**Extracción de datos de la página web de [Yahoo Finance](https://finance.yahoo.com/), mediante el uso de Python y la técnica de Web Scraping. Estos datos incluyen precios y registros históricos de acciones de siete destacadas empresas a nivel global: Apple, IBM, Google, Amazon, Meta, Tesla y Microsoft.**

In [None]:
# Importación de librerías
from datetime import datetime
import pandas as pd

In [None]:
# Función para la extracción de datos

def data_extraction(start_date, end_date):
  # Diccionario de las empresas
  dic_com={"Apple":"AAPL", "IBM":"IBM", "Google":"GOOG", "Meta":"META", "Amazon":"AMZN", "Tesla": "TSLA", "Microsoft":"MSFT"}
  df_list=[]

  for i in dic_com.items():
    url=f"https://query1.finance.yahoo.com/v7/finance/download/{i[1]}?period1={start_date}&period2={end_date}&interval=1d&events=history&includeAdjustedClose=true"
    df=pd.read_csv(url)
    df["id_company"]=i[1]
    df_list.append(df)
  df=pd.concat(df_list).sort_values(by="Date")

  return df.reset_index(drop=True)

In [None]:
# Fecha actual en formato UTC
datetime.now().strftime("%Y-%m-%d %H:%M:%S")

'2023-09-19 20:35:47'

In [None]:
# Obtiención de la marca de tiempo UNIX para las fechas
start_date=int(pd.to_datetime("2013-01-01").timestamp())
end_date=int(pd.to_datetime("2023-09-19").timestamp()) # Se considera hasta el '2023-09-19 00:00:00' para tener los datos consistentes y completos

df=data_extraction(start_date, end_date)
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,id_company
0,2013-01-02,19.779285,19.821428,19.343929,19.608213,16.791185,560518000,AAPL
1,2013-01-02,12.804000,12.905000,12.663000,12.865500,12.865500,65420000,AMZN
2,2013-01-02,27.440001,28.180000,27.420000,28.000000,28.000000,69846400,META
3,2013-01-02,2.333333,2.363333,2.314000,2.357333,2.357333,17922000,TSLA
4,2013-01-02,27.250000,27.730000,27.150000,27.620001,22.620338,52899300,MSFT
...,...,...,...,...,...,...,...,...
18867,2023-09-18,137.630005,139.929993,137.630005,138.960007,138.960007,16233600,GOOG
18868,2023-09-18,140.479996,141.750000,139.220001,139.979996,139.979996,42823500,AMZN
18869,2023-09-18,145.770004,146.479996,145.059998,145.089996,145.089996,2508100,IBM
18870,2023-09-18,298.190002,303.600006,297.799988,302.549988,302.549988,14234200,META


In [None]:
# Información del dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18872 entries, 0 to 18871
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Date        18872 non-null  object 
 1   Open        18872 non-null  float64
 2   High        18872 non-null  float64
 3   Low         18872 non-null  float64
 4   Close       18872 non-null  float64
 5   Adj Close   18872 non-null  float64
 6   Volume      18872 non-null  int64  
 7   id_company  18872 non-null  object 
dtypes: float64(5), int64(1), object(2)
memory usage: 1.2+ MB


In [None]:
# Exportación del dataset unificado en formato CSV
df.to_csv("datasets_company.csv", index=False)