In [1]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

1. `yfinance`: Esta es una biblioteca de Python que permite descargar datos históricos del mercado de Yahoo Finance. Se utiliza para obtener datos de acciones, opciones, futuros, etc.

2. `pandas`: Es una biblioteca de Python para la manipulación y análisis de datos. Proporciona estructuras de datos y funciones necesarias para manipular y analizar conjuntos de datos grandes y complejos.

3. `requests`: Esta biblioteca se utiliza para hacer solicitudes HTTP en Python. Puede ser utilizada para interactuar con servicios web.

4. `bs4 (BeautifulSoup)`: BeautifulSoup es una biblioteca de Python para extraer datos de archivos HTML y XML. Se utiliza para web scraping, que es el proceso de recopilación de datos de sitios web.

5. `plotly.graph_objects`: Esta es una biblioteca de Python para crear gráficos interactivos. Proporciona una interfaz de alto nivel para dibujar gráficos atractivos e informativos.

6. `plotly.subplots`: Este módulo de Plotly se utiliza para crear subtramas, que son gráficos que tienen múltiples gráficos en una sola figura.

En Python, puedes ignorar las advertencias utilizando el módulo de warnings. Puedes usar la función filterwarnings para filtrar o ignorar mensajes o categorías de advertencia específicos.


In [2]:
# Ignore all warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

Se definirá una función que será utilizada posteriormente para graficar lo requerido con respecto a la informacion de las acciones.

In [3]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

# Se incia la resolución del cuestionario 

## Question 1: Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.


In [4]:
# Download historical data for a stock
tesla = yf.Ticker("TSLA")

Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to `max` so we get information for the maximum amount of time.


In [5]:
tesla_data = tesla.history(period="max")

**Reset the index** using the `reset_index(inplace=True)` function on the tesla_data DataFrame and display the first five rows of the `tesla_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 1 to the results below.


In [6]:
tesla_data.reset_index(inplace=True)

In [7]:
# Display the downloaded data
tesla_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2010-06-29 00:00:00-04:00,1.266667,1.666667,1.169333,1.592667,281494500,0.0,0.0
1,2010-06-30 00:00:00-04:00,1.719333,2.028,1.553333,1.588667,257806500,0.0,0.0
2,2010-07-01 00:00:00-04:00,1.666667,1.728,1.351333,1.464,123282000,0.0,0.0
3,2010-07-02 00:00:00-04:00,1.533333,1.54,1.247333,1.28,77097000,0.0,0.0
4,2010-07-06 00:00:00-04:00,1.333333,1.333333,1.055333,1.074,103003500,0.0,0.0


## Question 2: Use Webscraping to Extract Tesla Revenue Data


Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm Save the text of the response as a variable named `html_data`.


In [8]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"
html_data = requests.get(URL).text

Parse the html data using `beautiful_soup`.


In [9]:
soup = BeautifulSoup(html_data)

In [45]:
#We need to identify how many tables there are on the page
tables = soup.find_all('table')
print('Tables count: ', len(tables))

#Then we need to find which table we need
table_index = 0
for index,table in enumerate(tables):
    if ("Tesla Quarterly Revenue" in str(table)):
        table_index = index

print('Table needed at index: ',table_index)

Tables count:  6
Table needed at index:  1


In [50]:
#Now we know that there are 6 tables but the one we need is at index 1
#We can parse the table to a bs4 object
table_parsed = soup.find_all('table')[table_index]

In [64]:
table_parsed.find('tbody').find_all('tr')[0].text.replace('\n','').split('$')

['2022-09-30', '21,454']

Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.


In [66]:
# Here we use the table_parsed to extract the table of revenues and store it into a dataframe called tesla_revenue.
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in table_parsed.find("tbody").find_all('tr'):
    current_row = row.text.replace('\n', '').split('$')
    #Because there is a empty row we need to make sure that we only append when we have both columns
    if len(current_row) >= 2:
        date = current_row[0]
        rev = current_row[1]
        new_period = pd.DataFrame(data=[(date, rev)], columns=['Date', 'Revenue'])
        tesla_revenue = pd.concat([tesla_revenue, new_period], ignore_index=True)


In [69]:
tesla_revenue.tail()

Unnamed: 0,Date,Revenue
48,2010-09-30,31
49,2010-06-30,28
50,2010-03-31,21
51,2009-09-30,46
52,2009-06-30,27


## Question 3: Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.


In [70]:
#Create a ticker object for GameStop
gme = yf.Ticker("GME")

Using the ticker object and the function `history` extract stock information and save it in a dataframe named `gme_data`. Set the `period` parameter to `max` so we get information for the maximum amount of time.


In [71]:
#Download the GameStop data into a dataframe called gme_data
gme_data = gme.history(period="max")

**Reset the index** using the `reset_index(inplace=True)` function on the gme_data DataFrame and display the first five rows of the `gme_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 3 to the results below.


In [72]:
#Reset the index of the dataframe
gme_data.reset_index(inplace=True)
#Display the first five rows of the dataframe
gme_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2002-02-13 00:00:00-05:00,1.620128,1.69335,1.603296,1.691666,76216000,0.0,0.0
1,2002-02-14 00:00:00-05:00,1.712707,1.716074,1.670626,1.683251,11021600,0.0,0.0
2,2002-02-15 00:00:00-05:00,1.683251,1.687459,1.658002,1.674834,8389600,0.0,0.0
3,2002-02-19 00:00:00-05:00,1.666417,1.666417,1.578047,1.607504,7410400,0.0,0.0
4,2002-02-20 00:00:00-05:00,1.615921,1.66221,1.603296,1.66221,6892800,0.0,0.0


## Question 4: Use Webscraping to Extract GME Revenue Data


Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html. Save the text of the response as a variable named `html_data`.


In [73]:
#Scrape the html at https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"
html_data = requests.get(url).text

Parse the html data using `beautiful_soup`.


In [74]:
soup = BeautifulSoup(html_data, 'html5lib')

Using `BeautifulSoup` or the `read_html` function extract the table with `GameStop Revenue` and store it into a dataframe named `gme_revenue`. The dataframe should have columns `Date` and `Revenue`. Make sure the comma and dollar sign is removed from the `Revenue` column using a method similar to what you did in Question 2.


In [75]:
#We need to identify how many tables there are on the page
tables = soup.find_all('table')
print('Tables count: ', len(tables))

#Then we need to find which table we need
table_index = 0
for index,table in enumerate(tables):
    if ("GameStop Quarterly Revenue" in str(table)):
        table_index = index

print('Table needed at index: ',table_index)

Tables count:  6
Table needed at index:  1


In [85]:
#Now we know that there are 6 tables but the one we need is at index 1
#We can parse the table to a bs4 object
table_parsed = soup.find_all('table')[table_index]
table_parsed.find('tbody').find_all('tr')[0].text.replace('\n','').replace('\t','').replace('  ','').split('$')

['2020-04-30', '1,021']

In [86]:
#Here we use the table_parsed to extract the table of revenues and store it into a dataframe called gme_revenue.
gme_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in table_parsed.find("tbody").find_all('tr'):
    current_row = row.text.replace('\n', '').replace('\t', '').replace('  ', '').split('$')
    #Because there is a empty row we need to make sure that we only append when we have both columns
    if len(current_row) >= 2:
        date = current_row[0]
        rev = current_row[1]
        new_period = pd.DataFrame(data=[(date, rev)], columns=['Date', 'Revenue'])
        gme_revenue = pd.concat([gme_revenue, new_period], ignore_index=True)

Display the last five rows of the `gme_revenue` dataframe using the `tail` function. Take a screenshot of the results.


In [87]:
gme_revenue.tail()

Unnamed: 0,Date,Revenue
57,2006-01-31,1667
58,2005-10-31,534
59,2005-07-31,416
60,2005-04-30,475
61,2005-01-31,709
