## Webscraping to Extract Tesla Revenue Data


I will use the requests library to download the webpage https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue. Then I will save the text of the response in a pandas dataframe to be further analysed.


In [11]:
import pandas as pd
from bs4 import BeautifulSoup 
import requests 

In [15]:
url='https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue/'
html_data= requests.get(url).text

In [16]:
soup = BeautifulSoup(html_data, "html.parser")

In [18]:
#find all html tables in the web page
tables = soup.find_all('table')

In [19]:
# we can see how many tables were found by checking the length of the tables list
print('Number of tables: ',len(tables))

Number of tables:  6


In [20]:
print('All the tables: ', pd.read_html(str(tables), flavor='bs4'))

All the tables:  [    Tesla Annual Revenue(Millions of US $)  \
0                                     2021   
1                                     2020   
2                                     2019   
3                                     2018   
4                                     2017   
5                                     2016   
6                                     2015   
7                                     2014   
8                                     2013   
9                                     2012   
10                                    2011   
11                                    2010   
12                                    2009   

   Tesla Annual Revenue(Millions of US $).1  
0                                   $53,823  
1                                   $31,536  
2                                   $24,578  
3                                   $21,461  
4                                   $11,759  
5                                    $7,000  
6              

In [21]:
#find THE TABLE that I want using the title e.g. title= "Tesla Quarterly Revenue"
for index,table in enumerate(tables):
    if ("Tesla Quarterly Revenue" in str(table)):
        table_index = index
print('identifier number of the table I am searching: ',table_index)


identifier number of the table I am searching:  1


In [22]:
# I will give a look to the right table:
print('Table that I searched: ', pd.read_html(str(tables[1]), flavor='bs4'))

Table that I searched:  [   Tesla Quarterly Revenue(Millions of US $)  \
0                                 2022-09-30   
1                                 2022-06-30   
2                                 2022-03-31   
3                                 2021-12-31   
4                                 2021-09-30   
5                                 2021-06-30   
6                                 2021-03-31   
7                                 2020-12-31   
8                                 2020-09-30   
9                                 2020-06-30   
10                                2020-03-31   
11                                2019-12-31   
12                                2019-09-30   
13                                2019-06-30   
14                                2019-03-31   
15                                2018-12-31   
16                                2018-09-30   
17                                2018-06-30   
18                                2018-03-31   
19             

In [24]:
#FINAL pandas Dataframe of the Table:
df = pd.read_html(str(tables[1]), flavor='bs4')[0] 

df.head()

Unnamed: 0,Tesla Quarterly Revenue(Millions of US $),Tesla Quarterly Revenue(Millions of US $).1
0,2022-09-30,"$21,454"
1,2022-06-30,"$16,934"
2,2022-03-31,"$18,756"
3,2021-12-31,"$17,719"
4,2021-09-30,"$13,757"


In [25]:
df=df.rename(columns={'Tesla Quarterly Revenue(Millions of US $)':'Date', 'Tesla Quarterly Revenue(Millions of US $).1':'Revenue'})
df.head()

Unnamed: 0,Date,Revenue
0,2022-09-30,"$21,454"
1,2022-06-30,"$16,934"
2,2022-03-31,"$18,756"
3,2021-12-31,"$17,719"
4,2021-09-30,"$13,757"


I will remove the comma and dollar sign from the Revenue column. 


In [26]:
df["Revenue"] = df['Revenue'].str.replace(',|\$',"")

  df["Revenue"] = df['Revenue'].str.replace(',|\$',"")


I will remove any null or empty strings in the Revenue column.

In [27]:
df.dropna(inplace=True)

df = df[df['Revenue'] != ""]

In [28]:
df.head()

Unnamed: 0,Date,Revenue
0,2022-09-30,21454
1,2022-06-30,16934
2,2022-03-31,18756
3,2021-12-31,17719
4,2021-09-30,13757
