## Python Pandas Working With HTML - Part 4

In this Part we are going to learn about

1. read HTML(read_html)
2. To HTML(to_html)

In Part 4 of Python Pandas Working with HTML, we will explore two essential functions: `read_html` and `to_html`.

1. `read_html`: This function allows us to read tables from HTML files or web pages directly into pandas DataFrames. It automatically searches for tables within the HTML content and converts them into pandas DataFrames. The `read_html` function returns a list of DataFrames, where each DataFrame corresponds to a table found in the HTML.

Example usage of `read_html`:
```python
import pandas as pd

# Read tables from an HTML file
tables = pd.read_html('example.html')

# Access the first DataFrame (assuming there's at least one table in the HTML)
df = tables[0]
print(df.head())
```

2. `to_html`: This function is used to convert pandas DataFrames into HTML format. It is particularly useful when you want to display or save the DataFrame as an HTML table. The `to_html` function generates an HTML string representing the DataFrame's content in tabular form.

Example usage of `to_html`:
```python
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 28],
    'Occupation': ['Engineer', 'Manager', 'Analyst']
}
df = pd.DataFrame(data)

# Convert the DataFrame to an HTML table
html_table = df.to_html()

# Print or save the HTML table
print(html_table)
```

These two functions, `read_html` and `to_html`, are powerful tools for working with HTML content in pandas. `read_html` simplifies the process of extracting tabular data from HTML, while `to_html` makes it easy to present pandas DataFrames as HTML tables. These functionalities are valuable for data manipulation, analysis, and sharing data in web applications or reports.

In [1]:
!pip install lxml



In [2]:
import pandas as pd

In [4]:
# read html tables from wikipedia link

html = pd.read_html("https://en.wikipedia.org/wiki/List_of_largest_technology_companies_by_revenue")

In [5]:
# fetch all tables

html

[         Column                                        Explanation
 0          Rank                         Rank of company by revenue
 1       Company                  Name of the international company
 2       Revenue  The revenue of the company in billions of USDs...
 3     Employees                     Number of employees of company
 4  Headquarters                 Location of company's headquarters,
     Rank  Company            Company.1 Revenue ($B) USD[2]  Employees[2]   
 0      1      NaN                Apple            $274.515        147000  \
 1      2      NaN  Samsung Electronics            $200.734        267937   
 2      3      NaN             Alphabet            $182.527        135301   
 3      4      NaN              Foxconn            $181.945        878429   
 4      5      NaN            Microsoft            $143.015        163000   
 5      6      NaN               Huawei            $129.184        197000   
 6      7      NaN    Dell Technologies             

In [6]:
type(html)

list

In [8]:
# 1st table of the page

html[0]

Unnamed: 0,Column,Explanation
0,Rank,Rank of company by revenue
1,Company,Name of the international company
2,Revenue,The revenue of the company in billions of USDs...
3,Employees,Number of employees of company
4,Headquarters,Location of company's headquarters


In [10]:
# 2nd table of the page

html[1]

Unnamed: 0,Rank,Company,Company.1,Revenue ($B) USD[2],Employees[2],Revenue per Employees ($K USD)[2],Headquarters
0,1,,Apple,$274.515,147000,1867.44897,"Cupertino, California, US"
1,2,,Samsung Electronics,$200.734,267937,749.18357,"Suwon, South Korea"
2,3,,Alphabet,$182.527,135301,1349.04398,"Mountain View, California, US"
3,4,,Foxconn,$181.945,878429,207.12544,"New Taipei City, Taiwan"
4,5,,Microsoft,$143.015,163000,877.39263,"Redmond, Washington, US"
5,6,,Huawei,$129.184,197000,655.75634,"Shenzhen, China"
6,7,,Dell Technologies,$92.224,158000,583.6962,"Round Rock, Texas, US"
7,8,,Meta,$85.965,58604,1466.87939,"Menlo Park, California, US"
8,9,,Sony,$84.893,109700,773.86508,"Tokyo, Japan"
9,10,,Hitachi,$82.345,350864,234.69207,"Tokyo, Japan"


In [11]:
type(html)

list

In [12]:
html[1]

Unnamed: 0,Rank,Company,Company.1,Revenue ($B) USD[2],Employees[2],Revenue per Employees ($K USD)[2],Headquarters
0,1,,Apple,$274.515,147000,1867.44897,"Cupertino, California, US"
1,2,,Samsung Electronics,$200.734,267937,749.18357,"Suwon, South Korea"
2,3,,Alphabet,$182.527,135301,1349.04398,"Mountain View, California, US"
3,4,,Foxconn,$181.945,878429,207.12544,"New Taipei City, Taiwan"
4,5,,Microsoft,$143.015,163000,877.39263,"Redmond, Washington, US"
5,6,,Huawei,$129.184,197000,655.75634,"Shenzhen, China"
6,7,,Dell Technologies,$92.224,158000,583.6962,"Round Rock, Texas, US"
7,8,,Meta,$85.965,58604,1466.87939,"Menlo Park, California, US"
8,9,,Sony,$84.893,109700,773.86508,"Tokyo, Japan"
9,10,,Hitachi,$82.345,350864,234.69207,"Tokyo, Japan"


In [14]:
html[2]

Unnamed: 0,Rank,Company,Company.1,Revenue ($B) USD[6],Employees[6],Revenue per Employees ($K USD)[6],Headquarters
0,1,,Apple,$260.174,137000,1899.08029,"Cupertino, California, US"
1,2,,Samsung Electronics,$197.705,287439,687.8155,"Suwon, South Korea"
2,3,,Foxconn,$178.869,757404,236.16062,"New Taipei City, Taiwan"
3,4,,Alphabet,$161.857,118899,1361.29824,"Mountain View, California, US"
4,5,,Microsoft,$125.843,144000,873.90972,"Redmond, Washington, US"
5,6,,Huawei,$124.316,194000,640.80412,"Shenzhen, China"
6,7,,Dell Technologies,$92.154,165000,558.50909,"Round Rock, Texas, US"
7,8,,Hitachi,$80.639,301056,267.85382,"Tokyo, Japan"
8,9,,IBM,$77.147,383056,201.39875,"Armonk, New York, US"
9,10,,Sony,$75.972,111700,680.14324,"Tokyo, Japan"


In [15]:
html=pd.read_html("https://en.wikipedia.org/wiki/Economy_of_the_United_States",match="Government debt")

In [16]:
html[0]

Unnamed: 0,0,1
0,"New York City, the world’s principal financial...","New York City, the world’s principal financial..."
1,Currency,United States dollar (USD) US Dollar Index
2,Fiscal year,"October 1, 2022 – September 30, 2023"
3,Trade organizations,"WTO, G-20, G7, OECD, USMCA, APEC and others"
4,Country group,".mw-parser-output .plainlist ol,.mw-parser-out..."
5,Statistics,Statistics
6,Population,"334,977,281 (June 6, 2023)[4]"
7,GDP,$26.854 trillion (nominal; 2023 est.)[5] $26....
8,GDP rank,1st (nominal; 2023) 2nd (PPP; 2023)
9,GDP growth,2.1% (2022)[6] 1.6% (2023f)[6] 1.0% (2024f)[6]


In [17]:
html=pd.read_html("https://en.wikipedia.org/wiki/Economy_of_the_United_States",match="Unemployment")
html[2]

Unnamed: 0,Year,GDP (in Bil. US$PPP),GDP per capita (in US$ PPP),GDP (in Bil. US$nominal),GDP per capita (in US$ nominal),GDP growth (real),Inflation rate (in Percent),Unemployment (in Percent),Government debt (in % of GDP)
0,1980,2857.3,12552.9,2857.3,12552.9,-0.3%,13.5%,7.2%,
1,1981,3207.0,13948.7,3207.0,13948.7,2.5%,10.4%,7.6%,
2,1982,3343.8,14405.0,3343.8,14405.0,-1.8%,6.2%,9.7%,
3,1983,3634.0,15513.7,3634.0,15513.7,4.6%,3.2%,9.6%,
4,1984,4037.7,17086.4,4037.7,17086.4,7.2%,4.4%,7.5%,
5,1985,4339.0,18199.3,4339.0,18199.3,4.2%,3.5%,7.2%,
6,1986,4579.6,19034.8,4579.6,19034.8,3.5%,1.9%,7.0%,
7,1987,4855.3,20001.0,4855.3,20001.0,3.5%,3.6%,6.2%,
8,1988,5236.4,21376.0,5236.4,21376.0,4.2%,4.1%,5.5%,
9,1989,5641.6,22814.1,5641.6,22814.1,3.7%,4.8%,5.3%,


In [20]:
html=pd.read_html("https://en.wikipedia.org/wiki/Minnesota",match="Average daily maximum and minimum temperatures for selected cities in Minnesota")

In [21]:
html[0]

Unnamed: 0,Location,July (°F),July (°C),January (°F),January (°C)
0,Minneapolis,83/64,28/18,23/7,−4/−13
1,Saint Paul,83/63,28/17,23/6,−5/−14
2,Rochester,82/63,28/17,23/3,−5/−16
3,Duluth,76/55,24/13,19/1,−7/−17
4,St. Cloud,81/58,27/14,18/−1,−7/−18
5,Mankato,86/62,30/16,23/3,−5/−16
6,International Falls,77/52,25/11,15/−6,−9/−21


In [22]:
html=pd.read_html("https://en.wikipedia.org/wiki/Minnesota",match="Largest cities or towns in Minnesota")

In [23]:
html

[   Largest cities or towns in Minnesota Source:[81]                          
                                  Unnamed: 0_level_1                   Rank   
 0                            Minneapolis Saint Paul                      1  \
 1                            Minneapolis Saint Paul                      2   
 2                            Minneapolis Saint Paul                      3   
 3                            Minneapolis Saint Paul                      4   
 4                            Minneapolis Saint Paul                      5   
 5                            Minneapolis Saint Paul                      6   
 6                            Minneapolis Saint Paul                      7   
 7                            Minneapolis Saint Paul                      8   
 8                            Minneapolis Saint Paul                      9   
 9                            Minneapolis Saint Paul                     10   
 10                           Minneapolis Saint Paul

In [24]:
html[0]

Unnamed: 0_level_0,Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81],Largest cities or towns in Minnesota Source:[81]
Unnamed: 0_level_1,Unnamed: 0_level_1,Rank,Name,County,Pop.,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,Minneapolis Saint Paul,1,Minneapolis,Hennepin,425336.0,Rochester Bloomington,,,,
1,Minneapolis Saint Paul,2,Saint Paul,Ramsey,307193.0,Rochester Bloomington,,,,
2,Minneapolis Saint Paul,3,Rochester,Olmsted,121465.0,Rochester Bloomington,,,,
3,Minneapolis Saint Paul,4,Bloomington,Hennepin,89298.0,Rochester Bloomington,,,,
4,Minneapolis Saint Paul,5,Duluth,St. Louis,86372.0,Rochester Bloomington,,,,
5,Minneapolis Saint Paul,6,Brooklyn Park,Hennepin,84526.0,Rochester Bloomington,,,,
6,Minneapolis Saint Paul,7,Plymouth,Hennepin,79828.0,Rochester Bloomington,,,,
7,Minneapolis Saint Paul,8,Woodbury,Washington,76990.0,Rochester Bloomington,,,,
8,Minneapolis Saint Paul,9,Lakeville,Dakota,72812.0,Rochester Bloomington,,,,
9,Minneapolis Saint Paul,10,Blaine,Anoka,70935.0,Rochester Bloomington,,,,


In [25]:
type(html[0])

pandas.core.frame.DataFrame

In [None]:
html[0].to_html('demo.html')