# Interacting with HTML file

The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. For an example, we will try to parse this table from the Politics section on the Minnesota wiki page. 
The basic usage is of pandas read_html is pretty simple and works well on many Wikipedia pages since the tables are not complicated. To get started, I am including some extra imports we will use for data cleaning for more complicated examples:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize

table_MN = pd.read_html('https://en.wikipedia.org/wiki/Minnesota');

The unique point here is that table_MN is a list of all the tables on the page:

In [2]:
print(f'Total tables: {len(table_MN)}')

Total tables: 28


With 28 tables, it can be challenging to find the one you need. To make the table selection easier, use the match parameter to select a subset of tables. We can use the caption “Election results from statewide races” to select the table:

In [3]:
table_MN = pd.read_html('https://en.wikipedia.org/wiki/Minnesota', match='Election results from statewide races')
len(table_MN)

1

In [4]:
df = table_MN[0]
df.head()

Unnamed: 0,Year,Office,GOP,DFL,Others
0,2020,President,45.3%,52.4%,2.3%
1,2020,Senator,43.5%,48.8%,7.7%
2,2018,Governor,42.4%,53.9%,3.7%
3,2018,Senator,36.2%,60.3%,3.4%
4,2018,Senator,42.4%,53.0%,4.6%


Pandas makes it easy to read in the table and also handles the year column that spans multiple rows.

Reference: https://pbpython.com/pandas-html-table.html