### ``Day 46: Create a DataFrame`` 
* Create a DataFrame using pandas. You are going to create a code to put the following into a DataFrame. You will use the information in the table below. Basically, you are going to recreate this table using pandas. Use the information in the table to recreate the table.

|year|title|genre|
|----|----|----|
|2009|Brothers|Drama|
|2002|Spider-Man|Sci-Fi|
|2009|WatchMen|Drama|
|2010|Inception|Sci-Fi|
|2009|Avatar|Fantasy|

In [1]:
import pandas as pd

In [2]:
movies = pd.DataFrame({
    'year': [2009, 2002, 2009, 2010, 2009],
    'title': ['Brothers', 'Spider-Man', 'WatchMen', 'Inception', 'Avatar'],
    'genre': ['Drama', 'Sci-Fi', 'Drama', 'Sci-Fi', 'Fantasy']
})
movies

Unnamed: 0,year,title,genre
0,2009,Brothers,Drama
1,2002,Spider-Man,Sci-Fi
2,2009,WatchMen,Drama
3,2010,Inception,Sci-Fi
4,2009,Avatar,Fantasy


#### `Extra Challenge: Website Data with Pandas` 
* Create code that extracts data from a website. 
* You will extract a table from the website. And from that table, you will extract columns about the data types in Python and their mutability. You will extract information from the following website: https://en.wikipedia.org/wiki/Python_(programming_language)z


In [3]:
from bs4 import BeautifulSoup
import requests

url = "https://en.wikipedia.org/wiki/Python_(programming_language)"
data = requests.get(url).text

soup = BeautifulSoup(data, 'html.parser')
print(type(soup))

<class 'bs4.BeautifulSoup'>


In [4]:
tables = soup.find_all('table')

In [5]:
len(soup.find_all('table', class_="wikitable"))

1

* Now using the find_all method we inspected the webpage and found the class to be the wikitable which our length confirms above should be our primary table, let's look at it now

In [6]:
wiki_table = soup.find_all('table', class_="wikitable")
table = wiki_table[0]
# collect tableheading
for heading in table.tbody.find_all('th'):
    print(heading)

<th>Type
</th>
<th><a href="/wiki/Immutable_object" title="Immutable object">Mutability</a>
</th>
<th>Description
</th>
<th>Syntax examples
</th>


In [7]:
# The result set returned can be sliced so we can look at like the first three
for row_number, row in enumerate(table.tbody.find_all('tr')[:3]):
    print(row_number, row)

0 <tr>
<th>Type
</th>
<th><a href="/wiki/Immutable_object" title="Immutable object">Mutability</a>
</th>
<th>Description
</th>
<th>Syntax examples
</th></tr>
1 <tr>
<td><code>bool</code>
</td>
<td>immutable
</td>
<td><a class="mw-redirect" href="/wiki/Boolean_value" title="Boolean value">Boolean value</a>
</td>
<td><code class="mw-highlight mw-highlight-lang-python mw-content-ltr" dir="ltr" id="" style=""><span class="kc">True</span></code><br/><code class="mw-highlight mw-highlight-lang-python mw-content-ltr" dir="ltr" id="" style=""><span class="kc">False</span></code>
</td></tr>
2 <tr>
<td><code>bytearray</code>
</td>
<td>mutable
</td>
<td>Sequence of <a href="/wiki/Byte" title="Byte">bytes</a>
</td>
<td><code class="mw-highlight mw-highlight-lang-python mw-content-ltr" dir="ltr" id="" style=""><span class="nb">bytearray</span><span class="p">(</span><span class="sa">b</span><span class="s1">'Some ASCII'</span><span class="p">)</span></code><br/><code class="mw-highlight mw-highligh

In [8]:
# remember to strip new line characters
headings = [heading.text.strip('\n') for heading in table.tbody.find_all('th')] 
print(headings)
headings_dict = {heading.text.strip('\n') for heading in table.tbody.find_all('th')}
print(headings_dict)

['Type', 'Mutability', 'Description', 'Syntax examples']
{'Description', 'Mutability', 'Syntax examples', 'Type'}


In [9]:
# So let's make the pandas dataframe with the bare bones and the headings 
wiki_python_df = pd.DataFrame(columns=headings[:2])
wiki_python_df

Unnamed: 0,Type,Mutability


In [10]:
# Loop through each table row after the headings
rows_data = []
for row_data in table.tbody.find_all('tr')[1:]:
    # Reset column data for next row
    columns_data = []
    for column_data in row_data.find_all('td')[:2]:
        columns_data.append(column_data.text.strip('\n'))
    #row_frames.append(pd.DataFrame([columns_data])) # create list of pandas dataframes to concat after all rows
    rows_data.append(columns_data)

In [11]:
pd.DataFrame(rows_data, columns=wiki_python_df.columns)

Unnamed: 0,Type,Mutability
0,bool,immutable
1,bytearray,mutable
2,bytes,immutable
3,complex,immutable
4,dict,mutable
5,types.EllipsisType,immutable
6,float,immutable
7,frozenset,immutable
8,int,immutable
9,list,mutable


* Alternatively, we should be able to use the `read_html` function from pandas to do this as well, let's try!

In [12]:
scrubbed_tables = pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)')

In [13]:
print(len(scrubbed_tables), [type(x) for x in scrubbed_tables])

13 [<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>]


In [14]:
# Let's see where our tables is in the list of dataframes
scrubbed_tables[1].columns

Index(['Type', 'Mutability', 'Description', 'Syntax examples'], dtype='object')

In [15]:
pyth_types_table_list = [x for x in scrubbed_tables if list(x.columns) == 
                       ['Type', 'Mutability', 'Description', 'Syntax examples']]

In [16]:
type(pyth_types_table_list), len(pyth_types_table_list), type(pyth_types_table_list[0])

(list, 1, pandas.core.frame.DataFrame)

In [18]:
pyth_types_table_list[0]

Unnamed: 0,Type,Mutability,Description,Syntax examples
0,bool,immutable,Boolean value,TrueFalse
1,bytearray,mutable,Sequence of bytes,"bytearray(b'Some ASCII')bytearray(b""Some ASCII..."
2,bytes,immutable,Sequence of bytes,"b'Some ASCII'b""Some ASCII""bytes([119, 105, 107..."
3,complex,immutable,Complex number with real and imaginary parts,3+2.7j3 + 2.7j
4,dict,mutable,Associative array (or dictionary) of key and v...,"{'key1': 1.0, 3: False}{}"
5,types.EllipsisType,immutable,An ellipsis placeholder to be used as an index...,...Ellipsis
6,float,immutable,Double-precision floating-point number. The pr...,1.33333
7,frozenset,immutable,"Unordered set, contains no duplicates; can con...","frozenset([4.0, 'string', True])"
8,int,immutable,Integer of unlimited magnitude[108],42
9,list,mutable,"List, can contain mixed types","[4.0, 'string', True][]"


In [20]:
# So we can limit the pandas dataframe in our result set by a column check and then access the dataframe which
# is still in the list, now we can subset as we'd like
pyth_types_table_list[0].iloc[:, [0, 1]]

Unnamed: 0,Type,Mutability
0,bool,immutable
1,bytearray,mutable
2,bytes,immutable
3,complex,immutable
4,dict,mutable
5,types.EllipsisType,immutable
6,float,immutable
7,frozenset,immutable
8,int,immutable
9,list,mutable
