# Practice Assignment II
## *Python*
> Scrape the content of this [wikipedia page](https://en.wikipedia.org/wiki/Comma-separated_values), and save the contents of the table in the page to a CSV file.

#### Import libraries

In [36]:
import numpy as np
from bs4 import BeautifulSoup
import requests
import pandas as pd

#### Store the URL in a variable

In [37]:
url = "https://en.wikipedia.org/wiki/Comma-separated_values"

## Pandas Solution
#### Use pandas' `read_html` method to scrape the table with class *wikitable*, and store it in a dataframe.

In [38]:
df1 = pd.read_html(url, attrs={"class": "wikitable"})[0]

#### Verify that the data has been correctly stored.

In [39]:
df1.head()

Unnamed: 0,Year,Make,Model,Description,Price
0,1997,Ford,E350,"ac, abs, moon",3000.0
1,1999,Chevy,"Venture ""Extended Edition""",,4900.0
2,1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.0
3,1996,Jeep,Grand Cherokee,"MUST SELL!air, moon roof, loaded",4799.0


In [40]:
df1.dtypes

Year             int64
Make            object
Model           object
Description     object
Price          float64
dtype: object

#### Save the dataframe in a csv file.<br>Since some of the columns contain strings with commas, use semicolumns as the separator.

In [41]:
df1.to_csv("cars_table.csv", index=False, sep=";")

## BeautifulSoup Solution
#### Retrieve the page using __BeautifulSoup__, and parse the XML to extract the desired data.

In [42]:
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')

In [43]:
cars_table = soup.find(attrs={"class": "wikitable"})

#### Loop through *th* elements to get the table columns.

In [44]:
df_columns = [header.text.strip('\n') for header in cars_table.find_all("th")]

#### Parse *tr* elements one at a time, and store the content in each row's *td* elements in lists.

In [45]:
df_data = []
for row in cars_table.find_all("tr"):
    row_cells = row.find_all("td")
    
    if row_cells:
        df_data.append([cell.text.strip('\n') for cell in row_cells])

#### Manually create a Pandas dataframe from the retrieved columns and rows.

In [46]:
df2 = pd.DataFrame(data=df_data, columns=df_columns)

In [47]:
df2.head()

Unnamed: 0,Year,Make,Model,Description,Price
0,1997,Ford,E350,"ac, abs, moon",3000.0
1,1999,Chevy,"Venture ""Extended Edition""",,4900.0
2,1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.0
3,1996,Jeep,Grand Cherokee,"MUST SELL!air, moon roof, loaded",4799.0


#### Change the datatypes as necessary.

In [48]:
df2.dtypes

Year           object
Make           object
Model          object
Description    object
Price          object
dtype: object

In [49]:
df2.Year = df2.Year.astype(int)
df2.Price = df2.Price.astype(float)

In [50]:
df2.dtypes

Year             int64
Make            object
Model           object
Description     object
Price          float64
dtype: object

#### Store the resulting dataframe in a CSV file (as above).

In [51]:
df2.to_csv("cars_table2.csv", index=False, sep=";")