# Practice Project: Webscrapping Using Pandas & Numpy
---
This practice project demonstrates extracting data from a website using webscraping and 
reqeust APIs to process it using Pandas and Numpy libraries.
## Objectives:
- Use Webscraping to extract required information from a website.
- Use Pandas to load and process the tabular data as a dataframe.
- Use Numpy to manipulate the information contatined in the dataframe.
- Load the updated dataframe to CSV file.

In [None]:
#Install required packages
!pip install pandas numpy 
!pip install lxml

In [None]:
# this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings #module to provide warnings of issues while still allowing code execution
warnings.warn = warn  #because we set it to equal the def warn which has 'pass', warnings are ignored
warnings.filterwarnings('ignore')

In [None]:
URL="https://web.archive.org/web/20230902185326/https://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28nominal%29"


### We are interested in table 3 as highligthed below in the image
- use `.read_html()` to extract it
<img src="GDP_countries.png" alt="Shows which table is scrapped" width="300"/>

## Extract Table 3 & Adjusting Columns with Pandas

In [None]:
# Extract tables from webpage using Pandas. Retain table number 3 as the required dataframe.
tables = pd.read_html(URL)
df = tables[3]

# Replace the column headers with column numbers
df.columns = range(df.shape[1]) #[1] refers to 2nd element of the tuple that reps dimensions of df(columns)

# Retain columns with index 0 and 2 (name of country and value of GDP quoted by IMF)
df = df[[0,2]]

# Retain the Rows with index 1 to 10, indicating the top 10 economies of the world.
df = df.iloc[1:11,:]

# Assign column names as "Country" and "GDP (Million USD)"
df.columns = ['Country','GDP (Million USD)']
df

## Modify Dataframes Using Numpy

In [None]:
# Change the data type of the 'GDP (Million USD)' column to integer. Use astype() method.
df['GDP (Million USD)'] = df['GDP (Million USD)'].astype(int)
# Convert the GDP value in Million USD to Billion USD
df[['GDP (Million USD)']] = df[['GDP (Million USD)']]/1000
# Use numpy.round() method to round the value to 2 decimal places.
df[['GDP (Million USD)']] = np.round(df[['GDP (Million USD)']],2)
# Rename the column header from 'GDP (Million USD)' to 'GDP (Billion USD)'
df.rename(columns = {'GDP (Million USD)' : 'GDP (Billion USD)'})

## Load Dataframe to CSV File 

In [None]:
df.to_csv('./Largest_economies.csv')

#### Scrapping and cleaning of data is done and ready for further analysis!