##### **What is an Application Programming Interface (API)?**
- It's an interface that allows communication between 2 software program.
- Just like a function, you don't need to know the internal working -- only its inputs & outputs.
- APIs helps your program to interact with external software, data or services.

**Example -**
- The pandas api allows you to process and manipulate data.
- Many parts of Pandas are not written in Python, but you can still use them through its API.

**How it Works-**
- You can a Dictionary.
- Convert it into a DataFrame using the Pandas API.
- Call methods on the DataFrame to interact with the API.

In [28]:
# Creating an API Instance - 
# When you create a DataFrame it acts as an "instance" that interacts with the Pandas API.

import pandas as pd

data = {'Name': ['Alice','Bob','Charlie'], 'Age': [25,30,35]}
df = pd.DataFrame(data)

# Communicating with the API - df is used to interact with the Pandas Function
print(df.head(),'\n')
print(df.mean(numeric_only=True)) # Can only handle numeric value 

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35 

Age    30.0
dtype: float64


##### **What is REST API?**
- REST API Stands for Representational State Transfer API
- A type of API that allows communication over the internet.
- It helps clients (your program) interact with web services (resources).

**How REST APIs Work?**
1) The client (your program) sends a request.
2) The request is sent as an HTTP Message (often with a JSON file).
3) The server processes the request and sends a response (also in JSON Format).

**Common REST API Terms -**
- GET : Retrieve Data
- POST : Sends new Data
- PUT : Updates Existing Data
- DELETE : Remove Data
- Client : Your Python Code
- Resource : The web service providing data
- Endpoint : The URL where data is accessed

##### **Example - Using PyCoinGecko API for Cryptocurrency Data**

In [1]:
#  Install and Import the library 
from pycoingecko import CoinGeckoAPI
cg = CoinGeckoAPI()

In [2]:
# Fetch Bitcoin Data (Past 30 days in USD)
data = cg.get_coin_market_chart_by_id(
    id = 'bitcoin',
    vs_currency = 'usd',
    days=30
)

# The API returns a JSON Dictionary containing Price, Market Cap , Total Volume
# print(data)

# Extracting Price from the JSON Dictionary
prices = data['prices']
print(prices)

[[1737536618751, 105265.54039003127], [1737540171930, 104964.5927061866], [1737543829650, 105125.72871098916], [1737547433648, 105064.6017722564], [1737551031756, 105274.67946424602], [1737554629674, 104058.3806290983], [1737558460597, 104461.75496730869], [1737561611802, 104585.95000093221], [1737565445496, 103809.42598027962], [1737568958733, 104404.40751192812], [1737572640639, 103862.59857173236], [1737576091015, 104604.75924580242], [1737579821164, 104270.85038371527], [1737583419610, 104063.7203689605], [1737587028831, 103889.57174880913], [1737590631303, 103610.2884720945], [1737594072223, 103213.21801098887], [1737597752971, 102932.26960635831], [1737601414822, 102652.0766969804], [1737605020249, 102393.49466027308], [1737608636984, 101968.18865279967], [1737612233176, 102477.01865221249], [1737615812310, 102348.1246088757], [1737619411927, 102601.69451384385], [1737623031202, 102187.99212580184], [1737626617948, 101564.38402240237], [1737630227028, 101524.9578065445], [1737633

In [3]:
# Converting API Data into Pandas DataFrame
import pandas as pd

df=pd.DataFrame(prices, columns=['Timestamp','Price'])
print(df.head())

       Timestamp          Price
0  1737536618751  105265.540390
1  1737540171930  104964.592706
2  1737543829650  105125.728711
3  1737547433648  105064.601772
4  1737551031756  105274.679464


In [4]:
# Fixing TimeStamp in Date & Time using to_datetime function
df['date'] = pd.to_datetime(df['Timestamp'], unit='ms')
print(df.head())

       Timestamp          Price                    date
0  1737536618751  105265.540390 2025-01-22 09:03:38.751
1  1737540171930  104964.592706 2025-01-22 10:02:51.930
2  1737543829650  105125.728711 2025-01-22 11:03:49.650
3  1737547433648  105064.601772 2025-01-22 12:03:53.648
4  1737551031756  105274.679464 2025-01-22 13:03:51.756


In [5]:
# Creating a Candlestick Chart -
# Consists - Opening Time, CLosing time, Highest Price, Lowest Price

# Using Groupby function we Group data by date 
daily_data = df.groupby(df['date'].dt.date).agg({'Price' : ['min','max','first','last']}) 

In [6]:
# Plotting using Plotly 
import plotly.graph_objects as go

In [7]:
print(daily_data.head())

                    Price                                             
                      min            max          first           last
date                                                                  
2025-01-22  103809.425980  105274.679464  105265.540390  103889.571749
2025-01-23  101524.957807  105928.726268  103610.288472  103996.383586
2025-01-24  103007.080902  106462.475105  104067.609912  104828.410234
2025-01-25  104297.641103  105196.811090  104865.340402  105043.523379
2025-01-26  104252.449455  105205.034735  104757.721694  104252.449455


In [8]:
# Creating a CandleStick Chart
fig = go.Figure(data=[go.Candlestick(
    x=daily_data.index,
    open=daily_data['Price']['first'].values,
    high = daily_data['Price']['max'].values,
    low = daily_data['Price']['min'].values,
    close = daily_data['Price']['last'].values
)])


# Set Renderer (Only if Needed)
import plotly.io as pio
pio.renderers.default = "browser"

fig.show()

**What Happened?**
- You tried to display a Plotly Candlestick Chart in a Jupyter Notebook or an environment that supports inline visualization (like VS Code or Jupyter).

- However, Plotly requires additional libraries to properly render the chart inside the notebook, and they were missing. Specifically:

**nbformat was missing or outdated:**

- nbformat is a library that helps Jupyter notebooks display rich output, like interactive charts.
- Plotly relies on nbformat >= 4.2.0 to generate the chart.
- Your system either did not have it installed or had an older version.

**IPython was not installed:**

- IPython is an advanced Python shell used by Jupyter notebooks.
- It provides functions that help render interactive visualizations.

#### **Practice Project - GDP Data Extraction & Processing** 
**Project Scenario:**  
An international firm that is looking to expand its business in different countries across the world has recruited you. You have been hired as a junior Data Engineer and are tasked with creating a script that can extract the list of the top 10 largest economies of the world in descending order of their GDPs in Billion USD (rounded to 2 decimal places), as logged by the International Monetary Fund (IMF).

**Objectives:**

After completing this lab you will be able to:

 - Use Webscraping to extract required information from a website.
 - Use Pandas to load and process the tabular data as a dataframe.
 - Use Numpy to manipulate the information contatined in the dataframe.
 - Load the updated dataframe to CSV file.

What is lxml and What Does It Do?
- lxml is a Python library used for parsing and manipulating XML and HTML documents.
- It is fast, efficient, and widely used in web scraping and data extraction tasks.
- It combines the power of:
- Libxml2 (C-based XML parsing library)
- Libxslt (for handling XSLT transformations)

In [43]:
pip install lxml

Collecting lxml
  Downloading lxml-5.3.1-cp312-cp312-win_amd64.whl.metadata (3.8 kB)
Downloading lxml-5.3.1-cp312-cp312-win_amd64.whl (3.8 MB)
   ---------------------------------------- 0.0/3.8 MB ? eta -:--:--
   -------- ------------------------------- 0.8/3.8 MB 4.2 MB/s eta 0:00:01
   ---------------- ----------------------- 1.6/3.8 MB 4.4 MB/s eta 0:00:01
   --------------------------- ------------ 2.6/3.8 MB 4.3 MB/s eta 0:00:01
   ----------------------------------- ---- 3.4/3.8 MB 4.3 MB/s eta 0:00:01
   ---------------------------------------- 3.8/3.8 MB 4.3 MB/s eta 0:00:00
Installing collected packages: lxml
Successfully installed lxml-5.3.1
Note: you may need to restart the kernel to use updated packages.


In [41]:
import pandas as pd
import numpy as np

In [47]:
# Extracting GDP Data from the given URL using Web Scraping
URL="https://web.archive.org/web/20230902185326/https://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28nominal%29"

# Extract tables from webpage using Pandas. Retain table number 3 as the required dataframe.
tables = pd.read_html(URL) #Extracts all tables from webpage and stores them in a list of DataFrames.
df = tables[3] #selects the 4th table from the extracted tables (indexing starts from 0)

# Replace the column headers with column numbers
df.columns = range(df.shape[1]) #replaces the original column names with column numbers (0, 1, 2, ...).

# Retain columns with index 0 and 2 (name of country and value of GDP quoted by IMF)
df = df[[0,2]]

# Retain the Rows with index 1 to 10, indicating the top 10 economies of the world.
df = df.iloc[1:11,:]

# Assign column names as "Country" and "GDP (Million USD)"
df.columns = ['Country','GDP (Million USD)']

In [52]:
# Change the data type of the 'GDP (Million USD)' column to integer. Use astype() method.
df['GDP (Million USD)'] = df['GDP (Million USD)'].astype(int)

# Convert the GDP value in Million USD to Billion USD
df['GDP (Million USD)'] = df['GDP (Million USD)']/1000

# Use numpy.round() method to round the value to 2 decimal places.
df['GDP (Million USD))'] = np.round(df['GDP (Million USD)'],2)

# Rename the column header from 'GDP (Million USD)' to 'GDP (Billion USD)'
df.rename(columns = {'GDP (Million USD)' : 'GDP (Billion USD)'})

# Load the DataFrame to the CSV file named "Largest_economies.csv"
df.to_csv('./Largest_economies.csv')