### **API EDA process**:

- Objective: This workbook presents the process of interaction with the API and the integration with the main data (Credit Card Transactions).

So, we are looking to get population information from the API, full name of the State and country code and name. (This will not be a problem since the dataset represents the population of the United States).

---


#### **First Step**: Obtain the list of states present in our data


In [1]:
import requests
import pandas as pd
import sys
import os

# Add the 'src' folder to sys.path
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

In [2]:
from connections.db import DB
db = DB()

# Fetch the data from the database as a dataframe
df = db.fetch_as_dataframe("SELECT * FROM raw_table;")

INFO:root:✔ Connected to database
INFO:root:✔ Data loaded into DataFrame
INFO:root:✔ Cursor closed
INFO:root:✔ Connection closed


In [3]:
df['state'].unique()

array(['NC', 'ID', 'VA', 'PA', 'TN', 'IA', 'WV', 'FL', 'NJ', 'OK', 'IN',
       'MA', 'TX', 'WI', 'MI', 'WY', 'HI', 'LA', 'DC', 'KY', 'NY', 'MS',
       'KS', 'AL', 'WA', 'AR', 'MD', 'GA', 'ME', 'CA', 'NE', 'MN', 'OH',
       'MO', 'SC', 'OR', 'IL', 'NH', 'CO', 'SD', 'MT', 'ND', 'CT', 'VT',
       'AZ', 'UT', 'NM', 'NV', 'RI'], dtype=object)

#### **Second Step**: Connect to the API to obtain the population of each State and perform the merge.

In [4]:
API_URL = "https://api-world-population-etl-project.onrender.com/api/v1/data/usa/states"

response = requests.get(API_URL)
df_api = pd.DataFrame(response.json())
df_api

Unnamed: 0,Country,Country_Code,State_Abbreviation,State_Name,State_Population
0,United States,USA,AL,Alabama,5024294
1,United States,USA,AK,Alaska,733374
2,United States,USA,AZ,Arizona,7157902
3,United States,USA,AR,Arkansas,3011490
4,United States,USA,CA,California,39538212
5,United States,USA,CO,Colorado,5773707
6,United States,USA,CT,Connecticut,3605912
7,United States,USA,DE,Delaware,989946
8,United States,USA,DC,District of Columbia,689548
9,United States,USA,FL,Florida,21538216


In [5]:
# Merge using the column “state”. 
df_api.columns = [col.lower() for col in df_api.columns.to_list()]
df = df.merge(df_api, left_on='state', right_on='state_abbreviation', how='left')

In [6]:
df.columns

Index(['id', 'trans_date_trans_time', 'cc_num', 'merchant', 'category', 'amt',
       'first', 'last', 'gender', 'street', 'city', 'state', 'zip', 'lat',
       'long', 'job', 'dob', 'trans_num', 'is_fraud', 'merch_zipcode', 'age',
       'country', 'country_code', 'state_abbreviation', 'state_name',
       'state_population'],
      dtype='object')

In [7]:
# Save the preprocessed data to a new CSV file.
df.to_csv('../data/credit_card_transactions_api_preprocessed.csv', index=False)

---

#### **Fourth Step**: Simplify the entire process into one or more functions and then run it in Airflow

In [8]:
import logging
import pandas as pd
import requests

logging.basicConfig(level=logging.INFO)

def api_connect():
    API_URL = "https://api-world-population-etl-project.onrender.com/api/v1/data/usa/states"
    response = requests.get(API_URL)
    
    if response is None or response.status_code != 200:
        logging.error(f'API connection failed. Status code: {response.status_code}')
        return None
    
    df_api = pd.DataFrame(response.json())
    
    return df_api

def merge_data(df, df_api):
    df_api.columns = [col.lower() for col in df_api.columns.to_list()]
    df = df.merge(df_api, left_on='state', right_on='state_abbreviation', how='left')
    return df

# Main function
def api_integration(df):
    try:
        df_api = api_connect()
        df = merge_data(df, df_api)
        df.to_csv('../data/credit_card_transactions_api_preprocessed.csv', index=False)
    except Exception as e:
        logging.error(f'API integration failed. Error: {str(e)}')