<a href="https://colab.research.google.com/github/appliedcode/mthree-c422/blob/mthree-c422-rama/Excercises/Day-6/Ingestion_demo_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Step 1: Set Up the Environment
# Install required libraries (requests and pandas)
!pip install requests pandas -q


In [2]:
# Step 2: Ingest Data from a Public REST API
# Example: Fetch random user data from the Random User Generator API.
import requests
import pandas as pd

api_url = "https://randomuser.me/api/?results=10"
response = requests.get(api_url)
data = response.json()

# Convert the results to a DataFrame
df_api = pd.json_normalize(data['results'])
print("Data from REST API:")
print(df_api.head())

Data from REST API:
   gender                        email           phone            cell nat  \
0    male  augusto.ordonez@example.com  (610) 661 7412  (618) 246 3076  MX   
1  female    hannah.walker@example.com  (708)-075-4643  (598)-590-9998  NZ   
2  female   chloe.gonzalez@example.com  05-22-37-82-58  06-52-38-06-99  FR   
3  female    afet.kormukcu@example.com  (634)-571-5236  (147)-813-2177  TR   
4  female     sara.vasquez@example.com  (624) 307 3012  (624) 286 2484  MX   

  name.title name.first name.last  location.street.number  \
0         Mr    Augusto   Ordóñez                    8929   
1        Mrs     Hannah    Walker                    3604   
2       Miss      Chloé  Gonzalez                    9230   
3       Miss       Afet  Körmükçü                    8464   
4        Mrs       Sara   Vásquez                    8940   

    location.street.name  ...  \
0     Andador Sur Urbina  ...   
1          Totara Avenue  ...   
2    Rue Jean-Baldassini  ...   
3           

In [3]:
# Step 3: Ingest Data from a CSV File
# Example using the Iris dataset (direct CSV link):

csv_url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df_csv = pd.read_csv(csv_url)
print("\nData from CSV File:")
print(df_csv.head())



Data from CSV File:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


In [4]:
# Step 4: Inspect and Compare Data
# Perform basic inspection on both data sources.
# Inspect columns and info for both DataFrames
print("API Data Columns:", df_api.columns)
print("CSV Data Columns:", df_csv.columns)

print("\nAPI Data Info:")
print(df_api.info())

print("\nCSV Data Info:")
print(df_csv.info())

API Data Columns: Index(['gender', 'email', 'phone', 'cell', 'nat', 'name.title', 'name.first',
       'name.last', 'location.street.number', 'location.street.name',
       'location.city', 'location.state', 'location.country',
       'location.postcode', 'location.coordinates.latitude',
       'location.coordinates.longitude', 'location.timezone.offset',
       'location.timezone.description', 'login.uuid', 'login.username',
       'login.password', 'login.salt', 'login.md5', 'login.sha1',
       'login.sha256', 'dob.date', 'dob.age', 'registered.date',
       'registered.age', 'id.name', 'id.value', 'picture.large',
       'picture.medium', 'picture.thumbnail'],
      dtype='object')
CSV Data Columns: Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

API Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 34 columns):
 #   Column                          Non-Null Count  Dtype 

1. What were the main steps for ingesting data from a REST API vs. a CSV?

**REST API Ingestion Steps**:


*   Identify and construct the API endpoint URL.
*   Send a request using a library like requests.

*   Receive a JSON response.

*   Parse the response and convert it to a structured format using pandas.DataFrame.

**CSV File Ingestion Steps:**


*   
Upload the CSV file using Colab’s files.upload() or access it from a known path.
*   Use pandas.read_csv() to read the file into a DataFrame.

**Key Difference:**

REST API involves network calls and JSON parsing, while CSV ingestion reads local/static data.




[link text](https://)

2. What are some possible challenges or error scenarios for each ingestion method?

**For REST API Ingestion:**


*   Network issues (no internet, timeouts, bad request).
*   API changes (endpoint or structure).

*   Rate limits or API key restrictions.
*   JSON structure inconsistencies or deeply nested data.

**For CSV Ingestion:**


*   File not found or incorrect path.
*   Encoding errors (e.g., UTF-8 vs ANSI).

*   Inconsistent delimiters or missing headers.
*   Corrupted or improperly formatted CSV files.










3. For your workflow, when would you prefer an API vs. a CSV file?
**Prefer API when:**


*   You need real-time or frequently updated data.
*   You're building a dynamic app or dashboard.

*   You want to automate the data fetching process.

**Prefer CSV when:**


*   Data is static, pre-downloaded, or shared offline.
*   You’re working on data cleaning, exploration, or training models.

*   There's no need for frequent updates, and the data is already cleaned.








In [5]:
# (Optional) Save Your Results
# Save both datasets to Colab files (optional)
df_api.to_csv("users_api.csv", index=False)
df_csv.to_csv("iris_csv.csv", index=False)