In [1]:
# Step 1: Set Up the Environment
# Install required libraries (requests and pandas)
!pip install requests pandas -q

In [2]:
# Step 2: Ingest Data from a Public REST API
# Example: Fetch random user data from the Random User Generator API.
import requests
import pandas as pd

api_url = "https://randomuser.me/api/?results=10"
response = requests.get(api_url)
data = response.json()

# Convert the results to a DataFrame
df_api = pd.json_normalize(data['results'])
print("Data from REST API:")
print(df_api.head())

Data from REST API:
   gender                            email           phone            cell  \
0  female          amy.laurent@example.com   076 918 27 77   077 282 97 74   
1  female       grazyna.klasen@example.com    0206-4166587    0171-8933964   
2    male          brdy.jaafry@example.com    023-32087736   0901-370-4469   
3    male         francis.hart@example.com     01534 25688    07497 736791   
4  female  magdalena.moseychuk@example.com  (098) A33-5030  (067) A04-5751   

  nat name.title name.first  name.last  location.street.number  \
0  CH     Madame        Amy    Laurent                    5317   
1  DE       Miss    Grazyna     Klasen                    9592   
2  IR         Mr      بردیا      جعفری                    8307   
3  GB         Mr    Francis       Hart                    1220   
4  UA        Mrs  Magdalena  Moseychuk                    7802   

              location.street.name  ...  \
0              Rue des Cuirassiers  ...   
1                      Haupt

In [3]:
# Step 3: Ingest Data from a CSV File
# Example using the Iris dataset (direct CSV link):

csv_url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df_csv = pd.read_csv(csv_url)
print("\nData from CSV File:")
print(df_csv.head())


Data from CSV File:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


In [4]:
# Step 4: Inspect and Compare Data
# Perform basic inspection on both data sources.
# Inspect columns and info for both DataFrames
print("API Data Columns:", df_api.columns)
print("CSV Data Columns:", df_csv.columns)

print("\nAPI Data Info:")
print(df_api.info())

print("\nCSV Data Info:")
print(df_csv.info())

API Data Columns: Index(['gender', 'email', 'phone', 'cell', 'nat', 'name.title', 'name.first',
       'name.last', 'location.street.number', 'location.street.name',
       'location.city', 'location.state', 'location.country',
       'location.postcode', 'location.coordinates.latitude',
       'location.coordinates.longitude', 'location.timezone.offset',
       'location.timezone.description', 'login.uuid', 'login.username',
       'login.password', 'login.salt', 'login.md5', 'login.sha1',
       'login.sha256', 'dob.date', 'dob.age', 'registered.date',
       'registered.age', 'id.name', 'id.value', 'picture.large',
       'picture.medium', 'picture.thumbnail'],
      dtype='object')
CSV Data Columns: Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

API Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 34 columns):
 #   Column                          Non-Null Count  Dtype 

In [None]:
# Step 5: Reflection Questions
# At the end of your notebook, answer these questions:

# What were the main steps for ingesting data from a REST API vs. a CSV? --------------------------->1

# What are some possible challenges or error scenarios for each ingestion method?--------------------->2

# For your workflow, when would you prefer an API vs. a CSV file? _--------------------------------------->3

1. For REST API:

Import requests and pandas.

Use requests.get() to call the API.

Convert the response to JSON.

Use pd.json_normalize() to create a DataFrame.

For CSV:

Import pandas.

Use pd.read_csv() with the file path or URL to load the data directly.

2.
For REST API:

Internet or server issues.

API may change structure or become unavailable.

Rate limits or access restrictions.

Complex nested JSON can be tricky to flatten.

For CSV:

Wrong file path or broken link.

File may have missing data or bad formatting.

Encoding problems (e.g., special characters).

Header or column mismatch.

3. Prefer API when:

You need real-time or frequently updated data.

Automating workflows or building data pipelines.

Prefer CSV when:

You’re doing offline analysis or using static data.

You need a simple, portable format to share or store data.