# DATA INGESTION:
---

Data ingestion was performed to reliably import and consolidate raw marketing campaign files into a unified dataset. This stage ensures consistent input structure, preserves original data fidelity, and establishes the foundation for all downstream analysis and modeling.

### Objective:
Import raw marketing data into the project workspace and preserve it in its original form for reproducibility.

---

### Scope & Tasks:

- Load source CSV files
  - marketing_campaign_2024.csv
  - marketing_campaign_2025.csv
instead)
- Perform initial structural validation
  - Row counts
  - Column names match expected schema
- Save unmodified dataset into /data/raw/

---

### Output:
Raw data snapshot stored for auditability and reprocessing if needed.

---

## Import Libraries:

In [3]:
import pandas as pd

## marketing_campaign_2024.csv

### Load Data:

In [4]:
#LOAD DATA
#LOAD 1ST CSV - 2024 FILE

df1 = pd.read_csv("../data/raw/marketing_campaign_2024.csv")


### DF1 Initial Check:

In [5]:
#INITIAL CHECK 1ST 5 ROWS

df1.head()


Unnamed: 0,campaign_id,campaign_name,start_date,end_date,channel,region,impressions,clicks,conversions,spend_usd,revenue_usd,target_audience,product_category,device,year
0,2024_0001,Campaign_2024_0001,2024-05-16,2024-08-16,Search,South America,28252,5609,65466,39193.43,79017.74,Youth,Electronics,Desktop,2024
1,2024_0002,Campaign_2024_0002,2024-04-06,2024-10-13,Search,Asia,89608,83584,26865,17291.53,49868.54,Adults,Home,Mobile,2024
2,2024_0003,Campaign_2024_0003,2024-05-08,2024-11-27,Social,Europe,37853,62661,43662,6729.63,63021.28,Seniors,Electronics,Desktop,2024
3,2024_0004,Campaign_2024_0004,2024-01-28,2024-08-03,Display,Africa,10577,41421,75023,15077.58,133106.71,Seniors,Clothing,Desktop,2024
4,2024_0005,Campaign_2024_0005,2024-02-06,2024-08-23,Social,Asia,84039,56010,11283,16877.69,144736.99,Adults,Home,Mobile,2024


In [6]:
df1.shape

(500, 15)

In [7]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   campaign_id       500 non-null    object 
 1   campaign_name     500 non-null    object 
 2   start_date        500 non-null    object 
 3   end_date          500 non-null    object 
 4   channel           500 non-null    object 
 5   region            500 non-null    object 
 6   impressions       500 non-null    int64  
 7   clicks            500 non-null    int64  
 8   conversions       500 non-null    int64  
 9   spend_usd         500 non-null    float64
 10  revenue_usd       500 non-null    float64
 11  target_audience   500 non-null    object 
 12  product_category  500 non-null    object 
 13  device            500 non-null    object 
 14  year              500 non-null    int64  
dtypes: float64(2), int64(4), object(9)
memory usage: 58.7+ KB


## marketing_campaign_2025.csv

### Load Data:

In [8]:
#LOAD DATA
#LOAD 2nd CSV - 2024 FILE

df2 = pd.read_csv("../data/raw/marketing_campaign_2025.csv")


### DF2 Initial Check:

In [9]:
#INITIAL CHECK 1ST 5 ROWS

df1.head()


Unnamed: 0,campaign_id,campaign_name,start_date,end_date,channel,region,impressions,clicks,conversions,spend_usd,revenue_usd,target_audience,product_category,device,year
0,2024_0001,Campaign_2024_0001,2024-05-16,2024-08-16,Search,South America,28252,5609,65466,39193.43,79017.74,Youth,Electronics,Desktop,2024
1,2024_0002,Campaign_2024_0002,2024-04-06,2024-10-13,Search,Asia,89608,83584,26865,17291.53,49868.54,Adults,Home,Mobile,2024
2,2024_0003,Campaign_2024_0003,2024-05-08,2024-11-27,Social,Europe,37853,62661,43662,6729.63,63021.28,Seniors,Electronics,Desktop,2024
3,2024_0004,Campaign_2024_0004,2024-01-28,2024-08-03,Display,Africa,10577,41421,75023,15077.58,133106.71,Seniors,Clothing,Desktop,2024
4,2024_0005,Campaign_2024_0005,2024-02-06,2024-08-23,Social,Asia,84039,56010,11283,16877.69,144736.99,Adults,Home,Mobile,2024


In [10]:
df2.shape

(500, 15)

In [11]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   campaign_id       500 non-null    object 
 1   campaign_name     500 non-null    object 
 2   start_date        500 non-null    object 
 3   end_date          500 non-null    object 
 4   channel           500 non-null    object 
 5   region            500 non-null    object 
 6   impressions       500 non-null    int64  
 7   clicks            500 non-null    int64  
 8   conversions       500 non-null    int64  
 9   spend_usd         500 non-null    float64
 10  revenue_usd       500 non-null    float64
 11  target_audience   500 non-null    object 
 12  product_category  500 non-null    object 
 13  device            500 non-null    object 
 14  year              500 non-null    int64  
dtypes: float64(2), int64(4), object(9)
memory usage: 58.7+ KB


## Ingestion Summary:

---

#### Overview:

The data ingestion phase focused only on loading and performing preliminary validation on raw datasets, one for 2024 and one for 2025 for the preparation for cleaning and transformation.

#### INitial check:
- No missing or null values detected in the 2 raw import csv's.
- Data types consistent across both datasets.
- Verified record counts aligned with expected totals for both csv's:
- 2024: 500 campaigns
- 2025: 500 campaigns

Prepared for Cleaning Phase
- Datasets successfully loaded and verified.
- Ready for data type conversions, column standardization, and consolidation in the Data Cleaning step.

#### Conclusion:

The raw marketing campaign datasets for 2024 and 2025 were successfully ingested, validated, and confirmed to be structurally consistent.
The data is clean enough for transition into the next phase, Data Cleaning, where further formatting, type correction, and merging operations are applied.