# Pipeline output – quick read

This notebook shows the output of the last successful pipeline run:

1. **Validation report** – data quality issues (no rows dropped; all recorded here).
2. **Warehouse** – `dim_campaign`, `fact_ad_performance`, `fact_orders` in `output/warehouse.db`.

Run all cells to refresh. Requires `pandas` (project `requirements.txt`).

## 1. Validation report

All data quality issues detected during validation. No rows are dropped; they are listed here for review.

In [1]:
import pandas as pd
from pathlib import Path

OUTPUT_DIR = Path("output")
report_path = OUTPUT_DIR / "validation_report.csv"

if report_path.exists():
    df_report = pd.read_csv(report_path)
    print(f"Total issues: {len(df_report)}")
    display(df_report)
else:
    print("No validation_report.csv found. Run the pipeline first.")

Total issues: 15


Unnamed: 0,source,row_id,column,issue_type,message,value
0,facebook,fb_camp_001|2024-01-13,purchases,missing,Null purchases,
1,facebook,fb_camp_003|2024-01-06,purchases,missing,Null purchases,
2,crm,ORD-10011,channel_attributed,inconsistent_casing,Value 'Google' normalized to 'google',Google
3,crm,ORD-10012,channel_attributed,inconsistent_casing,Value 'FACEBOOK' normalized to 'facebook',FACEBOOK
4,crm,ORD-10008,revenue,invalid,Non-numeric revenue,
5,crm,ORD-10030,customer_id,missing,Null or empty customer_id,
6,crm,ORD-10076,revenue,invalid,Negative revenue,-50.0
7,crm,ORD-10081,campaign_source,missing,Null or empty campaign_source,
8,crm,ORD-10081,revenue,outlier,Revenue > 1e6 (absolute),9999999.99
9,crm,ORD-10021,order_id,duplicate,Duplicate order_id,


## 2. Warehouse tables

Data loaded into SQLite at `output/warehouse.db`: campaign dimension and two fact tables (all rows).

In [4]:
import sqlite3

db_path = OUTPUT_DIR / "warehouse.db"
if not db_path.exists():
    print("No warehouse.db found. Run the pipeline first.")
else:
    conn = sqlite3.connect(db_path)
    
    print("=== dim_campaign (all rows) ===")
    dim = pd.read_sql("SELECT * FROM dim_campaign", conn)
    display(dim)
    
    print("=== fact_ad_performance (all rows) ===")
    fact_ad = pd.read_sql("SELECT * FROM fact_ad_performance", conn)
    display(fact_ad)
    
    print("=== fact_orders (all rows) ===")
    fact_ord = pd.read_sql("SELECT * FROM fact_orders", conn)
    display(fact_ord)
    
    conn.close()

=== dim_campaign (all rows) ===


Unnamed: 0,campaign_id,platform,campaign_name,campaign_type,status
0,goog_camp_001,google_ads,Brand Awareness - Q1,DISPLAY,ENABLED
1,goog_camp_002,google_ads,Product Launch - Winter Sale,SEARCH,ENABLED
2,goog_camp_003,google_ads,Retargeting - Cart Abandoners,DISPLAY,ENABLED
3,fb_camp_001,facebook,Spring Collection Awareness,,
4,fb_camp_002,facebook,Conversion Campaign - Promo,,
5,fb_camp_003,facebook,Video Views - Brand Story,,


=== fact_ad_performance (all rows) ===


Unnamed: 0,id,date,campaign_id,platform,impressions,clicks,spend,conversions,conversion_value,reach,frequency
0,1,2024-01-01 00:00:00,goog_camp_001,google_ads,45230,892,125.34,23,2875.50,,
1,2,2024-01-02 00:00:00,goog_camp_001,google_ads,48120,956,134.56,28,3520.00,,
2,3,2024-01-03 00:00:00,goog_camp_001,google_ads,42890,834,118.23,19,2280.00,,
3,4,2024-01-04 00:00:00,goog_camp_001,google_ads,51230,1023,145.67,31,3875.25,,
4,5,2024-01-05 00:00:00,goog_camp_001,google_ads,38920,756,108.45,17,1955.00,,
...,...,...,...,...,...,...,...,...,...,...,...
85,86,2024-01-11 00:00:00,fb_camp_003,facebook,135670,978,73.34,10,1150.00,108500.0,1.25
86,87,2024-01-12 00:00:00,fb_camp_003,facebook,129340,934,70.12,9,1035.00,103500.0,1.25
87,88,2024-01-13 00:00:00,fb_camp_003,facebook,79230,567,42.67,2,230.00,63400.0,1.25
88,89,2024-01-14 00:00:00,fb_camp_003,facebook,84560,612,46.12,3,345.00,67600.0,1.25


=== fact_orders (all rows) ===


Unnamed: 0,order_id,customer_id,order_date,revenue,channel_attributed,campaign_source,product_category,region
0,ORD-10001,CUST-5234,2024-01-01 00:00:00,125.50,google,goog_camp_002,Electronics,North America
1,ORD-10002,CUST-8921,2024-01-01 00:00:00,89.99,facebook,fb_camp_002,Apparel,Europe
2,ORD-10003,CUST-3456,2024-01-01 00:00:00,234.00,google,goog_camp_001,Home & Garden,North America
3,ORD-10004,CUST-7823,2024-01-01 00:00:00,67.50,facebook,fb_camp_001,Apparel,North America
4,ORD-10005,CUST-2341,2024-01-01 00:00:00,445.00,google,goog_camp_002,Electronics,Europe
...,...,...,...,...,...,...,...,...
77,ORD-10078,CUST-9876,2024-01-15 00:00:00,156.25,facebook,fb_camp_002,Electronics,North America
78,ORD-10079,CUST-3210,2024-01-15 00:00:00,534.00,google,goog_camp_001,Apparel,Asia Pacific
79,ORD-10080,CUST-7654,2024-01-15 00:00:00,67.75,facebook,fb_camp_003,Home & Garden,Europe
80,ORD-10081,CUST-2109,2024-01-15 00:00:00,9999999.99,google,,Electronics,North America


## 3. Row counts

Quick summary of how many rows are in each warehouse table.

In [3]:
if db_path.exists():
    conn = sqlite3.connect(db_path)
    for table in ["dim_campaign", "fact_ad_performance", "fact_orders"]:
        n = pd.read_sql(f"SELECT COUNT(*) as n FROM {table}", conn).iloc[0, 0]
        print(f"{table}: {n} rows")
    conn.close()

dim_campaign: 6 rows
fact_ad_performance: 90 rows
fact_orders: 82 rows
