# 01 – Dataset Exploration

**Purpose:**  
This notebook provides a lightweight exploration of a sample CSV dataset
to understand schema, column types, and basic statistics.

No core analysis logic is implemented here. All computation is delegated
to the main application and executor modules.



In [2]:
import pandas as pd

# Load a sample CSV (small, <100MB)
df = pd.read_csv("../data/BOOK1.csv")

df.head()


Unnamed: 0,ORDERNUMBER,QUANTITYORDERED,PRICEEACH,ORDERLINENUMBER,SALES,ORDERDATE,STATUS,QTR_ID,MONTH_ID,YEAR_ID,...,ADDRESSLINE1,ADDRESSLINE2,CITY,STATE,POSTALCODE,COUNTRY,TERRITORY,CONTACTLASTNAME,CONTACTFIRSTNAME,DEALSIZE
0,10107,30,95.7,2,2871.0,2/24/2003 0:00,Shipped,1,2,2003,...,897 Long Airport Avenue,,NYC,NY,10022.0,USA,,Yu,Kwai,Small
1,10121,34,81.35,5,2765.9,05-07-2003 00:00,Shipped,2,5,2003,...,59 rue de l'Abbaye,,Reims,,51100.0,France,EMEA,Henriot,Paul,Small
2,10134,41,94.74,2,3884.34,07-01-2003 00:00,Shipped,3,7,2003,...,27 rue du Colonel Pierre Avia,,Paris,,75508.0,France,EMEA,Da Cunha,Daniel,Medium
3,10145,45,83.26,6,3746.7,8/25/2003 0:00,Shipped,3,8,2003,...,78934 Hillside Dr.,,Pasadena,CA,90003.0,USA,,Young,Julie,Medium
4,10159,49,100.0,14,5205.27,10-10-2003 00:00,Shipped,4,10,2003,...,7734 Strong St.,,San Francisco,CA,,USA,,Brown,Julie,Medium


In [3]:
print("Shape:", df.shape)
print("\nColumns:")
for col in df.columns:
    print("-", col)


Shape: (2823, 25)

Columns:
- ORDERNUMBER
- QUANTITYORDERED
- PRICEEACH
- ORDERLINENUMBER
- SALES
- ORDERDATE
- STATUS
- QTR_ID
- MONTH_ID
- YEAR_ID
- PRODUCTLINE
- MSRP
- PRODUCTCODE
- CUSTOMERNAME
- PHONE
- ADDRESSLINE1
- ADDRESSLINE2
- CITY
- STATE
- POSTALCODE
- COUNTRY
- TERRITORY
- CONTACTLASTNAME
- CONTACTFIRSTNAME
- DEALSIZE


In [4]:
df.describe(include="all")

Unnamed: 0,ORDERNUMBER,QUANTITYORDERED,PRICEEACH,ORDERLINENUMBER,SALES,ORDERDATE,STATUS,QTR_ID,MONTH_ID,YEAR_ID,...,ADDRESSLINE1,ADDRESSLINE2,CITY,STATE,POSTALCODE,COUNTRY,TERRITORY,CONTACTLASTNAME,CONTACTFIRSTNAME,DEALSIZE
count,2823.0,2823.0,2823.0,2823.0,2823.0,2823,2823,2823.0,2823.0,2823.0,...,2823,302,2823,1337,2747.0,2823,1749,2823,2823,2823
unique,,,,,,252,6,,,,...,92,9,73,16,73.0,19,3,77,72,3
top,,,,,,11/14/2003 0:00,Shipped,,,,...,"C/ Moralzarzal, 86",Level 3,Madrid,CA,28034.0,USA,EMEA,Freyre,Diego,Medium
freq,,,,,,38,2617,,,,...,259,55,304,416,259.0,1004,1407,259,259,1384
mean,10258.725115,35.092809,83.658544,6.466171,3553.889072,,,2.717676,7.092455,2003.81509,...,,,,,,,,,,
std,92.085478,9.741443,20.174277,4.225841,1841.865106,,,1.203878,3.656633,0.69967,...,,,,,,,,,,
min,10100.0,6.0,26.88,1.0,482.13,,,1.0,1.0,2003.0,...,,,,,,,,,,
25%,10180.0,27.0,68.86,3.0,2203.43,,,2.0,4.0,2003.0,...,,,,,,,,,,
50%,10262.0,35.0,95.7,6.0,3184.8,,,3.0,8.0,2004.0,...,,,,,,,,,,
75%,10333.5,43.0,100.0,9.0,4508.0,,,4.0,11.0,2004.0,...,,,,,,,,,,


**Observations:**
- Dataset contains both numerical and categorical columns
- Suitable for aggregation, comparison, and correlation queries
- No sensitive or private data present
