# 📊 EDA: PepsiCo Sales and Product Data

This notebook explores two datasets:
- `PEP_stock_data.csv`: Historical PepsiCo stock prices
- `pepsico_products.csv`: Metadata on PepsiCo’s product portfolio

We will:
- Load the datasets
- Examine structure, datatypes, and missing values
- Visualize basic distributions and relationships
- Prepare insights for cleaning and modeling

In [None]:
# Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

# Configure settings
pd.set_option('display.max_columns', 100)
sns.set(style='whitegrid')

In [None]:
# Load datasets
stock_path = '../data/raw/PEP_stock_data.csv'
product_path = '../data/raw/pepsico_products.csv'

df_stock = pd.read_csv(stock_path)
df_products = pd.read_csv(product_path)

# Preview
df_stock.head()

In [None]:
# Inspect stock data
df_stock.info()
df_stock.describe()

In [None]:
# Check nulls and ranges
df_stock.isnull().sum()

In [None]:
# Time range of stock dataset
df_stock['Date'] = pd.to_datetime(df_stock['Date'])
print(f"Earliest date: {df_stock['Date'].min()} | Latest date: {df_stock['Date'].max()}")

In [None]:
# Line plot of closing price
plt.figure(figsize=(12, 4))
sns.lineplot(data=df_stock, x='Date', y='Close')
plt.title('PepsiCo (PEP) Stock Closing Price Over Time')
plt.ylabel('Close Price')
plt.xlabel('Date')
plt.tight_layout()
plt.show()

In [None]:
# Product dataset overview
df_products.head()

In [None]:
# Unique categories, regions, and ownerships
print("Categories:", df_products['Category'].unique())
print("Regions:", df_products['Region Served'].unique())
print("Ownership:", df_products['Ownership'].unique())