# Introduction to pandas for Data Analysis

This notebook introduces pandas, Python's powerful data manipulation library, by comparing it with familiar Excel operations.

## Learning Objectives
1. Understand pandas DataFrame basics
2. Learn Excel to pandas translations
3. Perform basic data analysis with real sales data

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
plt.style.use('seaborn')
%matplotlib inline

## 1. Loading Data (Like Opening an Excel File)

In Excel, you open files directly. In pandas, we use functions like `read_csv()`, `read_excel()`, etc.

In [None]:
# Load the retail sales data
sales_df = pd.read_csv('../../datasets/sales/retail_sales_data.csv')

# Convert Date column to datetime
sales_df['Date'] = pd.to_datetime(sales_df['Date'])

# Display the first few rows (like scrolling to top in Excel)
print("First 5 rows:")
sales_df.head()

## 2. Basic Excel Operations in pandas

Common Excel operations and their pandas equivalents:

In [None]:
# Data Overview (like Excel's sheet overview)
print("Dataset Shape (rows, columns):", sales_df.shape)
print("\nColumn Names:")
print(sales_df.columns.tolist())

# Basic statistics (like Excel's descriptive statistics)
print("\nNumerical Summary:")
sales_df.describe()

## 3. Filtering Data (Like Excel's Filter Feature)

In [None]:
# Multiple filters (like Excel's filter combinations)
electronics = sales_df[sales_df['Category'] == 'Electronics']
high_value = sales_df[sales_df['Total_Sales'] > 2000]
promotional_sales = sales_df[sales_df['On_Promotion'] == True]

print("Electronics Products Summary:")
print(electronics['Total_Sales'].describe())

print("\nNumber of High-Value Sales:", len(high_value))
print("Number of Promotional Sales:", len(promotional_sales))

## 4. Time-Based Analysis (Like Excel's Date Functions)

In [None]:
# Add time-based columns
sales_df['Month'] = sales_df['Date'].dt.month
sales_df['Day_of_Week'] = sales_df['Date'].dt.day_name()
sales_df['Is_Weekend'] = sales_df['Date'].dt.dayofweek.isin([5, 6])

# Monthly sales analysis
monthly_sales = sales_df.groupby('Month')['Total_Sales'].sum()

# Plot monthly sales
plt.figure(figsize=(12, 6))
monthly_sales.plot(kind='bar')
plt.title('Monthly Sales Distribution')
plt.xlabel('Month')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 5. Grouping and Aggregating (Like Excel's PivotTables)

In [None]:
# Complex pivot table equivalent
pivot_table = sales_df.pivot_table(
    values=['Total_Sales', 'Quantity'],
    index=['Category', 'Region'],
    columns=['On_Promotion'],
    aggfunc={'Total_Sales': 'sum', 'Quantity': 'mean'},
    fill_value=0
)

print("Sales and Quantity by Category, Region, and Promotion Status:")
print(pivot_table)

## 6. Advanced Visualizations

In [None]:
# Create a dashboard-style visualization
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# 1. Category Distribution
category_sales = sales_df.groupby('Category')['Total_Sales'].sum()
category_sales.plot(kind='pie', autopct='%1.1f%%', ax=ax1)
ax1.set_title('Sales Distribution by Category')

# 2. Regional Performance
sns.boxplot(data=sales_df, x='Region', y='Total_Sales', ax=ax2)
ax2.set_title('Sales Distribution by Region')
ax2.tick_params(axis='x', rotation=45)

# 3. Promotional Impact
sns.barplot(data=sales_df, x='Category', y='Total_Sales', hue='On_Promotion', ax=ax3)
ax3.set_title('Promotional vs Regular Sales')
ax3.tick_params(axis='x', rotation=45)

# 4. Price vs Quantity Relationship
sns.scatterplot(data=sales_df, x='Unit_Price', y='Quantity', hue='Category', alpha=0.5, ax=ax4)
ax4.set_title('Price vs Quantity Relationship')

plt.tight_layout()
plt.show()

## Practice Exercises

1. Time Series Analysis:
   - Calculate daily total sales
   - Find the busiest day of the week
   - Compare weekday vs weekend sales

2. Product Analysis:
   - Calculate profit margins (assuming 40% cost)
   - Find best-selling products by quantity
   - Analyze promotional effectiveness

3. Regional Analysis:
   - Create regional sales rankings
   - Find best-performing category by region
   - Analyze regional seasonal patterns

4. Advanced Dashboard:
   - Create a monthly performance dashboard
   - Add trend lines to visualizations
   - Calculate and plot year-to-date totals