# Introduction to pandas for Data Analysis

This notebook introduces pandas, Python's powerful data manipulation library, by comparing it with familiar Excel operations.

## Learning Objectives
1. Understand pandas DataFrame basics
2. Learn Excel to pandas translations
3. Perform basic data analysis

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Loading Data (Like Opening an Excel File)

In Excel, you open files directly. In pandas, we use functions like `read_csv()`, `read_excel()`, etc.

In [None]:
# Load the sample sales data
sales_df = pd.read_csv('../../datasets/sales/sample_sales_data.csv')

# Display the first few rows (like scrolling to top in Excel)
print("First 5 rows:")
sales_df.head()

## 2. Basic Excel Operations in pandas

Common Excel operations and their pandas equivalents:

In [None]:
# Data Overview (like Excel's sheet overview)
print("Dataset Shape (rows, columns):", sales_df.shape)
print("\nColumn Names:")
print(sales_df.columns.tolist())

# Basic statistics (like Excel's descriptive statistics)
print("\nNumerical Summary:")
sales_df.describe()

## 3. Filtering Data (Like Excel's Filter Feature)

In [None]:
# Filter electronics products (like Excel filter)
electronics = sales_df[sales_df['Category'] == 'Electronics']

# Filter high-value sales (>$2000)
high_value = sales_df[sales_df['Total_Sales'] > 2000]

print("Electronics Products Summary:")
print(electronics['Total_Sales'].describe())

print("\nNumber of High-Value Sales:", len(high_value))

## 4. Grouping and Aggregating (Like Excel's PivotTables)

In [None]:
# Sales by Region (like a PivotTable)
region_sales = sales_df.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)

# Sales by Category
category_sales = sales_df.groupby('Category').agg({
    'Total_Sales': 'sum',
    'Quantity': 'sum'
}).round(2)

print("Sales by Region:")
print(region_sales)
print("\nSales by Category:")
print(category_sales)

## 5. Data Visualization (Better than Excel Charts!)

In [None]:
# Bar chart of sales by region
plt.figure(figsize=(10, 6))
region_sales.plot(kind='bar')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Practice Exercises

1. Data Loading and Inspection:
   - Load the sales data
   - Display the last 5 rows
   - Show basic information about the dataset

2. Data Analysis:
   - Calculate average sales by product
   - Find the day with highest total sales
   - Calculate the percentage of sales by category

3. Data Visualization:
   - Create a line plot of daily sales
   - Make a pie chart of sales by category
   - Plot quantity sold vs total sales

4. Advanced Analysis:
   - Calculate running total of sales
   - Find products with above-average sales
   - Create a sales performance dashboard