# Comprehensive Pandas Tutorial

This notebook provides a comprehensive guide to pandas, covering everything from basics to advanced operations.

## Table of Contents
1. [Introduction to Pandas](#introduction)
2. [Creating DataFrames](#creating-dataframes)
3. [Reading and Writing Data](#reading-writing)
4. [Data Inspection](#data-inspection)
5. [Data Selection and Indexing](#selection-indexing)
6. [Data Cleaning](#data-cleaning)
7. [Data Manipulation](#data-manipulation)
8. [Grouping and Aggregation](#grouping-aggregation)
9. [Merging and Joining](#merging-joining)
10. [Time Series](#time-series)
11. [Advanced Operations](#advanced-operations)
12. [Practice Exercises](#practice-exercises)

## 1. Introduction to Pandas <a id='introduction'></a>

Pandas is a powerful Python library for data manipulation and analysis. It provides two main data structures:
- **Series**: 1-dimensional labeled array
- **DataFrame**: 2-dimensional labeled data structure (like a table)

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Pandas version: 2.2.3
NumPy version: 2.1.2


### Creating a Series

In [None]:
# Create a simple Series
s = pd.Series([10, 20, 30, 40, 50])
print("Simple Series:")
print(s)
print("\nSeries with custom index:")
s_indexed = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print(s_indexed)

## 2. Creating DataFrames <a id='creating-dataframes'></a>

Let's create comprehensive dummy datasets for our tutorial.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Create Employee Dataset
n_employees = 100

employee_data = {
    'employee_id': range(1001, 1001 + n_employees),
    'name': [f"Employee_{i}" for i in range(1, n_employees + 1)],
    'department': np.random.choice(['Sales', 'Engineering', 'HR', 'Marketing', 'Finance'], n_employees),
    'age': np.random.randint(22, 60, n_employees),
    'salary': np.random.randint(40000, 150000, n_employees).astype(float),  # Convert to float to allow NaN
    'years_experience': np.random.randint(0, 30, n_employees),
    'performance_score': np.random.uniform(2.5, 5.0, n_employees).round(2),
    'city': np.random.choice(['New York', 'San Francisco', 'Chicago', 'Boston', 'Seattle'], n_employees),
    'hire_date': [datetime(2020, 1, 1) + timedelta(days=int(x)) for x in np.random.randint(0, 1460, n_employees)]
}

# Add some missing values intentionally
employee_data['performance_score'][np.random.choice(n_employees, 10, replace=False)] = np.nan
employee_data['salary'][np.random.choice(n_employees, 5, replace=False)] = np.nan

df_employees = pd.DataFrame(employee_data)

print("Employee Dataset created!")
print(f"Shape: {df_employees.shape}")
df_employees.head(10)

In [None]:
# Create Sales Dataset
n_sales = 500

sales_data = {
    'sale_id': range(5001, 5001 + n_sales),
    'employee_id': np.random.choice(df_employees['employee_id'].values, n_sales),
    'product': np.random.choice(['Product_A', 'Product_B', 'Product_C', 'Product_D', 'Product_E'], n_sales),
    'quantity': np.random.randint(1, 50, n_sales),
    'unit_price': np.random.uniform(10, 500, n_sales).round(2),
    'sale_date': [datetime(2023, 1, 1) + timedelta(days=int(x)) for x in np.random.randint(0, 365, n_sales)],
    'region': np.random.choice(['North', 'South', 'East', 'West'], n_sales)
}

df_sales = pd.DataFrame(sales_data)
df_sales['total_amount'] = (df_sales['quantity'] * df_sales['unit_price']).round(2)

print("Sales Dataset created!")
print(f"Shape: {df_sales.shape}")
df_sales.head(10)

In [None]:
# Create Customer Dataset
n_customers = 200

customer_data = {
    'customer_id': range(2001, 2001 + n_customers),
    'customer_name': [f"Customer_{i}" for i in range(1, n_customers + 1)],
    'email': [f"customer{i}@email.com" for i in range(1, n_customers + 1)],
    'country': np.random.choice(['USA', 'Canada', 'UK', 'Germany', 'France', 'Japan'], n_customers),
    'signup_date': [datetime(2022, 1, 1) + timedelta(days=int(x)) for x in np.random.randint(0, 730, n_customers)],
    'total_purchases': np.random.randint(0, 50, n_customers),
    'lifetime_value': np.random.uniform(100, 10000, n_customers).round(2)
}

df_customers = pd.DataFrame(customer_data)

print("Customer Dataset created!")
print(f"Shape: {df_customers.shape}")
df_customers.head(10)

### Creating DataFrames from Different Sources

In [None]:
# From dictionary
df_dict = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
print("DataFrame from dictionary:")
print(df_dict)

# From list of lists
df_list = pd.DataFrame(
    [[1, 4, 7], [2, 5, 8], [3, 6, 9]],
    columns=['A', 'B', 'C']
)
print("\nDataFrame from list of lists:")
print(df_list)

# From numpy array
df_numpy = pd.DataFrame(
    np.random.randn(5, 3),
    columns=['X', 'Y', 'Z']
)
print("\nDataFrame from numpy array:")
print(df_numpy)

## 3. Reading and Writing Data <a id='reading-writing'></a>

In [None]:
# Save our datasets to CSV files
df_employees.to_csv('employees.csv', index=False)
df_sales.to_csv('sales.csv', index=False)
df_customers.to_csv('customers.csv', index=False)

print("Datasets saved to CSV files!")

# Read from CSV
df_read = pd.read_csv('employees.csv')
print("\nDataset read from CSV:")
print(df_read.head())

In [None]:
# Save to Excel (requires openpyxl)
try:
    with pd.ExcelWriter('company_data.xlsx') as writer:
        df_employees.to_excel(writer, sheet_name='Employees', index=False)
        df_sales.to_excel(writer, sheet_name='Sales', index=False)
        df_customers.to_excel(writer, sheet_name='Customers', index=False)
    print("Data saved to Excel file!")
except ImportError:
    print("openpyxl not installed. Install with: pip install openpyxl")

## 4. Data Inspection <a id='data-inspection'></a>

In [None]:
# Basic information
print("Dataset Info:")
print(df_employees.info())

print("\n" + "="*50)
print("Dataset Shape:", df_employees.shape)
print("Number of rows:", len(df_employees))
print("Number of columns:", len(df_employees.columns))
print("Column names:", df_employees.columns.tolist())

In [None]:
# First and last rows
print("First 5 rows:")
display(df_employees.head())

print("\nLast 5 rows:")
display(df_employees.tail())

print("\nRandom 5 rows:")
display(df_employees.sample(5))

In [None]:
# Statistical summary
print("Statistical Summary:")
display(df_employees.describe())

print("\nSummary for all columns (including non-numeric):")
display(df_employees.describe(include='all'))

In [None]:
# Data types
print("Data types:")
print(df_employees.dtypes)

print("\nMemory usage:")
print(df_employees.memory_usage(deep=True))

In [None]:
# Missing values
print("Missing values count:")
print(df_employees.isnull().sum())

print("\nMissing values percentage:")
print((df_employees.isnull().sum() / len(df_employees) * 100).round(2))

In [None]:
# Value counts
print("Department distribution:")
print(df_employees['department'].value_counts())

print("\nDepartment distribution (normalized):")
print(df_employees['department'].value_counts(normalize=True))

## 5. Data Selection and Indexing <a id='selection-indexing'></a>

In [None]:
# Selecting columns
print("Single column (Series):")
print(df_employees['name'].head())

print("\nMultiple columns (DataFrame):")
print(df_employees[['name', 'department', 'salary']].head())

In [None]:
# Selecting rows by position (iloc)
print("First row:")
print(df_employees.iloc[0])

print("\nFirst 5 rows:")
print(df_employees.iloc[:5])

print("\nSpecific rows and columns:")
print(df_employees.iloc[0:5, 1:4])

In [None]:
# Selecting rows by label (loc)
print("Selecting by label:")
print(df_employees.loc[0:4, ['name', 'department', 'salary']])

In [None]:
# Boolean indexing
print("Employees in Engineering department:")
engineering = df_employees[df_employees['department'] == 'Engineering']
print(engineering.head())

print("\nEmployees with salary > 100000:")
high_earners = df_employees[df_employees['salary'] > 100000]
print(high_earners.head())

In [None]:
# Multiple conditions
print("Engineering employees with salary > 80000:")
filtered = df_employees[
    (df_employees['department'] == 'Engineering') & 
    (df_employees['salary'] > 80000)
]
print(filtered.head())

print("\nEmployees in Sales OR Marketing:")
sales_marketing = df_employees[
    (df_employees['department'] == 'Sales') | 
    (df_employees['department'] == 'Marketing')
]
print(sales_marketing.head())

In [None]:
# Using isin() for multiple values
print("Employees in specific departments:")
specific_depts = df_employees[df_employees['department'].isin(['Sales', 'Engineering', 'HR'])]
print(specific_depts.head())

print("\nEmployees in specific cities:")
specific_cities = df_employees[df_employees['city'].isin(['New York', 'San Francisco'])]
print(specific_cities.head())

## 6. Data Cleaning <a id='data-cleaning'></a>

In [None]:
# Handling missing values
print("Missing values before cleaning:")
print(df_employees.isnull().sum())

# Create a copy for cleaning
df_clean = df_employees.copy()

# Fill missing performance scores with median
df_clean['performance_score'].fillna(df_clean['performance_score'].median(), inplace=True)

# Fill missing salaries with mean by department
df_clean['salary'] = df_clean.groupby('department')['salary'].transform(
    lambda x: x.fillna(x.mean())
)

print("\nMissing values after cleaning:")
print(df_clean.isnull().sum())

In [None]:
# Dropping rows with missing values
df_dropped = df_employees.dropna()
print(f"Original shape: {df_employees.shape}")
print(f"After dropping NaN: {df_dropped.shape}")

# Dropping columns with missing values
df_dropped_cols = df_employees.dropna(axis=1)
print(f"After dropping columns with NaN: {df_dropped_cols.shape}")

In [None]:
# Removing duplicates
print(f"Duplicates in employees: {df_employees.duplicated().sum()}")

# Create some duplicates for demonstration
df_with_dupes = pd.concat([df_employees, df_employees.head(5)], ignore_index=True)
print(f"\nDuplicates after adding: {df_with_dupes.duplicated().sum()}")

# Remove duplicates
df_no_dupes = df_with_dupes.drop_duplicates()
print(f"After removing duplicates: {df_no_dupes.duplicated().sum()}")

In [None]:
# Data type conversion
print("Original data types:")
print(df_clean.dtypes)

# Convert hire_date to datetime
df_clean['hire_date'] = pd.to_datetime(df_clean['hire_date'])

# Convert department to category
df_clean['department'] = df_clean['department'].astype('category')

print("\nUpdated data types:")
print(df_clean.dtypes)

In [None]:
# Renaming columns
df_renamed = df_clean.rename(columns={
    'employee_id': 'emp_id',
    'years_experience': 'experience_years'
})
print("Renamed columns:")
print(df_renamed.columns.tolist())

In [None]:
# String operations
print("Original names:")
print(df_clean['name'].head())

# Convert to uppercase
df_clean['name_upper'] = df_clean['name'].str.upper()
print("\nUppercase names:")
print(df_clean['name_upper'].head())

# Extract parts of strings
df_clean['employee_number'] = df_clean['name'].str.extract(r'(\d+)')
print("\nExtracted employee numbers:")
print(df_clean[['name', 'employee_number']].head())

## 7. Data Manipulation <a id='data-manipulation'></a>

In [None]:
# Adding new columns
df_clean['salary_per_year_exp'] = (df_clean['salary'] / (df_clean['years_experience'] + 1)).round(2)
df_clean['age_group'] = pd.cut(df_clean['age'], bins=[0, 30, 40, 50, 100], labels=['20-30', '31-40', '41-50', '50+'])

print("New columns added:")
print(df_clean[['name', 'age', 'age_group', 'salary', 'years_experience', 'salary_per_year_exp']].head(10))

In [None]:
# Apply function
def categorize_performance(score):
    if pd.isna(score):
        return 'Unknown'
    elif score >= 4.5:
        return 'Excellent'
    elif score >= 3.5:
        return 'Good'
    elif score >= 2.5:
        return 'Average'
    else:
        return 'Below Average'

df_clean['performance_category'] = df_clean['performance_score'].apply(categorize_performance)

print("Performance categories:")
print(df_clean[['name', 'performance_score', 'performance_category']].head(10))

In [None]:
# Sorting
print("Top 10 highest paid employees:")
print(df_clean.nlargest(10, 'salary')[['name', 'department', 'salary']])

print("\nBottom 10 lowest paid employees:")
print(df_clean.nsmallest(10, 'salary')[['name', 'department', 'salary']])

In [None]:
# Sort by multiple columns
print("Sorted by department and salary:")
sorted_df = df_clean.sort_values(['department', 'salary'], ascending=[True, False])
print(sorted_df[['name', 'department', 'salary']].head(15))

In [None]:
# Ranking
df_clean['salary_rank'] = df_clean['salary'].rank(ascending=False, method='min')
df_clean['dept_salary_rank'] = df_clean.groupby('department')['salary'].rank(ascending=False, method='min')

print("Salary rankings:")
print(df_clean[['name', 'department', 'salary', 'salary_rank', 'dept_salary_rank']].head(10))

## 8. Grouping and Aggregation <a id='grouping-aggregation'></a>

In [None]:
# Simple groupby
print("Average salary by department:")
dept_avg_salary = df_clean.groupby('department')['salary'].mean().round(2)
print(dept_avg_salary)

print("\nEmployee count by department:")
dept_count = df_clean.groupby('department').size()
print(dept_count)

In [None]:
# Multiple aggregations
print("Salary statistics by department:")
dept_stats = df_clean.groupby('department')['salary'].agg([
    'count', 'mean', 'median', 'min', 'max', 'std'
]).round(2)
print(dept_stats)

In [None]:
# Aggregating multiple columns
print("Multiple column aggregations:")
multi_agg = df_clean.groupby('department').agg({
    'salary': ['mean', 'max'],
    'age': ['mean', 'min', 'max'],
    'years_experience': 'mean',
    'performance_score': 'mean'
}).round(2)
print(multi_agg)

In [None]:
# Custom aggregation functions
def salary_range(x):
    return x.max() - x.min()

print("Custom aggregations:")
custom_agg = df_clean.groupby('department')['salary'].agg([
    ('avg_salary', 'mean'),
    ('salary_range', salary_range),
    ('total_payroll', 'sum')
]).round(2)
print(custom_agg)

In [None]:
# Groupby multiple columns
print("Groupby department and city:")
dept_city_stats = df_clean.groupby(['department', 'city']).agg({
    'salary': 'mean',
    'employee_id': 'count'
}).round(2)
dept_city_stats.columns = ['avg_salary', 'employee_count']
print(dept_city_stats.head(15))

In [None]:
# Pivot tables
print("Pivot table - Average salary by department and city:")
pivot = pd.pivot_table(
    df_clean,
    values='salary',
    index='department',
    columns='city',
    aggfunc='mean',
    fill_value=0
).round(2)
print(pivot)

In [None]:
# Cross-tabulation
print("Cross-tabulation - Department vs City:")
crosstab = pd.crosstab(
    df_clean['department'],
    df_clean['city'],
    margins=True,
    margins_name='Total'
)
print(crosstab)

## 9. Merging and Joining <a id='merging-joining'></a>

In [None]:
# Inner join
print("Inner join - Employees and Sales:")
inner_merged = pd.merge(
    df_employees[['employee_id', 'name', 'department']],
    df_sales[['sale_id', 'employee_id', 'product', 'total_amount']],
    on='employee_id',
    how='inner'
)
print(inner_merged.head(10))
print(f"\nShape: {inner_merged.shape}")

In [None]:
# Left join
print("Left join - All employees with their sales:")
left_merged = pd.merge(
    df_employees[['employee_id', 'name', 'department']],
    df_sales.groupby('employee_id').agg({
        'sale_id': 'count',
        'total_amount': 'sum'
    }).reset_index(),
    on='employee_id',
    how='left'
)
left_merged.columns = ['employee_id', 'name', 'department', 'num_sales', 'total_sales']
left_merged.fillna(0, inplace=True)
print(left_merged.head(10))

In [None]:
# Concatenating DataFrames
print("Concatenating vertically (rows):")
df1 = df_employees.head(5)
df2 = df_employees.tail(5)
concat_vertical = pd.concat([df1, df2], ignore_index=True)
print(concat_vertical[['employee_id', 'name', 'department']])

print("\nConcatenating horizontally (columns):")
df_extra = pd.DataFrame({
    'bonus': np.random.randint(1000, 10000, 5)
})
concat_horizontal = pd.concat([df1[['name', 'salary']].reset_index(drop=True), df_extra], axis=1)
print(concat_horizontal)

## 10. Time Series <a id='time-series'></a>

In [None]:
# Working with dates
df_sales['sale_date'] = pd.to_datetime(df_sales['sale_date'])

# Extract date components
df_sales['year'] = df_sales['sale_date'].dt.year
df_sales['month'] = df_sales['sale_date'].dt.month
df_sales['day'] = df_sales['sale_date'].dt.day
df_sales['day_of_week'] = df_sales['sale_date'].dt.day_name()
df_sales['quarter'] = df_sales['sale_date'].dt.quarter

print("Date components:")
print(df_sales[['sale_date', 'year', 'month', 'day', 'day_of_week', 'quarter']].head(10))

In [None]:
# Time-based grouping
print("Monthly sales:")
monthly_sales = df_sales.groupby(df_sales['sale_date'].dt.to_period('M')).agg({
    'total_amount': 'sum',
    'sale_id': 'count'
}).round(2)
monthly_sales.columns = ['total_revenue', 'num_sales']
print(monthly_sales.head(12))

In [None]:
# Resampling (set date as index first)
df_sales_indexed = df_sales.set_index('sale_date').sort_index()

print("Weekly sales (resampled):")
weekly_sales = df_sales_indexed['total_amount'].resample('W').sum().round(2)
print(weekly_sales.head(10))

In [None]:
# Rolling windows
print("7-day rolling average of sales:")
daily_sales = df_sales_indexed['total_amount'].resample('D').sum()
rolling_avg = daily_sales.rolling(window=7).mean().round(2)
print(rolling_avg.head(20))

In [None]:
# Date filtering
print("Sales in Q1 2023:")
q1_sales = df_sales[
    (df_sales['sale_date'] >= '2023-01-01') & 
    (df_sales['sale_date'] <= '2023-03-31')
]
print(f"Total Q1 sales: ${q1_sales['total_amount'].sum():,.2f}")
print(f"Number of transactions: {len(q1_sales)}")

## 11. Advanced Operations <a id='advanced-operations'></a>

In [None]:
# Window functions
print("Cumulative sum of sales by employee:")
df_sales_sorted = df_sales.sort_values(['employee_id', 'sale_date'])
df_sales_sorted['cumulative_sales'] = df_sales_sorted.groupby('employee_id')['total_amount'].cumsum()
print(df_sales_sorted[['employee_id', 'sale_date', 'total_amount', 'cumulative_sales']].head(15))

In [None]:
# Shift and lag
print("Previous sale amount (lag):")
df_sales_sorted['prev_sale'] = df_sales_sorted.groupby('employee_id')['total_amount'].shift(1)
df_sales_sorted['sale_change'] = df_sales_sorted['total_amount'] - df_sales_sorted['prev_sale']
print(df_sales_sorted[['employee_id', 'total_amount', 'prev_sale', 'sale_change']].head(15))

In [None]:
# Binning and discretization
print("Salary bins:")
df_clean['salary_bin'] = pd.cut(
    df_clean['salary'],
    bins=[0, 50000, 75000, 100000, 150000],
    labels=['Low', 'Medium', 'High', 'Very High']
)
print(df_clean['salary_bin'].value_counts().sort_index())

print("\nQuantile-based bins:")
df_clean['salary_quantile'] = pd.qcut(
    df_clean['salary'],
    q=4,
    labels=['Q1', 'Q2', 'Q3', 'Q4']
)
print(df_clean['salary_quantile'].value_counts().sort_index())

In [None]:
# Melt and pivot
print("Original wide format:")
wide_df = df_clean.groupby('department')[['salary', 'age', 'years_experience']].mean().round(2)
print(wide_df)

print("\nMelted to long format:")
long_df = wide_df.reset_index().melt(
    id_vars='department',
    var_name='metric',
    value_name='value'
)
print(long_df.head(10))

In [None]:
# Query method
print("Using query method:")
result = df_clean.query('department == "Engineering" and salary > 80000 and age < 40')
print(result[['name', 'department', 'age', 'salary']].head())

In [None]:
# Correlation matrix
print("Correlation matrix:")
numeric_cols = ['age', 'salary', 'years_experience', 'performance_score']
correlation = df_clean[numeric_cols].corr().round(3)
print(correlation)

In [None]:
# Exploding lists
print("Exploding lists:")
df_with_lists = pd.DataFrame({
    'employee': ['Alice', 'Bob', 'Charlie'],
    'skills': [['Python', 'SQL'], ['Java', 'C++', 'Python'], ['R', 'SQL', 'Excel']]
})
print("Before explode:")
print(df_with_lists)

df_exploded = df_with_lists.explode('skills')
print("\nAfter explode:")
print(df_exploded)

## 12. Practice Exercises <a id='practice-exercises'></a>

Try these exercises to test your understanding:

### Exercise 1: Basic Analysis
Find the top 5 departments by average salary and show the employee count for each.

In [None]:
# Your solution here


### Exercise 2: Data Filtering
Find all employees who:
- Are in the Engineering or Sales department
- Have more than 5 years of experience
- Have a performance score above 4.0
- Earn more than the median salary of their department

In [None]:
# Your solution here


### Exercise 3: Sales Analysis
Calculate the total sales revenue by product and region, and find which product-region combination has the highest revenue.

In [None]:
# Your solution here


### Exercise 4: Time Series Analysis
Calculate the month-over-month growth rate of sales revenue for 2023.

In [None]:
# Your solution here


### Exercise 5: Advanced Grouping
For each department, find:
- The employee with the highest salary
- The employee with the most years of experience
- The average performance score

In [None]:
# Your solution here


## Solutions to Exercises

In [None]:
# Solution 1
print("Solution 1: Top 5 departments by average salary")
dept_analysis = df_clean.groupby('department').agg({
    'salary': 'mean',
    'employee_id': 'count'
}).round(2)
dept_analysis.columns = ['avg_salary', 'employee_count']
print(dept_analysis.nlargest(5, 'avg_salary'))

In [None]:
# Solution 2
print("Solution 2: Filtered employees")
dept_median_salary = df_clean.groupby('department')['salary'].transform('median')
filtered_employees = df_clean[
    (df_clean['department'].isin(['Engineering', 'Sales'])) &
    (df_clean['years_experience'] > 5) &
    (df_clean['performance_score'] > 4.0) &
    (df_clean['salary'] > dept_median_salary)
]
print(filtered_employees[['name', 'department', 'years_experience', 'performance_score', 'salary']])

In [None]:
# Solution 3
print("Solution 3: Sales revenue by product and region")
product_region_sales = df_sales.groupby(['product', 'region'])['total_amount'].sum().round(2)
print(product_region_sales.sort_values(ascending=False).head(10))
print(f"\nHighest revenue combination: {product_region_sales.idxmax()} with ${product_region_sales.max():,.2f}")

In [None]:
# Solution 4
print("Solution 4: Month-over-month growth rate")
monthly_revenue = df_sales.groupby(df_sales['sale_date'].dt.to_period('M'))['total_amount'].sum()
mom_growth = monthly_revenue.pct_change() * 100
print(mom_growth.round(2))

In [None]:
# Solution 5
print("Solution 5: Department analysis")

# Highest salary employee per department
highest_salary = df_clean.loc[df_clean.groupby('department')['salary'].idxmax()]
print("Highest salary per department:")
print(highest_salary[['department', 'name', 'salary']])

# Most experienced employee per department
most_experienced = df_clean.loc[df_clean.groupby('department')['years_experience'].idxmax()]
print("\nMost experienced per department:")
print(most_experienced[['department', 'name', 'years_experience']])

# Average performance score
avg_performance = df_clean.groupby('department')['performance_score'].mean().round(2)
print("\nAverage performance score per department:")
print(avg_performance)

## Summary

This tutorial covered:
1. ✅ Creating and inspecting DataFrames
2. ✅ Reading and writing data
3. ✅ Data selection and indexing
4. ✅ Data cleaning and preprocessing
5. ✅ Data manipulation and transformation
6. ✅ Grouping and aggregation
7. ✅ Merging and joining datasets
8. ✅ Time series operations
9. ✅ Advanced pandas operations
10. ✅ Practice exercises with solutions

### Next Steps
- Explore visualization with matplotlib and seaborn
- Learn about pandas performance optimization
- Practice with real-world datasets
- Explore pandas integration with SQL databases