# Modern Pandas Exploration

This notebook explores modern features and best practices in Pandas for data manipulation and analysis.

## Table of Contents
1. [Getting Started](#getting-started)
2. [Creating DataFrames](#creating-dataframes)
3. [Data Inspection](#data-inspection)
4. [Data Selection and Filtering](#data-selection)
5. [Data Manipulation](#data-manipulation)
6. [Grouping and Aggregation](#grouping)
7. [Time Series](#time-series)
8. [Data Visualization](#visualization)

## 1. Getting Started <a id='getting-started'></a>

First, let's import the necessary libraries.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

## 2. Creating DataFrames <a id='creating-dataframes'></a>

There are multiple ways to create DataFrames in Pandas.

In [None]:
# Creating a DataFrame from a dictionary
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 28, 32],
    'city': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Boston'],
    'salary': [70000, 85000, 92000, 78000, 88000]
}

df = pd.DataFrame(data)
print("DataFrame from dictionary:")
print(df)

In [None]:
# Creating a DataFrame with dates
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df_timeseries = pd.DataFrame({
    'date': dates,
    'temperature': np.random.randint(60, 80, 10),
    'humidity': np.random.randint(30, 70, 10)
})

print("\nDataFrame with time series data:")
print(df_timeseries.head())

## 3. Data Inspection <a id='data-inspection'></a>

Let's explore various methods to inspect our data.

In [None]:
# Basic information
print("DataFrame shape:", df.shape)
print("\nColumn names:", df.columns.tolist())
print("\nData types:")
print(df.dtypes)

In [None]:
# Summary statistics
print("Summary statistics:")
print(df.describe())

In [None]:
# Detailed information
print("DataFrame info:")
df.info()

## 4. Data Selection and Filtering <a id='data-selection'></a>

Modern Pandas offers powerful ways to select and filter data.

In [None]:
# Selecting columns
print("Selecting a single column:")
print(df['name'])

print("\nSelecting multiple columns:")
print(df[['name', 'age']])

In [None]:
# Filtering rows
print("People older than 30:")
print(df[df['age'] > 30])

print("\nPeople in New York or Boston:")
print(df[df['city'].isin(['New York', 'Boston'])])

In [None]:
# Using query method (modern approach)
print("Using query method - people with salary > 80000:")
print(df.query('salary > 80000'))

In [None]:
# Using loc and iloc
print("Using loc (label-based):")
print(df.loc[0:2, ['name', 'city']])

print("\nUsing iloc (position-based):")
print(df.iloc[0:3, [0, 2]])

## 5. Data Manipulation <a id='data-manipulation'></a>

Let's explore common data manipulation operations.

In [None]:
# Adding new columns
df['salary_k'] = df['salary'] / 1000
df['age_group'] = pd.cut(df['age'], bins=[0, 30, 40, 100], labels=['20s', '30s', '40+'])

print("DataFrame with new columns:")
print(df)

In [None]:
# Applying functions
df['name_upper'] = df['name'].str.upper()
df['name_length'] = df['name'].apply(len)

print("DataFrame with applied functions:")
print(df[['name', 'name_upper', 'name_length']])

In [None]:
# Sorting
print("Sorted by age (descending):")
print(df.sort_values('age', ascending=False)[['name', 'age', 'salary']])

print("\nSorted by multiple columns:")
print(df.sort_values(['age_group', 'salary'], ascending=[True, False])[['name', 'age_group', 'salary']])

## 6. Grouping and Aggregation <a id='grouping'></a>

GroupBy operations are powerful for data aggregation.

In [None]:
# Group by age_group and calculate statistics
print("Average salary by age group:")
print(df.groupby('age_group')['salary'].mean())

print("\nMultiple aggregations:")
print(df.groupby('age_group')['salary'].agg(['mean', 'min', 'max', 'count']))

In [None]:
# Creating a more complex dataset for grouping
sales_data = pd.DataFrame({
    'region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
    'product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'sales': [100, 150, 120, 180, 200, 220, 190, 210],
    'quantity': [10, 15, 12, 18, 20, 22, 19, 21]
})

print("Sales data:")
print(sales_data)

print("\nGrouped by region and product:")
print(sales_data.groupby(['region', 'product']).sum())

## 7. Time Series <a id='time-series'></a>

Pandas excels at handling time series data.

In [None]:
# Set date as index
df_ts = df_timeseries.set_index('date')
print("Time series with date index:")
print(df_ts.head())

In [None]:
# Resampling - create daily averages from hourly data
# Create hourly data first
hourly_dates = pd.date_range('2024-01-01', periods=72, freq='H')
hourly_data = pd.DataFrame({
    'value': np.random.randn(72).cumsum()
}, index=hourly_dates)

print("Daily mean from hourly data:")
print(hourly_data.resample('D').mean().head())

In [None]:
# Rolling windows
print("3-day rolling average temperature:")
df_ts['temp_rolling_3d'] = df_ts['temperature'].rolling(window=3).mean()
print(df_ts[['temperature', 'temp_rolling_3d']])

## 8. Data Visualization <a id='visualization'></a>

Pandas integrates well with matplotlib for quick visualizations.

In [None]:
# Bar chart
df.plot(x='name', y='salary', kind='bar', figsize=(10, 6), title='Salary by Person')
plt.ylabel('Salary ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Line plot for time series
df_ts[['temperature', 'humidity']].plot(figsize=(12, 6), title='Temperature and Humidity Over Time')
plt.ylabel('Value')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Histogram
df['age'].plot(kind='hist', bins=10, figsize=(8, 6), title='Age Distribution', edgecolor='black')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()

In [None]:
# Scatter plot
df.plot(x='age', y='salary', kind='scatter', figsize=(8, 6), title='Age vs Salary', s=100)
plt.xlabel('Age')
plt.ylabel('Salary ($)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Conclusion

This notebook covered the fundamentals of modern Pandas:
- Creating and inspecting DataFrames
- Selecting and filtering data
- Manipulating data with various operations
- Grouping and aggregating data
- Working with time series
- Creating visualizations

Feel free to modify and extend this notebook with your own data and analysis!