# Session 7: Pandas - Mastering Data Manipulation

**Objective:** Learn to use the core data structures of the Pandas library, the Series and DataFrame, to import, inspect, and manipulate financial data.

## Introduction

Pandas is the most popular Python library for data manipulation and analysis. It is the de-facto standard for working with tabular data (like spreadsheets or SQL tables) in Python.

**Why is Pandas essential for finance and economics?**
- **Intuitive Data Structures:** The `DataFrame` object makes working with rows and columns of data simple and efficient.
- **Data Handling:** Easily handles loading data from various formats (CSV, Excel, databases) and deals with common issues like missing data.
- **Time Series Powerhouse:** Pandas has exceptional capabilities for working with time-stamped data, which is fundamental to financial analysis.
- **Integration:** It works seamlessly with other data science libraries like NumPy, Matplotlib, and Scikit-learn.

## 1. Core Data Structures: Series and DataFrame

First, let's import pandas using its standard alias, `pd`.

In [None]:
import pandas as pd

### The Series
A **Series** is a one-dimensional labeled array, like a single column in a spreadsheet. It has an index and the data.

In [None]:
# A Series of stock prices
stock_prices = pd.Series([150.25, 151.00, 149.75, 152.50], index=['Mon', 'Tue', 'Wed', 'Thu'])

print(stock_prices)

### The DataFrame
A **DataFrame** is a two-dimensional labeled data structure, like a full spreadsheet. It's the most common object you'll work with in Pandas.

In [None]:
# Creating a DataFrame from a dictionary
data = {
    'Ticker': ['AAPL', 'MSFT', 'GOOGL'],
    'Price': [172.2, 305.1, 2855.5],
    'Sector': ['Technology', 'Technology', 'Technology']
}

df = pd.DataFrame(data)
print(df)

## 2. Reading Data and Basic Inspection

A common task is to read data from a CSV file. For this example, we'll simulate reading a file by creating the data in a string first.

In [None]:
from io import StringIO

csv_data = """Date,Open,High,Low,Close,Volume
2023-10-02,150.00,152.50,149.80,152.20,1200000
2023-10-03,152.30,153.10,151.50,151.90,980000
2023-10-04,151.80,152.00,148.90,149.50,1500000
2023-10-05,149.60,151.10,149.00,150.80,1100000
2023-10-06,151.00,154.50,150.90,154.20,1800000
2023-10-06,155.00,157.10,149.90,157.10,1900000
2023-10-06,144.00,160.30,135.90,145.60,1600000
2023-10-06,140.00,159.20,150.90,134.50,1700000
"""

# Read the data into a DataFrame
stock_df = pd.read_csv(StringIO(csv_data))

# Let's inspect it
print("--- First 5 Rows (head) ---")
print(stock_df.head())

print("\n--- Data Types and Info ---")
stock_df.info()

print("\n--- Descriptive Statistics ---")
print(stock_df.describe())

## 3. Data Selection and Filtering

Pandas provides powerful and flexible ways to select the exact data you need.

In [None]:
# Select a single column (this returns a Series)
close_prices = stock_df['Close']
print("--- Close Price Column (Series) ---")
print(close_prices)

# Select multiple columns (this returns a DataFrame)
ohlc_data = stock_df[['Open', 'High', 'Low', 'Close']]
print("\n--- OHLC Data (DataFrame) ---")
print(ohlc_data.head())

# Conditional Filtering (Boolean Indexing)
# Find all days where the trading volume was greater than 1,000,000
high_volume_days = stock_df[stock_df['Volume'] > 1000000]
print("\n--- High Volume Trading Days ---")
print(high_volume_days)

---

## Finance Exercise: Basic Stock Data Analysis

**Task:** You will perform some basic calculations on a stock DataFrame to derive new insights.

**Scenario:** We will use the `stock_df` DataFrame created earlier.

In [None]:
print("--- Original Data ---")
print(stock_df)

# Step 1: Create a new column 'Daily_Change' which is the difference between the Close and Open price.
stock_df['Daily_Change'] = stock_df['Close'] - stock_df['Open']

print("\n--- Data with Daily Change ---")
print(stock_df)

# Step 2: Create a new column 'Daily_Return' which is the daily change as a percentage of the opening price.
stock_df['Daily_Return'] = (stock_df['Daily_Change'] / stock_df['Open']) * 100

print("\n--- Data with Daily Return (%) ---")
print(stock_df)

# Step 3: Find the day with the highest closing price.
# .idxmax() returns the index of the first occurrence of the maximum.
max_close_day_index = stock_df['Close'].idxmax()
day_with_max_close = stock_df.loc[max_close_day_index]

print(f"\n--- Day with Highest Closing Price ---")
print(day_with_max_close)

# Step 4: Find all days where the stock closed higher than it opened (a 'green' day).
green_days = stock_df[stock_df['Daily_Change'] > 0]
print("\n--- Days the Stock Closed Higher ---")
print(green_days)