# Introduction to Pandas

Pandas is a Python library for data analysis and manipulation. It is built on top of NumPy and is especially powerful for working with tabular data, such as financial time series, transaction records, or any dataset stored in rows and columns.

In this notebook, we will cover:
- Installing pandas and numpy
- Creating and inspecting Series and DataFrames
- Reading and writing data
- Selecting and filtering data
- Basic transformations and computations
- Grouping and merging data
- Simple financial time series analysis

#### pip install

If not yet done, pandas and numpy must be installed using pip:

In [None]:
!pip install pandas numpy --quiet

## 1. Importing Libraries and Defining Our Example DataFrame

We will define a single DataFrame at the start of this notebook and use it throughout all examples.
Our dataset will represent some example stock market data.

In [None]:
import pandas as pd
import numpy as np

# Define example stock market data
data = {
    "Symbol": ["AAPL", "MSFT", "GOOG", "AMZN", "TSLA"],
    "Price": [150.0, 280.0, 2700.0, 3400.0, 720.0],
    "Volume": [1000000, 800000, 1200000, 500000, 1500000],
    "Sector": ["Tech", "Tech", "Tech", "E-Commerce", "Automotive"]
}

df = pd.DataFrame(data)
df

### Save to CSV and Reload

Saving and reloading data is an important part of the data analysis workflow.

In [None]:
# Save DataFrame to CSV
df.to_csv("stocks.csv", index=False)

# Reload from CSV
df = pd.read_csv("stocks.csv")
df

## 2. Pandas Data Structures

Pandas provides two main data structures:
- **Series**: a 1D labeled array
- **DataFrame**: a 2D table of data with labeled rows and columns

In [None]:
# Series example: selecting the Price column
prices_series = df["Price"]
prices_series

In [None]:
# DataFrame example: selecting multiple columns
df[["Symbol", "Price"]]

## 3. Inspecting Data

Useful methods for exploring datasets:

In [None]:
df.head()  # First rows
df.tail(2) # Last rows
df.info()  # Structure
df.describe()  # Summary statistics

## 4. Selecting and Filtering Data

In [None]:
# Selecting one column
df["Price"]

# Selecting multiple columns
df[["Symbol", "Volume"]]

# Filtering by condition
df[df["Price"] > 1000]

## 5. Basic Transformations

In [None]:
# Add MarketCap column
df["MarketCap"] = df["Price"] * df["Volume"]

# Add formatted price
df["PriceUSD"] = df["Price"].apply(lambda x: f"${x:.2f}")
df

## 6. Grouping Data with groupby

The `groupby` method is used to split data into groups, apply functions, and combine results.

In [None]:
# Example: group by Sector and compute average price and volume
df.groupby("Sector")["Price", "Volume"].mean()

In [None]:
# Example: multiple aggregation functions
df.groupby("Sector").agg({
    "Price": ["mean", "max"],
    "Volume": "sum"
})

## 7. Combining Data with merge

`merge` allows you to combine two DataFrames using common columns (like SQL joins).

In [None]:
# Example: merging stock info with sector performance
sector_perf = pd.DataFrame({
    "Sector": ["Tech", "E-Commerce", "Automotive"],
    "YTD_Return": [0.15, 0.10, 0.25]
})

merged_df = pd.merge(df, sector_perf, on="Sector", how="left")
merged_df

## 8. Simple Financial Time Series Example

In [None]:
# Simulating daily prices for AAPL
dates = pd.date_range(start="2023-01-01", periods=5, freq="D")
prices = pd.Series([150, 152, 151, 153, 155], index=dates)

# Daily returns
returns = prices.pct_change()

# Cumulative returns
cumulative = (1 + returns).cumprod() - 1

pd.DataFrame({
    "Price": prices,
    "Daily Return": returns,
    "Cumulative Return": cumulative
})