# WiDS 5.0 Market Mood & Moves - Week 1: Pandas Tutorial

**Topic:** Data Manipulation with Pandas

## 1. Setup and Installation
First, we ensure we have the necessary library loaded.

In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 2. Creating Dummy Data
Since we are in a tutorial environment, let's create a synthetic dataset that mimics the stock data we will use in the project.

In [9]:
# Generating a date range
dates = pd.date_range(start='2024-01-01', periods=10, freq='D')

# Creating a DataFrame
data = {
    'Date': dates,
    'Close_Price': [150, 152, 149, 155, 157, 153, 158, 160, 159, 162],
    'Sentiment_Score': np.random.uniform(0, 1, 10), # Random sentiment between 0 and 1
    'Volume': np.random.randint(10000, 50000, 10)
}

df = pd.DataFrame(data)
df.set_index('Date', inplace=True) # Setting Date as the index is crucial for time series

print("Top 5 rows of our dataset:")
display(df.head())

Top 5 rows of our dataset:


Unnamed: 0_level_0,Close_Price,Sentiment_Score,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,150,0.37454,11685
2024-01-02,152,0.950714,10769
2024-01-03,149,0.731994,12433
2024-01-04,155,0.598658,15311
2024-01-05,157,0.156019,47819


## 3. Core Operations

### 3.1 Filtering Data
We often need to isolate rows that meet specific criteria. For example, finding days with high trading volume.

In [10]:
# Filter for days where Volume is greater than 30,000
high_volume_days = df[df['Volume'] > 30000]
print("\nHigh Volume Days:")
display(high_volume_days)


High Volume Days:


Unnamed: 0_level_0,Close_Price,Sentiment_Score,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-05,157,0.156019,47819
2024-01-06,153,0.155995,49188
2024-01-09,159,0.601115,38693


### 3.2 Feature Engineering (Derived Columns)
In finance, raw prices are less useful than **returns**. Let's calculate the daily change.

In [11]:
# Calculating Daily Change: Price_t - Price_{t-1}
df['Price_Change'] = df['Close_Price'].diff()

# Calculating Percentage Return (Daily Returns)
df['Daily_Return'] = df['Close_Price'].pct_change() * 100

display(df.head())

Unnamed: 0_level_0,Close_Price,Sentiment_Score,Volume,Price_Change,Daily_Return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-01-01,150,0.37454,11685,,
2024-01-02,152,0.950714,10769,2.0,1.333333
2024-01-03,149,0.731994,12433,-3.0,-1.973684
2024-01-04,155,0.598658,15311,6.0,4.026846
2024-01-05,157,0.156019,47819,2.0,1.290323


### 3.3 Aggregations
We can calculate summary statistics like mean, sum, or count.

In [12]:
average_price = df['Close_Price'].mean()
max_sentiment = df['Sentiment_Score'].max()

print(f"Average Closing Price: ${average_price:.2f}")
print(f"Maximum Sentiment Score: {max_sentiment:.4f}")

Average Closing Price: $155.50
Maximum Sentiment Score: 0.9507
