# **GTSF IC Quant Mentorship**
### **Python for Quantitative Finance Fundamentals**

**Name:** [Your Name Here]  
**GT Username:** [Your GT Username]  
**Due Date:** October 7, 2025  

---

## **Project Overview**

This project will test your understanding of the fundamental concepts covered in our first four weeks: probability/statistics, time value of money, basic portfolio theory, and financial data analysis using Python.

### **What You'll Demonstrate**
- **Data Handling**: Download, clean, and analyze real financial data
- **Statistical Analysis**: Calculate returns, risk metrics, and correlations  
- **Portfolio Basics**: Apply diversification and risk-return concepts
- **Python Skills**: Use pandas, numpy, and matplotlib effectively

---
## **Setup and Libraries**

In [None]:
# You may only use these libraries (same as original project)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import yfinance as yf

# Configuration
pd.set_option('display.max_columns', 10)
pd.set_option('display.precision', 4)
plt.style.use('default')
plt.rcParams['figure.figsize'] = (10, 6)

print("✅ Environment setup complete!")

---
## **Part 1: Data Collection & Basic Analysis (25 points)**

### **Step 1.1: Download Stock Data**
Download 3 years of data (2021-2024) for these assets and create clean datasets:

**Required Assets:**
- **AAPL** (Apple) - Large tech stock
- **JNJ** (Johnson & Johnson) - Defensive stock  
- **SPY** (S&P 500 ETF) - Market benchmark

In [None]:
def download_stock_data(tickers, start_date, end_date):
    """
    Download adjusted close prices for multiple stocks.
    
    Parameters:
    -----------
    tickers : list
        List of stock symbols
    start_date : str  
        Start date as 'YYYY-MM-DD'
    end_date : str
        End date as 'YYYY-MM-DD'
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame with dates as index, stocks as columns
    """
    ######################
    # YOUR CODE HERE
    # Hint: Use yfinance to download data
    # Hint: Extract only 'Adj Close' prices
    # Hint: Handle any missing data by forward-filling
    ######################
    pass

# Download the data
tickers = ['AAPL', 'JNJ', 'SPY'] 
prices_df = download_stock_data(tickers, '2021-01-01', '2024-01-01')

# Display first few rows and basic info
print("First 5 rows:")
print(prices_df.head())
print(f"\nDataset shape: {prices_df.shape}")
print(f"Date range: {prices_df.index[0].date()} to {prices_df.index[-1].date()}")

### **Step 1.2: Calculate Returns**
Calculate both simple and log returns as covered in the slides:

In [None]:
def calculate_returns(prices_df):
    """
    Calculate simple and log returns from price data.
    
    Returns:
    --------
    tuple: (simple_returns_df, log_returns_df)
    """
    ######################
    # YOUR CODE HERE
    # Simple returns: (P_t / P_{t-1}) - 1
    # Log returns: ln(P_t / P_{t-1})
    ######################
    pass

# Calculate returns
simple_returns, log_returns = calculate_returns(prices_df)

print("Simple returns - first 5 rows:")
print(simple_returns.head())

### **Questions - Part 1**
**Answer these questions in the markdown cell below:**

1. How many trading days of data do you have for each stock?
2. What's the difference between the largest simple return and largest log return for AAPL?
3. Why do we typically use adjusted close prices instead of regular close prices?

### **Your Answers - Part 1:**

1. [Your answer here]

2. [Your answer here]

3. [Your answer here]

---
## **Part 2: Risk and Return Analysis (25 points)**

### **Step 2.1: Basic Statistics**
Calculate the key statistics we covered in class:

In [None]:
def calculate_basic_statistics(returns_df):
    """
    Calculate mean, standard deviation, and annualized metrics.
    
    Returns:
    --------
    pandas.DataFrame
        Table with statistics for each stock
    """
    stats_dict = {}
    
    for stock in returns_df.columns:
        ######################
        # YOUR CODE HERE
        # Calculate:
        # - Daily mean return
        # - Daily standard deviation  
        # - Annualized return (daily_mean * 252)
        # - Annualized volatility (daily_std * sqrt(252))
        # - Sharpe ratio (assume 2% risk-free rate)
        ######################
        pass
    
    return pd.DataFrame(stats_dict).T

# Calculate and display statistics
stats_table = calculate_basic_statistics(simple_returns)
print("Return and Risk Statistics:")
print(stats_table)

### **Step 2.2: Risk-Return Visualization** 
Create the classic risk-return scatter plot from the slides:

In [None]:
def plot_risk_return(stats_df):
    """
    Create a risk-return scatter plot.
    """
    plt.figure(figsize=(10, 8))
    
    ######################
    # YOUR CODE HERE  
    # Create scatter plot with:
    # - x-axis: Annualized Volatility
    # - y-axis: Annualized Return
    # - Label each point with stock symbol
    # - Add proper title and axis labels
    ######################
    
    plt.show()

plot_risk_return(stats_table)

### **Questions - Part 2**

4. Which stock has the highest Sharpe ratio? What does this mean?
5. Does the risk-return relationship match what you'd expect from the slides?
6. How does SPY compare to the individual stocks in terms of risk?

### **Your Answers - Part 2:**

4. [Your answer here]

5. [Your answer here]

6. [Your answer here]

---
## **Part 3: Correlation and Diversification (25 points)**

### **Step 3.1: Correlation Analysis**
Calculate and analyze correlations as discussed in portfolio theory:

In [None]:
def analyze_correlations(returns_df):
    """
    Calculate correlation matrix and analyze diversification benefits.
    """
    # Calculate correlation matrix
    corr_matrix = returns_df.corr()
    
    print("Correlation Matrix:")
    print(corr_matrix)
    
    # Create correlation heatmap
    plt.figure(figsize=(8, 6))
    ######################
    # YOUR CODE HERE
    # Create a heatmap of the correlation matrix
    # Hint: You can use plt.imshow() with a colormap
    # Add colorbar and proper labels
    ######################
    plt.show()
    
    return corr_matrix

correlation_matrix = analyze_correlations(simple_returns)

### **Step 3.2: Portfolio Construction**
Build a simple equal-weighted portfolio and analyze its performance:

In [None]:
def create_equal_weighted_portfolio(returns_df):
    """
    Create an equal-weighted portfolio of AAPL and JNJ.
    (Exclude SPY since it's our benchmark)
    """
    # Create equal-weighted portfolio of individual stocks
    portfolio_returns = returns_df[['AAPL', 'JNJ']].mean(axis=1)
    
    ######################
    # YOUR CODE HERE
    # Calculate portfolio statistics:
    # - Mean return (annualized)  
    # - Standard deviation (annualized)
    # - Sharpe ratio
    ######################
    
    return portfolio_returns

# Create portfolio and compare to individual stocks
portfolio_rets = create_equal_weighted_portfolio(simple_returns)

# Calculate portfolio statistics
port_mean = portfolio_rets.mean() * 252
port_std = portfolio_rets.std() * np.sqrt(252)
port_sharpe = (port_mean - 0.02) / port_std

print(f"Portfolio Statistics:")
print(f"Annual Return: {port_mean:.2%}")
print(f"Annual Volatility: {port_std:.2%}")  
print(f"Sharpe Ratio: {port_sharpe:.3f}")

### **Questions - Part 3**

7. What is the correlation between AAPL and JNJ? Is this good or bad for diversification?
8. How does the portfolio's risk compare to the average risk of AAPL and JNJ individually?
9. Calculate the "diversification benefit": (Average individual volatility) - (Portfolio volatility)

### **Your Answers - Part 3:**

7. [Your answer here]

8. [Your answer here]

9. [Your answer here]

---
## **Part 4: Market Relationships (Beta Analysis) (15 points)**

### **Step 4.1: Beta Calculation**
Calculate beta using the regression approach from the slides:

In [None]:
def calculate_beta(stock_returns, market_returns):
    """
    Calculate beta using linear regression.
    
    Beta = Covariance(Stock, Market) / Variance(Market)
    """
    ######################
    # YOUR CODE HERE
    # Method 1: Using the covariance formula
    # Method 2: Using sklearn.LinearRegression
    # Compare both results
    ######################
    pass

# Calculate betas for AAPL and JNJ vs SPY
aapl_beta = calculate_beta(simple_returns['AAPL'], simple_returns['SPY'])
jnj_beta = calculate_beta(simple_returns['JNJ'], simple_returns['SPY'])

print(f"AAPL Beta: {aapl_beta:.3f}")
print(f"JNJ Beta: {jnj_beta:.3f}")

### **Step 4.2: Beta Visualization**
Create scatter plots showing the relationship between stock and market returns:

In [None]:
def plot_beta_relationship(stock_returns, market_returns, stock_name, beta):
    """
    Create scatter plot of stock vs market returns with regression line.
    """
    plt.figure(figsize=(10, 8))
    
    ######################
    # YOUR CODE HERE
    # Create scatter plot of stock returns vs market returns
    # Add regression line 
    # Include beta value in title
    # Add proper axis labels
    ######################
    
    plt.show()

# Create plots for both stocks
plot_beta_relationship(simple_returns['AAPL'], simple_returns['SPY'], 'AAPL', aapl_beta)
plot_beta_relationship(simple_returns['JNJ'], simple_returns['SPY'], 'JNJ', jnj_beta)

### **Questions - Part 4**

10. Which stock is more sensitive to market movements? How do you know?
11. Based on beta, which stock would you expect to fall more in a market crash?
12. Do the betas make intuitive sense given what you know about these companies?

### **Your Answers - Part 4:**

10. [Your answer here]

11. [Your answer here]

12. [Your answer here]

---
## **Part 5: Time Series Analysis (15 points)**

### **Step 5.1: Cumulative Returns**
Calculate and plot cumulative returns to show total performance:

In [None]:
def analyze_cumulative_returns(returns_df):
    """
    Calculate and plot cumulative returns over time.
    """
    # Calculate cumulative returns (compound growth)
    cum_returns = (1 + returns_df).cumprod()
    
    # Plot cumulative returns
    plt.figure(figsize=(12, 8))
    ######################
    # YOUR CODE HERE
    # Plot cumulative returns for all three assets
    # Add legend, title, and axis labels
    # Show which investment performed best over time
    ######################
    plt.show()
    
    return cum_returns

cumulative_returns = analyze_cumulative_returns(simple_returns)

# Calculate total returns over the period
total_returns = cumulative_returns.iloc[-1] - 1
print("Total Returns over the period:")
for asset in total_returns.index:
    print(f"{asset}: {total_returns[asset]:.2%}")

### **Step 5.2: Rolling Statistics**
Calculate rolling volatility to see how risk changes over time:

In [None]:
def plot_rolling_volatility(returns_df, window=60):
    """
    Plot 60-day rolling volatility for all assets.
    """
    # Calculate rolling standard deviation
    rolling_vol = returns_df.rolling(window=window).std() * np.sqrt(252)
    
    plt.figure(figsize=(12, 8))
    ######################
    # YOUR CODE HERE
    # Plot rolling volatility for all assets
    # Add title indicating the window size
    ######################
    plt.show()

plot_rolling_volatility(simple_returns, window=60)

### **Questions - Part 5**

13. Which asset had the best total return? Was this expected based on risk levels?
14. During which time periods was volatility highest? Can you guess why?
15. How does rolling volatility help us understand changing market conditions?

### **Your Answers - Part 5:**

13. [Your answer here]

14. [Your answer here]

15. [Your answer here]

---
## **Part 6: Summary Analysis and Interpretation (10 points)**

Write a brief analysis (300-500 words) answering these questions:

### **Investment Summary**
Based on your analysis, write responses to these prompts:

1. **Risk-Return Profile**: Summarize the risk and return characteristics of each asset. Which offered the best risk-adjusted returns?

2. **Diversification Benefits**: Explain whether combining AAPL and JNJ in a portfolio provided diversification benefits. Use specific numbers from your analysis.

3. **Market Sensitivity**: Compare how AAPL and JNJ respond to market movements using your beta analysis. What does this mean for an investor?

4. **Time-Varying Risk**: Describe how volatility changed over your sample period. What events might explain these changes?

5. **Investment Recommendation**: If you had to choose between investing in individual stocks or the diversified portfolio, what would you recommend and why?

### **My Analysis**

[Write your 300-500 word analysis here, referencing specific numbers and charts from your work above]

---
## **Bonus Section: Probability Application (5 extra points)**

Apply probability concepts from Week 1 slides:

In [None]:
def simulate_portfolio_outcomes(returns_df, num_simulations=1000, time_horizon=252):
    """
    Use Monte Carlo simulation to project potential portfolio outcomes.
    Assume returns follow a normal distribution.
    Also assume the stock price follows a geometric Brownian motion.
    """
    ######################
    # YOUR CODE HERE
    # 1. Calculate historical mean and std for portfolio
    # 2. Simulate random returns for next year 
    # 3. Calculate final portfolio values
    # 4. Create histogram of outcomes
    # 5. Calculate probability of losing money
    ######################
    pass

# Run simulation
simulate_portfolio_outcomes(simple_returns)

---
## **Submission Checklist**

Before submitting, make sure you have:

- [ ] Filled in your name and GT username at the top
- [ ] Completed all code sections with working implementations
- [ ] Answered all 15 numbered questions in the markdown cells
- [ ] Written the 300-500 word summary analysis
- [ ] All code cells run without errors
- [ ] All plots display correctly
- [ ] Saved the notebook as `quant_intro_project_[GTUsername].ipynb`
- [ ] Exported a PDF version showing all outputs
- [ ] Push your code to your personal Git repo

**Good luck!**