# FINMA Python Lab 08: NumPy, Pandas and Matplotlib for Financial Analysis

## Overview

In this lab, you'll practice:
- NumPy arrays and operations
- Financial calculations with NumPy
- Pandas DataFrames for tabular data
- Reading and writing data with Pandas
- Data manipulation and aggregation
- Time series analysis
- Calculating financial metrics
- Portfolio analysis

**Important:** Complete your work and have it manually checked by your instructor.



---

## Programming Exercises

### Exercise 1: Portfolio Value Calculation (NumPy)

Using NumPy arrays:
1. Create arrays for: symbols, shares, and prices
2. Calculate the value of each position
3. Calculate the total portfolio value
4. Calculate the percentage allocation for each position

**Data:**
- AAPL: 150 shares at $163.75
- GOOGL: 75 shares at $155.30
- MSFT: 100 shares at $406.20
- AMZN: 60 shares at $200.45
- TSLA: 40 shares at $276.30

In [31]:
# Exercise 1: Portfolio Value Calculation
# Write your code here
import numpy as np
symbols = np.array(['AAPL', 'GOOGL', 'MSFT', 'AMZN', "TSLA"])
shares = np.array([150, 75, 100, 60, 40])
price = np.array([163.75, 155.30, 406.20, 200.45, 276.30])

portfolio_value= shares*price
total_value = np.sum(portfolio_value)
allocation=portfolio_value/total_value*100

print("Portfolio Value per Stock:", portfolio_value)
print("Total Portfolio Value:", total_value)
print("Allocation per Stock:", allocation)

Portfolio Value per Stock: [24562.5 11647.5 40620.  12027.  11052. ]
Total Portfolio Value: 99909.0
Allocation per Stock: [24.58487223 11.65810888 40.65699787 12.03795454 11.06206648]


### Exercise 2: Return Statistics (NumPy)

Given daily returns for a stock, calculate:
1. Average daily return
2. Volatility (standard deviation)
3. Annualized return (assume 252 trading days)
4. Annualized volatility
5. Sharpe ratio (assume risk-free rate = 2%)

**Formula:** Sharpe Ratio = (Annualized Return - Risk Free Rate) / Annualized Volatility

**Data:** Use returns array: `[1.2, -0.5, 0.8, 1.5, -0.3, 0.9, 1.1, -0.7, 1.3, 0.6]`

In [32]:
# Exercise 2: Return Statistics
# Write your code here



### Exercise 3: Price Matrix Analysis (NumPy)

Create a 2D NumPy array representing 10 days of prices for 5 stocks.

Calculate:
1. Average price for each stock (across all days)
2. Highest price for each stock
3. Lowest price for each stock
4. Which stock had the highest average price?
5. Which day had the highest average price across all stocks?

**Use the first 10 days from stock_prices_timeseries.csv**

In [33]:
# Exercise 3: Price Matrix Analysis
# Write your code here



### Exercise 4: DataFrame Creation and Analysis (Pandas)

Create a DataFrame from the company_info.csv file and:
1. Display the first 5 rows
2. Show summary statistics
3. Find the company with the highest market cap
4. Find the company with the highest P/E ratio
5. Calculate the average dividend yield by sector
6. List all Technology sector companies

In [34]:
# Exercise 4: DataFrame Creation and Analysis
# Write your code here
import pandas as pd 
import csv

    
df = pd.read_csv("company_info.csv")
#print(df)
#print(df.head())
#print(df.describe())

#sorting by max market cap
sorteddf = df.sort_values("market_cap", ascending=False)
max_comp = sorteddf.at[0, "company"]
max_marketcap = sorteddf.at[0, "market_cap"]
##print(max_comp)
#print(max_marketcap)
#print("Company with max market cap is", max_comp, "at", max_marketcap )
#one easy way 
#df["market_cap"].idxmax() #returns the index of what we are looking for 
df.loc[df["market_cap"].idxmax()]
#print(df["market_cap"].nlargest(4).index[3])
#f.nlargest(3, 'market_cap').iloc[2]

#highest PE ratio 
df.loc[df["pe_ratio"].idxmax()]

#avg yield overall
avg_yield = (df["dividend_yield"]).mean()
#print("average dividend yield is", round(avg_yield, 2))

#avg yield per sector
sector_sort = df.sort_values("sector")
#print(sector_sort)
print(df[df["sector"]=="Technology"]["dividend_yield"].mean())
#avg yield per sector - using groupby --- IGNORE ---
sector_yields = df.groupby("sector")["dividend_yield"].mean()
print(sector_yields)




0.18571428571428572
sector
Consumer Discretionary    2.500000
Consumer Staples          2.333333
Energy                    3.633333
Financials                2.533333
Healthcare                2.366667
Technology                0.185714
Name: dividend_yield, dtype: float64


### Exercise 5: Transaction Analysis (Pandas)

Read transactions.csv and:
1. Calculate total shares bought and sold for each symbol
2. Calculate total commission paid
3. Calculate average buy price and average sell price for each symbol
4. Determine current holdings (shares bought - shares sold)
5. Export a summary to CSV with columns: symbol, shares_held, avg_buy_price, total_commission

In [35]:
# Exercise 5: Transaction Analysis
# Write your code here
import pandas as pd
trades = pd.read_csv("transactions.csv")

#print("Total commission is", round(trades["commission"].sum(), 2))
avg_prices = trades.groupby(["symbol", "action"]).agg(  {"price": "mean", "shares": "sum"})
#print(avg_prices)

BUYS = trades.loc[trades["action"] =="BUY"].groupby("symbol")["shares"].sum()
SELLS = trades.loc[trades["action"] =="SELL"].groupby("symbol")["shares"].sum()
holdings = BUYS.subtract(SELLS, fill_value=0)
print(holdings)

symbol
AAPL     200.0
AMZN      15.0
BAC      100.0
COP       70.0
CVX       55.0
GOOGL     15.0
HD        35.0
JNJ       60.0
JPM       50.0
KO        90.0
META      25.0
MSFT      60.0
NVDA      10.0
PFE      120.0
PG        40.0
TSLA      15.0
UNH       25.0
WMT       45.0
XOM       80.0
Name: shares, dtype: float64


### Exercise 6: Time Series Returns (Pandas)

Using stock_prices_timeseries.csv:
1. Read the data with Date as index
2. Calculate daily returns for all stocks
3. Calculate cumulative returns for all stocks
4. Find which stock had:
   - Highest total return
   - Lowest total return
   - Highest volatility (std of daily returns)
   - Lowest volatility
5. Export the daily returns to CSV

In [36]:
# Exercise 6: Time Series Returns
# Write your code here



### Exercise 7: Moving Averages (Pandas)

Using stock_prices_timeseries.csv:
1. Calculate 5-day and 10-day moving averages for AAPL
2. Identify days where price crosses above the 5-day MA (potential buy signal)
3. Identify days where price crosses below the 5-day MA (potential sell signal)
4. Calculate how many buy and sell signals occurred
5. Create a new DataFrame with columns: Date, Price, MA5, MA10, Signal

In [37]:
# Exercise 7: Moving Averages
# Write your code here



### Exercise 8: Portfolio Performance Report (Combined)

Create a comprehensive portfolio performance report:

1. Read transactions.csv to determine current holdings
2. Read stock_prices_timeseries.csv to get latest prices
3. Merge company_info.csv to get sector information
4. Calculate for each holding:
   - Current value
   - Cost basis
   - Unrealized gain/loss
   - Return percentage
5. Calculate portfolio-level metrics:
   - Total value
   - Total cost
   - Total gain/loss
   - Overall return %
   - Allocation by sector
6. Export to CSV with all details

**Hint:** Use the last price in the time series as the current price

In [38]:
# Exercise 8: Portfolio Performance Report
# Write your code here



In [39]:
import sys
!{sys.executable} -m pip install matplotlib



### Exercise 9: Correlation Analysis (Pandas + NumPy)

Analyze correlations between stocks:
1. Read stock_prices_timeseries.csv
2. Calculate daily returns for all stocks
3. Create a correlation matrix of returns
4. Find the pair of stocks with:
   - Highest correlation
   - Lowest correlation
5. Calculate portfolio variance for an equal-weighted portfolio

**Hint:** Use `df.corr()` for correlation matrix

In [40]:
# Exercise 9: Correlation Analysis
# Write your code here



### Exercise 10: Risk-Adjusted Returns (Advanced)

Calculate risk-adjusted metrics for each stock:
1. Calculate daily returns and annualized returns (252 trading days)
2. Calculate volatility (std) and annualized volatility
3. Calculate Sharpe Ratio (risk-free rate = 2%)
4. Calculate Maximum Drawdown for each stock
5. Create a summary DataFrame ranking stocks by:
   - Total return
   - Sharpe ratio
   - Risk (volatility)
6. Export the analysis to CSV

**Formula for Maximum Drawdown:**
```
Running maximum = cumulative maximum of prices
Drawdown = (Current price - Running maximum) / Running maximum
Maximum Drawdown = minimum drawdown value
```

In [41]:
# Exercise 10: Risk-Adjusted Returns
# Write your code here



### Exercise 11: Plot Return Distribution (Matplotlib)

Plot a histogram of provided daily returns:
1. Use the returns array below.
2. Plot a histogram with 20â€“30 bins, add a vertical line at the mean, and show grid/labels/title.
3. Annotate the chart with mean and standard deviation.
4. Optionally save as `returns_histogram.png`.

**Data:**
- `returns = [0.004, -0.003, 0.006, 0.002, 0.005, -0.001, 0.007, 0.003, -0.002, 0.004, 0.006, 0.001, 0.005, -0.002, 0.003, 0.004, 0.002, 0.005, 0.006, 0.003]`

In [42]:
# Exercise 11: Plot Return Distribution
# Write your code here



### Exercise 12: Plot Hypothetical Share Prices (Matplotlib)

Plot provided price paths for two stocks:
1. Use the given dates and prices.
2. Plot both series on one line chart with title, axes labels, legend, and grid.
3. Optionally save as `hypothetical_prices.png`.

**Data:**
- `dates = ["2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05", "2024-01-08", "2024-01-09", "2024-01-10", "2024-01-11", "2024-01-12", "2024-01-15"]`
- `aapl = [150.0, 151.5, 150.8, 152.3, 153.0, 154.2, 153.7, 155.0, 154.5, 156.0]`
- `msft = [320.0, 321.0, 322.5, 321.8, 323.0, 324.5, 325.0, 324.0, 326.5, 327.0]`

In [43]:
# Exercise 12: Plot Hypothetical Share Prices
# Write your code here


---

## Summary: Key Concepts

### NumPy:
```python
# Array creation
arr = np.array([1, 2, 3])
zeros = np.zeros(5)
ones = np.ones((3, 4))  # 3x4 matrix
range_arr = np.arange(0, 10, 2)

# Operations (vectorized)
arr * 2              # Multiply all elements
arr1 + arr2          # Element-wise addition
np.mean(arr)         # Average
np.std(arr)          # Standard deviation
np.sum(arr)          # Sum

# 2D operations
matrix.mean(axis=0)  # Mean of each column
matrix.mean(axis=1)  # Mean of each row
```

### Pandas:
```python
# Reading data
df = pd.read_csv('file.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Selection
df['column']                # Single column (Series)
df[['col1', 'col2']]        # Multiple columns
df[df['price'] > 100]       # Filter rows
df.iloc[0]                  # By position
df.loc['2025-12-01']        # By label

# Operations
df['new'] = df['a'] * df['b']  # New column
df.pct_change()                # Percentage change
df.rolling(window=5).mean()    # Moving average
df.groupby('sector').mean()    # Group and aggregate
pd.merge(df1, df2, on='key')   # Merge DataFrames

# Statistics
df.describe()           # Summary statistics
df.mean()              # Column means
df.corr()              # Correlation matrix
```

### Financial Calculations:
```python
# Returns
returns = prices.pct_change()
cum_returns = (1 + returns).cumprod() - 1

# Volatility (annualized)
volatility = returns.std() * np.sqrt(252)

# Sharpe Ratio
sharpe = (annual_return - risk_free) / volatility

# Moving averages
ma = prices.rolling(window=20).mean()
```

---

## Testing and Submission

**Before moving on:**
1. Complete all exercises using the sample data
2. Verify your calculations make sense
3. Export results to CSV where requested
4. Understand the difference between NumPy and Pandas
5. Know when to use each library
6. Have your instructor manually check your work

**Excellent work completing Lab 8!** ðŸŽ‰