# **📊 Stock Market Analysis and Portfolio Optimization**

This project combines **data analysis** with **statistical concepts** to create a practical financial application. It's **challenging enough for intermediate learners** while producing **valuable insights**.

---

## **📌 Project Overview**
Create a program that **analyzes historical stock data** for multiple companies, calculates **key metrics**, and recommends an **optimal portfolio allocation** based on risk/reward preferences.

---

## **🔹 Key Components**

### **1️⃣ Data Acquisition & Preprocessing**
- Import **historical stock price data** for **5-10 companies** (from CSV files or using a **financial API**).
- Handle **missing values** and **normalize data**.
- Calculate **daily returns**.

### **2️⃣ Statistical Analysis**
- Calculate **mean returns**, **volatility**, and **correlation matrices**.
- Implement **covariance analysis**.
- Create **rolling window calculations** to observe trend changes.

### **3️⃣ Portfolio Optimization**
- Implement the **Sharpe ratio** (risk-adjusted return measure).
- Create a **Monte Carlo simulation** to test different **portfolio weightings**.
- Find the **optimal portfolio allocation** based on different **risk tolerances**.

### **4️⃣ Visualization**
- Create an **efficient frontier plot**.
- Generate **correlation heatmaps**.
- Plot **historical performance of optimized portfolios**.

---

## **🛠 Technical Skills Used**
- **NumPy** arrays and operations
- **Statistical functions**
- **Matrix operations**
- **Random number generation**
- **Integration with Matplotlib** for visualization

---

This project provides **hands-on experience** with **real-world financial concepts** while strengthening **NumPy skills** in a practical context. It can be **expanded or simplified** based on specific learning goals. 🚀📈


In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf

In [9]:
# Getting the DATA
Tickers = ["NVDA","AAPL","GOOG"]
Start_Date = '2020-01-01'
End_Date = '2025-01-01'

stock_data = yf.download(Tickers,start = Start_Date, end = End_Date)['Close']
stock_data.head(5)

[*********************100%***********************]  3 of 3 completed


Ticker,AAPL,GOOG,NVDA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-02,72.716057,68.123726,5.97271
2020-01-03,72.009132,67.789421,5.877112
2020-01-06,72.582909,69.460922,5.901758
2020-01-07,72.241531,69.41758,5.973209
2020-01-08,73.403641,69.964615,5.984412


In [10]:
# Resetting the index as index are on Date column
# stock_data.reset_index(inplace = True)


# Converting the date to datetime format.
# stock_data['Date'] = pd.to_datetime(stock_data['Date'])
# stock_data

In [11]:
# Check for the NaN values in the data frame
NA_Values = stock_data.isna().sum()

print("\nCHECKING THE MISSING VALUES IN THE DATA FRAME:\n")
print(NA_Values)

# Handling missing and NAN values
# ffill (Forward Fill) → Fills missing values with the previous row's value.
# bfill (Backward Fill) → If ffill doesn’t fill all missing values, bfill fills 
# the remaining ones using the next available value.
# df = stock_data.fillna(method = 'ffill').fillna(method = 'bfill')
df = stock_data.ffill().bfill()


CHECKING THE MISSING VALUES IN THE DATA FRAME:

Ticker
AAPL    0
GOOG    0
NVDA    0
dtype: int64


In [12]:
# Converting to the numpy array for faster calculations
prices = df.values
prices

array([[ 72.71605682,  68.12372589,   5.97270966],
       [ 72.00913239,  67.78942108,   5.87711191],
       [ 72.58290863,  69.46092224,   5.90175819],
       ...,
       [255.30929565, 194.03999329, 137.00999451],
       [251.92301941, 192.69000244, 137.49000549],
       [250.14497375, 190.44000244, 134.28999329]])

In [13]:
prices[0][1]

68.12372589111328

In [15]:
# Calculating the daily return in percentage
# np.diff() used to calculate the difference between consecutive elements in the array (arr[i+1] - arr[i])
# axis = 0 used to calculate difference along the rows
# prices[:-1] used as the difference for the last
print("\n Returns from the stocks per DAY : \n")
returns = (np.diff(prices,axis = 0)/prices[:-1] * 100)
print(returns)


 Returns from the stocks per DAY : 

[[-0.97217103 -0.49073183 -1.60057578]
 [ 0.79681039  2.46572567  0.41936044]
 [-0.47032733 -0.06239852  1.21066821]
 ...
 [-1.32421274 -1.55251787 -2.08675647]
 [-1.32634272 -0.69572814  0.35034742]
 [-0.70578928 -1.16767864 -2.32745078]]


#### 📌 Available Columns in `yf.download()`

| Column       | Description |
|-------------|------------|
| **`Open`**   | Opening price of the stock for the given date. |
| **`High`**   | Highest price of the stock for the given date. |
| **`Low`**    | Lowest price of the stock for the given date. |
| **`Close`**  | Closing price (unadjusted, may not account for dividends/splits). |
| **`Adj Close`** | **Adjusted closing price** (adjusts for stock splits and dividends). |
| **`Volume`** | Number of shares traded on that day. |