# Final Project

## Available Assets for Portfolio Optimization
Here is a list of the available assets to consider in the portfolio:

<table>
  <thead>
    <tr>
      <th>Fund Name</th>
      <th>Ticker</th>
      <th>Inception Date</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>JPMorgan Equity Income Fund - Class R6</td>
      <td><a href="https://finance.yahoo.com/quote/OIEJX/history/" target="_blank">OIEJX</a></td>
      <td>Jan 31, 2012</td>
    </tr>
    <tr>
      <td>State Street Equity 500 Index K</td>
      <td><a href="https://finance.yahoo.com/quote/SSSYX/history/" target="_blank">SSSYX</a></td>
      <td>Sep 18, 2014</td>
    </tr>
    <tr>
      <td>T. Rowe Price Dividend Growth</td>
      <td><a href="https://finance.yahoo.com/quote/PRDGX/history/" target="_blank">PRDGX</a></td>
      <td>Dec 31, 1992</td>
    </tr>
    <tr>
      <td>American Funds Growth Fund of Amer R6</td>
      <td><a href="https://finance.yahoo.com/quote/RGAGX/history/" target="_blank">RGAGX</a></td>
      <td>May 1, 2009</td>
    </tr>
    <tr>
      <td>Vanguard Mid Cap Index Admiral</td>
      <td><a href="https://finance.yahoo.com/quote/VIMAX/history/" target="_blank">VIMAX</a></td>
      <td>Nov 12, 2001</td>
    </tr>
    <tr>
      <td>Vanguard Small Cap Value Index Admiral</td>
      <td><a href="https://finance.yahoo.com/quote/VSIAX/history/" target="_blank">VSIAX</a></td>
      <td>Sep 27, 2011</td>
    </tr>
  </tbody>
</table>

Historical data for these assets can be found on [Yahoo Finance website](https://finance.yahoo.com)

Since the portfolio optimization problem requires grouping the time series of each asset into matrix $R$, they must be considered during the same periods. For this reason, only returns starting from the most restrictive case are considered, i.e., **from Sep 18, 2014 until Aug 27, 2025**.

Data sources:
<table>
  <thead>
    <tr>
      <th>Ticker</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a href="https://finance.yahoo.com/quote/OIEJX/history/?period1=1410998400&period2=1756339200" target="_blank">OIEJX</a></td>
    </tr>
    <tr>
      <td><a href="https://finance.yahoo.com/quote/SSSYX/history/?period1=1410998400&period2=1756339200" target="_blank">SSSYX</a></td>
    </tr>
    <tr>
      <td><a href="https://finance.yahoo.com/quote/PRDGX/history/?period1=1410998400&period2=1756339200" target="_blank">PRDGX</a></td>
    </tr>
    <tr>
      <td><a href="https://finance.yahoo.com/quote/RGAGX/history/?period1=1410998400&period2=1756339200" target="_blank">RGAGX</a></td>
    </tr>
    <tr>
      <td><a href="https://finance.yahoo.com/quote/VIMAX/history/?period1=1410998400&period2=1756339200" target="_blank">VIMAX</a></td>
    </tr>
    <tr>
      <td><a href="https://finance.yahoo.com/quote/VSIAX/history/?period1=1410998400&period2=1756339200" target="_blank">VSIAX</a></td>
    </tr>
  </tbody>
</table>
The service is behind a paywall since early 2025, but data is still accessible inspecting the html elements of the page.

In [23]:
# Import necessary libraries
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import cvxpy as cp

## Preprocessing

In [24]:
# Path to the HTML file
html_file = '../data/sssyx.html'

# Read the HTML file
with open(html_file, 'r') as file:
    html_content = file.read()

# Parse HTML content
soup = BeautifulSoup(html_content, 'lxml')

# Extract table data
table_rows = soup.find_all('tr')

# Prepare data for pandas DataFrame
data = []
for row in table_rows:
    row_data = []
    for cell in row.find_all(['td', 'th']):
        row_data.append(cell.text.strip())
    if row_data:  # Skip empty rows
        data.append(row_data)

# Create DataFrame
columns = ['date', 'open', 'high', 'low', 'close', 'adj_close', 'volume']
df = pd.DataFrame(data, columns=columns)
df['date'] = pd.to_datetime(df['date'])
# Convert all columns except 'date' to numeric, downcasting to the smallest float possible
for col in df.columns:
    if col != 'date':
        df[col] = pd.to_numeric(df[col], errors='coerce', downcast='float')
df = df.sort_values('date', ascending=True).reset_index(drop=True)

# Display the DataFrame
df

Unnamed: 0,date,open,high,low,close,adj_close,volume
0,2014-09-18,170.899994,170.899994,170.899994,170.899994,126.000000,
1,2014-09-19,170.800003,170.800003,170.800003,170.800003,125.930000,
2,2014-09-22,169.399994,169.399994,169.399994,169.399994,124.900002,
3,2014-09-23,168.500000,168.500000,168.500000,168.500000,124.230003,
4,2014-09-24,169.800003,169.800003,169.800003,169.800003,125.190002,
...,...,...,...,...,...,...,...
2766,2025-08-21,479.130005,479.130005,479.130005,479.130005,479.130005,
2767,2025-08-22,486.440002,486.440002,486.440002,486.440002,486.440002,
2768,2025-08-25,484.369995,484.369995,484.369995,484.369995,484.369995,
2769,2025-08-26,486.399994,486.399994,486.399994,486.399994,486.399994,


## Exploratory Data Analysis

In [25]:
df.info()
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2771 entries, 0 to 2770
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date       2771 non-null   datetime64[ns]
 1   open       2752 non-null   float32       
 2   high       2752 non-null   float32       
 3   low        2752 non-null   float32       
 4   close      2752 non-null   float32       
 5   adj_close  2752 non-null   float32       
 6   volume     0 non-null      float32       
dtypes: datetime64[ns](1), float32(6)
memory usage: 86.7 KB


Unnamed: 0,date,open,high,low,close,adj_close,volume
count,2771,2752.0,2752.0,2752.0,2752.0,2752.0,0.0
mean,2020-03-06 18:55:59.653554688,271.758911,271.758911,271.758911,271.758911,244.565323,
min,2014-09-18 00:00:00,154.0,154.0,154.0,154.0,116.860001,
25%,2017-06-14 12:00:00,202.100006,202.100006,202.100006,202.100006,159.970001,
50%,2020-03-09 00:00:00,242.25,242.25,242.25,242.25,215.054993,
75%,2022-11-28 12:00:00,331.987518,331.987518,331.987518,331.987518,313.332489,
max,2025-08-27 00:00:00,487.559998,487.559998,487.559998,487.559998,487.559998,
std,,85.183502,85.183502,85.183502,85.183502,98.743935,


## Optimization

In [26]:
# Mock return matrix: 5 assets, 10 time periods
np.random.seed(0)
T, n = 10, 5
R = np.random.randn(T, n) * 0.01  # Simulated daily returns ~1%

# Compute average return per asset
mu = np.mean(R, axis=0)  # Shape: (n,)

# Set a mock target return
rho = 0.1 / 250  # 10% annual return

# Define optimization variable
w = cp.Variable(n)

# Define constraints
constraints = [
    cp.sum(w) == 1,       # weights sum to 1
    mu @ w == rho         # target average return
]

# Define the objective (least squares formulation)
objective = cp.Minimize(cp.norm(R @ w - rho, 2))

# Solve the problem
problem = cp.Problem(objective, constraints)
problem.solve()

# Output results
print("Optimal weights w:", w.value)
print("Achieved average return:", mu @ w.value)
print("Portfolio risk (std dev of returns):", np.std(R @ w.value))


Optimal weights w: [0.35218793 0.10524019 0.20972512 0.11947077 0.21337598]
Achieved average return: 0.0003999999999999991
Portfolio risk (std dev of returns): 0.006035366033597584
