<h1 style="text-align:center">
Stock Portfolio Forecasting and Optimization on S&P500 Using
Machine Learning and Search Methods</h1>

---

- Constança Fernandes, nº 202205398
- Daniela Osório, nº 202208679
- Inês Amorim, nº 202108108
- Pedro Afonseca, nº 202105394

---

 ## == IMPORTS == 

In [None]:
%pip install -r ../requirements.txt

In [2]:
import yfinance as yf
from pathlib import Path
import pandas as pd

---

## 1. Introduction

---

## 2. Dataset Quality Assessment and Exploratory Data Analysis

### 2.1. Raw Dataset (S&P500)

The S&P500 dataset includes historical data (from 2010 to january 2024) on the 500 largest publicly traded companies in the U.S., measured by market capitalization. This was obtained using the library **yfinance**, from Yahoo Finance.

In [19]:
raw_data = pd.read_pickle("../data/raw/raw.pkl")
raw_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2010-01-04 00:00:00-05:00,1116.560059,1133.869995,1116.560059,1132.989990,3991400000,0.0,0.0
2010-01-05 00:00:00-05:00,1132.660034,1136.630005,1129.660034,1136.520020,2491020000,0.0,0.0
2010-01-06 00:00:00-05:00,1135.709961,1139.189941,1133.949951,1137.140015,4972660000,0.0,0.0
2010-01-07 00:00:00-05:00,1136.270020,1142.459961,1131.319946,1141.689941,5270680000,0.0,0.0
2010-01-08 00:00:00-05:00,1140.520020,1145.390015,1136.219971,1144.979980,4389590000,0.0,0.0
...,...,...,...,...,...,...,...
2023-12-22 00:00:00-05:00,4753.919922,4772.939941,4736.770020,4754.629883,3046770000,0.0,0.0
2023-12-26 00:00:00-05:00,4758.859863,4784.720215,4758.450195,4774.750000,2513910000,0.0,0.0
2023-12-27 00:00:00-05:00,4773.450195,4785.390137,4768.899902,4781.580078,2748450000,0.0,0.0
2023-12-28 00:00:00-05:00,4786.439941,4793.299805,4780.979980,4783.350098,2698860000,0.0,0.0


This dataset has a multilayered header structure, because it includes stock prices from differnet companies and various market sectors. 

**Top Level:** Ticker Symbols for various stock market indices


- **^GSPC:** S&P500 Index. This includes 500 large-cap U.S. stocks and is widely used to represent the U.S. stock market.

**Why choose these indexes?**

Removed:
- GEV, SOLV, AMTM: novos demais = tem tem dados historicos
- SW: yahoo n tem dados historicos
- BF.B, BRK.B, ZT: n existe no yahoo

**Sub-categories:**

- **Open:** The opening price of the stock for the given day.
- **High:** The highest price during the day.
- **Low:** The lowest price.
- **Close:** The closing price.
- **Volume:** The number of shares traded.

### 2.2 Dropping Columns

In [None]:
raw_data["Dividends"].value_counts()

Dividends
0.0    3522
Name: count, dtype: int64

In [None]:
raw_data["Stock Splits"].value_counts()

Stock Splits
0.0    3522
Name: count, dtype: int64

As the values of these two columns are all zero, we will delete them.

In [None]:
raw_data = raw_data.drop("Dividends", axis=1)
raw_data = raw_data.drop("Stock Splits", axis=1)
raw_data

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04 00:00:00-05:00,1116.560059,1133.869995,1116.560059,1132.989990,3991400000
2010-01-05 00:00:00-05:00,1132.660034,1136.630005,1129.660034,1136.520020,2491020000
2010-01-06 00:00:00-05:00,1135.709961,1139.189941,1133.949951,1137.140015,4972660000
2010-01-07 00:00:00-05:00,1136.270020,1142.459961,1131.319946,1141.689941,5270680000
2010-01-08 00:00:00-05:00,1140.520020,1145.390015,1136.219971,1144.979980,4389590000
...,...,...,...,...,...
2023-12-22 00:00:00-05:00,4753.919922,4772.939941,4736.770020,4754.629883,3046770000
2023-12-26 00:00:00-05:00,4758.859863,4784.720215,4758.450195,4774.750000,2513910000
2023-12-27 00:00:00-05:00,4773.450195,4785.390137,4768.899902,4781.580078,2748450000
2023-12-28 00:00:00-05:00,4786.439941,4793.299805,4780.979980,4783.350098,2698860000


---

## 3. Application of ML algorithms for Stock Price Predicition

---

## 4. Application of Oprimization Techniques for Selection of Stocks

---

## 5. Application of Optimization Techniques for Selection of Stocks

---

## 6. Assess Portfolio Performance

---

## 7. Bibliography