In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import yfinance as yf
import pandas as pd
import logging
import sys
from copy import deepcopy
from pathlib import Path
from datetime import datetime
from dateutil.relativedelta import relativedelta

In [None]:
src_path: str = "../src"
sys.path.append(src_path)
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

In [None]:
from data.utils import download_yfinance_data, get_price_statistics

In [None]:
random_seed = 8080
portfolio_name = "big_tech"
data_path = Path("..").resolve().joinpath("data")
data_path

## 1. Business understanding


It is often not straight-forward to discern which stocks have performed better than others when looking at historical data. Stock's price changes constantly, with some events, such as economic recessions, pandemics or natural disasters heavily affecting the valuations of many companies.

However, these companies are not necessarily affected the same way, and their publicly traded price can, due to a multitude of factors, react very differently to the same events. For example, in 2020 there was the COVID-19 outbreak, eCommerce companies such as Amazon, as well as hardware manufacturers such as NVIDIA, saw a dramatic increase in the demand of their products due to the pandemic's lockdowns. We will be able to see this clearly in the data later on.

As small investors, we might be interested in analysing the past performance of a certain portfolio of companies, as well as be able forecast whether they will continue their present trend.

The goal of this project is to provide an easy to use interface to quickly compare the performance of multiple companies over a given period of time by means of visualizations and aggregated statistics. This will allow users to pick which companies did best. It will also provide an idea of how the stocks will change their price in the near future. We aim to answer questions such as:

1. Which company grew more in value?
2. Which company experimented the lowest volatility? Which experimented the highest?
3. Which company is expected to grow more in value?

---
NOTES:
In the case of finance, according to random walk theory, market prices behave randomly and not as a function of their time series.

[https://www.cienciadedatos.net/documentos/py41-forecasting-cryptocurrency-bitcoin-machine-learning-python.html]

When creating a forecaster model, historical data are used to get a mathematical representation capable of predicting future values. This idea is based on a very important assumption: the future behavior of a phenomenon can be explained from its past behavior. However, this rarely happens in reality, or at least not in its entirety. For more on this, see the following definition:

Forecast=pattern+unexplainedvariance

The first term of the equation refers to everything that has a repetitive character over time (trend, seasonality, cyclical factors...). The second term represents everything that influences the response variable but is not captured (explained) by the past of the time series.


---

**DISCLAIMER:** This project is merely meant to be used for understanding the past and get a sense of the future. The insights gained and any recommendations made **are not financial advise**. The value of a company at any given time and its evolution dependes on many factors that aren't taken into account in this project. Real-world value investing requires an in-depth analysis of each company and sector, and it's still not guaranteed to yield better returns than simply investing in a market index. And above all, **Past performance is no guarantee of future results. Don't assume an investment will continue to do well in the future simply because it's done well in the past.**


## 2. Understanding the data through Exploratory Data Analysis (EDA)

In this section, we preview the kind of financial data that can be downloaded through the Yahoo Finance API.


In [None]:
portfolio_filepath = data_path.joinpath("portfolios").joinpath(f"{portfolio_name}.txt")
tickers = [line.split(" ")[0] for line in portfolio_filepath.read_text().split("\n")]
date_range = (
    datetime.now() - relativedelta(years=5),
    datetime.now(),
)
save_path = data_path.joinpath(portfolio_name)
save_path.mkdir(parents=True, exist_ok=True)

In [None]:
tickers_info, tickers_data = download_yfinance_data(tickers, date_range, save_path)

### 2.1. Tickers information


In [None]:
tickers_info.dropna(how="all")

**And the conclusion from this is...**

### 2.2. Historical Price Data


In [None]:
tickers_data

**And the conclusion from this is...**

Different information is available for each date and ticker: `Adj Close`, `Close`, `High`, `Low`, `Open` and `Volume`. We will only be using `Adj Close` for performance analysis as well as forecasting.

`NaN` value indicate periods where the companies were not yet publicly traded (or they didn't even exist). We are interested in keeping these values and rows in any case for the sake of completeness.


## 3. Data Preparation

In the previous section, we already did some minor pre-processing such as ensuring that the index of the price data is a `datetime` object. Next, we will extract price statistics as well visualizations to improve our understanding of the data.


In [None]:
price_stats = get_price_statistics(tickers_data)

**And the conclusion from this is...**

In [None]:
# TODO: show the same plots as in the Web App

**And the conclusion from this is...**

## 4. Data Modelling

In this section, we will model our time-series data for price forecasting.


In [None]:
# 1. Necessary data pre-processing
# 2. Data splits
# 3. Model selection
# 4. Hyper-parameter tuning
# 5. Model evaluation
# 6. Visualization of predictions and test performance

In [None]:
# TODO: plot data split

In [None]:
# TODO: plot predictions with errors

**And the conclusion from this is...**