In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import yfinance as yf
import json
import numpy as np
import pandas as pd
import logging
import sys

from pathlib import Path
from datetime import datetime

In [None]:
src_path: str = "../src"
sys.path.append(src_path)
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

In [None]:
from data.pipeline import main as data_pipeline

In [None]:
data_path = Path("..").resolve().joinpath("data")
data_path

## 1. Business understanding

It is often not straight-forward to discern which stocks have performed better than others when looking at historical data. Stock's price changes constantly, with some events, such as economic recessions, pandemics or natural disasters heavily affecting the valuations of many companies.

However, these companies are not necessarily affected the same way, and their publicly traded price can, due to a multitude of factors, react very differently to the same events. For example, in 2020 there was the COVID-19 outbreak, eCommerce companies such as Amazon, as well as hardware manufacturers such as NVIDIA, saw a dramatic increase in the demand of their products due to the pandemic's lockdowns. We will be able to see this clearly in the data later on.

As small investors, we might be interested in analysing the past performance of a certain portfolio of companies, as well as be able forecast whether they will continue their present trend.

The goal of this project is to provide an easy to use interface to quickly compare the performance of multiple companies over a given period of time by means of visualizations and aggregated statistics. This will allow users to pick which companies did best. It will also provide an idea of how the stocks will change their price in the near future. We aim to answer questions such as:

1. Which company grew more in value?
2. Which company experimented the lowest volatility? Which experimented the highest?
3. Which company is expected to grow more in value?

**DISCLAIMER:** This project is merely meant to be used for understanding the past and get a sense of the future. The insights gained and any recommendations made **are not financial advise**. The value of a company at any given time and its evolution dependes on many factors that aren't taken into account in this project. Real-world value investing requires an in-depth analysis of each company and sector, and it's still not guaranteed to yield better returns than simply investing in a market index. And above all, **Past performance is no guarantee of future results. Don't assume an investment will continue to do well in the future simply because it's done well in the past.**

## 2. Understanding the data through Exploratory Data Analysis (EDA)

In this section, we preview the kind of financial data that can be downloaded through the Yahoo Finance API.

In [None]:
portfolio_filepath = data_path.joinpath("portfolios").joinpath("big_tech.txt")
tickers = [line.split(" ")[0] for line in portfolio_filepath.read_text().split("\n")]

### 2.1. Tickers information

In [None]:
tickers_objs = yf.Tickers(" ".join(tickers))

In [None]:
tickers_info = []
for ticker_name, ticker_obg in tickers_objs.tickers.items():
    try:
        tickers_info.append(pd.Series(ticker_obg.info).rename(ticker_name))
    except Exception as e:
        logging.warning(f"Problem retrieving information for {ticker_name}: {e}")
        continue

if len(tickers_info) != 0:
    tickers_info = pd.concat(tickers_info, axis=1)
else:
    tickers_info = pd.DataFrame()

tickers_info

### 2.2. Historical Price Data

In [None]:
tickers_data = yf.download(" ".join(tickers), start="2000-01-01", end="2022-01-01")
tickers_data

Different information is available for each date and ticker: `Adj Close`, `Close`, `High`, `Low`, `Open` and `Volumne`. We will only be using `Adj Close` for performance analysis as well as forecasting.

`NaN` value indicate periods where the companies were not yet public (or they didn't even exist). We are interested in keeping these values and rows in any case for the sake of completeness.

In [None]:
tickers_data["Adj Close"]

## 3. Data Preparation

In this section, we will run our data pipeline, which will take care of all the data processing steps:

1. Download and save ticker information and historical price data.
2. Compute and save price statistics on historical data.

In [None]:
portfolio_filepath = data_path.joinpath("portfolios").joinpath("big_tech.txt")
date_range = (datetime(2000, 1, 1), datetime(2023, 1, 1))

In [None]:
data_pipeline(
    portfolio_filepath=portfolio_filepath,
    date_range=date_range,
    save_root_path=data_path,
)

## 4. Data Modelling

In this section, we will model our time-series data for price forecasting.