<img src="res/PP_logotyp_ANG_CMYK.svg" width="90%" />

### Table of Contents

1. [Executive summary](#Executive-summary)
2. [Foreword](#Foreword)
3. [Setting up](#Setting-up)
    1. [Tools](#Tools)
    2. [Datasets](#Datasets)
    3. [Individual stocks descriptions](#Individual-stocks-descriptions)
        1. [AAPL](#AAPL)
        2. [GOOG](#GOOG)
        3. [MSFT](#MSFT)
        4. [AMZN](#AMZN)
4. [Analysis](#Analysis)
    1. [Pair trading similarity](#Pair-trading-similarity)
    2. [Daily stocks exchange](#Daily-stocks-exchange)
    3. [Moving average](#Moving-average)
    4. [Daily average return](#Daily-average-return)
    5. [Correlation](#Correlation)
        1. [Pairplot](#Pairplot)
        2. [Return on risk factor](#Return-on-risk-factor)
    6. [Risk](#Risk)

# Executive summary

TODO - Summary / Afterword

--||--. The analysis result is, for the interval DD.MM.YYYY - DD.MM.YYYY, out of `AAPL`, `GOOG`, `MSFT` and `AMZN`:
- X was the best choice, given a risk factor of y
- pairs X-Y, Y-Z can be considered for pair trading (the delays are x and y)

# Foreword

The following report is an analysis of the top tech stocks from the US market. It concerns deciding the best stock for
an investment, based on given risk factor. Let us try to dig in why we need such type of analysis.

Everything starts with investment and its return with some safety and security. Since there is a lot of stocks to choose
from and every investor has their own needs when it comes to the measure of *subjectively* best stock. That is why it is
really important to filter the stocks on some parameters when formulating personal stock portfolio.

As an example, we will analyze stocks of top 4 US tech companies to see  which stock is more suitable for a portfolio
based on risk factor.

# Setting up

## Tools

For data processing and visualization, we are going to use python with the following tools:

In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
from pandas_datareader.data import DataReader
from datetime import datetime

## Datasets

The datasets we are using are imported from Yahoo reader. The 4 stocks will be
- `AAPL` - Apple
- `GOOG` - Google
- `MSFT` - Microsoft
- `AMZN` - Amazon

In [21]:
tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']

# TODO - hardcode date (makes more sense for **reproducible** report)
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)

for stock in tech_list:
    globals()[stock] = DataReader(stock, 'yahoo', start, end)

# display data characteristics
company_list = [AAPL, GOOG, MSFT, AMZN]
company_name = ["APPLE", "GOOGLE", "MICROSOFT", "AMAZON"]

for company, com_name in zip(company_list, company_name):
    company["company_name"] = com_name

ConnectionError: HTTPSConnectionPool(host='finance.yahoo.com', port=443): Max retries exceeded with url: /quote/AAPL/history?period1=1619575200&period2=1651197599&interval=1d&frequency=1d&filter=history (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000020422BFF760>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

TODO - explanation of each attribute

Besides the actual data, the whole dataset contains an additional attribute `company_name`, the sole purpose of which is
to determine which stock a given set of values belongs. In fact, there are four datasets presented below.

## Individual stocks descriptions:

### APPL

In [None]:
AAPL.describe()

### GOOG

In [None]:
GOOG.describe()

### MSFT

In [None]:
MSFT.describe()

### AMZN

In [None]:
AMZN.describe()

# Analysis

## Pair trading similarity

TODO - comments/explanation

Changes in some stocks tend to follow each other. For example, it may be the case AMD's stock value follows very closely
that of Nvidia's, with a delay of roughly 2 minutes.

When that is the case, it is worth considering doing pair trading. When one is sure the above holds true and knows the
delay, instead of using resources on analyzing both stocks, it suffices to analyze one and use the same analysis for the
other.

We can find stocks suitable for pair trading by comparing their plots of `Adj Close`.

In [None]:
plt.figure(figsize=(20, 8))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Adj Close'].plot(color='g')
    plt.ylabel('Adj Close')
    plt.xlabel(None)
    plt.title(f"{tech_list[i - 1]}")

TODO - conclusions of the above

It can be seen that `MSFT` follows `AAPL` quite accurately, and `GOOG` does so for `MSFT`. Therefore, these pairs can be
considered for pair trading.

TODO - remove with cell below OR leave if cell below stays

We can investigate these pairs further by looking at the delay.

In [None]:
# TODO - close up on MSFT and AAPL to spot the delay?

TODO - remove with cell above OR leave if cell below stays;

TODO - conclusion

As it can be seen, the delay between `AAPL` and `MSFT` is X, and Y between `MSFT` and `GOOG`.

Thus, we can be quite sure that trading the former just like the first one taking into account the delay should give us
profit, granted our trading transactions are good in the first place.

## Daily stocks exchange

TODO - comments/explanations (volume, see comment in code block)

In [None]:
# Plotting the total volume of stock being traded each day
plt.figure(figsize=(20, 8))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Volume'].plot(color='b')
    plt.ylabel('Volume')
    plt.xlabel(None)
    plt.title(f"{tech_list[i - 1]}")

## Moving average

TODO - comments/explanations

Moving averages are usually calculated to identify the trend direction of a stock or to determine its support and resistance levels. So it repersents a best time for buying and selling based on crossover of moving average 20 and 50, it means if it cross below the resistance level then its time to buy

In [None]:
ma_day = [10, 20, 50]

for ma in ma_day:
    for company in company_list:
        column_name = f"MA for {ma} days"
        company[column_name] = company['Adj Close'].rolling(ma).mean()

df.groupby("company_name").hist(figsize=(20, 10));

TODO - comments/explanations (breaking the above figure from this one)

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(8)
fig.set_figwidth(20)

AAPL[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,0])
axes[0,0].set_title('APPLE')

GOOG[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,1])
axes[0,1].set_title('GOOGLE')

MSFT[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,0])
axes[1,0].set_title('MICROSOFT')

AMZN[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,1])
axes[1,1].set_title('AMAZON')

fig.tight_layout()

## Daily average return

TODO - comments/explanations

The daily return measures the dollar change in a stock's price as a percentage of the previous day's closing price. 

In [None]:
# Finding the percent change for each day
for company in company_list:
    company['Daily Return'] = company['Adj Close'].pct_change()

# Plotting the daily return percentage
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(8)
fig.set_figwidth(20)

AAPL['Daily Return'].plot(ax=axes[0,0], legend=True,  marker='o',color='b')
axes[0,0].set_title('APPLE')

GOOG['Daily Return'].plot(ax=axes[0,1], legend=True,  marker='o',color='g')
axes[0,1].set_title('GOOGLE')

MSFT['Daily Return'].plot(ax=axes[1,0], legend=True, marker='o',color='r')
axes[1,0].set_title('MICROSOFT')

AMZN['Daily Return'].plot(ax=axes[1,1], legend=True,  marker='o',color='y')
axes[1,1].set_title('AMAZON')

fig.tight_layout()

TODO - comments/explanations (breaking up figure from the above)

In [None]:
plt.figure(figsize=(20, 12))

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    sns.histplot(company['Daily Return'].dropna(), bins=100, color='purple')
    plt.ylabel('Daily Return')
    plt.title(f'{company_name[i - 1]}')

## Correlation

TODO - comments/explanation

Correlation plays a very important role in terms of avoiding huge losses in terms of short period of time

TODO - give explanation of the below

In [None]:
# TODO - is the below simpy close price?
closing_df = DataReader(tech_list, 'yahoo', start, end)['Adj Close']
closing_df.head() 

TODO - give explanation of the below

In [None]:
# TODO - is the below relative change between today and previous day (as in above)?
tech_rets = closing_df.pct_change()
tech_rets.head()

### Pairplot

TODO - reformat text

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters.

TODO - explain the visualization

In [None]:
sns.pairplot(tech_rets, kind='reg')

### Return on risk factor

TODO - reformat text

Now, we are going to compare each stock its return on risk factor and cluster them together with other stock

TODO - explain the below (or ditch the above if duplicate)

In [None]:
# TODO - duplicate of the above?
# Set up our figures
return_fig = sns.PairGrid(tech_rets.dropna())
return_fig.map_upper(plt.scatter, color='purple')
return_fig.map_lower(sns.kdeplot, cmap='cool_d')
return_fig.map_diag(plt.hist, bins=30)

TODO - text for the below

In [None]:
# TODO - give title
sns.heatmap(tech_rets.corr(), annot=True, cmap='gist_heat_r')

TODO - reformat text

We can clearly see that Apple and microsoft are highly correlated, so it's good to do pair trading rather than trading separately and wasting resources

TODO - explain the below

In [None]:
# TODO - what's this one about compared to the above?
sns.heatmap(closing_df.corr(), annot=True, cmap='inferno')

## Risk

TODO - comments/explanations

In [None]:
rets = tech_rets.dropna()

area = np.pi*20

plt.figure(figsize=(12, 10))
plt.scatter(rets.mean(), rets.std(), s=area)
plt.xlabel('Expected return')
plt.ylabel('Risk')

for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
    plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords='offset points', ha='right', va='bottom', 
                 arrowprops=dict(arrowstyle='-', color='blue', connectionstyle='arc3,rad=-0.3'))

TODO - reformat (conclusion)

We can clearly see that *Apple* is one of the secured the investment in terms of higher return and low risk