In [None]:
#@title Student Information
Name = 'Lingxuan Ye' #@param {type:"string"}
Login_ID = '' #@param {type:"string"}
SIS_ID = 'value' #@param {type:"string"}

# The Dow Jones Industrial Average (DJIA)

According to [Wiki](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average): "The Dow Jones Industrial Average, Dow Jones, or simply the Dow, is a stock market index of 30 prominent companies listed on stock exchanges in the United States. The DJIA is one of the oldest and most commonly followed equity indices."

The 30 component companies of DJIA can be found online at https://www.slickcharts.com/dowjones. The online table contains the company names, their symbols (i.e., tickers), and the [weights](https://www.investopedia.com/terms/p/priceweightedindex.asp) used in the DJIA calculations. 

We could use the Yahoo finance API to download history data of companies using their tickers. The DJIA index can be downloaded using the ticker "^DJI" as well. For example:




In [None]:
# Install Yahoo Finance APIs
%pip install yfinance

In [None]:
import yfinance as yf
import pandas as pd

# Download historic data for the DJIA index, Apple, and Amazon stocks
yf.download(
  ["^DJI", "AAPL", "AMZN"],
  start='2022-01-15',
  end='2022-02-01',
  progress=False
)

# Problem 1. Year-to-Date (YTD) Historic Data of the DJIA Companies (1 percentage points)

Download the YTD (i.e., **01/01/2022 - 11/04/2022**) history data of all the DJIA companies. Save this data into a pandas DataFrame for use in other problems.

**Requirements**

1. The DJIA company tickers should **ONLY** be extracted from the website https://www.slickcharts.com/dowjones
2. YTD means **01/01/2022 - 11/04/2022**.



In [None]:
# web crawler forbidden
try:
    pd.read_html('https://www.slickcharts.com/dowjones')
except Exception as e:
    print(e)

In [None]:
# file 'dowjones.html' is downloaded with Chrome Developer Tools manually.
tickers = list(  # 'Series' object has no attribute 'split'
    pd.read_html('./dowjones.html')[0]['Symbol']
)
tickers

In [None]:
data = yf.download(
  tickers,
  start='2022-01-01',
  end='2022-11-05',  # endpoint excluded
  progress=False
)
data

# Problem 2. Positively Trending and Natively Trending Stocks (2 percentage points)

Use the YTD data of the DJIA companies extracted from Problem 1, identify the following stocks:

1. Positively trending stocks at a confidence level of 95%
2. Negatively trending stocks at a confidence level of 95%
3. Non-tredning stocks at a confidence level of 95%

**Requirements**

1. Use the `Close` price for the trending test

**Hints**

1. Use the Mann-Kendall trending test

In [None]:
%pip install pymannkendall

In [None]:
from pymannkendall import original_test

In [None]:
trend = pd.DataFrame(index=tickers, columns=['Trend'])

for i, j in data['Close'].iteritems():
    trend.loc[i, 'Trend'] = original_test(j).trend

In [None]:
# Positively trending stocks at a confidence level of 95%
trend[trend['Trend'] == 'increasing']

In [None]:
# Negatively trending stocks at a confidence level of 95%
trend[trend['Trend'] == 'decreasing']

In [None]:
# Non-trending stocks at a confidence level of 95%
trend[trend['Trend'] == 'no trend']

# Problem 3. Pairplot of the Daily Returns of the 5 Most Traded DJIA Companies (2 percentage points)

Generate a seaborn pairplot showing the daily returns of the stocks of the 5 most traded DJIA companies. 

**Requirements**

1. The 5 most traded companies should be identified by looking at the daily average `Volume` of the data generated in problem 1.

2. Daily return of a stock in a specific day is defined as `close_today / close_previous_day - 1`, where `close_today` is the `Close` price of the stock in that specific day and `close_previous_day` is the `Close` price of the stock in the previous trading day. Note that the previous trading day may not be just yesterday since yesterday could be a holiday or weekend.


In [None]:
import seaborn as sns

In [None]:
volume_mean: pd.Series = data['Volume'].mean()
top_5 = volume_mean.sort_values(ascending=False)[:5].index
top_5

In [None]:
top_5_close = data['Close'][top_5]
top_5_close

In [None]:
daily_return = pd.DataFrame(index=top_5_close.index, columns=top_5)
# next(top_5_close.iterrows())
for i, (_, j) in enumerate(top_5_close.iterrows()):
    if i == 0:
        prev = j
        continue
    daily_return.iloc[i] = j / prev - 1
daily_return = daily_return.dropna().astype('float')
daily_return

In [None]:
sns.pairplot(daily_return)

# Problem 4. The Strongest Correlated Company Pair of the 5 Most Traded Compainies (1 percentage points)

Among the 5 most traded DJIA companies, what are the two compaines whose daily returns are the strongest linearly correlated? What is the p-value of the correlation?

**Requirements**

1. The 5 most traded companies should be identified by looking at the daily average `Volume` of data generated in problem 1.

2. Daily return of a stock in a specific day is defined as `close_today / close_previous_day - 1`, where `close_today` is the `Close` price of the stock in that specific day and `close_previous_day` is the `Close` price of the stock in the previous trading day. Note that the previous trading day may not be just yesterday since yesterday could be a holiday or weekend.

In [None]:
from scipy.stats import linregress

locs = ((row, col) for i, row in enumerate(top_5) for col in top_5[i:])

results = {
    (row, col): linregress(daily_return[row], daily_return[col])
    for row, col in locs
    if row != col
}

most_correlated = None
for k, v in results.items():
    if most_correlated is None:
        most_correlated = k
        continue
    if results[most_correlated].rvalue < v.rvalue:
        most_correlated = k
most_correlated

Daily returns of 'INTC' and 'VZ' are the strongest linearly correlated, of which the p-value is:

In [None]:
results[most_correlated].pvalue