# QF 627 Programming and Computational Finance
## Individual Assessment

***

> Good evening, Team. 👋

> This assessment is prepared so that you may review what you have learned on the course. You can find the answers from the lessons and from the scripts of each lesson that you have received throughout the course. Please do not feel yourselves to be under pressure. `Read each question carefully and answer accordingly`.

> Using Python in real-world financial data analysis does not mean simply executing a single step of an independent chunk of code. It requires a chain of lines of codes with a sharp logical progression. To give you a good exercise in real-world practice, the questions here, just like all the lessons and exercise problem sets, require you to go through `inter-related` and `logically deduced lines of programming`.

***

> Below are 10 questions. Each question asks you to program a sequence of codes that lead to an answer. First `ensure you fully understand the question`, in order not to overlook essential processes and answers. When questions ask for answers in addition to lines of codes, provide them using a markdown cell.

***

> Be sure to submit your work before the deadline: `9:30pm tonight, November 8, 2022`. It is an open-book exercise, and is also a timed task. To be fair to all students, a late submission will incur a point reduction.

> Please note `your last name` for `naming your submission` file (e.g., `Roh.ipynb`)

> If you find that you cannot answer a question, it would be wise to move on to another question that you can answer, and to finish that one first. `Make the best use of the time available`. If you cannot fully answer all the questions, then do as much as you can.

***

> Under a relative grading scheme, not everyone can receive an A grade for the course. This is school policy. If you find the questions easy, that does not guarantee your good final performance. If you find the questions a bit difficult, that is so that you may be given a valid and fair assessment. `It does not mean that you are failing`.

***

> Rather than feeling pressured by the assessment, I hope you will enjoy the opportunity presented by the hands-on exercise. You will notice that `answering each question will further consolidate your learning`.

***

> I wish you the best for your individual assessment, Team.🤞

### For standardization of your answers…

> Please execute the lines of code below before you start work on your answers.

In [None]:
# Our standardized printing options

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib as mpl

np.set_printoptions(precision = 3)

pd.set_option("display.float_format", lambda x: "%.3f" % x)

plt.style.use("ggplot")

mpl.rcParams["axes.grid"] = True
mpl.rcParams["grid.color"] = "grey"
mpl.rcParams["grid.alpha"] = 0.25

mpl.rcParams["axes.facecolor"] = "white"

mpl.rcParams["legend.fontsize"] = 14

## 👇 Questions 1 and 2 ask you to assess the correlations of stocks returns, using two methods of analysis.

###  <font color = blue> 👉 Questions 1 </font>. Extract the stock prices of the following ticker symbols, from July 2013 to June 2022.

| Security | Symbol |
| -------- | ------ |
Merck | `MRK`
Marriott | `MAR`
3M | `MMM`
Adobe | `ADBE`
Aon | `AON`
American Airlines | `AAL`
Capital One | `COF`
Coca-Cola | `KO`
Citigroup | `C`

### Assess which of the pairs of tickers (there are 36 unique possible pairs) appear to show the closest relationships (i.e., greatest correlations) when comparing daily percentage changes.

### Make sure to provide the lines of code that lead to your answers, and give your answers in `Answer 1 cell`.

### <font color = red> Answer 1 </font>

    The answer is ____________&______________ .
    

###  <font color = blue> 👉 Questions 2 </font>. Let’s look for clusters of correlations using the agglomerate hierarchical clustering technique (AGNES) for the ticker symbols above. Run the analysis and prepare a dendrogram. 

### According to the dendrogram, which of the stocks are most strongly correlated? Working from the dendrogram, please also identify two stocks that are not well correlated.

### Report the results using the `average` and `ward` methods, respectively.

### Below are the lines of code that lead to an answer:

### <font color = red> Answer 2 </font>

    The answer is ________________________________________________ .

## 👇 Questions 3 to 5 ask you to create a predictive model for the weekly return of IBM stock. You will use supervised learning for your predictive modelling.

### <font color = "green"> NOTE: There are 10 questions in the assessment, and each question has three credits. Questions 3–5 thus have a total of nine credits allocated, but with only two actual questions. 
    
### <font color = "green"> Question 3 is about building a predictive model. For correctly completing all parts of it, you will receive three credits. 
    
### <font color = "green"> In Questions 4 and 5 you are competing with our classmates with the performance of your predictive models. 
    
### <font color = "green"> Predictive models are ultimately assessed based on their performance in prediction. Questions 4 and 5 are for `relative grading`, based on your predictive model’s performance. That is, your answer here will be assessed relative to other students’ performances.

You must give your best algorithm (`best`, based on performance metrics). 

* if your best model’s performance is among the `top three` results in class, you will receive `six points`; 
* if your best model’s performance is `between the 4th and 6th ranks`, you will receive `four points`; 
* `the rest of our classmates` will receive `two points`.

* `MSE` will be the first criterion for performance appraisal. In the event of a tie, `R-squared` will be used as a tie-breaker.

> Among the three major factors (correlated assets, technical indicators, and fundamental analysis), you will use correlated assets and technical indicators as input features here.

    Step 0. The analysis horizon is 10 years between 2010 and 2019.
    
    Step 1. Use 80% of your data for the training of your algorithm, and 20% for the testing set.

    Step 2. For your feature engineering...
    
> Our operational definition of `outcome` (`Y`) is the weekly return of IBM. The number of trading days in a week is assumed to be five, and we compute the return using five trading days. 
<br>
    
### <font color = "green"> NOTE: The lagged five-day variables embed the time series component by using a time-delay approach, where the lagged variable is included as one of the predictor variables. This step translates the time series data into a supervised regression-based model framework.
    
### <font color = "green"> For `input features` (`predictors`; `Xs`), you may `choose` to use any or all of the following features.

> `Correlated assets`

* lagged five-day returns of stocks (AAPL, AMZN, MSFT);
* currency exchange rates (USD/JPY and GBP/USD);
* indices (S&P 500, Dow Jones, and VIX);
* lagged five-day, 15-day, 30-day, and 60-day returns of IBM.

> `Technical indicators`

* 21-day, 63-day, and 252-day moving averages;
* 10-day, 30-day, and 200-day exponential moving averages;
* 10-day, 30-day, and 200-day relative strength index;
* stochastic oscillator %K and %D (using rolling windows of 10-, 30-, 200-day);
* rate of change (using 10-, 30-day past prices).

    
    Step 3. For your algorithm of choices, please assess the model performance of the following algorithms: 

* Linear Regression
* Elastic Net
* LASSO
* Support Vector Machine
* K-Nearest Neighbor
* ARIMA
* Decision Tree
* Extra Trees 
* Random Forest
* Gradient Boosting Tree
* Adaptive Boosting
    
    
    Step 4. For this question, hyperparameter tuning won’t be requested. 
    
    Step 5. But make sure to compare the model performance of the above algorithms.

### Set `num_folds` at `10`, `seed` at `627` 
    
### The metric for assessing model performance will be mean squared error (`MSE`) and r-squared ($ R^2 $).

### Below are the lines of code that lead to an <font color = red> Answer 3 </font>:

### Below are the lines of code that lead to an <font color = red>Answer 4</font> and <font color = red>Answer 5</font> :

### <font color = green> NOTE: You must give your best algorithm here (`best`, based on performance metrics). 

### <font color = blue> 👉 Question 6</font>. This question is where you will execute principal component analysis (PCA) for portfolio management.
    
### As learned from the course, the principal components of the correlation matrix capture most of the covariation among assets in descending order and are mutually uncorrelated. Importantly, we can employ standardized principal components as portfolio weights. 

### Randomly choose 30 stock tickers among the stock tickers from below DF.
    
### Set the seed number 627 for an identical set of stock tickers at the starting point of your analysis for everyone.
    
### Your objective is to find the best-performing portfolio, using PCA. Using what you learned in class, identify the profile of each portfolio. Please visualize and show the relative performances of the four portfolios (Disregard the portfolio that shows infinite returns, and use the other four).
    
### Please use 75% of your data for PCA and 25% for backtesting. 
    
### <font color = "green"> NOTE: The investment horizon will be 7 years between 2013 and 2019.

In [None]:
list_of_tickers = pd.read_html("https://en.wikipedia.org/wiki/Nasdaq-100")[4]

list_of_tickers

### Below are the lines of code that lead to an answer:

### <font color = red> Answer 6 (`including visualization component`) is presented in the cell below: </font>

## 👇 Questions 7 to 10 ask you to build, execute, and backtest a `moving average crossover strategy`.

###  <font color = blue> 👉 Question 7. </font> Our securities of interest are Apple (`AAPL`), Google (`GOOGL`), Microsoft (`MSFT`) stocks. The time period for analysis is from October 2006 to December 2012.

### The strategy that you'll be developing is as follows: you create two separate Simple Moving Averages (SMA) of a time series with differing lookback periods (here, 40 days and 100 days). If the short moving average exceeds the long moving average then you go long, if the long moving average exceeds the short moving average then you exit.

### On the days that the signal is 1 and the short moving average crosses the long moving average (for the period greater than the shortest moving average window), you'll buy a 100 shares. The days on which the signal is 0, the final result will be 0 as a result of the operation 100 x signal.

### For rolling statistics, set `min_periods` at `1` and `center` argument at `False`.

### Use `adjusted` closing price.

### Let’s suppose that you started from a `$100,000` capital base for each of the three securities.

### Disregarding commission, how much will you have in the end in your account for each of the securities as a result of the current momentum-based trading?

### Below are the lines of code that lead to an answer:

### <font color = red> Answer 7 </font>


    AAPL  : _$______________ 
    
    GOOGL : _$______________ 
    
    MSFT  : _$______________ 


###  <font color = blue> 👉 Question 8. </font>  

### Calculate and visualize the maximum drawdowns and the longest drawdown periods for `AAPL`, `GOOGL`, and `MSFT`.

### Below are the lines of code that lead to an answer:

### <font color = red> Answer 8 (`visualization component`) is presented in the cell below: </font>

### <font color = red> Answer 8 </font>
    
    As to AAPL,
    
    The maximum drawdown is about ____________ percentage points.
    The longest drawdown period lasts for _____________ days.
    
    As to GOOGL,
    
    The maximum drawdown is about ____________ percentage points.
    The longest drawdown period lasts for _____________ days.
    
    As to MSFT,
    
    The maximum drawdown is about ____________ percentage points.
    The longest drawdown period lasts for _____________ days.


###  <font color = blue> 👉 Question 9. </font> Using the current momentum strategy, which of the securities shows the greatest Sharpe ratio?

### Below are the lines of code that lead to an answer:

### <font color = red> Answer 9 </font>

    The answer is ____________________________ .

###  <font color = blue> 👉 Question 10. </font> Report compound annual growth rate (CAGR) for `AAPL`, `GOOGL`, and `MSFT`.

### Below are the lines of code that lead to an answer:

### <font color = red> Answer 10 </font>


    AAPL  : _____________%__ 
    
    GOOGL : _____________%__ 
    
    MSFT  : _____________%__ 


> 💯 “Thank you for putting your efforts into the individual assessment questions” 😊