# WEIGHTED COIN MODEL

## ML NOTEBOOK 

The objective of this tutorial notebook is to illustrate the generation of a prediction to *buy* or *sell* a unit of a given security based on a weighted coin model. 

There are three distinct parts to this notebook:

- data preparation;
- ML analysis;
- prediction.

### LIBRARY IMPORTS

Before we begin we import the relevant scripts and functions for this notebook.


In [1]:
from dataprep import PriceData
from ml_analysis import mlModel, mlResults
from predict import Predict

**Note.** *We do not need to import any core Python libraries such as `pandas`, `numpy` or `scikit-learn`. These are all imported in the scripts `dataprep`, `ml_analysis` and `predict`.*

## DATA PREP

### PREPARATORY USER INPUTS

Inputs to prepare:

- time period (starting and ending dates) over which to analyse;
- interval length (in days);
- depth (dimension of feature vector)


In [2]:
period = '', '' # pass two string dates (start, end), format YEAR-MONTH-DAY
intv = int() # pass an integer number of days e.g., 3 for 3 days
depth = int() # pass an integer, defines feature vector dimension

#### PRICE DATAFRAME

With the above inputs we can initiate a *price dataframe object*. We can call methods on this object to then further our analysis.

In [3]:
# priceDataFrame = PriceData(period, intv, depth)

### TRAINING AND TEST DATA OUTPUTS

With the price dataframe initialized, we can call methods on it with respect to a security of interest.

#### TICKER CODE

In [4]:
ticker = '' # pass a valid ticker code

#### HISTORICAL DATA TO CSV

We can use the API token from `secrets.py` to call data from **IEX Cloud**, download historical data between the dates specified in the input `period` and save it to a `.csv` file in the folder `datasets/`. 

To do this simply pass `ticker` and call `.to_csv(ticker)` on the dataframe object `priceDataFrame` above. 

In [5]:
# priceDataFrame.to_csv(ticker)

#### TRAINING AND LABELS

Once the historical data is downloaded and saved, call `.parse(ticker)` on the dataframe object to generate the training and label datasets.

These are saved again as `.csv` files in the folder `datasets/`.

In [6]:
# priceDataFrame.parse(ticker)

### THE WEIGHTED COIN

Based on the data analysed so far we can call the method `.weightings(ticker)` on the dataframe object to return the `BUY/SELL` signal rate for the security between dates specified in `period`.


In [7]:
# priceDataFrame.weightings(ticker)

#### GENERAL CONCLUSON 

The weighted coin represents the number of times we retrospectively should have bought or sold the security in order to maximize our returns with respect to movements in price per `intv`-many days. That is, imagine that at the start of each `intv`-day interval between dates in `period` that we make a decision to buy or sell based on tossing a weighted coin. 

- if the `BUY` signal rate is *higher* than the `SELL` signal rate, the security price has appreciated over generic `intv` day periods;
- if the `BUY` signal rate is *lower* than the `SELL` signal rate, the security price has depreciated over generic `intv` day periods.

## ML ANALYSIS

In this section we do not need to make any API calls. In `DATA PREP` we used an API to generate data and parsed it into classified training data (training data and labels). With these datasets we can move to train a classifier and build a machine learning model.

### TRAINING

The parameter `trainTestSplit` is the portion of the classified training data which is set aside for validation. During training, this data is used to test the accuracy of the model in making predictions.


In [8]:
trainTestSplit = float() # enter a number between 0 and 1

In `ml_analysis.py` the following models from `scikit-learn` with their defailt hyperparameters are trained:

- Perceptron;
- Logistic Regression;
- Support Vector Machine;
- Naive Bayes;
- Decision Tree;
- Random Forest;
- $k$-Nearest Neighbours

We initiate the class `mlModel`, passing the same parameters `period`, `intv` and `depth` used in initiating the price dataframe earlier.

In [9]:
# model = mlModel(period, intv, depth, trainTestSplit)

Calling the method `.fitted(ticker)` then trains the above models and returns datasets and fitted models based on the validation parameter `trainTestSplit` soecified earlier.

In [10]:
# data_train, data_test, labels_train, labels_test, fitted_models = model.fitted(ticker)

### ML RESULTS

The outputs of `model.fitted(ticker)` above are then passed to initialise the class object `mlResults()`. 

In [11]:
# results = mlResults(data_train, data_test, labels_train, labels_test, fitted_models)

Calling `.show()` returns a table recording the performance of each classifier.

In [12]:
# results.show()

#### BEST CLASSIFIER

Calling the method `.bestModel(metric)` on the `results` object above returns the classifier which outperformed all other with respect to the chosen metric.


In [13]:
# metric = '' # e.g., metric = 'Aggregate score'
# classifier = results.bestModel(metric)

## PREDICTION

The last part of this notebook is prediction. As of *now*, should we buy or sell a unit of the security which we have been studying so far? The variable `classifier` in the previous code block is the classifier which out performed the others in our list of possible classifiers with respect to buy/sell prediction. 

This classifier, along with the parameters `intv` and `depth` specified in the price dataframe object are now passed in order to initiate a prediction object.

In [14]:
# prediction = Predict(intv, depth, classifier)

Call `.predict(ticker)` on the prediction object to generate a prediction on whether to buy or sell. 

Recall:

- $+1$ is a *buy* signal;
- $-1$ is a *sell* signal.

In [15]:
# prediction.predict(ticker)