# Using Machine Learning Technique to Boost Capital Asset Pricing Model

## Aim

The aim of this project is to implement Sentiment Analysis on company’s Annual Reports and Artificial Neural Networks on stock prices to develop an approach to boost the performance of the CAPM by reducing the risk factor of investing in distress stocks and predicting the company’s volatility. Also, enhance the accuracy of prediction for the return and risk-free components of the model.

Traditional investment techniques utilize fundamental analysis to value corporate shares, where financial information is assessed to measure the companies manage capacity. This technique allows investors to value the company and thus, its shares.  Another traditional technique "Technical Analysis" forecasts shares, and bonds prices based on historical data. (Ganti & Segal 2019; Hayes 2019)

The return over investment comes with systematic risk as a trade-off. The higher the return, higher the risk. This relationship is measured through the Capital Asset Pricing Model (CAPM), which provides an expected return for the expected risk. CAPM uses different components to measures this trade-off. The model incorporates the time value of money by implementing a "risk free" rate and compared to the expected return of the market to calculate the Market Risk Premium. The model incorporates a risk fixed by the volatility of the investment, represented by the value $\ beta $  (Kenton 2019).

$$ Er = Rf + \beta * ( ERm - Rf ) $$

* $Er$: Expected return of investment 
* $Rf$: Risk Freee rate 
* $ \beta$  : Risk "volatility" of investment
* $ ERm $: Expected return of market  

The CAPM possess some major defects. The first one is the assumption that the risk (BETA) can be measured by the stock’s PRICE volatility, though the direction in which the price moves is not equally risky. Second; The model assumes that the risk-free rate (Rf) will remain constant during the discount period and the third one is the assumption that the feature cash flows (returns) can be estimated with high accuracy for the discount, in this case, the CAPM will not be necessary (Kenton 2019).

The techniques necessary to develop this approach are Sentiment Analysis and Artificial Neural Networks. Sentiment analysis is a technique used to extract subjective information from text. This allows determining if the writer feels positive, negative or neutral regarding to the content included in the text. (Monkey Learn 2018)

The objective is broken into the following objectives.


1.	Create potential portfolio:
<br>

2.	Collect companies Annual Report (10-K) including financial statements.
<br>

3.	Extract meaning from the text
<br>

4.	Implement machine learning algorithms to predict stocks returns 
<br>

5.	Implement machine learning algorithms to predict risk free rates 
<br>

6.	Construct a Capital Asset Pricing Model using the Sentiment Analysis as a volatility proxy and Machine Learning algorithms as the Market Return and risk free rate estimation


## Background.


As explained before, Fundamental Analysis and Technical analysis are traditionally implemented to evaluate and predict returns. These techniques have been a topic of research for many different implementations of machine learning. Gradient Boosting Decision Trees, Support Vector Machines and Neural Networks have tested to predict stock  returns, where non-linear models have shown to outperform linear model (Sugitomo & Minami 2018).

Artificial Neural Networks has been researched to predict the returns of capital markets by achieving a dimensionality reduction. PCA-based ANN classifiers have been tested to be the best model to predict the SPDR S&P 500 ETF (Zhong & Enke 2019).

Sentiment analysis over Annual Reports has been implemented to predict financial distress in United states bank institutions. This research has shown that managers facing financial distress in their institutions are more likely to use more negative words in their Annual Reports when describing their institution economic situation.  
(Gandhi, Loughran & McDonald 2019)

Many opinions have aroused over this topic, and traditional investing techniques are still considered to be more reliable than the "computer" approach. This is because the information keeps changing and multiple factors affects the results, the data is composed of more noise than signals and the behaviours can change drastically in a short lapse of time (Dewey 2019).

This project proposes the combination of the above methods to create a hybrid Traditional/Machine learning approach, which objective is to minimize risk factors when creating a portfolio by predicting if a company is prone to face financial distress and predict the direction in which the stock price will move using Annual Report Sentiment Analysis. Also the proposed model looks to improve the prediction of the expected return of the market and risk free rate by implementing hybrid machine learning algorithms such as PCA-based ANN classifier.

## Research Project



The proposed application for the machine learning models over CAPM looks for a solution to reduce the error presented in the underlying assumptions in the CAPM (Market volatility, risk-free rate and market return predictions).  

The proposed innovation of this project is to merge two states of the art models (PCA-based ANN and Sentiment Analysis) to boost the performance of the traditional CAPM. This boosted model will improve investment decisions and mitigate risk factors creating a hybrid model, solving the conjuncture between the new era of machine learning applied to the stock market and the traditional approach of constructing investment techniques. 

This proposal will allow making an easier transition between the traditional investment techniques and state of the art machine learning model, allowing to join experienced investors and data scientists to research for optimal usage of both techniques.

The process by which this model would be developed is:

- 1.	Create potential portfolio:
    - Select potential companies for investment based on traditional methods as screening and expert criteria
<br>

- 2.	Data Set:
    - Collect companies Annual Report (10-K) including financial statements.
    - Split the training and test set using time frames
<br>

- 3.	Word List:
    - Extract and clean text from Annual Reports by removing all non-textual elements such as HTML, ASCII, tables, images, links and others. By vectorizing words using word2vec  
    - Create the dictionary and make the classification between positive, negative words, based on financial vocabulary.
<br>

- 4.	Undertake Sentiment Analysis
    - Evaluate different classification algorithms for sentiment analysis task (Naive Bayes, Logistic Regression, Support Vector Machines and Neural Networks)
<br>

- 5.	Sentiment Analysis Evaluation:
    - Compare Sentiment analysis with financial statements to determine a correlation between negative sentiment and financial distress 
<br>

- 6.	Implement machine learning algorithms to predict stocks returns 
    - Create and compare machine learning algorithms to get the best prediction for the expected returns. (PCA-based NN, Gradient Boosting, Decision Trees, Support Vector Machines and Neural Networks)

- 7.	Implement machine learning algorithms to predict risk free rates over a 10 year US bond 
    - Create and compare machine learning algorithms to get the best prediction for the expected returns. (PCA-based NN, Gradient Boosting, Decision Trees, Support Vector Machines and Neural Networks)

- 8.	Construct a Capital Asset Pricing Model using the Sentiment Analysis as a volatility proxy and Machine Learning algorithms as the Market Return and risk free rate estimation


The expected result from this project is to create a boosted CAPM capable of building a better investment portfolio with higher accuracy and reducing the risk factors by minimizing the impact of the underlying assumptions of the model. Figure1 illustrates the expected schedule to develop the project.

Specialised personnel is required to create this project where two Professional Traders are necessary to build the portfolio based on screening and experience in trading, they will also create a variable selection to feed the models and build the CAPM. Two data scientists with finance knowledge are also required to make de Sentiment Analysis and the ANN model for the stock price and risk-free rate prediction. The implementation will require of powerfull computers capable of processing large data sets and high complexity algorithms. Two cloud computing services (AWS EC2 GPU instance) will be hired for the most complex model testing. Budget detailed in Figure 2. 

<center>Figure 1. Project Gantt Chart</center> 		
    
    
![image.png](attachment:image.png)

<center>Figure 2. Project Budget</center> 


![image.png](attachment:image.png)



## References 


Abdullah, S., Rahaman, M. & Rahman, M. 2013, Analysis of stock market using text mining and natural language processing, .

Amazon Web Services Amazon EMR- EC2 pricing - Amazon Web Services, viewed Oct 7, 2019, <https://aws.amazon.com/emr/pricing/>.

Dewey, R. 2019. 'Computer Models Won’t Beat the Stock Market Any Time Soon', Bloomberg.com, May, .

Gandhi, P., Loughran, T. & McDonald, B. 2019, 'Using Annual Report Sentiment as a Proxy for Financial Distress in U.S. Banks', Journal of Behavioral Finance, , pp. 1-13.

Ganti, A. & Segal, T. 2019, Fundamental Analysis, viewed Oct 3, 2019, <https://www.investopedia.com/terms/f/fundamentalanalysis.asp>.

Hayes, A. 2019, Technical Analysis Definition, viewed Oct 3, 2019, <https://www.investopedia.com/terms/t/technicalanalysis.asp>.

Kenton, W. 2019, Capital Asset Pricing Model (CAPM), viewed Oct 3, 2019, <https://www.investopedia.com/terms/c/capm.asp>.

Kim, R., Yoo, D. & Kim, G. 2016, 'Development of Prediction Model of Financial Distress and Improvement of Prediction Performance Using Data Mining Techniques', Information Systems Review, vol. 18, no. 2, pp. 173-98.

Monkey Learn 2018, Sentiment Analysis: Nearly Everything You Need to Know, viewed Oct 3, 2019, <https://monkeylearn.com/sentiment-analysis>.

Sugitomo, S. & Minami, S. 2018, 'Fundamental Factor Models Using Machine Learning', Journal of Mathematical Finance, vol. 08, pp. 111-8.

Zhong, X. & Enke, D. 2019, 'Predicting the daily return direction of the stock market using hybrid machine learning algorithms', Financial Innovation, vol. 5, no. 1, pp. 24.