# Part 1: Features Engineering

Indicators are tools that help an investor or a trader to make a decision whether to buy stock or sell.
Technical indicators (which can be called features in this context) constructed from stock data, such as `price` or `volume`.
In this part we will create following features: `Bollinger Bands`, `RSI`, `MACD`, `Moving Average`, `Return`, `Momentum`, `Change` and `Volatility`.

`Return` will serve as a **target** or dependent variable. Other features will serve as independent variables.

## Importing Libraries

In [1]:
import pandas_datareader as pdr
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from importlib import reload
from features_engineering import ma7, ma21, rsi, macd, bollinger_bands, momentum, get_tesla_headlines

from bs4 import BeautifulSoup
import requests
from nltk.sentiment.vader import SentimentIntensityAnalyzer
warnings.filterwarnings('ignore')
plt.rcParams['figure.dpi'] = 227 # native screen dpi for my computer

# Original Data

In [3]:
tsla_df = pdr.get_data_yahoo('tsla', '1980')
tsla_df.to_csv('data/raw_stocks.csv')

Let's take a look at the historical data of **Tesla**.

In [4]:
tsla_df.head()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-06-29,25.0,17.540001,19.0,23.889999,18766300,23.889999
2010-06-30,30.42,23.299999,25.790001,23.83,17187100,23.83
2010-07-01,25.92,20.27,25.0,21.959999,8218800,21.959999
2010-07-02,23.1,18.709999,23.0,19.200001,5139800,19.200001
2010-07-06,20.0,15.83,20.0,16.110001,6866900,16.110001


In [5]:
tesla_df.describe()

Unnamed: 0,High,Low,Open,Close,Volume,Adj Close
count,2355.0,2355.0,2355.0,2355.0,2355.0,2355.0
mean,183.326119,176.897919,180.195248,180.210909,5585331.0,180.210909
std,114.760392,111.145495,113.008218,113.030067,11340590.0,113.030067
min,16.629999,14.98,16.139999,15.8,118500.0,15.8
25%,34.504999,33.395,33.975,34.02,1799750.0,34.02
50%,214.440002,206.850006,210.380005,210.089996,4481700.0,210.089996
75%,264.604996,255.834999,261.125,260.945007,7175250.0,260.945007
max,389.609985,379.350006,386.690002,385.0,505990000.0,385.0


### Checking for missing data

In [6]:
print('No missing data') if sum(tesla_df.isna().sum()) == 0 else tesla_df.isna().sum()

No missing data


# Generating Features

In [6]:
#del stocks
files = os.listdir('data/raw_stocks')
stocks = {}
for file in files:
    name = file.lower().split('.')[0]
    stocks[name] = pd.read_csv('data/raw_stocks/'+file)    
    
    # Return Feature
    stocks[name]['Return'] = round(stocks[name]['Close'] / stocks[name]['Open'] - 1, 3)
    # Change Feature
    # Change of the price from previous day, absolute value
    stocks[name]['Change'] = (stocks[name].Close - stocks[name].Close.shift(1)).fillna(0)
    # Date Feature
    stocks[name]['Date'] = pd.to_datetime(stocks[name]['Date'])
    stocks[name].set_index('Date', inplace=True)
    # Volatility Feature
    stocks[name]['Volatility'] = stocks[name].Close.ewm(21).std()
    # Moving Average, 7 days
    stocks[name]['MA7'] = ma7(stocks[name])
    # Moving Average, 21 days
    stocks[name]['MA21'] = ma21(stocks[name])
    # Momentum
    stocks[name]['Momentum'] = momentum(stocks[name].Close, 3)
    # RSI (Relative Strength Index)
    stocks[name]['RSI'] = rsi(stocks[name])
    # MACD - (Moving Average Convergence/Divergence)
    stocks[name]['MACD'], stocks[name]['Signal'] = macd(stocks[name])
    # Upper Band and Lower Band for Bollinger Bands
    stocks[name]['Upper_band'], stocks[name]['Lower_band'] = bollinger_bands(stocks[name])
    stocks[name].dropna(inplace=True)
    # Saving
    stocks[name].to_csv('data/stocks/'+name+'.csv')

stocks['tsla'].head()

Mostly we will rely on historical data and technical indicators. Additionally, we will use news headlines of Tesla to check hypothesis if news affect price movement.

## Tesla News Headlines

For news source we will use <a href="nasdaq.com">NASDAQ</a> website.
At the moment of parsing there were 120 pages of news from `2019-01-10` till `2019-09-05`

In [None]:
headlines_list, dates_list = [], []
for i in range(1, 120):    
    headlines, dates = get_tesla_headlines("https://www.nasdaq.com/symbol/tsla/news-headlines?page={}".format(i))
    headlines_list.append(headlines)
    dates_list.append(dates)
    time.sleep(1)

In [None]:
tesla_headlines = pd.DataFrame({'Title': [i for sub in headlines_list for i in sub], 'Date': [i for sub in dates_list for i in sub[:10]]})

## Unsupervised sentiment prediction

Once news are parsed, we will use unsupervised learning to assign sentiment to each news.

In [15]:
sid = SentimentIntensityAnalyzer()

In [57]:
tesla_headlines['Sentiment'] = tesla_headlines['Title'].map(lambda x: sid.polarity_scores(x)['compound'])
tesla_headlines.Date = pd.to_datetime(tesla_headlines.Date)
tesla_headlines.to_csv('data/tesla_headlines.csv')

In [24]:
tesla_headlines.head()

Unnamed: 0,Title,Date,Sentiment
0,Tesla's use of individual driver data for insu...,2019-09-05,0.0
1,U.S. safety agency cites Tesla Autopilot desig...,2019-09-04,0.0258
2,"U.S. safety agency cites driver error, Tesla A...",2019-09-04,-0.3818
3,"U.S. safety regulator cites driver error, Tesl...",2019-09-04,-0.3818
4,"U.S. NTSB cites driver error, Tesla Autopilot ...",2019-09-04,-0.6597


# Conclusion

Exploratory Analysis, Machine learning algorithms and Q-Learning will rely on features we generated at this point.