# Part 3: Data Analytics

## Step 1: Selecting real-world dataset

The dataset that will be used today are the stock market data of TESLA and some companies related to it. It is extracted using yfinance API. yfinance is an open source library developed by Ran Aroussi as a means to access the financial data available on Yahoo Finance[1]. Out of the many variables extracted, we will be focusing on Closing Price and Volume of shares traded. 

In [None]:
#installing needed libraries
pip install yfinance

In [None]:
pip install keras

In [None]:
pip install tensorflow

In [None]:
# Importing packages
import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from keras.models import Sequential,load_model
from keras.layers import Dense, LSTM, Dropout

The stock market data is extracted by creating a ticker object for the particular stocks we need data for. The primary company I am interested is TESLA (TSLA). They are growing at a tremendous pace and is currently the biggest electric vehicle manufacturer in the world [2]. I am also interested to know the effect of TSLA share price have on its main battery supplier, Panasonic (PCRFY) [3]. I also intend to find the effect of TSLA share price on its competitor in the field of vehicle manufacturing (Ford(F) and General Motors(GM)) and energy (Royal Dutch Shell(RDS-B) and BP(BP)). We will be extracting data of last 5 years.

In [None]:
# Setting the start and end date
start_date = '2016-11-30'
end_date = '2021-12-01'

# Set the ticker
ticker = 'TSLA','PCRFY','GM','F','RDS-B','BP'

# Get the data
data = yf.download(ticker, start_date, end_date,group_by='tickers')

# Print data
data

In [None]:
# Saving the data into a csv file
data.to_csv('data.csv')

## Step 2: Data preparation and Cleaning

First step is we will be checking there are any null values in the dataset. We will also create 3 additional columns for the variables Percentage change of Price(%changeprice), Percentage change of volume(%changevolume) and Cumulative Return of the stock(CumulativeRet).

In [None]:
# Loading the dataset into a data fram
stock_data = pd.read_csv('data.csv', header=[0, 1], index_col=0)
# Print dataset
stock_data

In [None]:
# Print the shape of dataset
stock_data.shape

In [None]:
# Checking no. of null values
stock_data.isnull().sum()

In [None]:
# Checking any numeric data is not string
stock_data.dtypes.value_counts()

###### Pre-Processing Completed

We can see that there is no Null value in the data, and also all the stock values are not strings either. Both of these are very good indicators that all values are present.

###### Creation of additional columns

In [None]:
# Calculating daily change of price
tesladailychanges= stock_data[( 'TSLA', 'Adj Close')]
teslapercentagedailychange= tesladailychanges.pct_change(periods=1)
stock_data['TSLA','%changeprice'] = teslapercentagedailychange

panasonicdailychanges= stock_data[( 'PCRFY', 'Adj Close')]
panasonicpercentagedailychange= panasonicdailychanges.pct_change(periods=1)
stock_data['PCRFY','%changeprice'] = panasonicpercentagedailychange

GMdailychanges= stock_data[( 'GM', 'Adj Close')]
GMpercentagedailychange= GMdailychanges.pct_change(periods=1)
stock_data['GM','%changeprice'] = GMpercentagedailychange

forddailychanges= stock_data[( 'F', 'Adj Close')]
fordpercentagedailychange= forddailychanges.pct_change(periods=1)
stock_data['F','%changeprice'] = fordpercentagedailychange

shelldailychanges= stock_data[( 'RDS-B', 'Adj Close')]
shellpercentagedailychange= shelldailychanges.pct_change(periods=1)
stock_data['RDS-B','%changeprice'] = shellpercentagedailychange

bpdailychanges= stock_data[( 'BP', 'Adj Close')]
bppercentagedailychange= bpdailychanges.pct_change(periods=1)
stock_data['BP','%changeprice'] = bppercentagedailychange

stock_data = stock_data.sort_index(axis=1)

In [None]:
# Calculating daily change of volume traded
teslavolumechanges= stock_data[( 'TSLA', 'Volume')]
teslapercentagevolumechange= teslavolumechanges.pct_change(periods=1)
stock_data['TSLA','%changevolume'] = teslapercentagevolumechange

panasonicvolumechanges= stock_data[( 'PCRFY', 'Volume')]
panasonicpercentagevolumechange= panasonicvolumechanges.pct_change(periods=1)
stock_data['PCRFY','%changevolume'] = panasonicpercentagevolumechange

GMvolumechanges= stock_data[( 'GM', 'Volume')]
GMpercentagevolumechange= GMvolumechanges.pct_change(periods=1)
stock_data['GM','%changevolume'] = GMpercentagevolumechange

fordvolumechanges= stock_data[( 'F', 'Volume')]
fordpercentagevolumechange= fordvolumechanges.pct_change(periods=1)
stock_data['F','%changevolume'] = fordpercentagevolumechange

shellvolumechanges= stock_data[( 'RDS-B', 'Volume')]
shellpercentagevolumechange= shellvolumechanges.pct_change(periods=1)
stock_data['RDS-B','%changevolume'] = shellpercentagevolumechange

bpvolumechanges= stock_data[( 'BP', 'Volume')]
bppercentagevolumechange= bpvolumechanges.pct_change(periods=1)
stock_data['BP','%changevolume'] = bppercentagevolumechange

stock_data = stock_data.sort_index(axis=1)

In [None]:
# Calculating Cumulative Return of stock
stock_data['TSLA','CumulativeRet'] = (1+ stock_data[( 'TSLA', '%changeprice')]).cumprod()

stock_data['PCRFY','CumulativeRet'] = (1+ stock_data[( 'PCRFY', '%changeprice')]).cumprod()

stock_data['GM','CumulativeRet'] = (1+ stock_data[( 'GM', '%changeprice')]).cumprod()

stock_data['F','CumulativeRet'] = (1+ stock_data[( 'F', '%changeprice')]).cumprod()

stock_data['RDS-B','CumulativeRet'] = (1+ stock_data[( 'RDS-B', '%changeprice')]).cumprod()

stock_data['BP','CumulativeRet'] = (1+ stock_data[( 'BP', '%changeprice')]).cumprod()

stock_data = stock_data.sort_index(axis=1)

In [None]:
# Printing stock data
stock_data