# Final Project for DNDS6288 - Scientific Python 2021/22 Fall
## Student: Abay Jumabayev
## 1. Introduction
I want to accomplish two things:
1. Assess the simple trading algorithm efficiency 
2. Predict the future stock prices of stocks

I will start with the description of the trading algorithm. The idea is to buy stock, hold them until the stock prices increases by $x$ percent, sell, and receive profits. Buy stock again if the price drops by $x$ percent from the moment of last transaction. Repeat this process again and again.

I need a stock with a volatile price for this algorithm to work. In this project, I will use Tesla (TSLA ticker) as it is trending and volatile. The stock under analysis can be changed in the code. 

Testing period is 1 year (which can be changed in the code). The data is the hourly data from Yahoo Finance.

I will compare the effectiveness of the algorithm with the S&P 500 ETF Trust (SPY ticker), which is considered to be a good option for beginning investors who don't want to interact with the stocks much.

Problem: Suppose an investor has 1000\$ and 1 year. Base option is to buy S&P 500 ETF Trust. The second option is to use an algorithm described above.

## 2. Getting and cleaning data
Importing necessary modules

In [3]:
import pandas as pd
import numpy as np
import random as rnd
import scipy.stats as stats
import scipy.optimize as opt
import json as json
import matplotlib as mpl
from math import exp
from matplotlib import pyplot as plt
from IPython.core.pylabtools import figsize
from IPython.display import display
from IPython.core.display import HTML
rnd.seed(2)
import warnings
warnings.filterwarnings('ignore')
import yfinance as yf

Getting the data. Both SPY and TSLA separately

In [26]:
# Choosing tickers for an analysis
base_ticker = "SPY"
algo_ticker = "TSLA"

# Choosing period of analysis and interval
period = "365d" # 1 year
interval = "60m" # 1 hour


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [None]:
# # UNCOMMENT THIS TO DOWNLOAD THE DATA
# # Get the data
# base = yf.download(tickers=base_ticker, period=period, interval=interval)
# algo = yf.download(tickers=base_ticker, period=period, interval=interval)

Save the raw data. We might need it for reproducability.

First, define the folders

In [27]:
# location folders
data_in = ".\\Documents\\GitHub\\SciPy_final_project\\raw\\"
data_out = ".\\Documents\\GitHub\\SciPy_final_project\\clean\\"
results = ".\\Documents\\GitHub\\SciPy_final_project\\results\\"

In [28]:
# # UNCOMMENT THIS TO SAVE THE DATA
# # save raw data
# base.to_csv(data_in+'base_raw.csv',index=True)
# algo.to_csv(data_in+'algo_raw.csv',index=True)

For now, I will use the saved data. If you want to download the data from scratch, uncomment two cells above.

In [29]:
base = pd.read_csv(data_in+'base_raw.csv')
algo = pd.read_csv(data_in+'base_raw.csv')

In [None]:
a

In [14]:
# convert index to datetime
base = base.reset_index()
base.Datetime = pd.to_datetime(base.Datetime, format = '%y-%m-%d %H:%M:%S')

In [30]:
base

Unnamed: 0,Datetime,Open,High,Low,Close,Adj Close,Volume
0,2020-07-20 09:30:00-04:00,321.429993,322.649994,320.619995,322.609985,322.609985,13794187
1,2020-07-20 10:30:00-04:00,322.630005,323.019989,322.329987,322.739990,322.739990,5203122
2,2020-07-20 11:30:00-04:00,322.750000,323.140015,322.459991,322.739990,322.739990,5009515
3,2020-07-20 12:30:00-04:00,322.739990,323.320007,322.670013,323.295013,323.295013,3248668
4,2020-07-20 13:30:00-04:00,323.297699,323.809998,323.153992,323.730011,323.730011,5758334
...,...,...,...,...,...,...,...
2536,2021-12-27 13:30:00-05:00,475.984985,476.600006,475.839996,476.595001,476.595001,4855098
2537,2021-12-27 14:30:00-05:00,476.600006,476.609985,475.693115,476.269989,476.269989,6853121
2538,2021-12-27 15:30:00-05:00,476.269989,477.309998,476.109985,477.269989,477.269989,10442363
2539,2021-12-28 09:30:00-05:00,477.720001,478.809998,477.230011,477.760010,477.760010,6944843


Save both dataframes 

In [18]:
base.Datetime[0]

Timestamp('2020-07-20 09:30:00-0400', tz='America/New_York')

In [6]:
data.head(20)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-07-20 09:30:00-04:00,321.429993,322.649994,320.619995,322.609985,322.609985,13794187
2020-07-20 10:30:00-04:00,322.630005,323.019989,322.329987,322.739990,322.739990,5203122
2020-07-20 11:30:00-04:00,322.750000,323.140015,322.459991,322.739990,322.739990,5009515
2020-07-20 12:30:00-04:00,322.739990,323.320007,322.670013,323.295013,323.295013,3248668
2020-07-20 13:30:00-04:00,323.297699,323.809998,323.153992,323.730011,323.730011,5758334
...,...,...,...,...,...,...
2021-12-27 13:30:00-05:00,475.984985,476.600006,475.839996,476.595001,476.595001,4855098
2021-12-27 14:30:00-05:00,476.600006,476.609985,475.693115,476.269989,476.269989,6853121
2021-12-27 15:30:00-05:00,476.269989,477.309998,476.109985,477.269989,477.269989,10442363
2021-12-28 09:30:00-05:00,477.720001,478.209991,477.230011,478.059998,478.059998,3865967
