# Bitcoin Price Predictor Using Linear and Logistical Regression Analysis - Andrew Turvey

The idea is to take Google Trends data, the price fluctuation and trading volume of the previous day and make predictions on the closing price the following day using linear regression. For logistical regression you will make a boolean prediction on price.



In [1]:
import pandas as pd
import numpy as np
import random as rnd
import math
from pytrends.request import TrendReq
from seaborn import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import explained_variance_score, mean_absolute_error, r2_score, mean_squared_error
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, recall_score, precision_score, fbeta_score, classification_report

## Google Trends Data
The block below get the search keyword of "Bitcoin" from Google Trends.  Notice the dates 2021-03-31 and 2021-06-29.

In [2]:
pytrends = TrendReq(hl='en-US', tz=360)

#build list of keywords in this case only use Teslas
kw_list = ["Bitcoin"] 

# build the payload
pytrends.build_payload(kw_list, timeframe='2021-03-31 2021-06-29', geo='US')

bitcoinTrendsdf = pytrends.interest_over_time()
bitcoinTrendsdf = bitcoinTrendsdf.rename(columns={'Bitcoin': 'Previous_Search_Interest'})
bitcoinTrendsdf.reset_index(inplace=True, drop=True)
bitcoinTrendsdf

Unnamed: 0,Previous_Search_Interest,isPartial
0,29,False
1,41,False
2,29,False
3,22,False
4,22,False
...,...,...
86,24,False
87,22,False
88,22,False
89,25,False


## Getting Stock Data
Next is to get two sets of price data, one spans 2021-03-31 to 2021-06-29 this will be used as the previous day's data.  The other set goes from 2021-04-01 to 2021-06-30 that will be used as the current day data.

In [3]:
bitcoinPricedf = pd.read_csv("https://raw.githubusercontent.com/atlas125gev/StockProject/main/Homework7/Data%20Files/BTC-USD.csv")
bitcoinPreviousPricedf = pd.read_csv("https://raw.githubusercontent.com/atlas125gev/StockProject/main/Homework7/Data%20Files/BTC-USD-Previous.csv")
mergedPrice = pd.concat([bitcoinPricedf, bitcoinPreviousPricedf], axis=1)
mergedPrice

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Previous_Open,Previous_High,Previous_Low,Previous_Close,Previous_Adj_Close,Previous_Volume
0,2021-04-01,58926.562500,59586.070313,58505.277344,59095.808594,59095.808594,61669163792,58930.277344,59930.027344,57726.417969,58918.832031,58918.832031,65520826225
1,2021-04-02,59098.878906,60267.187500,58869.281250,59384.312500,59384.312500,58727860620,58926.562500,59586.070313,58505.277344,59095.808594,59095.808594,61669163792
2,2021-04-03,59397.410156,60110.269531,57603.890625,57603.890625,57603.890625,59641344484,59098.878906,60267.187500,58869.281250,59384.312500,59384.312500,58727860620
3,2021-04-04,57604.839844,58913.746094,57168.675781,58758.554688,58758.554688,50749662970,59397.410156,60110.269531,57603.890625,57603.890625,57603.890625,59641344484
4,2021-04-05,58760.875000,59891.296875,57694.824219,59057.878906,59057.878906,60706272115,57604.839844,58913.746094,57168.675781,58758.554688,58758.554688,50749662970
...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,2021-06-26,31594.664063,32637.587891,30184.501953,32186.277344,32186.277344,38585385521,34659.105469,35487.246094,31350.884766,31637.779297,31637.779297,40230904226
87,2021-06-27,32287.523438,34656.128906,32071.757813,34649.644531,34649.644531,35511640894,31594.664063,32637.587891,30184.501953,32186.277344,32186.277344,38585385521
88,2021-06-28,34679.121094,35219.890625,33902.074219,34434.335938,34434.335938,33892523752,32287.523438,34656.128906,32071.757813,34649.644531,34649.644531,35511640894
89,2021-06-29,34475.558594,36542.109375,34252.484375,35867.777344,35867.777344,37901460044,34679.121094,35219.890625,33902.074219,34434.335938,34434.335938,33892523752


## Merging the stock price data and the Google Trends data

In [4]:
mergedStockPrice = pd.concat([mergedPrice, bitcoinTrendsdf], axis=1)
mergedStockPrice.set_index('isPartial', drop=True)

mergedStockPrice

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Previous_Open,Previous_High,Previous_Low,Previous_Close,Previous_Adj_Close,Previous_Volume,Previous_Search_Interest,isPartial
0,2021-04-01,58926.562500,59586.070313,58505.277344,59095.808594,59095.808594,61669163792,58930.277344,59930.027344,57726.417969,58918.832031,58918.832031,65520826225,29,False
1,2021-04-02,59098.878906,60267.187500,58869.281250,59384.312500,59384.312500,58727860620,58926.562500,59586.070313,58505.277344,59095.808594,59095.808594,61669163792,41,False
2,2021-04-03,59397.410156,60110.269531,57603.890625,57603.890625,57603.890625,59641344484,59098.878906,60267.187500,58869.281250,59384.312500,59384.312500,58727860620,29,False
3,2021-04-04,57604.839844,58913.746094,57168.675781,58758.554688,58758.554688,50749662970,59397.410156,60110.269531,57603.890625,57603.890625,57603.890625,59641344484,22,False
4,2021-04-05,58760.875000,59891.296875,57694.824219,59057.878906,59057.878906,60706272115,57604.839844,58913.746094,57168.675781,58758.554688,58758.554688,50749662970,22,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,2021-06-26,31594.664063,32637.587891,30184.501953,32186.277344,32186.277344,38585385521,34659.105469,35487.246094,31350.884766,31637.779297,31637.779297,40230904226,24,False
87,2021-06-27,32287.523438,34656.128906,32071.757813,34649.644531,34649.644531,35511640894,31594.664063,32637.587891,30184.501953,32186.277344,32186.277344,38585385521,22,False
88,2021-06-28,34679.121094,35219.890625,33902.074219,34434.335938,34434.335938,33892523752,32287.523438,34656.128906,32071.757813,34649.644531,34649.644531,35511640894,22,False
89,2021-06-29,34475.558594,36542.109375,34252.484375,35867.777344,35867.777344,37901460044,34679.121094,35219.890625,33902.074219,34434.335938,34434.335938,33892523752,25,False


In [5]:
#mergedStockPrice.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Date                      91 non-null     object 
 1   Open                      91 non-null     float64
 2   High                      91 non-null     float64
 3   Low                       91 non-null     float64
 4   Close                     91 non-null     float64
 5   Adj Close                 91 non-null     float64
 6   Volume                    91 non-null     int64  
 7   Previous_Open             91 non-null     float64
 8   Previous_High             91 non-null     float64
 9   Previous_Low              91 non-null     float64
 10  Previous_Close            91 non-null     float64
 11  Previous_Adj_Close        91 non-null     float64
 12  Previous_Volume           91 non-null     int64  
 13  Previous_Search_Interest  91 non-null     int64  
 14  isPartial   

## Getting Features and Target

Next two blocks establish the columns that will be used to predict and the target that will be predicted.  You will notice that it is using the previous day information to predict a Close price for the next day.

In [6]:
columns = ["Previous_Open", "Previous_High", "Previous_Low", "Previous_Close", "Previous_Volume", "Previous_Search_Interest", "Close"]
mergedStockPrice = mergedStockPrice[columns]
mergedStockPrice

Unnamed: 0,Previous_Open,Previous_High,Previous_Low,Previous_Close,Previous_Volume,Previous_Search_Interest,Close
0,58930.277344,59930.027344,57726.417969,58918.832031,65520826225,29,59095.808594
1,58926.562500,59586.070313,58505.277344,59095.808594,61669163792,41,59384.312500
2,59098.878906,60267.187500,58869.281250,59384.312500,58727860620,29,57603.890625
3,59397.410156,60110.269531,57603.890625,57603.890625,59641344484,22,58758.554688
4,57604.839844,58913.746094,57168.675781,58758.554688,50749662970,22,59057.878906
...,...,...,...,...,...,...,...
86,34659.105469,35487.246094,31350.884766,31637.779297,40230904226,24,32186.277344
87,31594.664063,32637.587891,30184.501953,32186.277344,38585385521,22,34649.644531
88,32287.523438,34656.128906,32071.757813,34649.644531,35511640894,22,34434.335938
89,34679.121094,35219.890625,33902.074219,34434.335938,33892523752,25,35867.777344


In [7]:
features = list(mergedStockPrice.columns)
features.remove("Close")
target = "Close"

X = mergedStockPrice[features]
y = mergedStockPrice[target]

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

## Linear Regression

Linear regression is ran an the data look very promising! Having an r2 score of 0.96 is very close to 1 indicting a good model.

In [9]:
lr = LinearRegression()
lr

LinearRegression()

In [10]:
lr.fit(X_train, y_train)

LinearRegression()

In [11]:
lr.score(X_train, y_train)

0.9600860921909471

In [12]:
lr.score(X_test, y_test)

0.9614613962910867

In [13]:

def printMetrics(test, predictions):
    print(f"Score: {explained_variance_score(test, predictions):.2f}")
    print(f"MAE: {mean_absolute_error(test, predictions):.2f}")
    print(f"RMSE: {math.sqrt(mean_squared_error(test, predictions)):.2f}")
    print(f"r2: {r2_score(test, predictions):.2f}")

In [14]:
predictions = lr.predict(X_test)
printMetrics(y_test, predictions)

Score: 0.96
MAE: 1573.91
RMSE: 1988.37
r2: 0.96


## Make Some Prediction with Dummy Data

Using 3 instances with dummy data to predict what the price would be. Having a negative price in nonsensical so what need some tuning in the future.

In [15]:
numElements = 3
samplePrice = []
for _ in range(numElements):
    dict = {}
    for column in X.columns:
        min = 0  # We'll always allow at lea
        maxValue = round(max(mergedStockPrice[column].values))
        dict[column] = rnd.randint(min, maxValue)
    samplePrice.append(dict)
samplePrice

[{'Previous_Open': 8685,
  'Previous_High': 19197,
  'Previous_Low': 40904,
  'Previous_Close': 4405,
  'Previous_Volume': 84302337763,
  'Previous_Search_Interest': 77},
 {'Previous_Open': 8700,
  'Previous_High': 41195,
  'Previous_Low': 55079,
  'Previous_Close': 5822,
  'Previous_Volume': 62010145256,
  'Previous_Search_Interest': 58},
 {'Previous_Open': 36856,
  'Previous_High': 61704,
  'Previous_Low': 5072,
  'Previous_Close': 55719,
  'Previous_Volume': 27694817943,
  'Previous_Search_Interest': 51}]

In [16]:
pdSamplePrice = pd.DataFrame.from_dict(samplePrice)
pdSamplePrice

Unnamed: 0,Previous_Open,Previous_High,Previous_Low,Previous_Close,Previous_Volume,Previous_Search_Interest
0,8685,19197,40904,4405,84302337763,77
1,8700,41195,55079,5822,62010145256,58
2,36856,61704,5072,55719,27694817943,51


In [17]:
predictions = lr.predict(pdSamplePrice)
predictions



array([-33236.90250245, -43507.18938791,  88669.31502458])

In [18]:
pdSamplePrice = pdSamplePrice.copy()
pdSamplePrice['Predicted'] = predictions
pdSamplePrice

Unnamed: 0,Previous_Open,Previous_High,Previous_Low,Previous_Close,Previous_Volume,Previous_Search_Interest,Predicted
0,8685,19197,40904,4405,84302337763,77,-33236.902502
1,8700,41195,55079,5822,62010145256,58,-43507.189388
2,36856,61704,5072,55719,27694817943,51,88669.315025


# Logarithmic Regression

This is more or less a rinse and repeat of above the only caveat is you are not predicting in a boolean manner if the closing price will go up from the opening price based on the previous days metrics.

In [19]:
def printClassificationMetrics(test, predictions):
    print("Confusion Matrix:")
    print(confusion_matrix(test, predictions))
    print("------------------")
    print(f"Accuracy: {accuracy_score(test, predictions):.2f}")
    print(f"Recall: {recall_score(test, predictions):.2f}")
    print(f"Prediction: {precision_score(test, predictions):.2f}")
    print(f"f-measure: {fbeta_score(test, predictions, beta=1):.2f}")
    print("------------------")
    print(classification_report(test, predictions))

In [20]:
#pd.set_option('display.max_rows', len(mergedStockPrice))

mergedPrice["Price_Increase"] = bitcoinPricedf["Open"] - bitcoinPricedf["Close"] > 0.0
mergedPrice['Price_Increase'] = mergedPrice.Price_Increase.astype(int)
mergedStockPrice = pd.concat([mergedPrice, bitcoinTrendsdf], axis=1)
mergedStockPrice

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Previous_Open,Previous_High,Previous_Low,Previous_Close,Previous_Adj_Close,Previous_Volume,Price_Increase,Previous_Search_Interest,isPartial
0,2021-04-01,58926.562500,59586.070313,58505.277344,59095.808594,59095.808594,61669163792,58930.277344,59930.027344,57726.417969,58918.832031,58918.832031,65520826225,0,29,False
1,2021-04-02,59098.878906,60267.187500,58869.281250,59384.312500,59384.312500,58727860620,58926.562500,59586.070313,58505.277344,59095.808594,59095.808594,61669163792,0,41,False
2,2021-04-03,59397.410156,60110.269531,57603.890625,57603.890625,57603.890625,59641344484,59098.878906,60267.187500,58869.281250,59384.312500,59384.312500,58727860620,1,29,False
3,2021-04-04,57604.839844,58913.746094,57168.675781,58758.554688,58758.554688,50749662970,59397.410156,60110.269531,57603.890625,57603.890625,57603.890625,59641344484,0,22,False
4,2021-04-05,58760.875000,59891.296875,57694.824219,59057.878906,59057.878906,60706272115,57604.839844,58913.746094,57168.675781,58758.554688,58758.554688,50749662970,0,22,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,2021-06-26,31594.664063,32637.587891,30184.501953,32186.277344,32186.277344,38585385521,34659.105469,35487.246094,31350.884766,31637.779297,31637.779297,40230904226,0,24,False
87,2021-06-27,32287.523438,34656.128906,32071.757813,34649.644531,34649.644531,35511640894,31594.664063,32637.587891,30184.501953,32186.277344,32186.277344,38585385521,0,22,False
88,2021-06-28,34679.121094,35219.890625,33902.074219,34434.335938,34434.335938,33892523752,32287.523438,34656.128906,32071.757813,34649.644531,34649.644531,35511640894,1,22,False
89,2021-06-29,34475.558594,36542.109375,34252.484375,35867.777344,35867.777344,37901460044,34679.121094,35219.890625,33902.074219,34434.335938,34434.335938,33892523752,0,25,False


In [21]:

columns = ["Previous_Open", "Previous_Close", "Previous_Search_Interest", "Price_Increase"]
mergedStockPrice = mergedStockPrice[columns]

features = list(mergedStockPrice.columns)
features.remove("Price_Increase")
target = "Price_Increase"

X = mergedStockPrice[features]
y = mergedStockPrice[target]

In [22]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

In [23]:


logReg = LogisticRegression(solver="liblinear")
logReg

LogisticRegression(solver='liblinear')

In [24]:
logReg.fit(X_train, y_train)

LogisticRegression(solver='liblinear')

## Prediction is Not as Good

The training and test scores are much lower than the ones from the Linear Regression.

In [25]:
logReg.score(X_train, y_train)

0.5441176470588235

In [26]:
logReg.score(X_test, y_test)

0.5217391304347826

In [27]:
predictions = logReg.predict(X_test)
printMetrics(y_test, predictions)

Score: -0.32
MAE: 0.48
RMSE: 0.69
r2: -0.95


In [28]:
predictions = logReg.predict(X_test)
printClassificationMetrics(y_test, predictions)



Confusion Matrix:
[[ 0 10]
 [ 1 12]]
------------------
Accuracy: 0.52
Recall: 0.92
Prediction: 0.55
f-measure: 0.69
------------------
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        10
           1       0.55      0.92      0.69        13

    accuracy                           0.52        23
   macro avg       0.27      0.46      0.34        23
weighted avg       0.31      0.52      0.39        23



In [29]:
numElements = 3
samplePrice = []
for _ in range(numElements):
    dict = {}
    for column in X.columns:
        min = 0  # We'll always allow at lea
        maxValue = round(max(mergedStockPrice[column].values))
        dict[column] = rnd.randint(min, maxValue)
    samplePrice.append(dict)
samplePrice


[{'Previous_Open': 3477,
  'Previous_Close': 48530,
  'Previous_Search_Interest': 43},
 {'Previous_Open': 30349,
  'Previous_Close': 47562,
  'Previous_Search_Interest': 99},
 {'Previous_Open': 57515,
  'Previous_Close': 18416,
  'Previous_Search_Interest': 66}]

In [30]:
bitcoinPreparedData = pd.DataFrame.from_dict(samplePrice)
bitcoinPreparedData

Unnamed: 0,Previous_Open,Previous_Close,Previous_Search_Interest
0,3477,48530,43
1,30349,47562,99
2,57515,18416,66


In [31]:
predictions = logReg.predict(bitcoinPreparedData)
predictions

array([1, 1, 0])

## Made Three Dummy Cases

You can observe, in the dummy cases if the price would go up the next day.  By the looks of it is seems to make sense.  You will notice it bias if the price when up the previous day over the previous search interest as an indictor of price the next day.  All in all it is very interesting exercise!

In [32]:
pdPredictedStockTrend = bitcoinPreparedData
pdPredictedStockTrend["Price_Increase"] = predictions.astype(bool)
pdPredictedStockTrend

Unnamed: 0,Previous_Open,Previous_Close,Previous_Search_Interest,Price_Increase
0,3477,48530,43,True
1,30349,47562,99,True
2,57515,18416,66,False
