### Autoregression Of DogeCoin

Aim: Test how well autoregression works to predict DogeCoin Value from 01.01.2021 to 01.01.2022

In [4]:
import pandas as pd
import plotly.express as px
import numpy as np

df_doge = pd.read_csv("../dat/crypto-stocks/doge-coin-daily.csv")
df_doge["OHLC Average"] = df_doge.iloc[:,1:5].mean(axis = 1)
df_doge.Date = pd.to_datetime(df_doge.Date)

In [5]:
def get_sliding_window_df(df,column, window, start_date, end_date):

    start_index = np.where(df.Date == start_date)[0][0]
    end_index = np.where(df.Date == end_date)[0][0]

    all_windows = []

    for i in range(end_index-start_index-window):
        index_use = start_index + i
        window_list = []
        for w in range(window):
            window_list.append(df["OHLC Average"].iloc[index_use-w-1])
        all_windows.append(window_list)

    df = df.iloc[start_index:end_index-window]
    df["Sliding Window"] = all_windows
    return df

In [6]:
df_doge_windows = get_sliding_window_df(df_doge, "OHLC Average", 3, "2021-01-01","2022-01-01")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Sliding Window"] = all_windows


In [10]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X = np.stack(df_doge_windows ["Sliding Window"].values)
y = df_doge_windows["OHLC Average"]


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

reg = LinearRegression()
reg.fit(X_train, y_train)

pred = reg.predict(X_test)


In [23]:
df_doge_windows["pred"] = list(y_train) + list(pred)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [24]:
fig = px.line(df_doge_windows, x="Date", y=["OHLC Average","pred"], title='DogeCoin Price')
fig.show()

In [33]:
np.mean(np.abs(pred - y_test) < 0.01)

0.8666666666666667

In [31]:
y_test.std()

0.03660643974019772

In [40]:
df_doge_windows["True Diffs"] = df_doge_windows["OHLC Average"].diff()
df_doge_windows["Pred Diffs"] = df_doge_windows["pred"].diff()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [49]:
df_doge_windows["Value Increased"] = df_doge_windows["True Diffs"] > 0



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [51]:
df_doge_windows.iloc[-100:]

Unnamed: 0,Date,Open,High,Low,Close,Volume,OHLC Average,Sliding Window,pred,True Diffs,Pred Diffs,Value Increased
1411,2021-09-20,0.233161,0.233606,0.200022,0.207071,2244003542,0.218465,"[0.23676124999999998, 0.2417175, 0.24489724999...",0.234987,-0.018296,-0.005092,False
1412,2021-09-21,0.208773,0.218094,0.198161,0.201027,1766963639,0.206514,"[0.21846500000000002, 0.23676124999999998, 0.2...",0.211161,-0.011951,-0.023826,False
1413,2021-09-22,0.200822,0.229423,0.200224,0.224858,2016471206,0.213832,"[0.20651375, 0.21846500000000002, 0.2367612499...",0.203764,0.007318,-0.007398,True
1414,2021-09-23,0.224748,0.227095,0.218115,0.224832,1169581249,0.223698,"[0.21383175, 0.20651375, 0.21846500000000002]",0.218912,0.009866,0.015148,True
1415,2021-09-24,0.224726,0.228267,0.197720,0.209451,1883155313,0.215041,"[0.22369750000000002, 0.21383175, 0.20651375]",0.227572,-0.008657,0.008661,False
...,...,...,...,...,...,...,...,...,...,...,...,...
1506,2021-12-24,0.184979,0.195290,0.179761,0.186622,1853415104,0.186663,"[0.17849025000000002, 0.17349575, 0.16918025]",0.181172,0.008173,0.004949,True
1507,2021-12-25,0.186712,0.194876,0.185571,0.190657,1010443046,0.189454,"[0.18666300000000002, 0.17849025000000002, 0.1...",0.190577,0.002791,0.009406,True
1508,2021-12-26,0.190567,0.192546,0.185646,0.190020,650674078,0.189695,"[0.18945399999999998, 0.18666300000000002, 0.1...",0.190545,0.000241,-0.000033,True
1509,2021-12-27,0.189986,0.192923,0.187239,0.187705,666773423,0.189463,"[0.18969475, 0.18945399999999998, 0.1866630000...",0.190252,-0.000231,-0.000293,False


In [54]:
df_doge_windows["OHLC Average"].shift(1) 

1149         NaN
1150    0.005167
1151    0.008896
1152    0.010912
1153    0.009713
          ...   
1506    0.178490
1507    0.186663
1508    0.189454
1509    0.189695
1510    0.189463
Name: OHLC Average, Length: 362, dtype: float64