### Dataset engineering for more recent period's dataset: 2020-2023

Data Collection Periods: 01/01/2005 - 01/01/2010 and 01/01/2020 - 03/30/2023. The first period will primarily be used as a training set and the more recent period will be used for testing.

### NBER Recession Indicators for the United States

These are daily indicators of whether or not the United States is currently in a recession, with 0 = no and 1 = yes. Downloaded from the St. Louis Fed FRED Economic Database. These values will likely be used as our main labels.

In [1]:
import pandas as pd

us_rec = pd.read_csv(r'Macroeconomic_Data\20_23_USRECD.csv')
us_rec

Unnamed: 0,DATE,USRECD
0,2020-01-01,0
1,2020-01-02,0
2,2020-01-03,0
3,2020-01-04,0
4,2020-01-05,0
...,...,...
1180,2023-03-26,0
1181,2023-03-27,0
1182,2023-03-28,0
1183,2023-03-29,0


## Collection of Further Macroeconomic Indicators

- T10YIE: 10-Year Breakeven Inflation Rate
- T10Y2Y: 10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity
- DFF: Federal Funds Effective Rate
- SP500: S&P 500
- VIXCLS: Volatility Index

### Collecting Security Data

Daily data for the SVB stock:

In [3]:
import yfinance as yf

In [4]:
svb_p2 = yf.download("SIVB", start="2020-01-01", end="2023-03-31")
svb_p2.to_csv("svb_p2.csv")
svb_p2

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917
2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454
2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021
2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982
2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032
...,...,...,...,...,...,...
2023-03-24,106.040001,106.040001,106.040001,106.040001,106.040001,0
2023-03-27,106.040001,106.040001,106.040001,106.040001,106.040001,0
2023-03-28,0.530000,0.740000,0.010000,0.400000,0.400000,84502118
2023-03-29,0.390000,1.290000,0.331000,0.970000,0.970000,67419705


## Feature Engineering

### Creating Price Percentile vs Self

First, adding a column for the percentile of the price in the last 30 days, judging price by the adjusted close price.

In [5]:
from scipy import stats
import numpy as np

# Change this variable assignment to rerun on a different df
df = svb_p2

close = df['Adj Close']
percentiles = []

for i, val in enumerate(close):

    # Selecting last 30 days of prices
    last_30 = close[i-30: i]

    # Calculating percentile of current price in respect to the last 30
    if np.any(last_30):
        median = stats.percentileofscore(last_30, close[i])
        percentiles.append(median)
    else:
        percentiles.append(np.NaN)

df["percentile_last_30"] = percentiles
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,percentile_last_30
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917,
2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454,
2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021,
2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982,
2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032,
...,...,...,...,...,...,...,...
2023-03-24,106.040001,106.040001,106.040001,106.040001,106.040001,0,20.000000
2023-03-27,106.040001,106.040001,106.040001,106.040001,106.040001,0,21.666667
2023-03-28,0.530000,0.740000,0.010000,0.400000,0.400000,84502118,0.000000
2023-03-29,0.390000,1.290000,0.331000,0.970000,0.970000,67419705,3.333333


### Adding Previous Row Prices as Columns

In [6]:
df_t1 = df.iloc[:, 0:6].shift()
df_t1 = df_t1.add_suffix('_t1')
df_t1
df = pd.merge(df, df_t1, on=df.index)
df

Unnamed: 0,key_0,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,Low_t1,Close_t1,Adj Close_t1,Volume_t1
0,2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917,,,,,,,
1,2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454,,252.649994,254.279999,249.669998,254.270004,254.270004,242917.0
2,2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021,,247.960007,251.119995,246.800003,250.330002,250.330002,476454.0
3,2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982,,246.839996,249.800003,245.059998,249.240005,249.240005,491021.0
4,2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032,,248.660004,251.119995,247.580002,250.399994,250.399994,501982.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
812,2023-03-24,106.040001,106.040001,106.040001,106.040001,106.040001,0,20.000000,106.040001,106.040001,106.040001,106.040001,106.040001,0.0
813,2023-03-27,106.040001,106.040001,106.040001,106.040001,106.040001,0,21.666667,106.040001,106.040001,106.040001,106.040001,106.040001,0.0
814,2023-03-28,0.530000,0.740000,0.010000,0.400000,0.400000,84502118,0.000000,106.040001,106.040001,106.040001,106.040001,106.040001,0.0
815,2023-03-29,0.390000,1.290000,0.331000,0.970000,0.970000,67419705,3.333333,0.530000,0.740000,0.010000,0.400000,0.400000,84502118.0


In [7]:
df = df.rename(columns={"key_0": "Date"})
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,Low_t1,Close_t1,Adj Close_t1,Volume_t1
0,2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917,,,,,,,
1,2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454,,252.649994,254.279999,249.669998,254.270004,254.270004,242917.0
2,2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021,,247.960007,251.119995,246.800003,250.330002,250.330002,476454.0
3,2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982,,246.839996,249.800003,245.059998,249.240005,249.240005,491021.0
4,2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032,,248.660004,251.119995,247.580002,250.399994,250.399994,501982.0


### Adding Macro Data into Dataframe

In [8]:
from functools import reduce

t10y2y = pd.read_csv(r'Macroeconomic_Data\T10Y2Y_P2.csv')
t10yie = pd.read_csv(r'Macroeconomic_Data\T10YIE_P2.csv')
dff = pd.read_csv(r'Macroeconomic_Data\DFF_P2.csv')
vix = pd.read_csv(r'Macroeconomic_Data\VIXCLS_P2.csv')

to_merge = [t10y2y, t10yie, dff, vix]

period1_macro = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
                                            how='outer'), to_merge)

period1_macro

Unnamed: 0,DATE,T10Y2Y,T10YIE,DFF,VIXCLS
0,2020-01-02,0.3,1.8,1.55,12.47
1,2020-01-03,0.27,1.77,1.55,14.02
2,2020-01-06,0.27,1.75,1.55,13.85
3,2020-01-07,0.29,1.74,1.55,13.79
4,2020-01-08,0.29,1.75,1.55,13.45
...,...,...,...,...,...
1181,2023-03-12,,,4.57,
1182,2023-03-18,,,4.58,
1183,2023-03-19,,,4.58,
1184,2023-03-25,,,4.83,


In [9]:
period1_macro = period1_macro.dropna()
period1_macro

Unnamed: 0,DATE,T10Y2Y,T10YIE,DFF,VIXCLS
0,2020-01-02,0.3,1.8,1.55,12.47
1,2020-01-03,0.27,1.77,1.55,14.02
2,2020-01-06,0.27,1.75,1.55,13.85
3,2020-01-07,0.29,1.74,1.55,13.79
4,2020-01-08,0.29,1.75,1.55,13.45
...,...,...,...,...,...
841,2023-03-24,-0.38,2.22,4.83,21.74
842,2023-03-27,-0.41,2.24,4.83,20.6
843,2023-03-28,-0.47,2.31,4.83,19.97
844,2023-03-29,-0.51,2.33,4.83,19.12


### Cleaning Macro Data

If the value is '.', assign to previous day's value.

In [10]:
# Replacing periods with None
import numpy as np

macro_cols = period1_macro.columns[1:]

for col in macro_cols:
    period1_macro[col] = period1_macro[col].replace('.', np.NaN, regex=False)

period1_macro = period1_macro.fillna(method='ffill')
period1_macro = period1_macro.dropna()
period1_macro

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  period1_macro[col] = period1_macro[col].replace('.', np.NaN, regex=False)


Unnamed: 0,DATE,T10Y2Y,T10YIE,DFF,VIXCLS
0,2020-01-02,0.3,1.8,1.55,12.47
1,2020-01-03,0.27,1.77,1.55,14.02
2,2020-01-06,0.27,1.75,1.55,13.85
3,2020-01-07,0.29,1.74,1.55,13.79
4,2020-01-08,0.29,1.75,1.55,13.45
...,...,...,...,...,...
841,2023-03-24,-0.38,2.22,4.83,21.74
842,2023-03-27,-0.41,2.24,4.83,20.6
843,2023-03-28,-0.47,2.31,4.83,19.97
844,2023-03-29,-0.51,2.33,4.83,19.12


In [11]:
period1_macro['DATE'] = period1_macro['DATE'].astype('datetime64[ns]')

df_features = pd.merge(df, period1_macro, left_on=['Date'], right_on=['DATE'])
df_features.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,Low_t1,Close_t1,Adj Close_t1,Volume_t1,DATE,T10Y2Y,T10YIE,DFF,VIXCLS
0,2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917,,,,,,,,2020-01-02,0.3,1.8,1.55,12.47
1,2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454,,252.649994,254.279999,249.669998,254.270004,254.270004,242917.0,2020-01-03,0.27,1.77,1.55,14.02
2,2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021,,247.960007,251.119995,246.800003,250.330002,250.330002,476454.0,2020-01-06,0.27,1.75,1.55,13.85
3,2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982,,246.839996,249.800003,245.059998,249.240005,249.240005,491021.0,2020-01-07,0.29,1.74,1.55,13.79
4,2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032,,248.660004,251.119995,247.580002,250.399994,250.399994,501982.0,2020-01-08,0.29,1.75,1.55,13.45


### Adding S&P 500 Daily Data

In [13]:
import yfinance as yf

sp500 = yf.download("^GSPC", start="2020-01-01", end="2023-03-30")
sp500 = sp500.add_suffix('_SP500')
sp500.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,3244.669922,3258.139893,3235.530029,3257.850098,3257.850098,3459930000
2020-01-03,3226.360107,3246.149902,3222.340088,3234.850098,3234.850098,3484700000
2020-01-06,3217.550049,3246.840088,3214.639893,3246.280029,3246.280029,3702460000
2020-01-07,3241.860107,3244.909912,3232.429932,3237.179932,3237.179932,3435910000
2020-01-08,3238.590088,3267.070068,3236.669922,3253.050049,3253.050049,3726840000


In [14]:
df_features = pd.merge(df_features, sp500, left_on=['Date'], right_on=sp500.index)

In [15]:
df_features

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,...,T10Y2Y,T10YIE,DFF,VIXCLS,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500
0,2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917,,,,...,0.3,1.8,1.55,12.47,3244.669922,3258.139893,3235.530029,3257.850098,3257.850098,3459930000
1,2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454,,252.649994,254.279999,...,0.27,1.77,1.55,14.02,3226.360107,3246.149902,3222.340088,3234.850098,3234.850098,3484700000
2,2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021,,247.960007,251.119995,...,0.27,1.75,1.55,13.85,3217.550049,3246.840088,3214.639893,3246.280029,3246.280029,3702460000
3,2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982,,246.839996,249.800003,...,0.29,1.74,1.55,13.79,3241.860107,3244.909912,3232.429932,3237.179932,3237.179932,3435910000
4,2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032,,248.660004,251.119995,...,0.29,1.75,1.55,13.45,3238.590088,3267.070068,3236.669922,3253.050049,3253.050049,3726840000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
811,2023-03-23,106.040001,106.040001,106.040001,106.040001,106.040001,0,18.333333,106.040001,106.040001,...,-0.38,2.19,4.83,22.61,3959.209961,4007.659912,3919.050049,3948.719971,3948.719971,4991600000
812,2023-03-24,106.040001,106.040001,106.040001,106.040001,106.040001,0,20.000000,106.040001,106.040001,...,-0.38,2.22,4.83,21.74,3939.209961,3972.739990,3909.159912,3970.989990,3970.989990,4583970000
813,2023-03-27,106.040001,106.040001,106.040001,106.040001,106.040001,0,21.666667,106.040001,106.040001,...,-0.41,2.24,4.83,20.6,3982.929932,4003.830078,3970.489990,3977.530029,3977.530029,4233540000
814,2023-03-28,0.530000,0.740000,0.010000,0.400000,0.400000,84502118,0.000000,106.040001,106.040001,...,-0.47,2.31,4.83,19.97,3974.129883,3979.199951,3951.530029,3971.270020,3971.270020,4014600000


### Adding Label Column

In [16]:
us_rec['DATE'] = us_rec['DATE'].astype('datetime64[ns]')

df_labeled = pd.merge(df_features, us_rec, left_on=['DATE'], right_on=['DATE'])
df_labeled.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,...,T10YIE,DFF,VIXCLS,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500,USRECD
0,2020-01-02,252.649994,254.279999,249.669998,254.270004,254.270004,242917,,,,...,1.8,1.55,12.47,3244.669922,3258.139893,3235.530029,3257.850098,3257.850098,3459930000,0
1,2020-01-03,247.960007,251.119995,246.800003,250.330002,250.330002,476454,,252.649994,254.279999,...,1.77,1.55,14.02,3226.360107,3246.149902,3222.340088,3234.850098,3234.850098,3484700000,0
2,2020-01-06,246.839996,249.800003,245.059998,249.240005,249.240005,491021,,247.960007,251.119995,...,1.75,1.55,13.85,3217.550049,3246.840088,3214.639893,3246.280029,3246.280029,3702460000,0
3,2020-01-07,248.660004,251.119995,247.580002,250.399994,250.399994,501982,,246.839996,249.800003,...,1.74,1.55,13.79,3241.860107,3244.909912,3232.429932,3237.179932,3237.179932,3435910000,0
4,2020-01-08,249.820007,254.115005,249.130005,252.779999,252.779999,502032,,248.660004,251.119995,...,1.75,1.55,13.45,3238.590088,3267.070068,3236.669922,3253.050049,3253.050049,3726840000,0


In [17]:
df_labeled = df_labeled.dropna()
df_labeled = df_labeled.drop(columns='DATE', axis=1)
df_labeled.to_csv('labelled_data_period.csv')
df_labeled

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,...,T10YIE,DFF,VIXCLS,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500,USRECD
30,2020-02-14,264.609985,267.500000,263.940002,265.420013,265.420013,275087,96.666667,262.649994,265.980011,...,1.66,1.58,13.68,3378.080078,3380.689941,3366.149902,3380.159912,3380.159912,3419700000,0
31,2020-02-18,264.329987,265.512909,258.774994,261.109985,261.109985,276633,80.000000,264.609985,267.500000,...,1.65,1.59,14.83,3369.040039,3375.010010,3355.610107,3370.290039,3370.290039,3750400000,0
32,2020-02-19,263.179993,267.399994,261.929993,266.989990,266.989990,266087,100.000000,264.329987,265.512909,...,1.65,1.59,14.38,3380.389893,3393.520020,3378.830078,3386.149902,3386.149902,3614200000,0
33,2020-02-20,264.119995,270.950012,263.010010,270.790009,270.790009,390627,100.000000,263.179993,267.399994,...,1.63,1.59,15.56,3380.449951,3389.149902,3341.020020,3373.229980,3373.229980,4019180000,0
34,2020-02-21,268.339996,270.000000,259.510010,261.420013,261.420013,561191,73.333333,264.119995,270.950012,...,1.61,1.58,17.08,3360.500000,3360.760010,3328.449951,3337.750000,3337.750000,3908780000,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
811,2023-03-23,106.040001,106.040001,106.040001,106.040001,106.040001,0,18.333333,106.040001,106.040001,...,2.19,4.83,22.61,3959.209961,4007.659912,3919.050049,3948.719971,3948.719971,4991600000,0
812,2023-03-24,106.040001,106.040001,106.040001,106.040001,106.040001,0,20.000000,106.040001,106.040001,...,2.22,4.83,21.74,3939.209961,3972.739990,3909.159912,3970.989990,3970.989990,4583970000,0
813,2023-03-27,106.040001,106.040001,106.040001,106.040001,106.040001,0,21.666667,106.040001,106.040001,...,2.24,4.83,20.6,3982.929932,4003.830078,3970.489990,3977.530029,3977.530029,4233540000,0
814,2023-03-28,0.530000,0.740000,0.010000,0.400000,0.400000,84502118,0.000000,106.040001,106.040001,...,2.31,4.83,19.97,3974.129883,3979.199951,3951.530029,3971.270020,3971.270020,4014600000,0


## Scaling
### Applying Standard Scaling

In [16]:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# Leaving out date column
df_features = df_features.drop(columns='DATE', axis=1)

all_columns = df_features.columns[1:]
scaler.fit(df_features[all_columns])

df_features[all_columns] = scaler.transform(df_features[all_columns])

df_features.head()

# Adding 

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,...,T10Y2Y,T10YIE,DFF,VIXCLS,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500
0,2005-01-03,-0.008555,-0.008174,0.047273,0.011952,0.011952,-0.198305,,,,...,0.209647,0.686082,-0.38367,-0.593336,-0.101169,-0.115271,-0.112968,-0.150119,-0.150119,-1.223083
1,2005-01-04,0.015261,-0.029749,0.000592,-0.053223,-0.053223,-0.403868,,-0.008757,-0.008416,...,0.167299,0.649046,-0.414562,-0.60135,-0.149999,-0.175116,-0.186459,-0.219752,-0.219752,-1.101886
2,2005-01-05,-0.067014,-0.09124,-0.022206,-0.084724,-0.084724,0.313785,,0.01505,-0.029983,...,0.146125,0.630528,-0.414562,-0.592534,-0.219619,-0.240716,-0.19468,-0.241143,-0.241143,-1.091565
3,2005-01-06,-0.060519,-0.090162,-0.014607,-0.053223,-0.053223,-0.450244,,-0.067194,-0.091452,...,0.188473,0.686082,-0.414562,-0.633406,-0.241007,-0.24622,-0.196895,-0.220546,-0.220546,-1.189468
4,2005-01-07,-0.03562,-0.086926,-0.069973,-0.136863,-0.136863,-0.570581,,-0.060701,-0.090374,...,0.167299,0.61201,-0.419711,-0.640619,-0.220414,-0.243368,-0.202359,-0.228984,-0.228984,-1.242052


Creating Labeled and scaled df

In [17]:

df_labeled_scaled = pd.merge(df_features, us_rec, left_on=['Date'], right_on=['DATE'])
df_labeled_scaled = df_labeled_scaled.dropna()
df_labeled_scaled = df_labeled_scaled.drop(columns='DATE', axis=1)
df_labeled_scaled.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,...,T10YIE,DFF,VIXCLS,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500,USRECD
30,2005-02-15,-0.055106,-0.086926,-0.025463,-0.080379,-0.080379,-0.898149,0.148018,-0.001182,-0.044003,...,0.61201,-0.270397,-0.81853,-0.129852,-0.142092,-0.087372,-0.110215,-0.110215,-1.213696,0
31,2005-02-16,-0.072427,-0.072901,-0.012436,-0.026066,-0.026066,-0.735819,0.711801,-0.05529,-0.087139,...,0.649046,-0.296141,-0.832153,-0.110102,-0.142092,-0.089636,-0.109123,-0.109123,-1.235018,0
32,2005-02-17,-0.036702,-0.088004,-0.05586,-0.127087,-0.127087,-0.752837,-0.03991,-0.072605,-0.07312,...,0.704601,-0.285844,-0.778459,-0.10901,-0.147646,-0.110901,-0.15672,-0.15672,-1.183114,0
33,2005-02-18,-0.138463,-0.173229,-0.115569,-0.182486,-0.182486,-0.670108,-0.227837,-0.036893,-0.088217,...,0.815709,-0.280695,-0.825742,-0.156598,-0.189727,-0.127588,-0.152551,-0.152551,-1.199789,0
34,2005-02-22,-0.178518,-0.208829,-0.192648,-0.225935,-0.225935,-0.619112,-0.321801,-0.138617,-0.173411,...,0.852746,-0.249802,-0.668668,-0.15243,-0.191929,-0.192514,-0.239059,-0.239059,-1.088082,0


In [23]:
df_labeled_scaled.to_csv('labelled_scaled_period1.csv')

## Collecting 4/25 Data For Demo

In [63]:
import yfinance as yf

svb = yf.download("SIVB").tail(40)

[*********************100%***********************]  1 of 1 completed


In [64]:
import numpy as np
import scipy.stats as stats
import pandas as pd

close = svb['Adj Close']
percentiles = []

for i, val in enumerate(close):

    # Selecting last 30 days of prices
    last_30 = close[i-30: i]

    # Calculating percentile of current price in respect to the last 30
    if np.any(last_30):
        median = stats.percentileofscore(last_30, close[i])
        percentiles.append(median)
    else:
        percentiles.append(np.NaN)

svb["percentile_last_30"] = percentiles
svb = svb.tail(2)
svb

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  svb["percentile_last_30"] = percentiles


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,percentile_last_30
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-04-24,0.7111,0.82,0.7111,0.72,0.72,2424161,30.0
2023-04-25,0.72,0.725,0.5565,0.576,0.576,3202508,13.333333


In [65]:
df_t1 = svb.iloc[:, 0:6].shift()
df_t1 = df_t1.add_suffix('_t1')
df_t1
svb = pd.merge(svb, df_t1, on=svb.index)
demo_df = svb.tail(1)

In [66]:
demo_df = demo_df.rename(columns={"key_0": "Date"})
demo_df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,Low_t1,Close_t1,Adj Close_t1,Volume_t1
1,2023-04-25,0.72,0.725,0.5565,0.576,0.576,3202508,13.333333,0.7111,0.82,0.7111,0.72,0.72,2424161.0


### Adding Banking Data to Demo Set

In [67]:
# looking up values manually because they are only for one day

demo_df['T10Y2Y'] = -0.46
demo_df['T10YIE'] = 2.25
demo_df['DFF'] = 4.83
demo_df['VIXCLS'] = 16.89

In [68]:
# Adding S&P

sp500 = yf.download("^GSPC")
sp500 = sp500.add_suffix('_SP500')
sp500 = sp500.tail(1)
sp500

demo_df = pd.merge(demo_df, sp500, left_on=['Date'], right_on=sp500.index)

[*********************100%***********************]  1 of 1 completed


In [70]:
demo_df.to_csv('demo_data.csv')
demo_df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,percentile_last_30,Open_t1,High_t1,...,T10Y2Y,T10YIE,DFF,VIXCLS,Open_SP500,High_SP500,Low_SP500,Close_SP500,Adj Close_SP500,Volume_SP500
0,2023-04-25,0.72,0.725,0.5565,0.576,0.576,3202508,13.333333,0.7111,0.82,...,-0.46,2.25,4.83,16.89,4126.430176,4126.430176,4071.379883,4071.629883,4071.629883,3978640000
