# Create the pickle file for the Pipeline Scaler

The scaler values must be stored so that the scaler can be applied on the new data collected each day.

The best interval used for the ANN model was Interval 4 (April 01, 2013 to September 01, 2021) so that is the date range for the data to be used to fit the scaler. 

That was the way the scaler was created in the notebook Bitcoin Next Day Price Prediction Using ANN Models. The exact scaler for those models was not stored so this will be the process to recreate it.

In [44]:
from model import get_full_dataset


# get the full dataset from bitinfocharts
df = get_full_dataset()

In [45]:
df.head()

Unnamed: 0,Date,median_transaction_fee3momUSD,fee_to_reward7momUSD,top100cap7mom,mining_profitability7rsi,top100cap14mom,price3wmaUSD,transactionvalue90emaUSD,difficulty30sma,fee_to_reward90smaUSD
0,2010/07/17,0.0,0.0,-0.508,0.0,-0.088,0.0,0.0,29.004,0.0
1,2010/07/18,0.0,0.0,-0.477,0.0,-0.327,0.0,0.0,34.476,0.0
2,2010/07/19,0.0,0.0,-0.184,0.0,-0.49,0.075,0.0,39.948,0.0
3,2010/07/20,0.0,0.0,0.005,0.0,-0.532,0.08,0.0,45.421,0.0
4,2010/07/21,0.0,0.0,0.163,0.0,-0.599,0.079,0.0,50.893,0.0


In [46]:
df.tail()

Unnamed: 0,Date,median_transaction_fee3momUSD,fee_to_reward7momUSD,top100cap7mom,mining_profitability7rsi,top100cap14mom,price3wmaUSD,transactionvalue90emaUSD,difficulty30sma,fee_to_reward90smaUSD
4099,2021/10/06,0.597,0.782,-0.096,46.344,-0.445,51045.0,662624.0,18746690000000.0,1.551
4100,2021/10/07,1.033,1.604,-0.158,61.109,-0.499,53095.0,664606.0,18822620000000.0,1.549
4101,2021/10/08,0.931,0.859,-0.032,50.21,-0.411,54124.0,667106.0,18872260000000.0,1.558
4102,2021/10/09,-0.015,0.622,-0.047,58.389,-0.418,54574.0,668069.0,18921530000000.0,1.549
4103,2021/10/10,-0.547,0.211,-0.04,53.873,-0.095,54887.0,669472.0,18970790000000.0,1.524


## Create and fit the Pipeline Scaler

In [47]:
from sklearn.preprocessing import MinMaxScaler, RobustScaler
from sklearn.pipeline import Pipeline

# extract the data from the dates of Interval 4
date_bool = (df.iloc[:, 0] >= '2013/04/01') & (df.iloc[:, 0] <= '2021/09/01')
X = df[date_bool]
# remove the date column
X = df.iloc[:, 1:]

# select only the features desired
features = [
            'median_transaction_fee3momUSD',
            'fee_to_reward7momUSD',
            'top100cap7mom',
            'mining_profitability7rsi',
            'top100cap14mom',
            'price3wmaUSD',
            'transactionvalue90emaUSD',
            'difficulty30sma',
            'fee_to_reward90smaUSD'
            ]

X = X.loc[:, features]

# scale the data
estimators = [] # create a list for the scalers
estimators.append(['minmax', MinMaxScaler()])
estimators.append(['robust', RobustScaler()])
# add the scalers to the Pipeline
scale = Pipeline(estimators, verbose=True)
# fit the scaler to the training data
scale.fit(X)

[Pipeline] ............ (step 1 of 2) Processing minmax, total=   0.0s
[Pipeline] ............ (step 2 of 2) Processing robust, total=   0.0s


Pipeline(steps=[('minmax', MinMaxScaler()), ['robust', RobustScaler()]],
         verbose=True)

## Save the pickle file

In [48]:
import pickle

# open the scaler pickle file
outfile = open('./scaler/final_scaler.pkl', 'wb')

pickle.dump(scale, outfile)

outfile.close()

## Test the scaler pickle file

In [49]:
def preprocess_the_data():
    """
        Takes in the dataframe from either today's or yesterday's features
        Loads the pickle file of the scaler
        Scales the data and returns a df and the date of the data
    """
    
    import pickle
    from btcinfocharts_scraper import grab_the_data

    # load the pickle file of the scaler
    infile = open('./scaler/final_scaler.pkl', 'rb')
    scale = pickle.load(infile)
    infile.close()

    # get the new dataset for the most recent data
    new_df, todays_date = grab_the_data()

    # scale the new data
    transformed_new_data = scale.transform(new_df)

    return transformed_new_data, todays_date

In [50]:
import pickle
# load the pickle file of the scaler
infile = open('./scaler/final_scaler.pkl', 'rb')
scale = pickle.load(infile)

# view the values for the scaling function
minmax_scales = scale.named_steps['minmax'].scale_
robust_scales = scale.named_steps['robust'].scale_

infile.close()

print(minmax_scales)
print(robust_scales)

[3.07219662e-02 1.90038198e-02 2.38714760e-02 1.09717696e-02
 2.40274874e-02 1.58846142e-05 1.49371445e-06 4.28164193e-14
 5.12505125e-02]
[0.0002765  0.00661333 0.00523382 0.13074235 0.00834355 0.11472389
 0.04178629 0.26707841 0.19946699]


In [53]:
# test the full preprocess function
todays_df, todays_date = preprocess_the_data()

# view the scaled current data
print('\nDate:', todays_date)
print('\nFeatures Data:\n', todays_df)

All of the data for today 2022-04-04 is available.
Today's 2022-04-04 data will be used

Date: 2022/04/04

Features Data: 
 [[ 7.46666667e+01  3.60919540e+00  7.06590650e+01 -4.01275569e+00
   4.46105112e+01  6.29118702e+00  2.09676047e+01  4.56000570e+00
   2.27389517e-02]]


### The Scaler has been stored as a pickle file and can be loaded and used to transform the data from bitinforcharts.org