# Rewriting Data Loader

In [2]:
import pickle
import pandas as pd
import numpy as np

In [3]:
from lightgbm import LGBMRegressor

I wanted to use vaex because I think it's faster - but I encountered a lot of problems with it not having all the features of pandas. So I'm switching back to pandas, unfortunately.

## Train Features

Actually, there are no train features - this is where the target is stored. At test time, targets are only revealed 2 days after the current date, so train will probably have to be 2 days behind current date. I'll come back to handle this later.

In [4]:
import pandas as pd
import datetime as dt
import numpy as np

import pandas as pd
import datetime as dt
import numpy as np
import time


class TrainLoader:
    """Stores the data in the 'train' file or 'test' files (which typically consists of target and grouping variables)"""
    def __init__(self, df):
        self.df = df
        self.recent_excerpt = self.init_recent_excerpt(df)

    def init_recent_excerpt(self, df):
        """Gets only the most recent data from the df, to reduce processing time for future feature creation"""
        pass

    def add_test_df(self, df):
        """adds the test df to the data"""
        pass  

## Revealed Targets Features

We'll be joining with the train_df on data_block_id, prediction_unit_id, is_consumption and hour.

At test time, we need to predict a day's worth of energy production/consumption, and we'll get a single datablock's worth of revealed info for our day. Each data_block_id gives us a single day of data 2 days in the past. In previous features, I moved the date 2 days in the future and joined on datetime. But now that I'm doing it from scratch, why not reimagine it? Seeing as the data is 2 days behind, there is no reason to stick to only the equivalent hour of the day and before it.

Note: because revealed targets are 2 days behind, the weekly period is offset by two days I think - so same day historical targets would be at 5 days, 12 days, etc? 

So, there are a few things that I think should be of interest to our model:

- The entire datablock's worth of targets for that prediction_id. Why not?
- The entire datablock's worth of targets for the complement of production/consumption. I've noticed that consumption is somewhat affected by production, so why not include the data for the model? I don't think production is affected by consumption though.
- Past values for the hour for 1-14 days previous                   + (both production and consumption?)
- Average for the same hour, 7 day, 14 day                          + means/stds? + (both production and consumption?)
- Daily Past averages for the last 7 days                           + means/stds? + (both production and consumption?)
- Average for the same hour, for the entire county                  + means/stds? + (both production and consumption?)
- Daily Past averages for the last 7 days, for the entire county    + means/stds? + (both production and consumption?)

+ Averages across entire product_type ?

In [101]:
class RevealedTargetsLoader:
    """Stores and processes the data from the revealed_targets df"""
    def __init__(self, df):
        self.df = df
        self.recent_excerpt = self.init_recent_excerpt(df)

    def init_recent_excerpt(self, df):
        """Gets only the most recent data from the df, to reduce processing time for future feature creation"""
        pass

    def add_test_df(self, df):
        """adds the test df to the data"""
        pass

    def _create_entire_data_block_features(self, df:pd.DataFrame):
        # ensure df is sorted by date
        # group by create a column for each hour of the day, odd numbers are opposite of is_consumption for the row
        df['unique_id'] = df.groupby(['data_block_id', 'prediction_unit_id']).cumcount()
        #create pivot table to create a column for each historical target
        pivot_df = df.pivot_table(index=['data_block_id', 'prediction_unit_id'], 
                          columns='unique_id', 
                          values='target', 
                          fill_value=0)
        # reset index to flatten the df and create separate columns
        pivot_df = pivot_df.reset_index()
        #name columns
        pivot_df.columns = [(f'target_{int(col/2)}' if col%2 == 0 else f'target_{int((col-1)/2)}_complement') if isinstance(col, int) else col for col in pivot_df.columns]
        return pivot_df
    
    def _create_past_hourly_values(self, df:pd.DataFrame):
        # ensure df is sorted by date

        df = df.set_index('datetime')
        df['id'] = df['prediction_unit_id'].astype(str) + '_' + df['is_consumption'].astype(str)


        # Every day from 1-12 days ago
        periods = [24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288]

        # Initialize a list to store the shifted DataFrames
        shifted_dfs = []

        # Loop through each period and create a shifted DataFrame
        for period in periods:
            shifted_df = df.groupby('id')['target'].shift(period)
            shifted_df.name = f'target_lag_{period}'  # Rename the column
            # display(shifted_df)
            shifted_dfs.append(shifted_df)

        # Concatenate all shifted DataFrames with the original DataFrame
        # Note: I should cut these columns down further to only the ones I need for merging.
        # result_df = pd.concat([df[['county', 'is_business', 'product_type', 'is_consumption', 'data_block_id', 'row_id', 'prediction_unit_id']]] + shifted_dfs, axis=1)
        result_df = pd.concat([df] + shifted_dfs, axis=1)
        result_df = result_df.reset_index()
        return result_df
    
    def _create_sma_values(self, df:pd.DataFrame):
        # ensure df is sorted by date
        df = df.set_index('datetime')
        df['id'] = df['prediction_unit_id'].astype(str) + '_' + df['is_consumption'].astype(str)

        windows = [8, 12, 24, 48, 24*7, 24*14]
        windowed_dfs = []

        other_df = df.reset_index()
        # Loop through each period and create avgs
        for window in windows:
            my_df = df.groupby('id')['target'].rolling(window=window, min_periods=1).mean()
            my_df = my_df.reset_index()
            my_df = my_df.rename(columns={'target': f'target_{window}h_sma'})
            other_df = pd.merge(other_df, my_df, on=['id', 'datetime'])

        # result_df = pd.concat(windowed_dfs, axis=1)
        return other_df.drop('id', axis=1)
    
    def _create_hourly_sma_values(self, df:pd.DataFrame):
        # ensure df is sorted by date
        
        df['id'] = df['prediction_unit_id'].astype(str) + '_' + df['is_consumption'].astype(str) + '_' + df['datetime'].dt.hour.astype(str)
        df = df.set_index('datetime')
        windows = [24*7, 24*14]
        windowed_dfs = []

        other_df = df.reset_index()
        # Loop through each period and create avgs
        for window in windows:
            my_df = df.groupby('id')['target'].rolling(window=window, min_periods=1).mean()
            my_df = my_df.reset_index()
            my_df = my_df.rename(columns={'target': f'target_{window}h_hsma'})
            other_df = pd.merge(other_df, my_df, on=['id', 'datetime'])

        # result_df = pd.concat(windowed_dfs, axis=1)
        return other_df.drop('id', axis=1)
    
    def _create_county_sma_values(self, df:pd.DataFrame):
        # ensure df is sorted by date
        df = df.set_index('datetime')
        df['id'] = df['county'].astype(str) + '_' + df['is_consumption'].astype(str)
        
        windows = [24, 48, 24*7]
        
        # Loop through each period and create avgs
        df = df
        for window in windows:
            df[f'target_{window}h_csma'] = df.groupby('id')['target'].transform(lambda x: x.rolling(window=window, min_periods=1).mean())

        # result_df = pd.concat(windowed_dfs, axis=1)
        return df.drop(['id', 'target'], axis=1)
    
    def _create_hourly_county_sma_values(self, df:pd.DataFrame):
        # ensure df is sorted by date
        df['id'] = df['county'].astype(str) + '_' + df['is_consumption'].astype(str) + '_' + df['datetime'].dt.hour.astype(str)
        df = df.set_index('datetime')
        windows = [24*7, 24*14]
        # Loop through each period and create avgs
        df = df
        for window in windows:
            df[f'target_{window}h_hcsma'] = df.groupby('id')['target'].transform(lambda x: x.rolling(window=window, min_periods=1).mean())

        return df.drop(['id', 'target'], axis=1)
    
    def _merge(self, df1:pd.DataFrame, df2:pd.DataFrame, merge_cols=['datetime', 'county', 'is_business', 'product_type', 'target', 'is_consumption', 'data_block_id', 'row_id', 'prediction_unit_id']):
        # Assuming df and hourly_df are your DataFrames

        # Create unique_id in both DataFrames
        df1['unique_id'] = df1[merge_cols].astype(str).agg('_'.join, axis=1)
        df2['unique_id'] = df2[merge_cols].astype(str).agg('_'.join, axis=1)
        df2 = df2.drop(merge_cols, axis=1)

        # Merge on unique_id
        merged_df = pd.merge(df1, df2, on='unique_id', how='inner')

        # Drop the unique_id column
        merged_df.drop('unique_id', axis=1, inplace=True)

        # Drop the duplicated columns from one of the DataFrames
        # Assuming the columns in hourly_df are the ones to be dropped
        # for col in merge_cols:
        #     if col + '_y' in merged_df:
        #         merged_df.drop(col + '_y', axis=1, inplace=True)
        #         merged_df.rename(columns={col + '_x': col}, inplace=True)
        return merged_df

    def create_features(self, df:pd.DataFrame):
        """Creates the features"""
        # First, ensure df is sorted
        df = df.sort_values(['datetime', 'prediction_unit_id', 'is_consumption'])
        # display(df)

        #### Entire datablocks worth of targets for that prediction_id - both production and consumption
        datablock_df = self._create_entire_data_block_features(df)
        # display(datablock_df)
        

        #### Past values for the same hour
        hourly_df = self._create_past_hourly_values(df)
        
        # df =  pd.merge(df, hourly_df, on=['datetime','county','is_business','product_type','target','is_consumption','data_block_id','row_id', 'prediction_unit_id'])

        #### simple moving average
        rolling_avg_df = self._create_sma_values(df)
        #join
        
        # df =  pd.merge(df, rolling_avg_df, on=['datetime','county','is_business','product_type','target','is_consumption','data_block_id','row_id', 'prediction_unit_id'])

        #### simple moving average for only same hourly values
        hourly_rolling_avg_df = self._create_hourly_sma_values(df)
        #join
        
        # df =  pd.merge(df, hourly_rolling_avg_df, on=['datetime','county','is_business','product_type','target','is_consumption','data_block_id','row_id', 'prediction_unit_id'])

        #### simple moving average for county
        county_rolling_avg_df = self._create_county_sma_values(df)
        #join
        
        # df =  pd.merge(df, county_rolling_avg_df, on=['datetime','county','is_business','product_type','is_consumption','data_block_id','row_id', 'prediction_unit_id'])

        #### simple moving average for county for only same hourly values
        county_hourly_rolling_avg_df = self._create_hourly_county_sma_values(df)
        #join
        
        # df =  pd.merge(df, county_hourly_rolling_avg_df, on=['datetime','county','is_business','product_type','is_consumption','data_block_id','row_id', 'prediction_unit_id'])

        #join
        df = pd.merge(df, datablock_df, on=['data_block_id', 'prediction_unit_id'])
        #join
        df = self._merge(df, hourly_df)
        df = self._merge(df, rolling_avg_df)
        df = self._merge(df, hourly_rolling_avg_df)
        df = self._merge(df, county_rolling_avg_df, merge_cols=['county', 'is_business', 'product_type', 'is_consumption', 'data_block_id', 'row_id', 'prediction_unit_id'])
        df = self._merge(df, county_hourly_rolling_avg_df, merge_cols=['county', 'is_business', 'product_type', 'is_consumption', 'data_block_id', 'row_id', 'prediction_unit_id'])

        #drop some cols that made it though
        # df = df.drop(['unique_id', 'id'], axis=1)
        # print(df.columns)
        return df


datablock = 274
revealed_targets = pd.read_csv('data/train.csv', parse_dates=['datetime'])
revealed_targets = revealed_targets[revealed_targets.data_block_id < datablock-2].copy()
revealed_targets['data_block_id'] += 2
# display(revealed_targets)

ob = RevealedTargetsLoader(revealed_targets)
# tdf = ob._create_entire_data_block_features(revealed_targets)
# tdf = ob._create_past_hourly_values(revealed_targets)
# tdf = ob._create_sma_values(revealed_targets)
# tdf = ob._create_hourly_sma_values(revealed_targets)
# tdf = ob._create_county_sma_values(revealed_targets)
# tdf = ob._create_hourly_county_sma_values(revealed_targets)
# tdf
feats = ob.create_features(revealed_targets)
feats.sort_values('row_id')


Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,id_x,...,target_48h_sma,target_168h_sma,target_336h_sma,target_168h_hsma,target_336h_hsma,target_24h_csma,target_48h_csma,target_168h_csma,target_168h_hcsma,target_336h_hcsma
0,0,0,1,0.713,0,2021-09-01 00:00:00,2,0,0,0_0_0,...,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000
1,0,0,1,96.590,1,2021-09-01 00:00:00,2,1,0,0_1_0,...,96.590000,96.590000,96.590000,96.590000,96.590000,96.590000,96.590000,96.590000,96.590000,96.590000
48,0,0,2,0.000,0,2021-09-01 00:00:00,2,2,1,0_0_0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.356500,0.356500,0.356500,0.356500,0.356500
49,0,0,2,17.314,1,2021-09-01 00:00:00,2,3,1,0_1_0,...,17.314000,17.314000,17.314000,17.314000,17.314000,56.952000,56.952000,56.952000,56.952000,56.952000
96,0,0,3,2.904,0,2021-09-01 00:00:00,2,4,2,0_0_0,...,2.904000,2.904000,2.904000,2.904000,2.904000,1.205667,1.205667,1.205667,1.205667,1.205667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
844655,15,1,0,108.908,1,2022-05-30 23:00:00,273,844795,64,15_1_23,...,133.171604,141.056583,143.589423,162.695589,161.846757,136.184792,143.070437,131.021595,119.710565,127.176765
844414,15,1,1,0.000,0,2022-05-30 23:00:00,273,844796,59,15_0_23,...,76.201917,76.541125,75.369533,0.000000,0.012868,4.120000,55.986938,77.921196,0.653500,0.739021
844415,15,1,1,40.896,1,2022-05-30 23:00:00,273,844797,59,15_1_23,...,31.517104,35.347310,38.828467,57.173470,51.560629,125.378042,134.915812,128.677655,118.110208,126.984560
844462,15,1,3,1.000,0,2022-05-30 23:00:00,273,844798,60,15_0_23,...,163.585396,147.296375,148.308545,0.273042,0.178191,3.690250,54.590958,77.373667,0.659452,0.741997


## Client Features

I am doing no transformations with client features

In [5]:
class ClientLoader:
    """Stores the data in the client file"""
    def __init__(self, df):
        self.df = df
        self.recent_excerpt = self.init_recent_excerpt(df)

    def init_recent_excerpt(self, df):
        """Gets only the most recent data from the df, to reduce processing time for future feature creation"""
        pass

    def add_test_df(self, df):
        """adds the test df to the data"""
        pass  

## Historical Weather

We are provided with historical weather data for weather stations within counties. A couple of ideas for this weather data:

- Latest historical weather data for the county (avg if more than 1 station)
- Avg weather data for a county 24h, 48h + var
- Avg weather data for all of the country + var


In [8]:
class HistoricalWeatherLoader:
    """Stores the data in the client file"""
    def __init__(self, df):
        self.df = df
        self.recent_excerpt = self.init_recent_excerpt(df)
        self.weather_mapping = self.init_weather_mapping()
        self.weather_features = ['temperature','dewpoint','rain','snowfall','surface_pressure','cloudcover_total','cloudcover_low','cloudcover_mid','cloudcover_high','windspeed_10m','winddirection_10m','shortwave_radiation','direct_solar_radiation','diffuse_radiation']

    def init_weather_mapping(self):
        # https://www.kaggle.com/code/tsunotsuno/enefit-eda-baseline/notebook#Baseline
        county_point_map = {
            0: (59.4, 24.7), # "HARJUMAA"
            1 : (58.8, 22.7), # "HIIUMAA"
            2 : (59.1, 27.2), # "IDA-VIRUMAA"
            3 : (58.8, 25.7), # "JÄRVAMAA"
            4 : (58.8, 26.2), # "JÕGEVAMAA"
            5 : (59.1, 23.7), # "LÄÄNE-VIRUMAA"
            6 : (59.1, 23.7), # "LÄÄNEMAA"
            7 : (58.5, 24.7), # "PÄRNUMAA"
            8 : (58.2, 27.2), # "PÕLVAMAA"
            9 : (58.8, 24.7), # "RAPLAMAA"
            10 : (58.5, 22.7),# "SAAREMAA"
            11 : (58.5, 26.7),# "TARTUMAA"
            12 : (58.5, 25.2),# "UNKNOWNN" (center of the map)
            13 : (57.9, 26.2),# "VALGAMAA"
            14 : (58.2, 25.7),# "VILJANDIMAA"
            15 : (57.9, 27.2) # "VÕRUMAA"
        }
        # Convert the dictionary to a list of tuples
        data = [(county_code, lat, lon) for county_code, (lat, lon) in county_point_map.items()]

        # Create DataFrame
        df = pd.DataFrame(data, columns=['county', 'latitude', 'longitude'])
        
        return df

    def init_recent_excerpt(self, df):
        """Gets only the most recent data from the df, to reduce processing time for future feature creation"""
        pass

    def add_test_df(self, df):
        """adds the test df to the data"""
        pass

    def _add_latest_weather(self, df:pd.DataFrame):
        pass

    def create_features(self, df:pd.DataFrame):
        # map l+l to county
        df = df.merge(self.weather_mapping, how='inner', on=('latitude', 'longitude'))
        # print(weather_features)

        # Apply the function
        county_mean = self.add_24h_mean_var(df, self.weather_features)    
        country_mean = self.add_24h_mean_var_estonia(df, self.weather_features)
           
        latest = self._add_latest_weather(df)
        pass

In [31]:
datablock = 274
df = pd.read_csv('data/historical_weather.csv', parse_dates=['datetime'])
df = df[df.data_block_id < datablock].copy()
# display(revealed_targets)

# ob = RevealedTargetsLoader(revealed_targets)
# tdf = ob._create_entire_data_block_features(revealed_targets)
# tdf = ob._create_past_hourly_values(revealed_targets)
# tdf = ob._create_sma_values(revealed_targets)
# tdf = ob._create_hourly_sma_values(revealed_targets)
# tdf = ob._create_county_sma_values(revealed_targets)
# tdf = ob._create_hourly_county_sma_values(revealed_targets)
# tdf

ob = HistoricalWeatherLoader(df)

df = df.merge(ob.weather_mapping, on=('latitude', 'longitude'))

# df.groupby(['county', 'datetime'])

max_datetime_rows = df.loc[df.groupby(['data_block_id', 'county'])['datetime'].idxmax()]
max_datetime_rows
# df
# df

Unnamed: 0,datetime,temperature,dewpoint,rain,snowfall,surface_pressure,cloudcover_total,cloudcover_low,cloudcover_mid,cloudcover_high,windspeed_10m,winddirection_10m,shortwave_radiation,direct_solar_radiation,diffuse_radiation,latitude,longitude,data_block_id,county
98095,2021-09-01 10:00:00,13.6,8.7,0.0,0.0,1008.9,12,7,9,0,6.305556,331,396.0,311.0,85.0,59.4,24.7,1.0,0
52322,2021-09-01 10:00:00,14.3,8.4,0.0,0.0,1014.4,53,28,0,94,7.777778,340,377.0,223.0,154.0,58.8,22.7,1.0,1
91556,2021-09-01 10:00:00,14.5,10.9,0.0,0.0,1004.2,50,46,15,0,4.916667,322,318.0,196.0,122.0,59.1,27.2,1.0,2
65400,2021-09-01 10:00:00,14.4,10.5,0.0,0.0,1003.3,38,26,24,0,4.861111,323,362.0,224.0,138.0,58.8,25.7,1.0,3
71939,2021-09-01 10:00:00,14.5,10.6,0.0,0.0,1001.1,62,49,30,0,5.194444,320,375.0,235.0,140.0,58.8,26.2,1.0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52311,2022-05-31 10:00:00,12.1,10.0,0.0,0.0,1001.6,96,72,51,1,5.472222,75,221.0,81.0,140.0,58.5,26.7,273.0,11
45772,2022-05-31 10:00:00,11.4,10.2,0.2,0.0,1002.3,100,98,59,3,5.333333,64,173.0,21.0,152.0,58.5,25.2,273.0,12
6538,2022-05-31 10:00:00,11.3,9.9,0.1,0.0,1001.5,100,89,71,1,6.444444,66,130.0,14.0,116.0,57.9,26.2,273.0,13
19616,2022-05-31 10:00:00,10.8,10.0,0.1,0.0,994.5,100,100,65,0,6.000000,64,118.0,3.0,115.0,58.2,25.7,273.0,14


In [57]:
datablock = 274
df = pd.read_csv('data/train.csv', parse_dates=['datetime'])
df = df[df.data_block_id < datablock-2].copy()
df['data_block_id'] += 2

df = df.sort_values(['datetime', 'prediction_unit_id', 'is_consumption'])
df = df.set_index('datetime')
df['id'] = df['prediction_unit_id'].astype(str) + '_' + df['is_consumption'].astype(str)

windows = [8, 12, 24, 48, 24*7, 24*14]
other_df = df.reset_index()
# Loop through each period and create avgs
for window in windows:
    df[f'target_{window}h_sma'] = df.groupby('id')['target'].transform(lambda x: x.rolling(window=window, min_periods=1).mean())
    # my_df = my_df.reset_index()
    # my_df = my_df.rename(columns={'target': f'target_{window}h_sma'})
    # display(my_df)
    # break
    # other_df = pd.merge(other_df, my_df, on=['id', 'datetime'])
    # display(my_df)
    # df[f'{window}h_sma'] = my_df

df

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_8h_sma,target_12h_sma,target_24h_sma,target_48h_sma,target_168h_sma,target_336h_sma
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2021-09-01 00:00:00,0,0,1,0.713,0,2,0,0,0_0,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000
2021-09-01 00:00:00,0,0,1,96.590,1,2,1,0,0_1,96.590000,96.590000,96.590000,96.590000,96.590000,96.590000
2021-09-01 00:00:00,0,0,2,0.000,0,2,2,1,1_0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2021-09-01 00:00:00,0,0,2,17.314,1,2,3,1,1_1,17.314000,17.314000,17.314000,17.314000,17.314000,17.314000
2021-09-01 00:00:00,0,0,3,2.904,0,2,4,2,2_0,2.904000,2.904000,2.904000,2.904000,2.904000,2.904000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-05-30 23:00:00,2,1,1,41.338,1,273,844691,65,65_1,43.745750,42.385583,45.605417,47.576250,47.422845,47.986027
2022-05-30 23:00:00,4,1,1,0.000,0,273,844708,66,66_0,22.252375,51.620667,45.153542,42.830354,40.564274,47.786018
2022-05-30 23:00:00,4,1,1,88.192,1,273,844709,66,66_1,85.601125,68.667833,72.642625,67.988542,75.066607,78.876830
2022-05-30 23:00:00,11,1,0,0.000,0,273,844764,67,67_0,7.285125,15.096167,13.520625,24.017271,19.158619,22.592821


In [48]:
datablock = 274
df = pd.read_csv('data/train.csv', parse_dates=['datetime'])
df = df[df.data_block_id < datablock-2].copy()
df['data_block_id'] += 2

df = df.sort_values(['datetime', 'prediction_unit_id', 'is_consumption'])
df = df.set_index('datetime')
df['id'] = df['prediction_unit_id'].astype(str) + '_' + df['is_consumption'].astype(str)



periods = [24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 268, 292]

# Initialize a list to store the shifted DataFrames
shifted_dfs = []

# Loop through each period and create a shifted DataFrame
for period in periods:
    shifted_df = df.groupby('id')['target'].shift(period)
    shifted_df.name = f'target_lag_{period}'  # Rename the column
    shifted_dfs.append(shifted_df)

# Concatenate all shifted DataFrames with the original DataFrame
result_df = pd.concat([df] + shifted_dfs, axis=1)
result_df

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_24,...,target_lag_72,target_lag_96,target_lag_120,target_lag_144,target_lag_168,target_lag_192,target_lag_216,target_lag_240,target_lag_268,target_lag_292
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-09-01 00:00:00,0,0,1,0.713,0,2,0,0,0_0,,...,,,,,,,,,,
2021-09-01 00:00:00,0,0,1,96.590,1,2,1,0,0_1,,...,,,,,,,,,,
2021-09-01 00:00:00,0,0,2,0.000,0,2,2,1,1_0,,...,,,,,,,,,,
2021-09-01 00:00:00,0,0,2,17.314,1,2,3,1,1_1,,...,,,,,,,,,,
2021-09-01 00:00:00,0,0,3,2.904,0,2,4,2,2_0,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-05-30 23:00:00,2,1,1,41.338,1,273,844691,65,65_1,49.310,...,57.566,45.138,37.696,40.427,46.279,50.540,51.677,51.263,44.959,33.549
2022-05-30 23:00:00,4,1,1,0.000,0,273,844708,66,66_0,0.000,...,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,8.014,10.065
2022-05-30 23:00:00,4,1,1,88.192,1,273,844709,66,66_1,89.906,...,79.231,92.174,95.418,91.396,73.996,92.429,93.970,103.111,92.670,105.669
2022-05-30 23:00:00,11,1,0,0.000,0,273,844764,67,67_0,0.000,...,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.713,1.444


In [66]:
import pandas as pd
import datetime as dt
import numpy as np

import pandas as pd
import datetime as dt
import numpy as np
import time

class TrainDataProcessor:
    """I am rewriting this training data processor to process a few more variables differently."""

    def __init__(self, train, revealed_targets, client, historical_weather,
                 forecast_weather, electricity_prices, gas_prices, for_testing=False,
                add_log_cols=False):
        self.add_log_cols = add_log_cols
        self.test_orig_dfs = self.get_test_orig_dfs([train, revealed_targets, client, historical_weather,
                 forecast_weather, electricity_prices, gas_prices])
        
        self.weather_mapping = self.init_weather_mapping()
        
        if not for_testing:
            self.train = self.init_train(train)
            display(self.train)
            self.revealed_targets = self.init_revealed_targets(revealed_targets)
            self.client = self.init_client(client)
            
            self.historical_weather = self.init_historical_weather(historical_weather)
            self.forecast_weather = self.init_forecast_weather(forecast_weather)
            self.electricity_prices = self.init_electricity(electricity_prices)
            self.gas_prices = self.init_gas_prices(gas_prices)
            
            self.df_all_cols = self.join_data(self.train, self.revealed_targets, self.client, self.historical_weather, self.forecast_weather, self.electricity_prices, self.gas_prices)
            if self.add_log_cols:
                self.df_all_cols = self.create_log_cols(self.df_all_cols)
            self.df = self.remove_cols(self.df_all_cols)
            # self.target = self.df['target']
            # self.df = self.df.drop('target', axis=1)
            
        
    def get_test_orig_dfs(self, dfs):
        for i, df in enumerate(dfs):
            if 'datetime' in df.columns:
                df['datetime'] = pd.to_datetime(df.datetime)
                col = 'datetime'
            if 'prediction_datetime' in df.columns:
                df['prediction_datetime'] = pd.to_datetime(df.prediction_datetime)
                col = 'prediction_datetime'
            if 'forecast_date' in df.columns:
                df['forecast_date'] = pd.to_datetime(df['forecast_date'])
                col = 'forecast_date'
            if 'forecast_datetime' in df.columns:
                df['forecast_datetime'] = pd.to_datetime(df['forecast_datetime'])
                col = 'forecast_datetime'
            if 'date' in df.columns:
                df['date'] = pd.to_datetime(df.date).dt.date
                col = 'date'

            test_date = df[col].iloc[-1]  # Assuming test is a DataFrame
            start_date = test_date - pd.Timedelta(days=14)
            historical_subset = df[df[col] >= start_date]
            dfs[i] = historical_subset.copy()
        return dfs
    
    def trim_test_orig_dfs(self, dfs):
        for i, df in enumerate(dfs):
            if 'datetime' in df.columns:
                col = 'datetime'
            if 'prediction_datetime' in df.columns:
                col = 'prediction_datetime'
            if 'forecast_date' in df.columns:
                col = 'forecast_date'
            if 'forecast_datetime' in df.columns:
                col = 'forecast_datetime'
            if 'date' in df.columns:
                col = 'date'

            test_date = df[col].iloc[-1]  # Assuming test is a DataFrame
            #set cutoff date to a little longer than 14 for a buffer. Shoudn't need the buffer tho honestly
            start_date = test_date - pd.Timedelta(days=20)
            historical_subset = df[df[col] >= start_date]
            dfs[i] = historical_subset.copy()
        return dfs
        
    def init_train(self, df):
        """Prepares the training data for model training."""
        try:
            df['datetime'] = pd.to_datetime(df.datetime)
        except Exception as e:
            df['datetime'] = pd.to_datetime(df.prediction_datetime)
        df['date'] = df.datetime.dt.date
        
        #for test dfs, set index to row id
        df.index = df.row_id.tolist()
            
        # df = self.get_data_block_id(df, 'datetime')
        return df
    
    def add_electricity_lag_features(self, df):
        ##### mean from entire last week
        df.set_index('datetime', inplace=True)
        # Use rolling to calculate mean price of the last week
        # The window is 7 days, min_periods can be set as per requirement
        # 'closed' determines which side of the interval is closed; it can be 'right' or 'left'
        # display(df)
        df['mean_euros_per_mwh_last_week'] = df['euros_per_mwh'].rolling(window='7D', min_periods=1, closed='right').mean()
        # Shift the results to align with the requirement of lagging
        df['mean_euros_per_mwh_last_week'] = df['mean_euros_per_mwh_last_week'].shift()
        
        ##### mean from last week this hour only
        # Extract hour from datetime
        df['hour'] = df.index.hour

        # Group by hour and apply rolling mean for each group
        hourly_groups = df.groupby('hour')
        dff = hourly_groups['euros_per_mwh'].rolling(window='7D', min_periods=1, closed='right').mean()#.shift()#.reset_index(level=0, drop=True)
        dff = dff.reset_index().set_index('datetime').groupby('hour')['euros_per_mwh'].shift()
        dff = dff.rename('mean_euros_per_mwh_same_hour_last_week')
        df = df.join(dff)
        #### yesterday's power price
        df['yesterdays_euros_per_mwh'] = df['euros_per_mwh'].shift(24)
        
        ### 24h average
        # Calculate the 24-hour rolling average
        df['euros_per_mwh_24h_average_price'] = df['euros_per_mwh'].rolling(window=24, min_periods=1).mean()

        # Resetting the index if needed
        df.reset_index(inplace=True)
        df = df.drop(['forecast_date', 'origin_date', 'hour'], axis=1)
        return df

    def init_electricity(self, df):
        ## LAG = 1 Day
        ## Move forecast datetime ahead by 1 day
        ## change name to datetime
        df['datetime'] = pd.to_datetime(df['forecast_date'])
        df['datetime'] = df['datetime'] + dt.timedelta(days=1)
        # df = self.get_data_block_id(df, 'datetime')
        df = self.add_electricity_lag_features(df)
        return df
    
    def add_historical_weather_lag_features(self, df):
        ##### LATEST WEATHER
        def add_latest_weather(df):
            # Assuming df is your original DataFrame
            # Step 1: Convert datetime to a Datetime Object
            df['datetime'] = pd.to_datetime(df['datetime'])
            df.set_index('datetime', inplace=True)

            # Step 2: Sorting the Data
            df.sort_values(by=['datetime', 'latitude', 'longitude'], inplace=True)

            # Step 3: Creating a Unique Identifier for each location
            df['location_id'] = df['latitude'].astype(str) + '_' + df['longitude'].astype(str)

            # Step 4: Filtering for 10:00 AM Entries
            df.reset_index(inplace=True)
            df_10am = df[df['datetime'].dt.hour == 10]
            df_10am.set_index('datetime', inplace=True)

            # Step 5: Shifting the Features by 1 day
            lagged_features = df_10am.groupby('location_id').shift(periods=1, freq='D')
            
            # grouped = lagged_features.groupby('county')
            # lagged_features = grouped[weather_features].mean()
            
            
            # Renaming columns to indicate lag
            lagged_features = lagged_features.add_suffix('_hw_lagged')
            lagged_features['location_id'] = lagged_features['location_id_hw_lagged']
            lagged_features.reset_index(inplace=True)
            lagged_features['date'] = lagged_features.datetime.dt.date

            df['date'] = df.datetime.dt.date
            return lagged_features
            # Step 6: Merging Lagged Features with Original DataFrame
            df = df.merge(lagged_features, on=['date', 'location_id'], how='left', suffixes=('', '_hw_lagged'))
            return df
        
        ##### mean from last day
        def add_24h_mean_var(df, weather_features):
            # Calculate the start and end times for each row
            # df['start_time'] = pd.to_datetime(df['datetime'].dt.date) - pd.Timedelta(days=2) + pd.Timedelta(hours=11)
            # df['end_time'] = pd.to_datetime(df['datetime'].dt.date) - pd.Timedelta(days=1) + pd.Timedelta(hours=10)
            # df['time_code'] = df['start_time'].astype(str) +'_' + df['end_time'].astype(str) + '_' + df['latitude'].astype(str) + '_' + df['longitude'].astype(str)
            # print(df.time_code)

            # Create a helper column for grouping
            # If the time is before 11:00 AM, subtract a day
            df['group'] = df['datetime'].apply(lambda dt: dt if dt.time() >= pd.to_datetime('11:00').time() else dt - pd.Timedelta(days=1))
            df['group'] = df['group'].dt.date  # Keep only the date part for grouping
            df['group'] = (pd.to_datetime(df['group']) + pd.Timedelta(hours=11)).astype(str) + '_' + (pd.to_datetime(df['group']) + pd.Timedelta(days=1, hours=10)).astype(str) + '_' + df['latitude'].astype(str) + '_' + df['longitude'].astype(str)

            # Now group by this new column
            grouped = df.groupby('group')
            means = grouped[weather_features].mean()
            variances = grouped[weather_features].var()

            # Merge means and variances into the original DataFrame
            my_df = df.merge(means, on='group', suffixes=('', '_hw_means'), how='left')
            my_df = my_df.merge(variances, on='group', how='left', suffixes=('', '_hw_variances'))

            return my_df
        
        ##### mean from last day all estonia
        def add_24h_mean_var_estonia(df, weather_features):
            # Calculate the start and end times for each row
            # df['start_time'] = pd.to_datetime(df['datetime'].dt.date) - pd.Timedelta(days=2) + pd.Timedelta(hours=11)
            # df['end_time'] = pd.to_datetime(df['datetime'].dt.date) - pd.Timedelta(days=1) + pd.Timedelta(hours=10)
            # df['time_code'] = df['start_time'].astype(str) +'_' + df['end_time'].astype(str)
            # print(df.time_code)

            # Create a helper column for grouping
            # If the time is before 11:00 AM, subtract a day
            df['group'] = df['datetime'].apply(lambda dt: dt if dt.time() >= pd.to_datetime('11:00').time() else dt - pd.Timedelta(days=1))
            df['group'] = df['group'].dt.date  # Keep only the date part for grouping
            df['group'] = (pd.to_datetime(df['group']) + pd.Timedelta(hours=11)).astype(str) + '_' + (pd.to_datetime(df['group']) + pd.Timedelta(days=1, hours=10)).astype(str) + '_' + df['latitude'].astype(str) + '_' + df['longitude'].astype(str)

            # Now group by this new column
            grouped = df.groupby('group')
            means = grouped[weather_features].mean()
            variances = grouped[weather_features].var()

            # Merge means and variances into the original DataFrame
            my_df = df.merge(means, on='group', suffixes=('', '_hw_means_estonia'), how='left')
            my_df = my_df.merge(variances, on='group', how='left', suffixes=('', '_hw_variances_estonia'))

            return my_df

        df['datetime'] = pd.to_datetime(df['datetime'])
        # 
        weather_features = ['temperature','dewpoint','rain','snowfall','surface_pressure','cloudcover_total','cloudcover_low','cloudcover_mid','cloudcover_high','windspeed_10m','winddirection_10m','shortwave_radiation','direct_solar_radiation','diffuse_radiation']
        # print(weather_features)

        # Apply the function
        df = add_24h_mean_var(df, weather_features)    
        df = add_24h_mean_var_estonia(df, weather_features)
           
        latest = add_latest_weather(df)
        df = df.merge(latest, on=['date', 'location_id'], how='left', suffixes=('', '_hw_lagged'))
        
        return df

    def init_historical_weather(self, df):
        ## LAG: From 11:00 AM 2 days ago to 10:00 AM 1 day ago
        ## What to do? Give most recent weather forecast? Give average over the last day?
        """
        Processes the historical weather data.
        """
        df['datetime'] = pd.to_datetime(df.datetime)
        df = df.drop('data_block_id', axis=1)
        
        
        df = self.add_historical_weather_lag_features(df)
        
        df = df.merge(self.weather_mapping, how='inner', on=('latitude', 'longitude'))
        
        return df

    def init_forecast_weather(self, df):
        ## LAG: DON't ADJUST
        ##      The forecast is from yesterday, but can forecast today, which is 22 hours ahead
        ## Drop any columns where:
        ##                        hours_ahead < 22 and hours_ahead > 45
        ## Then rename forecast_datetime to datetime and join on datetime
        """
        Processes the forecast weather data.
        """
        df['datetime'] = pd.to_datetime(df['forecast_datetime'])
        # keep only datetimes from our relevant period
        df = df[(df['hours_ahead'] < 46) & (df['hours_ahead'] > 21)]
        df['datetime'] = df['datetime'] + dt.timedelta(days=1)
        df = df.merge(self.weather_mapping, how='inner', on=('latitude', 'longitude'))
        return df
    
    def add_gas_prices_lag_features(self, df):
        df['date'] = pd.to_datetime(df['date'])
        df.set_index('date', inplace=True)

        # Sort the DataFrame by date, if it's not already sorted
        df.sort_index(inplace=True)

        # Calculate rolling averages for different time windows
        df['lowest_price_3d_avg'] = df['lowest_price_per_mwh'].rolling(window=3).mean()
        df['highest_price_3d_avg'] = df['highest_price_per_mwh'].rolling(window=3).mean()

        df['lowest_price_7d_avg'] = df['lowest_price_per_mwh'].rolling(window=7).mean()
        df['highest_price_7d_avg'] = df['highest_price_per_mwh'].rolling(window=7).mean()

        df['lowest_price_14d_avg'] = df['lowest_price_per_mwh'].rolling(window=14).mean()
        df['highest_price_14d_avg'] = df['highest_price_per_mwh'].rolling(window=14).mean()

        # Reset the index if you want the 'date' column back
        df.reset_index(inplace=True)
        return df

    def init_gas_prices(self, df):
        ## LAG: 1 DAY
        ## Predictions are made from 2 days ago and predict for yesterday
        ## add one day to forecast_date
        ## Rename forecast_date to date, join on date
        """
        Processes the gas prices data.
        Implement the logic to handle gas prices data processing here.
        """
        df['date'] = pd.to_datetime(df['forecast_date']).dt.date
        df['date'] = df['date'] + dt.timedelta(days=1)
        df = self.add_gas_prices_lag_features(df)
        return df
    
    def add_revealed_target_features(self, df):
        df['datetime'] = pd.to_datetime(df['datetime'])
        #for test dfs, set index to row id
        df.index = df.row_id.tolist()
        df['hour'] = df.datetime.dt.hour
        df['day'] = df.datetime.dt.dayofweek
        df.set_index('datetime', inplace=True)
        
        # let me add some new features here too
        # Adding lag features
        # Step 2: Sorting the Data
        df.sort_values(by=['datetime'], inplace=True)

        # Step 3: Creating a Unique Identifier for each location
        df['id'] = df['county'].astype(str) + '_' + df['is_business'].astype(str) + '_' + df['product_type'].astype(str) + '_' + df['is_consumption'].astype(str)
        lagged_features = []
        lagged_hours = []
        ### Defining lagged target features
        display('df')
        display(df)

        for lag_hours in range(1, 24):
            lagged_feature = df.groupby('id').shift(periods=lag_hours, freq='H')
            lagged_features.append(lagged_feature)
            lagged_hours.append(lag_hours)
            if lag_hours == 1:
                display('lagged_feature 1')
                display(lagged_feature)

        for lag_hours in ([i*24 for i in range(1,8)] + [24*11, 24*12]):
            lagged_feature = df.groupby('id').shift(periods=lag_hours, freq='H')
            lagged_features.append(lagged_feature)
            lagged_hours.append(lag_hours)
            if lag_hours == 1:
                display('lagged_feature 2')
                display(lagged_feature)
            
        df.reset_index(inplace=True)
        for lagged_feature, lag_hours in zip(lagged_features, lagged_hours):
            lagged_feature.reset_index(inplace=True)
            lagged_feature.dropna(inplace=True)
            df = df.merge(lagged_feature[['datetime', 'target', 'id']], on=['id', 'datetime'], how='left', suffixes=('', f'_lag_{lag_hours}h'))

        df.set_index('datetime', inplace=True)
        display('df 2')
        display(df)
        

        window_size = 7
        # Group by the specified columns and then apply the rolling mean
        grouped = df.groupby(['county', 'is_business', 'product_type', 'is_consumption'])
        df['target_rolling_avg_24h'] = grouped['target'].transform(lambda x: x.rolling(window=24, min_periods=1).mean())

        grouped = df.groupby(['county', 'is_business', 'product_type', 'is_consumption', 'hour'])
        df['target_rolling_avg_hour_7d'] = grouped['target'].transform(lambda x: x.rolling(window=7, min_periods=1).mean())

        # grouped = df.groupby(['county', 'is_business', 'product_type', 'is_consumption', 'hour', 'day'])
        # df['target_rolling_avg_hour_hour_day_4w'] = grouped['target'].transform(lambda x: x.rolling(window=4, min_periods=1).mean())

        grouped = df.groupby(['county', 'is_business', 'is_consumption'])
        df['target_rolling_allp_avg_24h'] = grouped['target'].transform(lambda x: x.rolling(window=24, min_periods=1).mean())

        grouped = df.groupby(['county', 'is_business', 'is_consumption', 'hour'])
        df['target_rolling_allp_avg_hour_7d'] = grouped['target'].transform(lambda x: x.rolling(window=7, min_periods=1).mean())

        grouped = df.groupby(['county', 'is_business', 'is_consumption', 'hour', 'day'])
        df['target_rolling_allp_avg_hour_hour_day_4w'] = grouped['target'].transform(lambda x: x.rolling(window=4, min_periods=1).mean())
        
        #All of estonia
        grouped = df.groupby(['is_business', 'product_type', 'is_consumption'])
        df['target_rolling_avg_24h_estonia'] = grouped['target'].transform(lambda x: x.rolling(window=24, min_periods=1).mean())

        grouped = df.groupby(['is_business', 'product_type', 'is_consumption', 'hour'])
        df['target_rolling_avg_hour_7d_estonia'] = grouped['target'].transform(lambda x: x.rolling(window=7, min_periods=1).mean())

        # grouped = df.groupby(['is_business', 'product_type', 'is_consumption', 'hour', 'day'])
        # df['target_rolling_avg_hour_hour_day_4w_estonia'] = grouped['target'].transform(lambda x: x.rolling(window=4, min_periods=1).mean())

        grouped = df.groupby(['is_business', 'is_consumption'])
        df['target_rolling_allp_avg_24h_estonia'] = grouped['target'].transform(lambda x: x.rolling(window=24, min_periods=1).mean())

        grouped = df.groupby(['is_business', 'is_consumption', 'hour'])
        df['target_rolling_allp_avg_hour_7d_estonia'] = grouped['target'].transform(lambda x: x.rolling(window=7, min_periods=1).mean())

        # grouped = df.groupby(['is_business', 'is_consumption', 'hour', 'day'])
        # df['target_rolling_allp_avg_hour_hour_day_4w_estonia'] = grouped['target'].transform(lambda x: x.rolling(window=4, min_periods=1).mean())
        
        df = df.drop(['hour', 'day'], axis=1)

        if df.target.isna().sum() > 0:
            print("### NAS IN TARGET FEATURES")
            # display(df)

        return df
    
    def init_revealed_targets(self, df):
        
        df['datetime'] = pd.to_datetime(df.datetime)
        df['datetime'] = df['datetime'] + dt.timedelta(days=2)
        df = self.add_revealed_target_features(df)
        return df
    
    def init_client(self, df):
        ## LAG: 2 days
        ## Add 2 days to date, join on date
        df['date'] = pd.to_datetime(df.date).dt.date
        df['date'] = df['date'] + dt.timedelta(days=2)
        # df = self.get_data_block_id(df, 'date')
        return df

    def init_weather_mapping(self):
        # https://www.kaggle.com/code/tsunotsuno/enefit-eda-baseline/notebook#Baseline
        county_point_map = {
            0: (59.4, 24.7), # "HARJUMAA"
            1 : (58.8, 22.7), # "HIIUMAA"
            2 : (59.1, 27.2), # "IDA-VIRUMAA"
            3 : (58.8, 25.7), # "JÄRVAMAA"
            4 : (58.8, 26.2), # "JÕGEVAMAA"
            5 : (59.1, 23.7), # "LÄÄNE-VIRUMAA"
            6 : (59.1, 23.7), # "LÄÄNEMAA"
            7 : (58.5, 24.7), # "PÄRNUMAA"
            8 : (58.2, 27.2), # "PÕLVAMAA"
            9 : (58.8, 24.7), # "RAPLAMAA"
            10 : (58.5, 22.7),# "SAAREMAA"
            11 : (58.5, 26.7),# "TARTUMAA"
            12 : (58.5, 25.2),# "UNKNOWNN" (center of the map)
            13 : (57.9, 26.2),# "VALGAMAA"
            14 : (58.2, 25.7),# "VILJANDIMAA"
            15 : (57.9, 27.2) # "VÕRUMAA"
        }
        # Convert the dictionary to a list of tuples
        data = [(county_code, lat, lon) for county_code, (lat, lon) in county_point_map.items()]

        # Create DataFrame
        df = pd.DataFrame(data, columns=['county', 'latitude', 'longitude'])
        
        return df
    
    def add_date_features(self, df):
        df['year'] = df['datetime'].dt.year
        df['month'] = df['datetime'].dt.month
        df['day'] = df['datetime'].dt.day
        df['hour'] = df['datetime'].dt.hour
        df['quarter'] = df['datetime'].dt.quarter
        df['day_of_week'] = df['datetime'].dt.day_of_week
        df['day_of_year'] = df['datetime'].dt.dayofyear
        df['week_of_year'] = df['datetime'].dt.isocalendar().week
        df['is_weekend'] = df['datetime'].dt.day_of_week >= 5
        df['is_month_start'] = df['datetime'].dt.is_month_start
        df['is_month_end'] = df['datetime'].dt.is_month_end
        df['is_quarter_start'] = df['datetime'].dt.is_quarter_start
        df['is_quarter_end'] = df['datetime'].dt.is_quarter_end
        df['is_year_start'] = df['datetime'].dt.is_year_start
        df['is_year_end'] = df['datetime'].dt.is_year_end
        df['season'] = df['datetime'].dt.month % 12 // 3 + 1
        df['hour_sin'] = np.sin(df['datetime'].dt.hour * (2. * np.pi / 24))
        df['hour_cos'] = np.cos(df['datetime'].dt.hour * (2. * np.pi / 24))
        # Calculate sin and cos for day of year
        days_in_year = 365.25  # accounts for leap year
        df['day_of_year_sin'] = np.sin((df['day_of_year'] - 1) * (2 * np.pi / days_in_year))
        df['day_of_year_cos'] = np.cos((df['day_of_year'] - 1) * (2 * np.pi / days_in_year))
        return df
    
    def add_ee_holidays(self, df):
        import holidays
        # Define Estonia public holidays
        ee_holidays = holidays.CountryHoliday('EE')
        
        print(df['date'].isna().sum())
        
        def find_problem(x):
            try:
                return x in ee_holidays
            except Exception as e:
                print(x)
                raise e

        # Function to check if the date is a holiday
        df['is_ee_holiday'] = df['date'].apply(lambda x: x in ee_holidays)

        return df
    
    def create_log_cols(self, df):
        log_cols = ['target_lag_1h', 'target_lag_2h', 'target_lag_3h', 'target_lag_4h',
       'target_lag_5h', 'target_lag_6h', 'target_lag_7h', 'target_lag_8h',
       'target_lag_9h', 'target_lag_10h', 'target_lag_11h', 'target_lag_12h',
       'target_lag_13h', 'target_lag_14h', 'target_lag_15h', 'target_lag_16h',
       'target_lag_17h', 'target_lag_18h', 'target_lag_19h', 'target_lag_20h',
       'target_lag_21h', 'target_lag_22h', 'target_lag_23h', 'target_lag_24h',
       'target_lag_48h', 'target_lag_72h', 'target_lag_96h', 'target_lag_120h',
       'target_lag_144h', 'target_lag_168h', 'target_lag_264h',
       'target_lag_288h', 'eic_count', 'installed_capacity', 'temperature', 'dewpoint', 'rain',
       'snowfall', 'surface_pressure', 'cloudcover_total', 'cloudcover_low',
       'cloudcover_mid', 'cloudcover_high', 'windspeed_10m',
       'winddirection_10m', 'shortwave_radiation', 'direct_solar_radiation',
       'diffuse_radiation', 'temperature_hw_means', 'dewpoint_hw_means',
       'rain_hw_means', 'snowfall_hw_means', 'surface_pressure_hw_means',
       'cloudcover_total_hw_means', 'cloudcover_low_hw_means',
       'cloudcover_mid_hw_means', 'cloudcover_high_hw_means',
       'windspeed_10m_hw_means', 'winddirection_10m_hw_means',
       'shortwave_radiation_hw_means', 'direct_solar_radiation_hw_means',
       'diffuse_radiation_hw_means', 'temperature_hw_variances',
       'dewpoint_hw_variances', 'rain_hw_variances', 'snowfall_hw_variances',
       'surface_pressure_hw_variances', 'cloudcover_total_hw_variances',
       'cloudcover_low_hw_variances', 'cloudcover_mid_hw_variances',
       'cloudcover_high_hw_variances', 'windspeed_10m_hw_variances',
       'winddirection_10m_hw_variances', 'shortwave_radiation_hw_variances',
       'direct_solar_radiation_hw_variances', 'diffuse_radiation_hw_variances',
       'temperature_hw_lagged', 'dewpoint_hw_lagged', 'rain_hw_lagged',
       'snowfall_hw_lagged', 'surface_pressure_hw_lagged',
       'cloudcover_total_hw_lagged', 'cloudcover_low_hw_lagged', 'cloudcover_mid_hw_lagged',
       'cloudcover_high_hw_lagged', 'windspeed_10m_hw_lagged',
       'winddirection_10m_hw_lagged', 'shortwave_radiation_hw_lagged',
       'direct_solar_radiation_hw_lagged', 'diffuse_radiation_hw_lagged',
       'temperature_hw_means_hw_lagged', 'dewpoint_hw_means_hw_lagged',
       'rain_hw_means_hw_lagged', 'snowfall_hw_means_hw_lagged',
       'surface_pressure_hw_means_hw_lagged',
       'cloudcover_total_hw_means_hw_lagged',
       'cloudcover_low_hw_means_hw_lagged',
       'cloudcover_mid_hw_means_hw_lagged',
       'cloudcover_high_hw_means_hw_lagged',
       'windspeed_10m_hw_means_hw_lagged',
       'winddirection_10m_hw_means_hw_lagged',
       'shortwave_radiation_hw_means_hw_lagged',
       'direct_solar_radiation_hw_means_hw_lagged',
       'diffuse_radiation_hw_means_hw_lagged',
       'temperature_hw_variances_hw_lagged', 'dewpoint_hw_variances_hw_lagged',
       'rain_hw_variances_hw_lagged', 'snowfall_hw_variances_hw_lagged',
       'surface_pressure_hw_variances_hw_lagged',
       'cloudcover_total_hw_variances_hw_lagged',
       'cloudcover_low_hw_variances_hw_lagged',
       'cloudcover_mid_hw_variances_hw_lagged',
       'cloudcover_high_hw_variances_hw_lagged',
       'windspeed_10m_hw_variances_hw_lagged',
       'winddirection_10m_hw_variances_hw_lagged',
       'shortwave_radiation_hw_variances_hw_lagged',
       'direct_solar_radiation_hw_variances_hw_lagged',
       'diffuse_radiation_hw_variances_hw_lagged', 'temperature_fw', 'dewpoint_fw', 'cloudcover_high_fw',
       'cloudcover_low_fw', 'cloudcover_mid_fw', 'cloudcover_total_fw',
       '10_metre_u_wind_component', '10_metre_v_wind_component',
       'direct_solar_radiation_fw', 'surface_solar_radiation_downwards',
       'snowfall_fw', 'total_precipitation', 'euros_per_mwh', 'mean_euros_per_mwh_last_week',
       'mean_euros_per_mwh_same_hour_last_week', 'yesterdays_euros_per_mwh',
       'euros_per_mwh_24h_average_price', 'lowest_price_per_mwh',
       'highest_price_per_mwh', 'lowest_price_3d_avg', 'highest_price_3d_avg',
       'lowest_price_7d_avg', 'highest_price_7d_avg', 'lowest_price_14d_avg',
       'highest_price_14d_avg']
        
        log_cols = [col for col in log_cols if col in df.columns]
        
        dff = np.log1p(df[log_cols] )
        dff.rename(columns={col: col + "_log" for col in log_cols}, inplace=True)
        return pd.concat([df, dff], axis=1)
        
    
    def remove_cols(self, df):
        col_list = [
                   'prediction_unit_id',
                    'date_train',
                    'hour_part',
                   'date_client',
                    'forecast_date_elec_price',
                    'origin_date_elec_price',
                    'forecast_date_gas_price',
                    'origin_date_gas_price',
                    'datetime_hist_weath',
                   'hour_part_hist_weath_latest',
                    'datetime_hist_weath_latest',
                   'origin_datetime',
                   'hour_part_fore_weath',
                    'id',
                     'data_block_id',
                     'prediction_unit_id',
                     'date',
                    'data_block_id_rt',
                     'row_id_rt',
                     'prediction_unit_id_rt',
                    'data_block_id_client',
                    'latitude',
                     'longitude',
                     'data_block_id_hw',
                    'start_time',
                     'end_time',
                     'time_code',
                     'group',
                    'data_block_id_hw_means',
                    'data_block_id_hw_variances',
                     'location_id',
                     'date_hw',
                     'datetime_hw_lagged',
                    'latitude_hw_lagged',
                     'longitude_hw_lagged',
                     'data_block_id_hw_lagged',
                     'start_time_hw_lagged',
                     'end_time_hw_lagged',
                     'time_code_hw_lagged',
                     'group_hw_lagged',
                    'data_block_id_hw_means_hw_lagged',
                    'data_block_id_hw_variances_hw_lagged',
                    'location_id_hw_lagged',
                     'latitude_fw',
                     'longitude_fw',
                     'origin_datetime',
                    'data_block_id_fw',
                     'forecast_datetime',
                    'data_block_id_elec',
                    'forecast_date',
                    'origin_date',
                     'data_block_id_gasp',
                   ]
        columns_to_drop = [col for col in col_list if col in df.columns]
        df = df.drop(columns_to_drop, axis=1)
        return df
    
    def remove_test_cols(self, df):
        col_list = [
                   'prediction_unit_id',
                    'date_train',
                    'hour_part',
                   'date_client',
                    'forecast_date_elec_price',
                    'origin_date_elec_price',
                    'forecast_date_gas_price',
                    'origin_date_gas_price',
                    'datetime_hist_weath',
                   'hour_part_hist_weath_latest',
                    'datetime_hist_weath_latest',
                   'origin_datetime',
                   'hour_part_fore_weath',
                     'data_block_id',
                     'prediction_unit_id',
                     'date',
                    'data_block_id_rt',
                     'row_id_rt',
                     'prediction_unit_id_rt',
                    'data_block_id_client',
                    'latitude',
                     'longitude',
                     'data_block_id_hw',
                    'start_time',
                     'end_time',
                     'time_code',
                     'group',
                    'data_block_id_hw_means',
                    'data_block_id_hw_variances',
                     'location_id',
                     'date_hw',
                     'datetime_hw_lagged',
                    'latitude_hw_lagged',
                     'longitude_hw_lagged',
                     'data_block_id_hw_lagged',
                     'start_time_hw_lagged',
                     'end_time_hw_lagged',
                     'time_code_hw_lagged',
                     'group_hw_lagged',
                    'data_block_id_hw_means_hw_lagged',
                    'data_block_id_hw_variances_hw_lagged',
                    'location_id_hw_lagged',
                     'latitude_fw',
                     'longitude_fw',
                     'origin_datetime',
                    'data_block_id_fw',
                     'forecast_datetime',
                    'data_block_id_elec',
                    'forecast_date',
                    'origin_date',
                     'data_block_id_gasp',
                    'id'
                   ]
        columns_to_drop = [col for col in col_list if col in df.columns]
        df = df.drop(columns_to_drop, axis=1)
        return df
    
    def join_data(self, train, revealed_targets, client, historical_weather, forecast_weather, electricity_prices, gas_prices):
        df = train
        display("TRAIN")
        display(df)
        display("revealed_targets")
        display(revealed_targets.sort_values(['datetime', 'row_id']))
        df = df.merge(revealed_targets, how='left', on=('datetime', 'county', 'is_business', 'product_type', 'is_consumption'), suffixes=('', '_rt'))
        display("MERGED with revealed targets")
        display(df.sort_values(['datetime', 'row_id']))
        df = df.merge(client, how='left', on=('date', 'county', 'is_business', 'product_type'), suffixes=('', '_client'))
        df = df.merge(historical_weather, how='left', on=('datetime', 'county'), suffixes=('', '_hw'))
        df = df.merge(forecast_weather, how='left', on=('datetime', 'county'), suffixes=('', '_fw'))
        df = df.merge(electricity_prices, how='left', on='datetime', suffixes=('', '_elec'))
        df['date'] = pd.to_datetime(df['date'])
        df = df.merge(gas_prices, how='left', on='date', suffixes=('', '_gasp'))
        df = self.add_date_features(df)
        df = self.add_ee_holidays(df)
        return df
    
    def add_test_data(self, test, revealed_targets, client, historical_weather,
            forecast_weather, electricity_prices, gas_prices):
        dfs = [test.copy(), revealed_targets, client, historical_weather,
                 forecast_weather, electricity_prices, gas_prices]
        for i, df in enumerate(dfs):
            if 'datetime' in df.columns:
                df['datetime'] = pd.to_datetime(df.datetime)
                col = 'datetime'
            if 'prediction_datetime' in df.columns:
                df['datetime'] = pd.to_datetime(df.prediction_datetime)
                col = 'datetime'
                df = df.drop('prediction_datetime', axis=1)
            if 'forecast_date' in df.columns:
                df['forecast_date'] = pd.to_datetime(df['forecast_date'])
                col = 'forecast_date'
            if 'forecast_datetime' in df.columns:
                df['forecast_datetime'] = pd.to_datetime(df['forecast_datetime'])
                col = 'forecast_datetime'
                
            self.test_orig_dfs[i] = pd.concat([ self.test_orig_dfs[i], df ])

    def update_targets(self, df, revealed_targets):
        # Select only the necessary columns from revealed_targets for merging
        cols_to_merge = ['county', 'is_business', 'product_type', 'is_consumption', 'datetime', 'target']

        # Performing a left join with an indicator
        merged_df = df.merge(revealed_targets[cols_to_merge], 
                            on=['county', 'is_business', 'product_type', 'is_consumption', 'datetime'], 
                            how='left',
                            suffixes=('', '_revealed'),
                            indicator=True)

        # Update 'target' in df only where it's NaN and the row is from the left DataFrame only
        mask = (merged_df['_merge'] == 'left_only') & (merged_df['target'].isna())
        df.loc[df.index[mask], 'target'] = merged_df.loc[mask, 'target_revealed']

        return df

    def add_to_df(self, df, revealed_targets):
        if 'currently_scored' in df.columns:
            df = df.drop('currently_scored', axis=1)
        # Filter out rows in df that have indices already in self.df
        o_index = self.df.index
        n_index = df.index
        df = df[~df.index.isin(self.df.index)]
        # Add to training data only if df is not empty after filtering
        if not df.empty:
            self.df = pd.concat([self.df, df])
        else:
            print('df was empty')
            display(o_index)
            display(n_index)
        # Add revealed targets if appropriate to target column
        self.df = self.update_targets(self.df, revealed_targets)
    
    def process_test_data_timestep(self, test, revealed_targets, client, historical_weather,
            forecast_weather, electricity_prices, gas_prices):
        test_revealed_targets = revealed_targets
        #append test data to test data cache
        marker = time.time()
        self.add_test_data(test, revealed_targets, client, historical_weather,
            forecast_weather, electricity_prices, gas_prices)
        diff = time.time()-marker
        print(f"Time to add test: {diff}")
        
        # process test data
        marker = time.time()
        test = self.init_train(self.test_orig_dfs[0])
        revealed_targets = self.init_revealed_targets(self.test_orig_dfs[1])
        client = self.init_client(self.test_orig_dfs[2])
        historical_weather = self.init_historical_weather(self.test_orig_dfs[3])
        forecast_weather = self.init_forecast_weather(self.test_orig_dfs[4])
        electricity_prices = self.init_electricity(self.test_orig_dfs[5])
        gas_prices = self.init_gas_prices(self.test_orig_dfs[6])
        df_all_cols = self.join_data(test, revealed_targets, client, historical_weather,
            forecast_weather, electricity_prices, gas_prices)
        if self.add_log_cols:
            df_all_cols = self.create_log_cols(df_all_cols)
        df = self.remove_test_cols(df_all_cols)
        diff = time.time()-marker
        print(f"Time to process test data: {diff}")
        
        #reindex to get row_id as the index
        # display(df)
        df = df.drop_duplicates('row_id')
        df.index = df.row_id.tolist()

        marker = time.time()
        self.add_to_df(df, test_revealed_targets)
        diff = time.time()-marker
        print(f"Time to add test to df: {diff}")

        #trim original dfs to prevent processing time taking too long
        self.trim_test_orig_dfs(self.test_orig_dfs)

        return df
        

In [67]:
%%time
datablock = 274

train = pd.read_csv('data/train.csv')
train = train[train.data_block_id < datablock]
display(train)

revealed_targets = pd.read_csv('data/train.csv')
revealed_targets = revealed_targets[revealed_targets.data_block_id < datablock-2].copy()
revealed_targets['data_block_id'] += 2
display(revealed_targets)


client = pd.read_csv('data/client.csv')
client= client[client.data_block_id < datablock]

historical_weather = pd.read_csv('data/historical_weather.csv')
historical_weather = historical_weather[historical_weather.data_block_id < datablock]

forecast_weather = pd.read_csv('data/forecast_weather.csv')
forecast_weather = forecast_weather[forecast_weather.data_block_id < datablock]

electricity_prices = pd.read_csv('data/electricity_prices.csv')
electricity_prices = electricity_prices[electricity_prices.data_block_id < datablock]

gas_prices = pd.read_csv('data/gas_prices.csv')
gas_prices = gas_prices[gas_prices.data_block_id < datablock]

data_processor = TrainDataProcessor(train, revealed_targets, client, historical_weather, forecast_weather, electricity_prices, gas_prices, add_log_cols=False)

data_processor.df

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,0,0
1,0,0,1,96.590,1,2021-09-01 00:00:00,0,1,0
2,0,0,2,0.000,0,2021-09-01 00:00:00,0,2,1
3,0,0,2,17.314,1,2021-09-01 00:00:00,0,3,1
4,0,0,3,2.904,0,2021-09-01 00:00:00,0,4,2
...,...,...,...,...,...,...,...,...,...
851227,15,1,0,114.072,1,2022-06-01 23:00:00,273,851227,64
851228,15,1,1,0.000,0,2022-06-01 23:00:00,273,851228,59
851229,15,1,1,36.401,1,2022-06-01 23:00:00,273,851229,59
851230,15,1,3,0.000,0,2022-06-01 23:00:00,273,851230,60


Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id
0,0,0,1,0.713,0,2021-09-01 00:00:00,2,0,0
1,0,0,1,96.590,1,2021-09-01 00:00:00,2,1,0
2,0,0,2,0.000,0,2021-09-01 00:00:00,2,2,1
3,0,0,2,17.314,1,2021-09-01 00:00:00,2,3,1
4,0,0,3,2.904,0,2021-09-01 00:00:00,2,4,2
...,...,...,...,...,...,...,...,...,...
844795,15,1,0,108.908,1,2022-05-30 23:00:00,273,844795,64
844796,15,1,1,0.000,0,2022-05-30 23:00:00,273,844796,59
844797,15,1,1,40.896,1,2022-05-30 23:00:00,273,844797,59
844798,15,1,3,1.000,0,2022-05-30 23:00:00,273,844798,60


Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,date
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,0,0,2021-09-01
1,0,0,1,96.590,1,2021-09-01 00:00:00,0,1,0,2021-09-01
2,0,0,2,0.000,0,2021-09-01 00:00:00,0,2,1,2021-09-01
3,0,0,2,17.314,1,2021-09-01 00:00:00,0,3,1,2021-09-01
4,0,0,3,2.904,0,2021-09-01 00:00:00,0,4,2,2021-09-01
...,...,...,...,...,...,...,...,...,...,...
851227,15,1,0,114.072,1,2022-06-01 23:00:00,273,851227,64,2022-06-01
851228,15,1,1,0.000,0,2022-06-01 23:00:00,273,851228,59,2022-06-01
851229,15,1,1,36.401,1,2022-06-01 23:00:00,273,851229,59,2022-06-01
851230,15,1,3,0.000,0,2022-06-01 23:00:00,273,851230,60,2022-06-01


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-09-03 00:00:00,0,0,1,0.713,0,2,0,0,0,4,0_0_1_0
2021-09-03 00:00:00,11,0,2,7.620,1,2,89,44,0,4,11_0_2_1
2021-09-03 00:00:00,11,0,2,0.000,0,2,88,44,0,4,11_0_2_0
2021-09-03 00:00:00,11,0,1,21.099,1,2,87,43,0,4,11_0_1_1
2021-09-03 00:00:00,11,0,1,0.000,0,2,86,43,0,4,11_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-01 23:00:00,4,0,1,0.001,0,273,844702,15,23,2,4_0_1_0
2022-06-01 23:00:00,3,1,3,739.677,1,273,844701,14,23,2,3_1_3_1
2022-06-01 23:00:00,3,1,3,0.000,0,273,844700,14,23,2,3_1_3_0
2022-06-01 23:00:00,5,0,3,0.065,0,273,844714,20,23,2,5_0_3_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-09-03 01:00:00,0,0,1,0.713,0,2,0,0,0,4,0_0_1_0
2021-09-03 02:00:00,0,0,1,1.132,0,2,122,0,1,4,0_0_1_0
2021-09-03 03:00:00,0,0,1,0.490,0,2,244,0,2,4,0_0_1_0
2021-09-03 04:00:00,0,0,1,0.496,0,2,366,0,3,4,0_0_1_0
2021-09-03 05:00:00,0,0,1,0.149,0,2,488,0,4,4,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-01 20:00:00,9,1,3,127.300,1,273,844211,37,19,2,9_1_3_1
2022-06-01 21:00:00,9,1,3,135.970,1,273,844345,37,20,2,9_1_3_1
2022-06-01 22:00:00,9,1,3,107.730,1,273,844479,37,21,2,9_1_3_1
2022-06-01 23:00:00,9,1,3,92.512,1,273,844613,37,22,2,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-09-03 00:00:00,0,0,1,0.713,0,2,0,0,0,4,...,,,,,,,,,,
2021-09-03 00:00:00,11,0,2,7.620,1,2,89,44,0,4,...,,,,,,,,,,
2021-09-03 00:00:00,11,0,2,0.000,0,2,88,44,0,4,...,,,,,,,,,,
2021-09-03 00:00:00,11,0,1,21.099,1,2,87,43,0,4,...,,,,,,,,,,
2021-09-03 00:00:00,11,0,1,0.000,0,2,86,43,0,4,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-01 23:00:00,4,0,1,0.001,0,273,844702,15,23,2,...,0.000,0.002,0.000,0.001,0.005,0.000,0.003,0.001,0.017,0.017
2022-06-01 23:00:00,3,1,3,739.677,1,273,844701,14,23,2,...,774.939,777.866,761.635,754.692,735.991,725.900,718.745,756.855,689.395,715.552
2022-06-01 23:00:00,3,1,3,0.000,0,273,844700,14,23,2,...,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-01 23:00:00,5,0,3,0.065,0,273,844714,20,23,2,...,0.288,0.092,0.646,0.679,0.228,0.178,0.153,0.329,0.129,1.504


### NAS IN TARGET FEATURES


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,date
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,0,0,2021-09-01
1,0,0,1,96.590,1,2021-09-01 00:00:00,0,1,0,2021-09-01
2,0,0,2,0.000,0,2021-09-01 00:00:00,0,2,1,2021-09-01
3,0,0,2,17.314,1,2021-09-01 00:00:00,0,3,1,2021-09-01
4,0,0,3,2.904,0,2021-09-01 00:00:00,0,4,2,2021-09-01
...,...,...,...,...,...,...,...,...,...,...
851227,15,1,0,114.072,1,2022-06-01 23:00:00,273,851227,64,2022-06-01
851228,15,1,1,0.000,0,2022-06-01 23:00:00,273,851228,59,2022-06-01
851229,15,1,1,36.401,1,2022-06-01 23:00:00,273,851229,59,2022-06-01
851230,15,1,3,0.000,0,2022-06-01 23:00:00,273,851230,60,2022-06-01


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-09-03 00:00:00,0,0,1,0.713,0,2,0,0,0_0_1_0,,...,,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000,0.713000
2021-09-03 00:00:00,0,0,1,96.590,1,2,1,0,0_0_1_1,,...,,96.590000,96.590000,256.921000,256.921000,256.921000,19.134100,21.204857,68.124565,119.252571
2021-09-03 00:00:00,0,0,2,0.000,0,2,2,1,0_0_2_0,,...,,0.000000,0.000000,1.205667,1.205667,1.205667,0.000000,0.000000,0.165130,0.427429
2021-09-03 00:00:00,0,0,2,17.314,1,2,3,1,0_0_2_1,,...,,17.314000,17.314000,337.086500,337.086500,337.086500,12.467000,12.467000,66.830682,111.059857
2021-09-03 00:00:00,0,0,3,2.904,0,2,4,2,0_0_3_0,,...,,2.904000,2.904000,1.808500,1.808500,1.808500,0.280455,0.428000,0.172636,0.428000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-01 23:00:00,15,1,0,108.908,1,273,844795,64,15_1_0_1,113.134,...,128.295,148.408500,116.198143,202.630292,137.274857,74.630250,471.246500,313.735571,642.526333,324.414000
2022-06-01 23:00:00,15,1,1,0.000,0,273,844796,59,15_1_1_0,0.000,...,0.000,55.732333,0.000000,31.139708,0.142857,0.000000,0.026083,0.000000,2.235750,0.569571
2022-06-01 23:00:00,15,1,1,40.896,1,273,844797,59,15_1_1_1,43.256,...,39.851,42.087583,38.239857,205.342708,127.022000,129.226750,165.953000,144.902143,674.855875,200.259000
2022-06-01 23:00:00,15,1,3,1.000,0,273,844798,60,15_1_3_0,0.346,...,0.000,94.921667,0.285714,22.381875,0.285714,0.250000,0.371375,0.354000,0.282708,0.225571


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,date,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,0,0,2021-09-01,...,,,,,,,,,,
1,0,0,1,96.590,1,2021-09-01 00:00:00,0,1,0,2021-09-01,...,,,,,,,,,,
2,0,0,2,0.000,0,2021-09-01 00:00:00,0,2,1,2021-09-01,...,,,,,,,,,,
3,0,0,2,17.314,1,2021-09-01 00:00:00,0,3,1,2021-09-01,...,,,,,,,,,,
4,0,0,3,2.904,0,2021-09-01 00:00:00,0,4,2,2021-09-01,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
851227,15,1,0,114.072,1,2022-06-01 23:00:00,273,851227,64,2022-06-01,...,128.295,148.408500,116.198143,202.630292,137.274857,74.63025,471.246500,313.735571,642.526333,324.414000
851228,15,1,1,0.000,0,2022-06-01 23:00:00,273,851228,59,2022-06-01,...,0.000,55.732333,0.000000,31.139708,0.142857,0.00000,0.026083,0.000000,2.235750,0.569571
851229,15,1,1,36.401,1,2022-06-01 23:00:00,273,851229,59,2022-06-01,...,39.851,42.087583,38.239857,205.342708,127.022000,129.22675,165.953000,144.902143,674.855875,200.259000
851230,15,1,3,0.000,0,2022-06-01 23:00:00,273,851230,60,2022-06-01,...,0.000,94.921667,0.285714,22.381875,0.285714,0.25000,0.371375,0.354000,0.282708,0.225571


0
CPU times: total: 1min 17s
Wall time: 3min 42s


Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,row_id,target_rt,target_lag_1h,target_lag_2h,...,is_quarter_start,is_quarter_end,is_year_start,is_year_end,season,hour_sin,hour_cos,day_of_year_sin,day_of_year_cos,is_ee_holiday
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
1,0,0,1,96.590,1,2021-09-01 00:00:00,1,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
2,0,0,2,0.000,0,2021-09-01 00:00:00,2,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
3,0,0,2,17.314,1,2021-09-01 00:00:00,3,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
4,0,0,3,2.904,0,2021-09-01 00:00:00,4,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
851353,15,1,0,114.072,1,2022-06-01 23:00:00,851227,108.908,113.134,127.814,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False
851354,15,1,1,0.000,0,2022-06-01 23:00:00,851228,0.000,0.000,0.001,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False
851355,15,1,1,36.401,1,2022-06-01 23:00:00,851229,40.896,43.256,42.361,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False
851356,15,1,3,0.000,0,2022-06-01 23:00:00,851230,1.000,0.346,2.162,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False


In [68]:
import pickle

with open('data_processor_test.pkl', 'wb') as f:
    pickle.dump(data_processor, f)

In [69]:
with open('data_processor_test.pkl', 'rb') as f:
    data_processor = pickle.load(f)
data_processor.df

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,row_id,target_rt,target_lag_1h,target_lag_2h,...,is_quarter_start,is_quarter_end,is_year_start,is_year_end,season,hour_sin,hour_cos,day_of_year_sin,day_of_year_cos,is_ee_holiday
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
1,0,0,1,96.590,1,2021-09-01 00:00:00,1,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
2,0,0,2,0.000,0,2021-09-01 00:00:00,2,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
3,0,0,2,17.314,1,2021-09-01 00:00:00,3,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
4,0,0,3,2.904,0,2021-09-01 00:00:00,4,,,,...,False,False,False,False,4,0.000000,1.000000,-0.861693,-0.507430,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
851353,15,1,0,114.072,1,2022-06-01 23:00:00,851227,108.908,113.134,127.814,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False
851354,15,1,1,0.000,0,2022-06-01 23:00:00,851228,0.000,0.000,0.001,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False
851355,15,1,1,36.401,1,2022-06-01 23:00:00,851229,40.896,43.256,42.361,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False
851356,15,1,3,0.000,0,2022-06-01 23:00:00,851230,1.000,0.346,2.162,...,False,False,False,False,3,-0.258819,0.965926,0.517586,-0.855631,False


### Creating Test data that is a year long

In [None]:
def create_test_df(df, name):
    if 'datetime' in df.columns:
        df['datetime'] = pd.to_datetime(df.datetime)
        col = 'datetime'
    if 'prediction_datetime' in df.columns:
        df['prediction_datetime'] = pd.to_datetime(df.prediction_datetime)
        col = 'prediction_datetime'
    if 'forecast_date' in df.columns:
        df['forecast_date'] = pd.to_datetime(df['forecast_date'])
        col = 'forecast_date'
    if 'forecast_datetime' in df.columns:
        df['forecast_datetime'] = pd.to_datetime(df['forecast_datetime'])
        col = 'forecast_datetime'
    if 'date' in df.columns:
        df['date'] = pd.to_datetime(df.date).dt.date
        col = 'date'

    if name =='test.csv':
        df = df.drop('target', axis=1)
        df['currently_scored'] = False
        sample_submission = df[['row_id', 'data_block_id']]
        sample_submission['target'] = 0
        sample_submission = sample_submission.drop_duplicates()
        sample_submission.to_csv(f"data/example_test_files/sample_submission.csv", index=False)

    test_date = df[col].iloc[-1]  # Assuming test is a DataFrame
    start_date = test_date - pd.Timedelta(days=367)
    historical_subset = df[df[col] >= start_date]
    historical_subset = historical_subset.drop_duplicates()
    historical_subset.to_csv(f"data/example_test_files/{name}", index=False)
    

In [None]:
%%time

import pandas as pd
import datetime as dt

train = pd.read_csv('data/train.csv')
train = train[train.data_block_id > 274].copy()

revealed_targets = pd.read_csv('data/train.csv')
revealed_targets['datetime'] = pd.to_datetime(revealed_targets.datetime)
revealed_targets = revealed_targets[revealed_targets.data_block_id > 272].copy()
revealed_targets['data_block_id'] += 2


client = pd.read_csv('data/client.csv')
client = client[client.data_block_id > 274].copy()

historical_weather = pd.read_csv('data/historical_weather.csv')
historical_weather = historical_weather[historical_weather.data_block_id > 274].copy()

forecast_weather = pd.read_csv('data/forecast_weather.csv')
forecast_weather = forecast_weather[forecast_weather.data_block_id > 274].copy()

electricity_prices = pd.read_csv('data/electricity_prices.csv')
electricity_prices = electricity_prices[electricity_prices.data_block_id > 274].copy()

gas_prices = pd.read_csv('data/gas_prices.csv')
gas_prices = gas_prices[gas_prices.data_block_id > 274].copy()

dfs = [train, revealed_targets, client, historical_weather, forecast_weather, electricity_prices, gas_prices]
display(revealed_targets)
names = ['test.csv', 'revealed_targets.csv', 'client.csv', 'historical_weather.csv', 'forecast_weather.csv', 'electricity_prices.csv', 'gas_prices.csv']

for df, name in zip(dfs, names):
    create_test_df(df, name)

# Testing

For my experimental CV, I want to take the approach of doing a stratified CV by time - splitting the year into 4 different parts, basically testing the model on each season, 3 months at a time. There was something in the kaggle forums that recommended something like this:

Key: 
= -> training data
+ -> CV data

4 splits in time:
1. =============+++
2. ================+++
3. ===================+++
4. ======================+++



The data starts on 2021-09-01 and ends on 2023-05-31

BUT we don't have enough data to do that properly. So, my CV will instead be:


(Thanks chatgpt)

Splitting the period from 2022-09-01 to 2023-05-31 into five equal parts, here are the date ranges for each segment:

#### First Segment:

From 2022-09-01 to 2022-10-24

#### Second Segment:

From 2022-10-25 to 2022-12-17

#### Third Segment:

From 2022-12-18 to 2023-02-09

#### Fourth Segment:

From 2023-02-10 to 2023-04-04

#### Fifth Segment:

From 2023-04-05 to 2023-05-29


In [70]:
def fill_drop_na(df):
    df = df[~df.target.isna()]
    df = df[~df.target_rolling_avg_24h.isna()]
    means = df.mean()
    # For each column, add an indicator column for NA values
    # for col in df.columns:
    #     if df[col].isna().any():
    #         df[f'{col}_is_na'] = df[col].isna()
    df = df.fillna(means)
    return df, means

In [71]:
%%time
processed_df_no_na, means = fill_drop_na(data_processor.df)
processed_df_no_na.isna().sum()

CPU times: total: 1.95 s
Wall time: 5.16 s


county             0
is_business        0
product_type       0
target             0
is_consumption     0
                  ..
hour_sin           0
hour_cos           0
day_of_year_sin    0
day_of_year_cos    0
is_ee_holiday      0
Length: 238, dtype: int64

In [72]:
# processed_df_no_na['target_installed_capacity'] = processed_df_no_na['target'] / processed_df_no_na['installed_capacity'] * 1000
# processed_df_no_na

In [73]:
from datetime import datetime

# cv_range = [('2022-09-10', '2023-05-31')]

# Note: This training range means that the models don't have any observations for months 6,7,8 - which isn't ideal. But we'll see. I might try another one with my previous range and compare blending results.
cv_range = [('2022-06-01', '2023-05-31')]

# Function to convert a date string into a datetime object
def to_datetime(date_str):
    return datetime.strptime(date_str, '%Y-%m-%d')

# Converting the date strings in cv_ranges to datetime objects
datetime_cv_ranges = [(to_datetime(start), to_datetime(end)) for start, end in cv_range]
datetime_cv_ranges

date_filter = data_processor.df_all_cols.date[processed_df_no_na.index]
date_filter

cv1_train = processed_df_no_na[date_filter <= datetime_cv_ranges[0][0]]
cv1_test = processed_df_no_na[(date_filter <= datetime_cv_ranges[0][1]) & (date_filter > datetime_cv_ranges[0][0])]

In [74]:
import datetime as dt
print(to_datetime('2023-04-05') + dt.timedelta(days=14))
print(to_datetime('2023-04-05') + dt.timedelta(days=48))

2023-04-19 00:00:00
2023-05-23 00:00:00


In [75]:
cv1_train[['year' ,'month', 'day']]

Unnamed: 0,year,month,day
5856,2021,9,3
5857,2021,9,3
5858,2021,9,3
5859,2021,9,3
5860,2021,9,3
...,...,...,...
851353,2022,6,1
851354,2022,6,1
851355,2022,6,1
851356,2022,6,1


In [76]:
cv1_test[['year' ,'month', 'day']]

Unnamed: 0,year,month,day


In [77]:
processed_df_no_na[['year', 'month', 'day']]

Unnamed: 0,year,month,day
5856,2021,9,3
5857,2021,9,3
5858,2021,9,3
5859,2021,9,3
5860,2021,9,3
...,...,...,...
851353,2022,6,1
851354,2022,6,1
851355,2022,6,1
851356,2022,6,1


In [57]:
grouped = processed_df_no_na.groupby(['year', 'month', 'day'])

for i, ((year, month, day), group) in enumerate(grouped):
    # Extracting the corresponding data from df_val_target2
    target_data = processed_df_no_na.loc[group.index]
    print(year, month, day)
    
    
    if i == 10:
        break

2021 9 3
2021 9 4
2021 9 5
2021 9 6
2021 9 7
2021 9 8
2021 9 9
2021 9 10
2021 9 11
2021 9 12
2021 9 13


### Retrainable Model

In [59]:
%%time

train_pred_list = []
train_mae_list = []
train_targets_list = []

pred_list = []
mae_list = []
val_targets_list = []

daily_mae_list = []
date_check_list = []
daily_preds = []
daily_true = []

df = processed_df_no_na
i=0
start = datetime_cv_ranges[i][0] + dt.timedelta(days=0*14)
stop = datetime_cv_ranges[i][0] + dt.timedelta(days=(0+1)*14)
train = processed_df_no_na[date_filter <= start]
val = processed_df_no_na[(date_filter <= stop) & (date_filter > start)]

print(f"Fold {i}, period {f}")
print(f"Train rows: {len(train)}")
print(f"Val rows: {len(val)}")

target_cols = ['target']
drop_cols = ['target', 'quarter', 'season', 'is_year_end', 'is_year_start', 'is_month_end', 'is_quarter_end', 'is_quarter_start', 'is_month_start', 'snowfall_hw_lagged', 'snowfall_hw_variances',
            'snowfall_fw', 'snowfall_hw_means', 'datetime', 'row_id']

df_train_target = train[target_cols]
df_train_data = train.drop(drop_cols, axis=1)

df_val_target2 = val[target_cols]
df_val_data2 = val.drop(drop_cols, axis=1)

cat_features = ["county", "is_business", "product_type", "is_consumption", 'month', 'hour', 'quarter',
        'day_of_week', 'is_weekend', 'is_month_start', 'is_month_end', 'is_quarter_start' ,'is_quarter_end', 
        'is_year_start', 'is_year_end', 'season'] + list(df_train_data.columns[df_train_data.columns.str.contains('is_na')])
cat_features = [c for c in cat_features if c in df_train_data.columns]

# We leave max_depth as -1
# Tune num_leaves, default is 31, let's double it       

params = {'lambda_l1': 0.7466999841658806, 'lambda_l2': 3.2140838539606458, 'learning_rate': 0.13753679743025782, 'max_bin': 250, 'min_data_in_leaf': 150, 'n_estimators': 5593,  
            'metric': 'mae', 'n_jobs': 22, 'boosting': 'dart', 'objective': 'tweedie', 'device':'gpu'}

clf = LGBMRegressor(**params, random_state=42, verbose=0, importance_type='gain')

clf.fit(df_train_data, df_train_target.target, categorical_feature=cat_features)

y_pred = clf.predict(df_train_data)
train_pred_list.append(y_pred)

from sklearn.metrics import mean_absolute_error

# Assuming you have two pandas Series: y_true and y_pred
mae = mean_absolute_error(df_train_target.target, y_pred)
train_mae_list.append(mae)
train_targets_list.append(df_train_target.target)
print(f" Train Mean Absolute Error_consumption:", mae)

# y_pred_val = clf.predict(df_val_data2)
# pred_list.append(y_pred_val)

# mae = mean_absolute_error(df_val_target2.target, y_pred_val)
# val_targets_list.append(df_val_target2.target)
# mae_list.append(mae)
# print("Val Mean Absolute Error:", mae)

# importance = pd.DataFrame({'importance':clf2.feature_importances_, 'name':clf2.feature_name_})
# importance = importance.sort_values('importance', ascending=False)
# display(importance.head(30))
# display(importance.tail(30))
print()
print()  

Fold 0, period <_io.BufferedReader name='data_processor_test.pkl'>
Train rows: 837658
Val rows: 0
 Train Mean Absolute Error_consumption: 16.291311952567586


CPU times: total: 1h 55min 19s
Wall time: 6min 47s


In [60]:
import pickle

with open('experiments/clf.pkl', 'wb') as f:
    pickle.dump(clf, f)

In [61]:
import pickle

with open('experiments/clf.pkl', 'rb') as f:
    clf = pickle.load(f)

In [62]:
clf.set_params(**{'verbose':-1})

In [63]:
def train_model(df):
    train = df
    print(f"Train rows: {len(train)}")

    target_cols = ['target']
    drop_cols = ['target', 'quarter', 'season', 'is_year_end', 'is_year_start', 'is_month_end', 'is_quarter_end', 'is_quarter_start', 'is_month_start', 'snowfall_hw_lagged', 'snowfall_hw_variances',
                'snowfall_fw', 'snowfall_hw_means', 'datetime', 'row_id', 'currently_scored']

    df_train_target = train[target_cols]
    drop_cols = [c for c in drop_cols if c in train.columns] 
    df_train_data = train.drop(drop_cols, axis=1)

    cat_features = ["county", "is_business", "product_type", "is_consumption", 'month', 'hour', 'quarter',
            'day_of_week', 'is_weekend', 'is_month_start', 'is_month_end', 'is_quarter_start' ,'is_quarter_end', 
            'is_year_start', 'is_year_end', 'season'] + list(df_train_data.columns[df_train_data.columns.str.contains('is_na')])
    cat_features = [c for c in cat_features if c in df_train_data.columns]

    # We leave max_depth as -1
    # Tune num_leaves, default is 31, let's double it       

    params = {'lambda_l1': 0.7466999841658806, 'lambda_l2': 3.2140838539606458, 'learning_rate': 0.13753679743025782, 'max_bin': 250, 'min_data_in_leaf': 150, 'n_estimators': 5593,  
                'metric': 'mae', 'n_jobs': 22, 'boosting': 'dart', 'objective': 'tweedie', 'device':'gpu'}

    clf = LGBMRegressor(**params, random_state=42, verbose=-1, importance_type='gain')

    clf.fit(df_train_data, df_train_target.target, categorical_feature=cat_features)

    return clf

## Testing Retraining model

#### Up to testing my data processor to see if it works

In [64]:
import warnings
warnings.filterwarnings('ignore', 'SettingWithCopyWarning')

In [78]:
%%time


with open('data_processor_test.pkl', 'rb') as f:
    data_processor = pickle.load(f)

processed_df_no_na, means = fill_drop_na(data_processor.df)


from data import public_timeseries_testing_util as enefit

env = enefit.make_env()

for i, (test, revealed_targets, client, historical_weather,
            forecast_weather, electricity_prices, gas_prices, sample_submission) in enumerate(env.iter_test()):
    test_data = data_processor.process_test_data_timestep(test, revealed_targets, client, historical_weather, forecast_weather, electricity_prices, gas_prices)
    
    print(f"Day: {i}")
    # display(test_data)
    # display(data_processor.df.columns)

    # filter based on predictions
    test_data['currently_scored'] = test_data['currently_scored'].fillna(True)
    score_mask = ~test_data.currently_scored
    test_data = test_data[score_mask]
    sample_submission = test_data[['row_id']]

    # we need to drop a few columns
    drop_cols = ['target', 'quarter', 'season', 'is_year_end', 'is_year_start', 'is_month_end', 'is_quarter_end', 'is_quarter_start', 'is_month_start', 'snowfall_hw_lagged', 'snowfall_hw_variances','snowfall_fw', 'snowfall_hw_means', 
                 'datetime', 'row_id', 'currently_scored']

    test_data = test_data.drop(drop_cols, axis=1)

    cat_features = ["county", "is_business", "product_type", "is_consumption", 'month', 'hour', 'quarter',
'day_of_week', 'is_weekend', 'is_month_start', 'is_month_end', 'is_quarter_start' ,'is_quarter_end', 
'is_year_start', 'is_year_end', 'season'] + list(test_data.columns[test_data.columns.str.contains('is_na')])
    cat_features = [c for c in cat_features if c in test_data.columns]

    #retrain every 30 days for testing
    if (i!= 0) and (i%30 == 0):
        print(f"Train rows: {len(data_processor.df)}")
        clf = train_model(data_processor.df)

    preds = clf.predict(test_data)

    # display(preds)
    # display(sample_submission)
    # print(f"Length of preds: {len(preds)}")

    sample_submission['target'] = preds
    env.predict(sample_submission)

    # set test data currently scored to true
    data_processor.test_orig_dfs[0]['currently_scored'] = True
    print('Preds predicted')
    

Time to add test: 0.015670299530029297


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-18 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,0_0_1_0
2022-05-18 23:00:00,11,1,0,0.000,0,259.0,799740,67,23,2,11_1_0_0
2022-05-18 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,2,11_0_3_1
2022-05-18 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,2,11_0_3_0
2022-05-18 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,2,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-03 23:00:00,4,0,1,0.018,0,,851134,15,23,4,4_0_1_0
2022-06-03 23:00:00,3,1,3,747.461,1,,851133,14,23,4,3_1_3_1
2022-06-03 23:00:00,3,1,3,0.000,0,,851132,14,23,4,3_1_3_0
2022-06-03 23:00:00,5,0,3,0.161,0,,851146,20,23,4,5_0_3_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-19 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,0_0_1_0
2022-05-19 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,3,0_0_1_0
2022-05-19 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,3,0_0_1_0
2022-05-19 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,3,0_0_1_0
2022-05-19 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,3,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-03 20:00:00,9,1,3,116.610,1,,850645,37,19,4,9_1_3_1
2022-06-03 21:00:00,9,1,3,123.902,1,,850779,37,20,4,9_1_3_1
2022-06-03 22:00:00,9,1,3,94.437,1,,850913,37,21,4,9_1_3_1
2022-06-03 23:00:00,9,1,3,94.907,1,,851047,37,22,4,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-18 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,...,,,,,,,,,,
2022-05-18 23:00:00,11,1,0,0.000,0,259.0,799740,67,23,2,...,,,,,,,,,,
2022-05-18 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,2,...,,,,,,,,,,
2022-05-18 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,2,...,,,,,,,,,,
2022-05-18 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,2,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-03 23:00:00,4,0,1,0.018,0,,851134,15,23,4,...,,,0.001,0.002,0.000,0.001,0.005,0.000,0.002,0.003
2022-06-03 23:00:00,3,1,3,747.461,1,,851133,14,23,4,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-06-03 23:00:00,3,1,3,0.000,0,,851132,14,23,4,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-03 23:00:00,5,0,3,0.161,0,,851146,20,23,4,...,,,0.065,0.092,0.646,0.679,0.228,0.178,1.330,0.114


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806074,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,,2022-05-18
806075,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,,2022-05-18
806076,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,,2022-05-18
806077,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,,2022-05-18
806078,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,,2022-05-18
...,...,...,...,...,...,...,...,...,...,...,...
857659,15,1,0,,1,2022-06-03 23:00:00,,857659,64,False,2022-06-03
857660,15,1,1,,0,2022-06-03 23:00:00,,857660,59,False,2022-06-03
857661,15,1,1,,1,2022-06-03 23:00:00,,857661,59,False,2022-06-03
857662,15,1,3,,0,2022-06-03 23:00:00,,857662,60,False,2022-06-03


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-18 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-18 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,112.964542,209.464000
2022-05-18 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.491565,0.608857
2022-05-18 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,103.883696,169.216286
2022-05-18 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.757909,0.972143,0.513909,0.613429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-03 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,115.094429,181.676625,109.704143,148.60500,405.987375,447.267571,640.923250,250.733286
2022-06-03 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,58.728083,0.285714,0.00000,0.184917,0.000000,2.363000,1.324714
2022-06-03 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.367857,179.704750,132.102857,128.91850,202.130250,145.093857,709.604042,288.623714
2022-06-03 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.285714,51.345667,0.285714,0.00000,3.147583,0.295429,1.099167,0.147571


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
1,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,112.964542,209.464000
2,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.491565,0.608857
3,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,103.883696,169.216286
4,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.757909,0.972143,0.513909,0.613429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48369,15,1,0,,1,2022-06-03 23:00:00,,857659,64,False,...,124.325,145.376000,115.094429,181.676625,109.704143,148.60500,405.987375,447.267571,640.923250,250.733286
48370,15,1,1,,0,2022-06-03 23:00:00,,857660,59,False,...,0.000,74.130667,0.000000,58.728083,0.285714,0.00000,0.184917,0.000000,2.363000,1.324714
48371,15,1,1,,1,2022-06-03 23:00:00,,857661,59,False,...,41.927,42.307917,38.367857,179.704750,132.102857,128.91850,202.130250,145.093857,709.604042,288.623714
48372,15,1,3,,0,2022-06-03 23:00:00,,857662,60,False,...,0.000,140.917625,0.285714,51.345667,0.285714,0.00000,3.147583,0.295429,1.099167,0.147571


0
Time to process test data: 14.511048316955566
Time to add test to df: 2.151273727416992
Day: 0
Preds predicted
Time to add test: 0.03260517120361328


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-20 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,0_0_1_0
2022-05-20 23:00:00,7,1,3,0.001,0,259.0,799708,30,23,4,7_1_3_0
2022-05-20 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,4,0_0_1_1
2022-05-20 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,4,0_0_2_0
2022-05-20 23:00:00,0,0,2,14.841,1,259.0,799645,1,23,4,0_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-05 23:00:00,15,1,0,0.000,0,,851226,64,23,6,15_1_0_0
2022-06-05 23:00:00,15,1,0,114.072,1,,851227,64,23,6,15_1_0_1
2022-06-05 23:00:00,15,1,1,0.000,0,,851228,59,23,6,15_1_1_0
2022-06-05 23:00:00,14,0,1,0.000,0,,851214,53,23,6,14_0_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-21 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,0_0_1_0
2022-05-21 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,5,0_0_1_0
2022-05-21 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,5,0_0_1_0
2022-05-21 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,5,0_0_1_0
2022-05-21 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,5,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-05 20:00:00,9,1,3,116.610,1,,850645,37,19,6,9_1_3_1
2022-06-05 21:00:00,9,1,3,123.902,1,,850779,37,20,6,9_1_3_1
2022-06-05 22:00:00,9,1,3,94.437,1,,850913,37,21,6,9_1_3_1
2022-06-05 23:00:00,9,1,3,94.907,1,,851047,37,22,6,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-20 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,...,,,,,,,,,,
2022-05-20 23:00:00,7,1,3,0.001,0,259.0,799708,30,23,4,...,,,,,,,,,,
2022-05-20 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,4,...,,,,,,,,,,
2022-05-20 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,4,...,,,,,,,,,,
2022-05-20 23:00:00,0,0,2,14.841,1,259.0,799645,1,23,4,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-05 23:00:00,15,1,0,0.000,0,,851226,64,23,6,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-05 23:00:00,15,1,0,114.072,1,,851227,64,23,6,...,,,108.908,124.036,114.279,116.714,114.263,113.389,116.796,124.325
2022-06-05 23:00:00,15,1,1,0.000,0,,851228,59,23,6,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-05 23:00:00,14,0,1,0.000,0,,851214,53,23,6,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806074,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,2022-05-18
806075,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,2022-05-18
806076,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,2022-05-18
806077,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,2022-05-18
806078,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,2022-05-18
...,...,...,...,...,...,...,...,...,...,...,...
860875,15,1,0,,1,2022-06-04 23:00:00,,860875,64,False,2022-06-04
860876,15,1,1,,0,2022-06-04 23:00:00,,860876,59,False,2022-06-04
860877,15,1,1,,1,2022-06-04 23:00:00,,860877,59,False,2022-06-04
860878,15,1,3,,0,2022-06-04 23:00:00,,860878,60,False,2022-06-04


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-20 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-20 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-05-20 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.799500,0.799500
2022-05-20 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-05-20 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.808000,1.808000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-05 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,115.070571,189.915917,166.435571,157.60550,423.324333,464.416571,470.216958,758.193571
2022-06-05 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,51.345667,0.105143,0.00000,0.000375,0.000000,2.973042,2.160571
2022-06-05 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.876714,192.196958,155.981857,137.91900,166.387125,51.152714,669.041708,288.623714
2022-06-05 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.390857,62.757833,0.248000,0.00000,3.039000,0.252714,2.039792,0.147571


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,...,,,,,,,,,,
1,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,...,,,,,,,,,,
3,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,...,,,,,,,,,,
4,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51585,15,1,0,,1,2022-06-04 23:00:00,,860875,64,False,...,125.606,148.077333,114.973000,201.320542,109.938000,77.35325,421.015958,447.146143,685.407000,306.437571
51586,15,1,1,,0,2022-06-04 23:00:00,,860876,59,False,...,0.000,68.878917,0.000000,45.448000,0.285714,0.00000,0.076583,0.000000,2.251292,1.243857
51587,15,1,1,,1,2022-06-04 23:00:00,,860877,59,False,...,46.820,37.640458,38.723143,203.675417,132.458143,130.59250,200.384208,78.658000,682.330208,297.985857
51588,15,1,3,,0,2022-06-04 23:00:00,,860878,60,False,...,0.000,130.846542,0.390857,34.030583,0.248000,0.18400,2.733625,0.155857,0.100708,0.132714


0
Time to process test data: 15.412883043289185
Time to add test to df: 2.0014443397521973
Day: 1
Preds predicted
Time to add test: 0.033805131912231445


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-22 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,6,0_0_1_0
2022-05-22 23:00:00,7,1,3,803.905,1,259.0,799709,30,23,6,7_1_3_1
2022-05-22 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,6,11_0_3_1
2022-05-22 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,6,11_0_3_0
2022-05-22 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,6,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-07 23:00:00,10,1,1,0.000,0,,851186,40,23,1,10_1_1_0
2022-06-07 23:00:00,10,0,3,111.845,1,,851185,39,23,1,10_0_3_1
2022-06-07 23:00:00,11,1,0,0.000,0,,851196,67,23,1,11_1_0_0
2022-06-07 23:00:00,9,0,1,0.000,0,,851174,34,23,1,9_0_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-23 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,6,0_0_1_0
2022-05-23 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,0,0_0_1_0
2022-05-23 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,0,0_0_1_0
2022-05-23 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,0,0_0_1_0
2022-05-23 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,0,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-07 20:00:00,9,1,3,116.610,1,,850645,37,19,1,9_1_3_1
2022-06-07 21:00:00,9,1,3,123.902,1,,850779,37,20,1,9_1_3_1
2022-06-07 22:00:00,9,1,3,94.437,1,,850913,37,21,1,9_1_3_1
2022-06-07 23:00:00,9,1,3,94.907,1,,851047,37,22,1,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-22 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,6,...,,,,,,,,,,
2022-05-22 23:00:00,7,1,3,803.905,1,259.0,799709,30,23,6,...,,,,,,,,,,
2022-05-22 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,6,...,,,,,,,,,,
2022-05-22 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,6,...,,,,,,,,,,
2022-05-22 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,6,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-07 23:00:00,10,1,1,0.000,0,,851186,40,23,1,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.00,0.000
2022-06-07 23:00:00,10,0,3,111.845,1,,851185,39,23,1,...,,,105.022,115.747,140.525,135.324,112.928,105.144,126.48,141.703
2022-06-07 23:00:00,11,1,0,0.000,0,,851196,67,23,1,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.00,0.000
2022-06-07 23:00:00,9,0,1,0.000,0,,851174,34,23,1,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.00,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806074,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,2022-05-18
806075,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,2022-05-18
806076,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,2022-05-18
806077,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,2022-05-18
806078,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,2022-05-18
...,...,...,...,...,...,...,...,...,...,...,...
864091,15,1,0,,1,2022-06-05 23:00:00,,864091,64,False,2022-06-05
864092,15,1,1,,0,2022-06-05 23:00:00,,864092,59,False,2022-06-05
864093,15,1,1,,1,2022-06-05 23:00:00,,864093,59,False,2022-06-05
864094,15,1,3,,0,2022-06-05 23:00:00,,864094,60,False,2022-06-05


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-22 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-22 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,112.964542,209.464000
2022-05-22 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.471083,0.608857
2022-05-22 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,103.883696,169.216286
2022-05-22 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.757909,0.972143,0.491565,0.608857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-07 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.976714,191.180333,167.077714,157.33675,423.324333,384.722286,796.200917,758.193571
2022-06-07 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,51.345667,0.191143,0.00000,0.035000,0.000000,0.712792,2.160571
2022-06-07 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,39.520714,189.915917,156.719571,138.08975,204.496583,285.453000,811.790042,357.335286
2022-06-07 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.476857,58.728083,0.334000,0.00000,2.716500,1.201286,0.029625,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,...,,,,,,,,,,
1,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,...,,,,,,,,,,
3,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,...,,,,,,,,,,
4,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57969,15,1,1,,1,2022-06-05 23:00:00,,864093,59,False,...,39.851,40.719792,39.224143,190.616042,175.240857,194.70550,143.075292,115.376143,327.341792,295.821000
57970,15,1,3,,0,2022-06-05 23:00:00,,864094,60,False,...,0.000,147.557333,0.285714,3.049250,0.285714,0.25000,3.361833,0.757143,2.282583,0.187571
57971,15,1,3,,0,2022-06-05 23:00:00,,864094,60,False,...,0.000,139.371042,0.371714,2.207083,0.371714,0.40050,0.347125,0.221000,2.327000,0.086143
57972,15,1,3,,1,2022-06-05 23:00:00,,864095,60,False,...,348.584,491.349625,313.274143,192.220042,170.877000,206.64800,1273.021583,1220.489286,768.973208,394.864571


0
Time to process test data: 16.412899494171143
Time to add test to df: 1.9494192600250244
Day: 2
Preds predicted
Time to add test: 0.016823291778564453


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-24 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,1,0_0_1_0
2022-05-24 23:00:00,11,1,1,0.000,0,259.0,799742,46,23,1,11_1_1_0
2022-05-24 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,1,0_0_1_1
2022-05-24 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,1,0_0_2_0
2022-05-24 23:00:00,0,0,2,14.841,1,259.0,799645,1,23,1,0_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-09 23:00:00,15,1,1,0.000,0,,851228,59,23,3,15_1_1_0
2022-06-09 23:00:00,13,1,3,0.000,0,,851212,52,23,3,13_1_3_0
2022-06-09 23:00:00,7,1,1,822.523,1,,851165,29,23,3,7_1_1_1
2022-06-09 23:00:00,14,0,3,0.022,0,,851216,54,23,3,14_0_3_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-25 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,1,0_0_1_0
2022-05-25 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,2,0_0_1_0
2022-05-25 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,2,0_0_1_0
2022-05-25 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,2,0_0_1_0
2022-05-25 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,2,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-09 20:00:00,9,1,3,116.610,1,,850645,37,19,3,9_1_3_1
2022-06-09 21:00:00,9,1,3,123.902,1,,850779,37,20,3,9_1_3_1
2022-06-09 22:00:00,9,1,3,94.437,1,,850913,37,21,3,9_1_3_1
2022-06-09 23:00:00,9,1,3,94.907,1,,851047,37,22,3,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-24 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,1,...,,,,,,,,,,
2022-05-24 23:00:00,11,1,1,0.000,0,259.0,799742,46,23,1,...,,,,,,,,,,
2022-05-24 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,1,...,,,,,,,,,,
2022-05-24 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,1,...,,,,,,,,,,
2022-05-24 23:00:00,0,0,2,14.841,1,259.0,799645,1,23,1,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-09 23:00:00,15,1,1,0.000,0,,851228,59,23,3,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-09 23:00:00,13,1,3,0.000,0,,851212,52,23,3,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-09 23:00:00,7,1,1,822.523,1,,851165,29,23,3,...,,,383.720,379.280,218.988,136.040,213.211,158.718,166.944,190.000
2022-06-09 23:00:00,14,0,3,0.022,0,,851216,54,23,3,...,,,0.046,0.023,0.029,0.035,0.033,0.046,0.066,0.014


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806074,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,2022-05-18
806075,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,2022-05-18
806076,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,2022-05-18
806077,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,2022-05-18
806078,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,2022-05-18
...,...,...,...,...,...,...,...,...,...,...,...
867307,15,1,0,,1,2022-06-06 23:00:00,,867307,64,False,2022-06-06
867308,15,1,1,,0,2022-06-06 23:00:00,,867308,59,False,2022-06-06
867309,15,1,1,,1,2022-06-06 23:00:00,,867309,59,False,2022-06-06
867310,15,1,3,,0,2022-06-06 23:00:00,,867310,60,False,2022-06-06


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-24 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-24 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-05-24 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.799500,0.799500
2022-05-24 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-05-24 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.808000,1.808000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-09 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,156.719571,138.08975,423.324333,464.416571,718.885125,241.064857
2022-06-09 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,51.345667,0.248000,0.00000,0.000375,0.000000,2.973042,2.160571
2022-06-09 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,155.981857,137.91900,166.093917,77.187857,699.711625,705.709429
2022-06-09 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,62.757833,0.248000,0.00000,2.934667,0.147571,2.039375,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,...,,,,,,,,,,
1,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,...,,,,,,,,,,
3,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,...,,,,,,,,,,
4,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61185,15,1,1,,1,2022-06-06 23:00:00,,867309,59,False,...,36.536,16.917833,37.827000,134.429375,103.553143,133.86225,150.846625,80.991000,392.106583,188.099571
61186,15,1,3,,0,2022-06-06 23:00:00,,867310,60,False,...,0.000,290.672875,0.102714,12.960167,0.102714,0.17975,2.577500,0.150429,0.053833,0.102857
61187,15,1,3,,0,2022-06-06 23:00:00,,867310,60,False,...,0.000,267.268250,0.245571,11.973083,0.245571,0.42975,0.303375,0.709857,0.250958,0.709857
61188,15,1,3,,1,2022-06-06 23:00:00,,867311,60,False,...,344.203,155.271375,312.579714,133.312792,135.100286,170.44075,952.819083,1129.020286,624.762917,284.999714


0
Time to process test data: 17.13053798675537
Time to add test to df: 1.968811273574829
Day: 3
Preds predicted
Time to add test: 0.016802549362182617


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-26 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,3,0_0_1_0
2022-05-26 23:00:00,1,0,1,0.000,0,259.0,799656,6,23,3,1_0_1_0
2022-05-26 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,3,11_0_3_1
2022-05-26 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,3,11_0_3_0
2022-05-26 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,3,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-11 23:00:00,4,0,1,9.686,1,,851135,15,23,5,4_0_1_1
2022-06-11 23:00:00,4,0,1,0.018,0,,851134,15,23,5,4_0_1_0
2022-06-11 23:00:00,3,1,3,747.461,1,,851133,14,23,5,3_1_3_1
2022-06-11 23:00:00,7,1,0,0.000,0,,851162,28,23,5,7_1_0_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-27 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,3,0_0_1_0
2022-05-27 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,4,0_0_1_0
2022-05-27 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,4,0_0_1_0
2022-05-27 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,4,0_0_1_0
2022-05-27 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,4,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-11 20:00:00,9,1,3,116.610,1,,850645,37,19,5,9_1_3_1
2022-06-11 21:00:00,9,1,3,123.902,1,,850779,37,20,5,9_1_3_1
2022-06-11 22:00:00,9,1,3,94.437,1,,850913,37,21,5,9_1_3_1
2022-06-11 23:00:00,9,1,3,94.907,1,,851047,37,22,5,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-26 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,3,...,,,,,,,,,,
2022-05-26 23:00:00,1,0,1,0.000,0,259.0,799656,6,23,3,...,,,,,,,,,,
2022-05-26 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,3,...,,,,,,,,,,
2022-05-26 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,3,...,,,,,,,,,,
2022-05-26 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,3,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-11 23:00:00,4,0,1,9.686,1,,851135,15,23,5,...,,,9.324,9.340,16.628,9.969,8.879,9.171,10.627,9.666
2022-06-11 23:00:00,4,0,1,0.018,0,,851134,15,23,5,...,,,0.001,0.002,0.000,0.001,0.005,0.000,0.002,0.003
2022-06-11 23:00:00,3,1,3,747.461,1,,851133,14,23,5,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-06-11 23:00:00,7,1,0,0.000,0,,851162,28,23,5,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806074,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,2022-05-18
806075,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,2022-05-18
806076,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,2022-05-18
806077,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,2022-05-18
806078,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,2022-05-18
...,...,...,...,...,...,...,...,...,...,...,...
870523,15,1,0,,1,2022-06-07 23:00:00,,870523,64,False,2022-06-07
870524,15,1,1,,0,2022-06-07 23:00:00,,870524,59,False,2022-06-07
870525,15,1,1,,1,2022-06-07 23:00:00,,870525,59,False,2022-06-07
870526,15,1,3,,0,2022-06-07 23:00:00,,870526,60,False,2022-06-07


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-26 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-26 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,112.964542,209.464000
2022-05-26 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.471083,0.608857
2022-05-26 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,103.883696,169.216286
2022-05-26 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.757909,0.967571,0.491565,0.608857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-11 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,182.941042,158.529286,148.33625,405.987375,409.156143,568.395542,338.923143
2022-06-11 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,64.954250,0.248000,0.00000,0.167667,0.000000,2.363000,1.324714
2022-06-11 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,177.423708,157.887143,148.60500,203.861208,193.353286,485.398958,705.709429
2022-06-11 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.105143,0.00000,3.104583,2.296286,2.973042,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,...,,,,,,,,,,
1,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,...,,,,,,,,,,
3,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,...,,,,,,,,,,
4,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64401,15,1,1,,1,2022-06-07 23:00:00,,870525,59,False,...,47.267,16.026375,37.901143,140.405292,156.722714,159.80700,154.571208,43.649571,366.074250,90.031286
64402,15,1,3,,0,2022-06-07 23:00:00,,870526,60,False,...,0.000,303.126833,0.142857,12.049125,0.142857,0.25000,2.852542,8.773714,2.618250,8.452286
64403,15,1,3,,0,2022-06-07 23:00:00,,870526,60,False,...,0.000,291.759667,0.142857,7.465250,0.142857,0.25000,2.967875,7.417143,5.385917,9.547571
64404,15,1,3,,1,2022-06-07 23:00:00,,870527,60,False,...,341.570,172.700958,318.504714,135.401208,135.639286,140.15900,875.188667,428.212000,606.855083,236.684286


0
Time to process test data: 17.981678247451782
Time to add test to df: 1.9487783908843994
Day: 4
Preds predicted
Time to add test: 0.015621185302734375


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-28 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,5,0_0_1_0
2022-05-28 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,5,11_1_0_1
2022-05-28 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,5,0_0_1_1
2022-05-28 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,5,0_0_2_0
2022-05-28 23:00:00,0,0,2,14.841,1,259.0,799645,1,23,5,0_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-13 23:00:00,0,1,2,0.000,0,,851108,61,23,0,0_1_2_0
2022-06-13 23:00:00,15,1,3,0.000,0,,851230,60,23,0,15_1_3_0
2022-06-13 23:00:00,3,0,3,0.003,0,,851128,12,23,0,3_0_3_0
2022-06-13 23:00:00,1,0,1,0.000,0,,851112,6,23,0,1_0_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-29 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,5,0_0_1_0
2022-05-29 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,6,0_0_1_0
2022-05-29 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,6,0_0_1_0
2022-05-29 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,6,0_0_1_0
2022-05-29 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,6,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-13 20:00:00,9,1,3,116.610,1,,850645,37,19,0,9_1_3_1
2022-06-13 21:00:00,9,1,3,123.902,1,,850779,37,20,0,9_1_3_1
2022-06-13 22:00:00,9,1,3,94.437,1,,850913,37,21,0,9_1_3_1
2022-06-13 23:00:00,9,1,3,94.907,1,,851047,37,22,0,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-28 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,5,...,,,,,,,,,,
2022-05-28 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,5,...,,,,,,,,,,
2022-05-28 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,5,...,,,,,,,,,,
2022-05-28 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,5,...,,,,,,,,,,
2022-05-28 23:00:00,0,0,2,14.841,1,259.0,799645,1,23,5,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-13 23:00:00,0,1,2,0.000,0,,851108,61,23,0,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-13 23:00:00,15,1,3,0.000,0,,851230,60,23,0,...,,,1.000,1.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-13 23:00:00,3,0,3,0.003,0,,851128,12,23,0,...,,,0.003,0.002,0.004,0.005,0.004,0.003,0.003,0.004
2022-06-13 23:00:00,1,0,1,0.000,0,,851112,6,23,0,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806074,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,2022-05-18
806075,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,2022-05-18
806076,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,2022-05-18
806077,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,2022-05-18
806078,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,2022-05-18
...,...,...,...,...,...,...,...,...,...,...,...
873739,15,1,0,,1,2022-06-08 23:00:00,,873739,64,False,2022-06-08
873740,15,1,1,,0,2022-06-08 23:00:00,,873740,59,False,2022-06-08
873741,15,1,1,,1,2022-06-08 23:00:00,,873741,59,False,2022-06-08
873742,15,1,3,,0,2022-06-08 23:00:00,,873742,60,False,2022-06-08


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-28 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-28 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-05-28 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.799500,0.799500
2022-05-28 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-05-28 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.808000,1.808000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-13 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,182.941042,134.155571,148.33625,406.767833,409.156143,394.718250,399.844429
2022-06-13 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,60.164833,0.105143,0.00000,0.241833,0.000000,3.021208,2.176000
2022-06-13 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,181.676625,123.797429,129.08925,193.676167,199.113286,532.385125,772.519143
2022-06-13 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,51.345667,0.105143,0.00000,5.410333,0.147571,2.370417,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.900,0,2022-05-18 23:00:00,259.0,806074,0,True,...,,,,,,,,,,
1,0,0,1,329.241,1,2022-05-18 23:00:00,259.0,806075,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-18 23:00:00,259.0,806076,1,True,...,,,,,,,,,,
3,0,0,2,16.189,1,2022-05-18 23:00:00,259.0,806077,1,True,...,,,,,,,,,,
4,0,0,3,2.327,0,2022-05-18 23:00:00,259.0,806078,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67617,15,1,1,,1,2022-06-08 23:00:00,,873741,59,False,...,,31.129667,38.888286,176.955917,141.890571,194.73875,155.642958,93.164714,405.258250,636.356857
67618,15,1,3,,0,2022-06-08 23:00:00,,873742,60,False,...,,241.021000,0.000000,10.649042,0.000000,0.00000,4.105667,0.061143,0.074958,0.041286
67619,15,1,3,,0,2022-06-08 23:00:00,,873742,60,False,...,,218.372958,0.000000,8.309708,0.000000,0.00000,0.261875,0.591429,1.672583,5.177000
67620,15,1,3,,1,2022-06-08 23:00:00,,873743,60,False,...,,420.953958,317.985571,173.218042,191.611429,132.38150,1013.292208,1300.662429,730.245333,464.833000


0
Time to process test data: 18.9217472076416
Time to add test to df: 1.9499704837799072
Day: 5
Preds predicted
Time to add test: 0.013619661331176758


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-30 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,0,0_0_1_0
2022-05-30 23:00:00,3,1,1,0.000,0,259.0,799674,13,23,0,3_1_1_0
2022-05-30 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,0,11_0_3_1
2022-05-30 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,0,11_0_3_0
2022-05-30 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,0,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-15 23:00:00,0,1,3,1.033,0,,851110,5,23,2,0_1_3_0
2022-06-15 23:00:00,5,0,1,32.886,1,,851145,19,23,2,5_0_1_1
2022-06-15 23:00:00,3,0,3,48.056,1,,851129,12,23,2,3_0_3_1
2022-06-15 23:00:00,4,1,3,412.493,1,,851143,18,23,2,4_1_3_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-05-31 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,0,0_0_1_0
2022-05-31 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,1,0_0_1_0
2022-05-31 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,1,0_0_1_0
2022-05-31 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,1,0_0_1_0
2022-05-31 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,1,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-15 20:00:00,9,1,3,116.610,1,,850645,37,19,2,9_1_3_1
2022-06-15 21:00:00,9,1,3,123.902,1,,850779,37,20,2,9_1_3_1
2022-06-15 22:00:00,9,1,3,94.437,1,,850913,37,21,2,9_1_3_1
2022-06-15 23:00:00,9,1,3,94.907,1,,851047,37,22,2,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-30 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,0,...,,,,,,,,,,
2022-05-30 23:00:00,3,1,1,0.000,0,259.0,799674,13,23,0,...,,,,,,,,,,
2022-05-30 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,0,...,,,,,,,,,,
2022-05-30 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,0,...,,,,,,,,,,
2022-05-30 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-15 23:00:00,0,1,3,1.033,0,,851110,5,23,2,...,,,0.578,0.334,0.345,0.635,0.336,0.391,0.531,0.354
2022-06-15 23:00:00,5,0,1,32.886,1,,851145,19,23,2,...,,,29.603,32.046,36.792,46.232,30.447,21.281,40.684,47.177
2022-06-15 23:00:00,3,0,3,48.056,1,,851129,12,23,2,...,,,43.146,53.955,58.036,63.694,44.783,44.863,64.110,75.317
2022-06-15 23:00:00,4,1,3,412.493,1,,851143,18,23,2,...,,,410.018,322.817,333.310,387.427,602.840,352.839,291.932,363.286


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
806208,0,0,1,1.438,0,2022-05-19 00:00:00,260.0,806208,0,True,2022-05-19
806209,0,0,1,311.155,1,2022-05-19 00:00:00,260.0,806209,0,True,2022-05-19
806210,0,0,2,0.000,0,2022-05-19 00:00:00,260.0,806210,1,True,2022-05-19
806211,0,0,2,12.559,1,2022-05-19 00:00:00,260.0,806211,1,True,2022-05-19
806212,0,0,3,1.352,0,2022-05-19 00:00:00,260.0,806212,2,True,2022-05-19
...,...,...,...,...,...,...,...,...,...,...,...
876955,15,1,0,,1,2022-06-09 23:00:00,,876955,64,False,2022-06-09
876956,15,1,1,,0,2022-06-09 23:00:00,,876956,59,False,2022-06-09
876957,15,1,1,,1,2022-06-09 23:00:00,,876957,59,False,2022-06-09
876958,15,1,3,,0,2022-06-09 23:00:00,,876958,60,False,2022-06-09


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-05-30 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-05-30 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,115.237000,206.520000
2022-05-30 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.491565,0.608857
2022-05-30 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,105.846682,164.748429
2022-05-30 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.757909,0.967571,0.513909,0.608857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-15 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,182.941042,189.013429,148.33625,405.987375,409.156143,802.541375,361.936429
2022-06-15 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.105143,0.00000,0.176458,0.000000,2.990042,3.473143
2022-06-15 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,177.423708,148.171143,74.79700,159.427792,51.852429,610.912833,749.634000
2022-06-15 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.105143,0.00000,3.104542,0.147857,2.973042,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,1.438,0,2022-05-19 00:00:00,260.0,806208,0,True,...,,,,,,,,,,
1,0,0,1,311.155,1,2022-05-19 00:00:00,260.0,806209,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-19 00:00:00,260.0,806210,1,True,...,,,,,,,,,,
3,0,0,2,12.559,1,2022-05-19 00:00:00,260.0,806211,1,True,...,,,,,,,,,,
4,0,0,3,1.352,0,2022-05-19 00:00:00,260.0,806212,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-09 23:00:00,,876957,59,False,...,,37.753875,37.413143,205.266625,148.265286,211.46175,154.241792,63.597286,393.443625,413.957714
70700,15,1,3,,0,2022-06-09 23:00:00,,876958,60,False,...,,196.322542,0.000000,8.885333,0.000000,0.00000,2.949875,0.504286,1.239917,4.218571
70701,15,1,3,,0,2022-06-09 23:00:00,,876958,60,False,...,,191.729458,0.000000,2.082708,0.000000,0.00000,2.937208,0.550857,1.244000,4.257571
70702,15,1,3,,1,2022-06-09 23:00:00,,876959,60,False,...,,449.253833,312.718286,194.753167,183.676286,151.47200,1218.564917,1296.798714,706.667042,468.852857


0
Time to process test data: 19.21613097190857
Time to add test to df: 1.9529597759246826
Day: 6
Preds predicted
Time to add test: 0.01562809944152832


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-01 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,0_0_1_0
2022-06-01 23:00:00,8,0,1,0.000,0,259.0,799710,31,23,2,8_0_1_0
2022-06-01 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,2,11_1_0_1
2022-06-01 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,2,0_0_1_1
2022-06-01 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,2,0_0_2_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-17 23:00:00,14,1,3,1074.844,1,,851221,56,23,4,14_1_3_1
2022-06-17 23:00:00,13,1,3,0.000,0,,851212,52,23,4,13_1_3_0
2022-06-17 23:00:00,14,1,1,98.738,1,,851219,55,23,4,14_1_1_1
2022-06-17 23:00:00,7,1,1,0.000,0,,851164,29,23,4,7_1_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-02 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,0_0_1_0
2022-06-02 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,3,0_0_1_0
2022-06-02 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,3,0_0_1_0
2022-06-02 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,3,0_0_1_0
2022-06-02 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,3,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-17 20:00:00,9,1,3,116.610,1,,850645,37,19,4,9_1_3_1
2022-06-17 21:00:00,9,1,3,123.902,1,,850779,37,20,4,9_1_3_1
2022-06-17 22:00:00,9,1,3,94.437,1,,850913,37,21,4,9_1_3_1
2022-06-17 23:00:00,9,1,3,94.907,1,,851047,37,22,4,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-01 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,...,,,,,,,,,,
2022-06-01 23:00:00,8,0,1,0.000,0,259.0,799710,31,23,2,...,,,,,,,,,,
2022-06-01 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,2,...,,,,,,,,,,
2022-06-01 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,2,...,,,,,,,,,,
2022-06-01 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,2,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-17 23:00:00,14,1,3,1074.844,1,,851221,56,23,4,...,,,887.351,568.845,523.650,801.878,931.881,954.863,497.578,815.446
2022-06-17 23:00:00,13,1,3,0.000,0,,851212,52,23,4,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-17 23:00:00,14,1,1,98.738,1,,851219,55,23,4,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-06-17 23:00:00,7,1,1,0.000,0,,851164,29,23,4,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
809424,0,0,1,0.600,0,2022-05-20 00:00:00,261.0,809424,0,True,2022-05-20
809425,0,0,1,303.162,1,2022-05-20 00:00:00,261.0,809425,0,True,2022-05-20
809426,0,0,2,0.000,0,2022-05-20 00:00:00,261.0,809426,1,True,2022-05-20
809427,0,0,2,14.367,1,2022-05-20 00:00:00,261.0,809427,1,True,2022-05-20
809428,0,0,3,2.060,0,2022-05-20 00:00:00,261.0,809428,2,True,2022-05-20
...,...,...,...,...,...,...,...,...,...,...,...
880171,15,1,0,,1,2022-06-10 23:00:00,,880171,64,False,2022-06-10
880172,15,1,1,,0,2022-06-10 23:00:00,,880172,59,False,2022-06-10
880173,15,1,1,,1,2022-06-10 23:00:00,,880173,59,False,2022-06-10
880174,15,1,3,,0,2022-06-10 23:00:00,,880174,60,False,2022-06-10


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-01 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-01 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-06-01 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.533000,0.533000
2022-06-01 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-06-01 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.356000,1.356000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-17 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,166.435571,157.60550,423.324333,464.416571,526.375333,749.634000
2022-06-17 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,53.938667,0.105143,0.00000,0.009167,0.000000,2.413542,0.135714
2022-06-17 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,156.196000,137.91900,199.062208,182.087857,661.376000,210.316286
2022-06-17 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,62.757833,0.191143,0.00000,5.402875,0.105143,2.395250,0.027571


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.600,0,2022-05-20 00:00:00,261.0,809424,0,True,...,,,,,,,,,,
1,0,0,1,303.162,1,2022-05-20 00:00:00,261.0,809425,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-20 00:00:00,261.0,809426,1,True,...,,,,,,,,,,
3,0,0,2,14.367,1,2022-05-20 00:00:00,261.0,809427,1,True,...,,,,,,,,,,
4,0,0,3,2.060,0,2022-05-20 00:00:00,261.0,809428,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-10 23:00:00,,880173,59,False,...,,31.917375,36.862857,190.953917,137.531143,172.87750,152.881208,131.219857,472.608750,417.489857
70700,15,1,3,,0,2022-06-10 23:00:00,,880174,60,False,...,,158.916375,0.000000,2.061000,0.000000,0.00000,2.414125,6.582714,1.977833,6.468571
70701,15,1,3,,0,2022-06-10 23:00:00,,880174,60,False,...,,142.408625,0.000000,0.963458,0.000000,0.00000,2.165667,0.000000,1.685083,0.000000
70702,15,1,3,,1,2022-06-10 23:00:00,,880175,60,False,...,,486.827958,308.055143,194.550292,137.096857,119.84550,937.465333,519.135571,730.112417,255.549857


0
Time to process test data: 19.145593643188477
Time to add test to df: 1.9512159824371338
Day: 7
Preds predicted
Time to add test: 0.01886725425720215


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-03 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,0_0_1_0
2022-06-03 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,4,7_1_1_1
2022-06-03 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,4,11_0_3_1
2022-06-03 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,4,11_0_3_0
2022-06-03 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,4,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-19 23:00:00,7,1,0,0.000,0,,851162,28,23,6,7_1_0_0
2022-06-19 23:00:00,0,1,3,1.033,0,,851110,5,23,6,0_1_3_0
2022-06-19 23:00:00,5,0,1,32.886,1,,851145,19,23,6,5_0_1_1
2022-06-19 23:00:00,7,0,1,65.935,1,,851157,25,23,6,7_0_1_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-04 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,0_0_1_0
2022-06-04 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,5,0_0_1_0
2022-06-04 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,5,0_0_1_0
2022-06-04 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,5,0_0_1_0
2022-06-04 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,5,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-19 20:00:00,9,1,3,116.610,1,,850645,37,19,6,9_1_3_1
2022-06-19 21:00:00,9,1,3,123.902,1,,850779,37,20,6,9_1_3_1
2022-06-19 22:00:00,9,1,3,94.437,1,,850913,37,21,6,9_1_3_1
2022-06-19 23:00:00,9,1,3,94.907,1,,851047,37,22,6,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-03 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,...,,,,,,,,,,
2022-06-03 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,4,...,,,,,,,,,,
2022-06-03 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,4,...,,,,,,,,,,
2022-06-03 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,4,...,,,,,,,,,,
2022-06-03 23:00:00,11,0,2,0.979,1,259.0,799737,44,23,4,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-19 23:00:00,7,1,0,0.000,0,,851162,28,23,6,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-19 23:00:00,0,1,3,1.033,0,,851110,5,23,6,...,,,0.578,0.334,0.345,0.635,0.336,0.391,0.531,0.354
2022-06-19 23:00:00,5,0,1,32.886,1,,851145,19,23,6,...,,,29.603,32.046,36.792,46.232,30.447,21.281,40.684,47.177
2022-06-19 23:00:00,7,0,1,65.935,1,,851157,25,23,6,...,,,60.504,67.313,68.837,75.909,58.891,49.133,74.875,73.983


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
812640,0,0,1,0.958,0,2022-05-21 00:00:00,262.0,812640,0,True,2022-05-21
812641,0,0,1,365.594,1,2022-05-21 00:00:00,262.0,812641,0,True,2022-05-21
812642,0,0,2,0.000,0,2022-05-21 00:00:00,262.0,812642,1,True,2022-05-21
812643,0,0,2,13.819,1,2022-05-21 00:00:00,262.0,812643,1,True,2022-05-21
812644,0,0,3,2.205,0,2022-05-21 00:00:00,262.0,812644,2,True,2022-05-21
...,...,...,...,...,...,...,...,...,...,...,...
883387,15,1,0,,1,2022-06-11 23:00:00,,883387,64,False,2022-06-11
883388,15,1,1,,0,2022-06-11 23:00:00,,883388,59,False,2022-06-11
883389,15,1,1,,1,2022-06-11 23:00:00,,883389,59,False,2022-06-11
883390,15,1,3,,0,2022-06-11 23:00:00,,883390,60,False,2022-06-11


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-03 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-03 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,115.237000,206.520000
2022-06-03 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.491565,0.608857
2022-06-03 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,105.846682,164.748429
2022-06-03 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.757909,0.972143,0.513909,0.613429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-19 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,182.941042,189.013429,138.25250,405.987375,409.156143,571.295208,376.904857
2022-06-19 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.248000,0.00000,0.096375,0.000000,3.027542,1.324714
2022-06-19 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,177.423708,148.171143,138.52125,159.427792,125.948429,455.894333,736.178000
2022-06-19 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.105143,0.00000,3.068958,0.147857,2.973042,0.012143


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.958,0,2022-05-21 00:00:00,262.0,812640,0,True,...,,,,,,,,,,
1,0,0,1,365.594,1,2022-05-21 00:00:00,262.0,812641,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-21 00:00:00,262.0,812642,1,True,...,,,,,,,,,,
3,0,0,2,13.819,1,2022-05-21 00:00:00,262.0,812643,1,True,...,,,,,,,,,,
4,0,0,3,2.205,0,2022-05-21 00:00:00,262.0,812644,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-11 23:00:00,,883389,59,False,...,,22.180333,39.080429,188.886458,142.574000,130.50175,159.548833,74.580857,329.188625,373.098143
70700,15,1,3,,0,2022-06-11 23:00:00,,883390,60,False,...,,307.968917,0.000000,10.277167,0.000000,0.00000,5.470292,9.557000,2.892750,9.444429
70701,15,1,3,,0,2022-06-11 23:00:00,,883390,60,False,...,,285.753750,0.000000,8.007375,0.000000,0.00000,3.153958,8.704857,4.074125,4.272857
70702,15,1,3,,1,2022-06-11 23:00:00,,883391,60,False,...,,374.433792,318.261000,187.984750,154.388571,153.24500,1228.499083,513.681286,773.012250,268.039286


0
Time to process test data: 18.965500593185425
Time to add test to df: 1.9736652374267578
Day: 8
Preds predicted
Time to add test: 0.02090287208557129


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-05 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,6,0_0_1_0
2022-06-05 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,6,11_1_3_0
2022-06-05 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,6,11_1_0_1
2022-06-05 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,6,0_0_1_1
2022-06-05 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,6,0_0_2_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-21 23:00:00,14,1,1,98.738,1,,851219,55,23,1,14_1_1_1
2022-06-21 23:00:00,7,1,1,0.000,0,,851164,29,23,1,7_1_1_0
2022-06-21 23:00:00,1,0,1,5.748,1,,851113,6,23,1,1_0_1_1
2022-06-21 23:00:00,11,1,3,3656.302,1,,851203,48,23,1,11_1_3_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-06 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,6,0_0_1_0
2022-06-06 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,0,0_0_1_0
2022-06-06 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,0,0_0_1_0
2022-06-06 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,0,0_0_1_0
2022-06-06 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,0,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-21 20:00:00,9,1,3,116.610,1,,850645,37,19,1,9_1_3_1
2022-06-21 21:00:00,9,1,3,123.902,1,,850779,37,20,1,9_1_3_1
2022-06-21 22:00:00,9,1,3,94.437,1,,850913,37,21,1,9_1_3_1
2022-06-21 23:00:00,9,1,3,94.907,1,,851047,37,22,1,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-05 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,6,...,,,,,,,,,,
2022-06-05 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,6,...,,,,,,,,,,
2022-06-05 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,6,...,,,,,,,,,,
2022-06-05 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,6,...,,,,,,,,,,
2022-06-05 23:00:00,0,0,2,0.000,0,259.0,799644,1,23,6,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-21 23:00:00,14,1,1,98.738,1,,851219,55,23,1,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-06-21 23:00:00,7,1,1,0.000,0,,851164,29,23,1,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-21 23:00:00,1,0,1,5.748,1,,851113,6,23,1,...,,,4.436,5.150,9.276,6.458,5.762,4.530,9.532,17.264
2022-06-21 23:00:00,11,1,3,3656.302,1,,851203,48,23,1,...,,,3565.626,3323.135,3017.917,3137.196,3490.744,3663.463,3048.108,2645.422


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
815856,0,0,1,1.243,0,2022-05-22 00:00:00,263.0,815856,0,True,2022-05-22
815857,0,0,1,349.207,1,2022-05-22 00:00:00,263.0,815857,0,True,2022-05-22
815858,0,0,2,0.000,0,2022-05-22 00:00:00,263.0,815858,1,True,2022-05-22
815859,0,0,2,19.784,1,2022-05-22 00:00:00,263.0,815859,1,True,2022-05-22
815860,0,0,3,3.041,0,2022-05-22 00:00:00,263.0,815860,2,True,2022-05-22
...,...,...,...,...,...,...,...,...,...,...,...
886603,15,1,0,,1,2022-06-12 23:00:00,,886603,64,False,2022-06-12
886604,15,1,1,,0,2022-06-12 23:00:00,,886604,59,False,2022-06-12
886605,15,1,1,,1,2022-06-12 23:00:00,,886605,59,False,2022-06-12
886606,15,1,3,,0,2022-06-12 23:00:00,,886606,60,False,2022-06-12


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-05 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-05 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-06-05 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.799500,0.799500
2022-06-05 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-06-05 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.808000,1.808000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-21 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,194.694571,157.73450,423.324333,464.416571,391.216625,230.145143
2022-06-21 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,60.164833,0.248000,0.00000,0.000500,0.000000,3.036625,2.284143
2022-06-21 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,184.455000,138.06725,207.804500,191.851143,663.464500,178.098857
2022-06-21 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,62.757833,0.248000,0.00000,4.979375,0.105143,2.394583,0.027571


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,1.243,0,2022-05-22 00:00:00,263.0,815856,0,True,...,,,,,,,,,,
1,0,0,1,349.207,1,2022-05-22 00:00:00,263.0,815857,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-22 00:00:00,263.0,815858,1,True,...,,,,,,,,,,
3,0,0,2,19.784,1,2022-05-22 00:00:00,263.0,815859,1,True,...,,,,,,,,,,
4,0,0,3,3.041,0,2022-05-22 00:00:00,263.0,815860,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-12 23:00:00,,886605,59,False,...,,34.882000,39.694143,189.637708,137.421000,174.89300,172.207333,104.964714,617.608333,89.783143
70700,15,1,3,,0,2022-06-12 23:00:00,,886606,60,False,...,,240.479417,0.000000,1.530708,0.000000,0.00000,5.707625,8.741286,2.617208,8.629571
70701,15,1,3,,0,2022-06-12 23:00:00,,886606,60,False,...,,215.482208,0.000000,1.450458,0.000000,0.00000,2.842708,0.769000,0.224292,0.059571
70702,15,1,3,,1,2022-06-12 23:00:00,,886607,60,False,...,,400.030250,314.529714,194.213625,129.079714,122.54125,980.632167,470.322429,677.498792,253.354286


0
Time to process test data: 18.89077377319336
Time to add test to df: 1.964846134185791
Day: 9
Preds predicted
Time to add test: 0.020133495330810547


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-07 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,1,0_0_1_0
2022-06-07 23:00:00,0,1,3,6788.992,1,259.0,799655,5,23,1,0_1_3_1
2022-06-07 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,1,7_1_1_1
2022-06-07 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,1,11_0_3_1
2022-06-07 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,1,11_0_3_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-23 23:00:00,3,1,3,747.461,1,,851133,14,23,3,3_1_3_1
2022-06-23 23:00:00,7,1,0,0.000,0,,851162,28,23,3,7_1_0_0
2022-06-23 23:00:00,5,0,1,32.886,1,,851145,19,23,3,5_0_1_1
2022-06-23 23:00:00,7,0,1,0.622,0,,851156,25,23,3,7_0_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-08 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,1,0_0_1_0
2022-06-08 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,2,0_0_1_0
2022-06-08 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,2,0_0_1_0
2022-06-08 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,2,0_0_1_0
2022-06-08 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,2,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-23 20:00:00,9,1,3,116.610,1,,850645,37,19,3,9_1_3_1
2022-06-23 21:00:00,9,1,3,123.902,1,,850779,37,20,3,9_1_3_1
2022-06-23 22:00:00,9,1,3,94.437,1,,850913,37,21,3,9_1_3_1
2022-06-23 23:00:00,9,1,3,94.907,1,,851047,37,22,3,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-07 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,1,...,,,,,,,,,,
2022-06-07 23:00:00,0,1,3,6788.992,1,259.0,799655,5,23,1,...,,,,,,,,,,
2022-06-07 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,1,...,,,,,,,,,,
2022-06-07 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,1,...,,,,,,,,,,
2022-06-07 23:00:00,11,0,3,1.350,0,259.0,799738,45,23,1,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-23 23:00:00,3,1,3,747.461,1,,851133,14,23,3,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-06-23 23:00:00,7,1,0,0.000,0,,851162,28,23,3,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-23 23:00:00,5,0,1,32.886,1,,851145,19,23,3,...,,,29.603,32.046,36.792,46.232,30.447,21.281,40.684,47.177
2022-06-23 23:00:00,7,0,1,0.622,0,,851156,25,23,3,...,,,0.712,0.143,0.802,0.492,0.098,0.371,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
819072,0,0,1,0.433,0,2022-05-23 00:00:00,264.0,819072,0,True,2022-05-23
819073,0,0,1,359.323,1,2022-05-23 00:00:00,264.0,819073,0,True,2022-05-23
819074,0,0,2,0.000,0,2022-05-23 00:00:00,264.0,819074,1,True,2022-05-23
819075,0,0,2,12.089,1,2022-05-23 00:00:00,264.0,819075,1,True,2022-05-23
819076,0,0,3,4.788,0,2022-05-23 00:00:00,264.0,819076,2,True,2022-05-23
...,...,...,...,...,...,...,...,...,...,...,...
889819,15,1,0,,1,2022-06-13 23:00:00,,889819,64,False,2022-06-13
889820,15,1,1,,0,2022-06-13 23:00:00,,889820,59,False,2022-06-13
889821,15,1,1,,1,2022-06-13 23:00:00,,889821,59,False,2022-06-13
889822,15,1,3,,0,2022-06-13 23:00:00,,889822,60,False,2022-06-13


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-07 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-07 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,59.229182,62.714143,115.237000,206.520000
2022-06-07 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.534667,0.626429
2022-06-07 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,105.846682,164.748429
2022-06-07 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.917667,0.961000,0.561400,0.626429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-23 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,182.941042,164.425571,192.54475,405.987375,409.156143,598.269875,734.497571
2022-06-23 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.191143,0.00000,0.145292,0.000000,2.935458,0.000000
2022-06-23 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,181.676625,123.583286,119.00550,193.261708,157.012143,596.070792,385.198571
2022-06-23 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,51.345667,0.105143,0.00000,3.104583,1.472571,2.973500,0.147857


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.433,0,2022-05-23 00:00:00,264.0,819072,0,True,...,,,,,,,,,,
1,0,0,1,359.323,1,2022-05-23 00:00:00,264.0,819073,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-23 00:00:00,264.0,819074,1,True,...,,,,,,,,,,
3,0,0,2,12.089,1,2022-05-23 00:00:00,264.0,819075,1,True,...,,,,,,,,,,
4,0,0,3,4.788,0,2022-05-23 00:00:00,264.0,819076,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-13 23:00:00,,889821,59,False,...,,18.791333,41.081857,128.003458,142.765143,119.862500,159.948000,80.699857,291.449500,178.072143
70700,15,1,3,,0,2022-06-13 23:00:00,,889822,60,False,...,,453.441875,0.150143,9.463583,0.150143,0.350333,2.537750,7.639143,2.284125,7.639143
70701,15,1,3,,0,2022-06-13 23:00:00,,889822,60,False,...,,413.776333,0.150143,6.898167,0.150143,0.262750,2.641125,0.717857,2.393750,0.566857
70702,15,1,3,,1,2022-06-13 23:00:00,,889823,60,False,...,,130.693750,321.849000,133.666083,171.043429,145.673667,784.648667,415.539000,618.226042,230.173286


0
Time to process test data: 18.651245832443237
Time to add test to df: 1.9859628677368164
Day: 10
Preds predicted
Time to add test: 0.015666723251342773


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-09 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,3,0_0_1_0
2022-06-09 23:00:00,14,0,3,0.032,0,259.0,799760,54,23,3,14_0_3_0
2022-06-09 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,3,11_1_3_0
2022-06-09 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,3,11_1_0_1
2022-06-09 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,3,0_0_1_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-25 23:00:00,14,1,1,98.738,1,,851219,55,23,5,14_1_1_1
2022-06-25 23:00:00,7,1,1,0.000,0,,851164,29,23,5,7_1_1_0
2022-06-25 23:00:00,11,1,3,3656.302,1,,851203,48,23,5,11_1_3_1
2022-06-25 23:00:00,13,1,1,19.880,1,,851211,63,23,5,13_1_1_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-10 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,3,0_0_1_0
2022-06-10 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,4,0_0_1_0
2022-06-10 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,4,0_0_1_0
2022-06-10 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,4,0_0_1_0
2022-06-10 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,4,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-25 20:00:00,9,1,3,116.610,1,,850645,37,19,5,9_1_3_1
2022-06-25 21:00:00,9,1,3,123.902,1,,850779,37,20,5,9_1_3_1
2022-06-25 22:00:00,9,1,3,94.437,1,,850913,37,21,5,9_1_3_1
2022-06-25 23:00:00,9,1,3,94.907,1,,851047,37,22,5,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-09 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,3,...,,,,,,,,,,
2022-06-09 23:00:00,14,0,3,0.032,0,259.0,799760,54,23,3,...,,,,,,,,,,
2022-06-09 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,3,...,,,,,,,,,,
2022-06-09 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,3,...,,,,,,,,,,
2022-06-09 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,3,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-25 23:00:00,14,1,1,98.738,1,,851219,55,23,5,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-06-25 23:00:00,7,1,1,0.000,0,,851164,29,23,5,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-25 23:00:00,11,1,3,3656.302,1,,851203,48,23,5,...,,,3565.626,3323.135,3017.917,3137.196,3490.744,3663.463,3048.108,2645.422
2022-06-25 23:00:00,13,1,1,19.880,1,,851211,63,23,5,...,,,20.531,21.650,21.809,32.243,22.632,15.846,36.493,29.625


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
822288,0,0,1,1.059,0,2022-05-24 00:00:00,265.0,822288,0,True,2022-05-24
822289,0,0,1,300.538,1,2022-05-24 00:00:00,265.0,822289,0,True,2022-05-24
822290,0,0,2,0.000,0,2022-05-24 00:00:00,265.0,822290,1,True,2022-05-24
822291,0,0,2,13.588,1,2022-05-24 00:00:00,265.0,822291,1,True,2022-05-24
822292,0,0,3,3.806,0,2022-05-24 00:00:00,265.0,822292,2,True,2022-05-24
...,...,...,...,...,...,...,...,...,...,...,...
893035,15,1,0,,1,2022-06-14 23:00:00,,893035,64,False,2022-06-14
893036,15,1,1,,0,2022-06-14 23:00:00,,893036,59,False,2022-06-14
893037,15,1,1,,1,2022-06-14 23:00:00,,893037,59,False,2022-06-14
893038,15,1,3,,0,2022-06-14 23:00:00,,893038,60,False,2022-06-14


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-09 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-09 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-06-09 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.543667,0.543667
2022-06-09 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-06-09 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,1.928500,1.928500,1.364000,1.364000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-25 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,189.915917,196.919714,157.73450,423.324333,464.416571,396.252000,351.108429
2022-06-25 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,53.938667,0.248000,0.00000,0.083583,0.000000,3.040167,2.284143
2022-06-25 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,192.196958,224.441000,138.06725,175.678542,182.865714,687.943792,595.790571
2022-06-25 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,62.757833,0.248000,0.00000,5.402875,0.155857,2.362208,0.132714


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,1.059,0,2022-05-24 00:00:00,265.0,822288,0,True,...,,,,,,,,,,
1,0,0,1,300.538,1,2022-05-24 00:00:00,265.0,822289,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-24 00:00:00,265.0,822290,1,True,...,,,,,,,,,,
3,0,0,2,13.588,1,2022-05-24 00:00:00,265.0,822291,1,True,...,,,,,,,,,,
4,0,0,3,3.806,0,2022-05-24 00:00:00,265.0,822292,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-14 23:00:00,,893037,59,False,...,,17.717292,40.394571,139.431333,173.428571,188.74725,150.578917,85.529000,398.401083,586.546571
70700,15,1,3,,0,2022-06-14 23:00:00,,893038,60,False,...,,208.191750,0.000000,6.541375,0.000000,0.00000,5.097792,8.227714,2.461708,8.227714
70701,15,1,3,,0,2022-06-14 23:00:00,,893038,60,False,...,,191.697958,0.000000,4.316083,0.000000,0.00000,2.667417,0.706143,0.205958,0.000000
70702,15,1,3,,1,2022-06-14 23:00:00,,893039,60,False,...,,186.832250,333.128667,138.455375,181.853000,151.23900,821.411375,436.358286,676.166292,236.673571


0
Time to process test data: 18.697720766067505
Time to add test to df: 2.0177652835845947
Day: 11
Preds predicted
Time to add test: 0.031336069107055664


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-11 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,5,0_0_1_0
2022-06-11 23:00:00,7,1,0,0.000,0,259.0,799704,28,23,5,7_1_0_0
2022-06-11 23:00:00,0,1,3,6788.992,1,259.0,799655,5,23,5,0_1_3_1
2022-06-11 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,5,7_1_1_1
2022-06-11 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,5,11_0_3_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-27 23:00:00,4,0,1,0.018,0,,851134,15,23,0,4_0_1_0
2022-06-27 23:00:00,3,1,3,747.461,1,,851133,14,23,0,3_1_3_1
2022-06-27 23:00:00,7,1,0,0.000,0,,851162,28,23,0,7_1_0_0
2022-06-27 23:00:00,14,0,3,124.207,1,,851217,54,23,0,14_0_3_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-12 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,5,0_0_1_0
2022-06-12 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,6,0_0_1_0
2022-06-12 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,6,0_0_1_0
2022-06-12 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,6,0_0_1_0
2022-06-12 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,6,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-27 20:00:00,9,1,3,116.610,1,,850645,37,19,0,9_1_3_1
2022-06-27 21:00:00,9,1,3,123.902,1,,850779,37,20,0,9_1_3_1
2022-06-27 22:00:00,9,1,3,94.437,1,,850913,37,21,0,9_1_3_1
2022-06-27 23:00:00,9,1,3,94.907,1,,851047,37,22,0,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-11 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,5,...,,,,,,,,,,
2022-06-11 23:00:00,7,1,0,0.000,0,259.0,799704,28,23,5,...,,,,,,,,,,
2022-06-11 23:00:00,0,1,3,6788.992,1,259.0,799655,5,23,5,...,,,,,,,,,,
2022-06-11 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,5,...,,,,,,,,,,
2022-06-11 23:00:00,11,0,3,342.649,1,259.0,799739,45,23,5,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-27 23:00:00,4,0,1,0.018,0,,851134,15,23,0,...,,,0.001,0.002,0.000,0.001,0.005,0.000,0.002,0.003
2022-06-27 23:00:00,3,1,3,747.461,1,,851133,14,23,0,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-06-27 23:00:00,7,1,0,0.000,0,,851162,28,23,0,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-27 23:00:00,14,0,3,124.207,1,,851217,54,23,0,...,,,114.868,120.525,124.748,147.345,116.591,106.001,135.410,142.885


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
825504,0,0,1,0.744,0,2022-05-25 00:00:00,266.0,825504,0,True,2022-05-25
825505,0,0,1,261.849,1,2022-05-25 00:00:00,266.0,825505,0,True,2022-05-25
825506,0,0,2,0.000,0,2022-05-25 00:00:00,266.0,825506,1,True,2022-05-25
825507,0,0,2,12.020,1,2022-05-25 00:00:00,266.0,825507,1,True,2022-05-25
825508,0,0,3,2.018,0,2022-05-25 00:00:00,266.0,825508,2,True,2022-05-25
...,...,...,...,...,...,...,...,...,...,...,...
896251,15,1,0,,1,2022-06-15 23:00:00,,896251,64,False,2022-06-15
896252,15,1,1,,0,2022-06-15 23:00:00,,896252,59,False,2022-06-15
896253,15,1,1,,1,2022-06-15 23:00:00,,896253,59,False,2022-06-15
896254,15,1,3,,0,2022-06-15 23:00:00,,896254,60,False,2022-06-15


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-11 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-11 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.69600,57.526167,64.561143,112.051833,206.520000
2022-06-11 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.80800,0.000000,0.000000,0.534667,0.602286
2022-06-11 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.63200,7.910000,7.910000,102.931304,164.748429
2022-06-11 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.71200,0.917667,0.961000,0.561400,0.626429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-27 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,182.941042,133.941429,192.54475,405.987375,409.156143,642.305958,877.301571
2022-06-27 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.191143,0.00000,0.179542,0.000000,3.027083,1.324714
2022-06-27 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,181.676625,123.583286,119.00550,193.261708,157.012143,446.434042,340.012571
2022-06-27 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.191143,0.00000,3.065417,0.135714,2.969500,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.744,0,2022-05-25 00:00:00,266.0,825504,0,True,...,,,,,,,,,,
1,0,0,1,261.849,1,2022-05-25 00:00:00,266.0,825505,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-25 00:00:00,266.0,825506,1,True,...,,,,,,,,,,
3,0,0,2,12.020,1,2022-05-25 00:00:00,266.0,825507,1,True,...,,,,,,,,,,
4,0,0,3,2.018,0,2022-05-25 00:00:00,266.0,825508,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-15 23:00:00,,896253,59,False,...,,47.537583,42.132500,181.689958,154.248857,127.130500,145.714750,40.160286,272.538000,220.817286
70700,15,1,3,,0,2022-06-15 23:00:00,,896254,60,False,...,,115.060208,0.393200,7.359292,0.280857,0.655333,1.931417,2.358286,0.756958,1.799429
70701,15,1,3,,0,2022-06-15 23:00:00,,896254,60,False,...,,113.049458,0.327667,6.369458,0.280857,0.491500,1.589500,0.883286,0.856292,0.577571
70702,15,1,3,,1,2022-06-15 23:00:00,,896255,60,False,...,,542.916583,334.505600,186.060292,186.984857,150.529667,1178.320000,474.986571,771.390583,257.943857


0
Time to process test data: 18.884077310562134
Time to add test to df: 2.005831718444824
Day: 12
Preds predicted
Time to add test: 0.01562809944152832


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-13 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,0,0_0_1_0
2022-06-13 23:00:00,12,1,3,1.300,0,259.0,799748,49,23,0,12_1_3_0
2022-06-13 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,0,11_1_3_0
2022-06-13 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,0,11_1_0_1
2022-06-13 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,0,0_0_1_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-29 23:00:00,14,1,1,98.738,1,,851219,55,23,2,14_1_1_1
2022-06-29 23:00:00,7,1,1,0.000,0,,851164,29,23,2,7_1_1_0
2022-06-29 23:00:00,13,1,1,19.880,1,,851211,63,23,2,13_1_1_1
2022-06-29 23:00:00,13,1,3,80.501,1,,851213,52,23,2,13_1_3_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-14 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,0,0_0_1_0
2022-06-14 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,1,0_0_1_0
2022-06-14 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,1,0_0_1_0
2022-06-14 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,1,0_0_1_0
2022-06-14 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,1,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-06-29 20:00:00,9,1,3,116.610,1,,850645,37,19,2,9_1_3_1
2022-06-29 21:00:00,9,1,3,123.902,1,,850779,37,20,2,9_1_3_1
2022-06-29 22:00:00,9,1,3,94.437,1,,850913,37,21,2,9_1_3_1
2022-06-29 23:00:00,9,1,3,94.907,1,,851047,37,22,2,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-13 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,0,...,,,,,,,,,,
2022-06-13 23:00:00,12,1,3,1.300,0,259.0,799748,49,23,0,...,,,,,,,,,,
2022-06-13 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,0,...,,,,,,,,,,
2022-06-13 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,0,...,,,,,,,,,,
2022-06-13 23:00:00,0,0,1,321.824,1,259.0,799643,0,23,0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-29 23:00:00,14,1,1,98.738,1,,851219,55,23,2,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-06-29 23:00:00,7,1,1,0.000,0,,851164,29,23,2,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-06-29 23:00:00,13,1,1,19.880,1,,851211,63,23,2,...,,,20.531,21.650,21.809,32.243,22.632,15.846,36.493,29.625
2022-06-29 23:00:00,13,1,3,80.501,1,,851213,52,23,2,...,,,642.542,355.383,367.825,489.491,561.300,637.584,491.503,645.005


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
828720,0,0,1,0.387,0,2022-05-26 00:00:00,267.0,828720,0,True,2022-05-26
828721,0,0,1,250.792,1,2022-05-26 00:00:00,267.0,828721,0,True,2022-05-26
828722,0,0,2,0.000,0,2022-05-26 00:00:00,267.0,828722,1,True,2022-05-26
828723,0,0,2,12.951,1,2022-05-26 00:00:00,267.0,828723,1,True,2022-05-26
828724,0,0,3,1.023,0,2022-05-26 00:00:00,267.0,828724,2,True,2022-05-26
...,...,...,...,...,...,...,...,...,...,...,...
899467,15,1,0,,1,2022-06-16 23:00:00,,899467,64,False,2022-06-16
899468,15,1,1,,0,2022-06-16 23:00:00,,899468,59,False,2022-06-16
899469,15,1,1,,1,2022-06-16 23:00:00,,899469,59,False,2022-06-16
899470,15,1,3,,0,2022-06-16 23:00:00,,899470,60,False,2022-06-16


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-13 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-13 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-06-13 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.799500,0.799500
2022-06-13 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-06-13 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.808000,1.808000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-06-29 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,196.919714,157.73450,423.324333,464.416571,685.157375,504.344429
2022-06-29 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,60.164833,0.248000,0.00000,0.000500,0.000000,3.040167,2.284143
2022-06-29 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,224.441000,138.06725,177.377083,51.152714,624.854458,130.805429
2022-06-29 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,62.757833,0.248000,0.00000,4.970333,0.105143,2.395458,0.105143


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.387,0,2022-05-26 00:00:00,267.0,828720,0,True,...,,,,,,,,,,
1,0,0,1,250.792,1,2022-05-26 00:00:00,267.0,828721,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-26 00:00:00,267.0,828722,1,True,...,,,,,,,,,,
3,0,0,2,12.951,1,2022-05-26 00:00:00,267.0,828723,1,True,...,,,,,,,,,,
4,0,0,3,1.023,0,2022-05-26 00:00:00,267.0,828724,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-16 23:00:00,,899469,59,False,...,,42.468667,40.8296,191.428958,147.253286,128.510750,179.633917,164.657857,711.938542,275.777571
70700,15,1,3,,0,2022-06-16 23:00:00,,899470,60,False,...,,68.465125,0.4835,1.483917,0.276286,0.644667,5.688958,8.900143,2.678083,8.787857
70701,15,1,3,,0,2022-06-16 23:00:00,,899470,60,False,...,,63.638250,0.3868,1.234542,0.276286,0.483500,2.880833,0.498000,0.202750,0.497857
70702,15,1,3,,1,2022-06-16 23:00:00,,899471,60,False,...,,582.513042,340.1350,207.034167,145.784143,155.741000,1129.206000,486.988429,782.481250,258.467857


0
Time to process test data: 19.287684679031372
Time to add test to df: 1.9856066703796387
Day: 13
Preds predicted
Time to add test: 0.01563119888305664


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-15 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,0_0_1_0
2022-06-15 23:00:00,5,1,0,1100.700,1,259.0,799693,21,23,2,5_1_0_1
2022-06-15 23:00:00,7,1,0,0.000,0,259.0,799704,28,23,2,7_1_0_0
2022-06-15 23:00:00,0,1,3,6788.992,1,259.0,799655,5,23,2,0_1_3_1
2022-06-15 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,2,7_1_1_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-01 23:00:00,4,0,1,0.018,0,,851134,15,23,4,4_0_1_0
2022-07-01 23:00:00,3,1,3,747.461,1,,851133,14,23,4,3_1_3_1
2022-07-01 23:00:00,14,0,3,124.207,1,,851217,54,23,4,14_0_3_1
2022-07-01 23:00:00,3,0,1,0.000,0,,851126,11,23,4,3_0_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-16 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,0_0_1_0
2022-06-16 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,3,0_0_1_0
2022-06-16 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,3,0_0_1_0
2022-06-16 03:00:00,0,0,1,0.347,0,260.0,800044,0,2,3,0_0_1_0
2022-06-16 04:00:00,0,0,1,1.361,0,260.0,800178,0,3,3,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-01 20:00:00,9,1,3,116.610,1,,850645,37,19,4,9_1_3_1
2022-07-01 21:00:00,9,1,3,123.902,1,,850779,37,20,4,9_1_3_1
2022-07-01 22:00:00,9,1,3,94.437,1,,850913,37,21,4,9_1_3_1
2022-07-01 23:00:00,9,1,3,94.907,1,,851047,37,22,4,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-15 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,2,...,,,,,,,,,,
2022-06-15 23:00:00,5,1,0,1100.700,1,259.0,799693,21,23,2,...,,,,,,,,,,
2022-06-15 23:00:00,7,1,0,0.000,0,259.0,799704,28,23,2,...,,,,,,,,,,
2022-06-15 23:00:00,0,1,3,6788.992,1,259.0,799655,5,23,2,...,,,,,,,,,,
2022-06-15 23:00:00,7,1,1,142.565,1,259.0,799707,29,23,2,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-01 23:00:00,4,0,1,0.018,0,,851134,15,23,4,...,,,0.001,0.002,0.000,0.001,0.005,0.000,0.002,0.003
2022-07-01 23:00:00,3,1,3,747.461,1,,851133,14,23,4,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-07-01 23:00:00,14,0,3,124.207,1,,851217,54,23,4,...,,,114.868,120.525,124.748,147.345,116.591,106.001,135.410,142.885
2022-07-01 23:00:00,3,0,1,0.000,0,,851126,11,23,4,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
831936,0,0,1,0.664,0,2022-05-27 00:00:00,268.0,831936,0,True,2022-05-27
831937,0,0,1,282.414,1,2022-05-27 00:00:00,268.0,831937,0,True,2022-05-27
831938,0,0,2,0.000,0,2022-05-27 00:00:00,268.0,831938,1,True,2022-05-27
831939,0,0,2,9.999,1,2022-05-27 00:00:00,268.0,831939,1,True,2022-05-27
831940,0,0,3,2.056,0,2022-05-27 00:00:00,268.0,831940,2,True,2022-05-27
...,...,...,...,...,...,...,...,...,...,...,...
902683,15,1,0,,1,2022-06-17 23:00:00,,902683,64,False,2022-06-17
902684,15,1,1,,0,2022-06-17 23:00:00,,902684,59,False,2022-06-17
902685,15,1,1,,1,2022-06-17 23:00:00,,902685,59,False,2022-06-17
902686,15,1,3,,0,2022-06-17 23:00:00,,902686,60,False,2022-06-17


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-15 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.5990,1.599000,1.599000,1.599000,1.599000
2022-06-15 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,443.696000,443.696000,443.6960,64.372300,66.865714,120.120591,209.609286
2022-06-15 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,1.808000,1.808000,1.8080,0.000000,0.000000,0.510364,0.602286
2022-06-15 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,504.632000,504.632000,504.6320,7.910000,7.910000,110.515667,167.329286
2022-06-15 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,2.712000,2.712000,2.7120,0.917667,0.987000,0.534667,0.602286
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-01 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,181.676625,123.583286,119.0055,405.987375,409.156143,585.281250,725.819714
2022-07-01 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.191143,0.0000,0.170750,0.000000,3.026708,6.572286
2022-07-01 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,201.710417,122.845571,173.1270,159.633250,77.682571,783.123958,64.885000
2022-07-01 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.191143,0.0000,3.102333,0.147857,2.973042,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.664,0,2022-05-27 00:00:00,268.0,831936,0,True,...,,,,,,,,,,
1,0,0,1,282.414,1,2022-05-27 00:00:00,268.0,831937,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-27 00:00:00,268.0,831938,1,True,...,,,,,,,,,,
3,0,0,2,9.999,1,2022-05-27 00:00:00,268.0,831939,1,True,...,,,,,,,,,,
4,0,0,3,2.056,0,2022-05-27 00:00:00,268.0,831940,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-17 23:00:00,,902685,59,False,...,,30.366917,40.29950,190.202208,145.339429,127.539750,166.248167,113.534000,288.635292,318.899286
70700,15,1,3,,0,2022-06-17 23:00:00,,902686,60,False,...,,307.272167,1.33200,8.537875,0.570857,1.332000,4.251792,7.075857,2.149667,6.961714
70701,15,1,3,,0,2022-06-17 23:00:00,,902686,60,False,...,,282.290083,0.99900,5.487083,0.570857,0.999000,4.228542,6.216714,2.208000,0.491429
70702,15,1,3,,1,2022-06-17 23:00:00,,902687,60,False,...,,419.658500,335.76300,189.660292,188.442143,156.769333,907.979250,481.210571,707.040250,254.707857


0
Time to process test data: 19.208908796310425
Time to add test to df: 2.0444960594177246
Day: 14
Preds predicted
Time to add test: 0.03063225746154785


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-17 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,0_0_1_0
2022-06-17 23:00:00,10,1,1,0.000,0,259.0,799728,40,23,4,10_1_1_0
2022-06-17 23:00:00,12,1,3,1.300,0,259.0,799748,49,23,4,12_1_3_0
2022-06-17 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,4,11_1_3_0
2022-06-17 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,4,11_1_0_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-03 23:00:00,13,1,3,0.000,0,,851212,52,23,6,13_1_3_0
2022-07-03 23:00:00,14,1,1,98.738,1,,851219,55,23,6,14_1_1_1
2022-07-03 23:00:00,7,1,1,0.000,0,,851164,29,23,6,7_1_1_0
2022-07-03 23:00:00,13,0,1,8.755,1,,851207,50,23,6,13_0_1_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-18 00:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,0_0_1_0
2022-06-18 01:00:00,0,0,1,2.695,0,,896256,0,0,5,0_0_1_0
2022-06-18 01:00:00,0,0,1,0.326,0,260.0,799776,0,0,5,0_0_1_0
2022-06-18 02:00:00,0,0,1,1.384,0,,896390,0,1,5,0_0_1_0
2022-06-18 02:00:00,0,0,1,1.438,0,260.0,799910,0,1,5,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-03 20:00:00,9,1,3,116.610,1,,850645,37,19,6,9_1_3_1
2022-07-03 21:00:00,9,1,3,123.902,1,,850779,37,20,6,9_1_3_1
2022-07-03 22:00:00,9,1,3,94.437,1,,850913,37,21,6,9_1_3_1
2022-07-03 23:00:00,9,1,3,94.907,1,,851047,37,22,6,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-17 23:00:00,0,0,1,1.599,0,259.0,799642,0,23,4,...,,,,,,,,,,
2022-06-17 23:00:00,10,1,1,0.000,0,259.0,799728,40,23,4,...,,,,,,,,,,
2022-06-17 23:00:00,12,1,3,1.300,0,259.0,799748,49,23,4,...,,,,,,,,,,
2022-06-17 23:00:00,11,1,3,0.280,0,259.0,799746,48,23,4,...,,,,,,,,,,
2022-06-17 23:00:00,11,1,0,347.447,1,259.0,799741,67,23,4,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-03 23:00:00,13,1,3,0.000,0,,851212,52,23,6,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-07-03 23:00:00,14,1,1,98.738,1,,851219,55,23,6,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-07-03 23:00:00,7,1,1,0.000,0,,851164,29,23,6,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-07-03 23:00:00,13,0,1,8.755,1,,851207,50,23,6,...,,,6.397,7.129,11.129,10.394,7.341,5.790,7.256,11.858


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
835152,0,0,1,1.111,0,2022-05-28 00:00:00,269.0,835152,0,True,2022-05-28
835153,0,0,1,287.297,1,2022-05-28 00:00:00,269.0,835153,0,True,2022-05-28
835154,0,0,2,0.000,0,2022-05-28 00:00:00,269.0,835154,1,True,2022-05-28
835155,0,0,2,12.907,1,2022-05-28 00:00:00,269.0,835155,1,True,2022-05-28
835156,0,0,3,1.382,0,2022-05-28 00:00:00,269.0,835156,2,True,2022-05-28
...,...,...,...,...,...,...,...,...,...,...,...
905899,15,1,0,,1,2022-06-18 23:00:00,,905899,64,False,2022-06-18
905900,15,1,1,,0,2022-06-18 23:00:00,,905900,59,False,2022-06-18
905901,15,1,1,,1,2022-06-18 23:00:00,,905901,59,False,2022-06-18
905902,15,1,3,,0,2022-06-18 23:00:00,,905902,60,False,2022-06-18


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-17 23:00:00,0,0,1,1.599,0,259.0,799642,0,0_0_1_0,,...,,1.599000,1.599000,1.599000,1.599000,1.59900,1.599000,1.599000,1.599000,1.599000
2022-06-17 23:00:00,0,0,1,321.824,1,259.0,799643,0,0_0_1_1,,...,,321.824000,321.824000,321.824000,321.824000,321.82400,321.824000,321.824000,321.824000,321.824000
2022-06-17 23:00:00,0,0,2,0.000,0,259.0,799644,1,0_0_2_0,,...,,0.000000,0.000000,0.799500,0.799500,0.79950,0.000000,0.000000,0.799500,0.799500
2022-06-17 23:00:00,0,0,2,14.841,1,259.0,799645,1,0_0_2_1,,...,,14.841000,14.841000,168.332500,168.332500,168.33250,14.841000,14.841000,168.332500,168.332500
2022-06-17 23:00:00,0,0,3,3.825,0,259.0,799646,2,0_0_3_0,,...,,3.825000,3.825000,1.808000,1.808000,1.80800,3.825000,3.825000,1.808000,1.808000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-03 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,196.919714,138.10900,423.324333,464.416571,799.801333,504.344429
2022-07-03 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,53.938667,0.248000,0.00000,0.000500,0.000000,3.040167,0.135714
2022-07-03 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,224.441000,182.14675,175.678542,182.865714,711.329250,1558.520286
2022-07-03 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,62.757833,0.248000,0.00000,5.365375,0.105143,2.357958,0.105143


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,1.111,0,2022-05-28 00:00:00,269.0,835152,0,True,...,,,,,,,,,,
1,0,0,1,287.297,1,2022-05-28 00:00:00,269.0,835153,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-28 00:00:00,269.0,835154,1,True,...,,,,,,,,,,
3,0,0,2,12.907,1,2022-05-28 00:00:00,269.0,835155,1,True,...,,,,,,,,,,
4,0,0,3,1.382,0,2022-05-28 00:00:00,269.0,835156,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70699,15,1,1,,1,2022-06-18 23:00:00,,905901,59,False,...,,45.592917,40.510000,193.407375,192.782000,213.713250,157.749417,93.154286,486.593292,767.632286
70700,15,1,3,,0,2022-06-18 23:00:00,,905902,60,False,...,,120.223250,2.507000,12.674292,0.835667,1.671333,4.227292,7.162571,2.189042,7.050000
70701,15,1,3,,0,2022-06-18 23:00:00,,905902,60,False,...,,114.092000,1.671333,5.636042,0.716286,1.253500,2.350625,0.617429,0.180083,0.045714
70702,15,1,3,,1,2022-06-18 23:00:00,,905903,60,False,...,,522.664958,344.603000,190.556583,168.000500,166.895667,936.629167,493.790571,714.927000,259.601857


0
Time to process test data: 19.117679119110107
Time to add test to df: 2.067335367202759
Day: 15
Preds predicted
Time to add test: 0.03828835487365723


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-19 00:00:00,15,1,3,300.407,1,,899605,60,0,6,15_1_3_1
2022-06-19 00:00:00,11,1,0,0.000,0,,899570,67,0,6,11_1_0_0
2022-06-19 00:00:00,11,0,3,237.516,1,,899569,45,0,6,11_0_3_1
2022-06-19 00:00:00,11,0,3,0.710,0,,899568,45,0,6,11_0_3_0
2022-06-19 00:00:00,11,0,2,1.674,1,,899567,44,0,6,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-05 23:00:00,3,1,3,747.461,1,,851133,14,23,1,3_1_3_1
2022-07-05 23:00:00,14,0,3,124.207,1,,851217,54,23,1,14_0_3_1
2022-07-05 23:00:00,3,0,1,0.000,0,,851126,11,23,1,3_0_1_0
2022-07-05 23:00:00,5,1,3,569.723,1,,851153,23,23,1,5_1_3_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-19 01:00:00,0,0,1,1.332,0,,899472,0,0,6,0_0_1_0
2022-06-19 02:00:00,0,0,1,1.161,0,,899606,0,1,6,0_0_1_0
2022-06-19 03:00:00,0,0,1,0.794,0,,899740,0,2,6,0_0_1_0
2022-06-19 04:00:00,0,0,1,1.129,0,,899874,0,3,6,0_0_1_0
2022-06-19 05:00:00,0,0,1,1.110,0,,900008,0,4,6,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-05 20:00:00,9,1,3,116.610,1,,850645,37,19,1,9_1_3_1
2022-07-05 21:00:00,9,1,3,123.902,1,,850779,37,20,1,9_1_3_1
2022-07-05 22:00:00,9,1,3,94.437,1,,850913,37,21,1,9_1_3_1
2022-07-05 23:00:00,9,1,3,94.907,1,,851047,37,22,1,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-19 00:00:00,15,1,3,300.407,1,,899605,60,0,6,...,,,,,,,,,,
2022-06-19 00:00:00,11,1,0,0.000,0,,899570,67,0,6,...,,,,,,,,,,
2022-06-19 00:00:00,11,0,3,237.516,1,,899569,45,0,6,...,,,,,,,,,,
2022-06-19 00:00:00,11,0,3,0.710,0,,899568,45,0,6,...,,,,,,,,,,
2022-06-19 00:00:00,11,0,2,1.674,1,,899567,44,0,6,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-05 23:00:00,3,1,3,747.461,1,,851133,14,23,1,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-07-05 23:00:00,14,0,3,124.207,1,,851217,54,23,1,...,,,114.868,120.525,124.748,147.345,116.591,106.001,135.410,142.885
2022-07-05 23:00:00,3,0,1,0.000,0,,851126,11,23,1,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-07-05 23:00:00,5,1,3,569.723,1,,851153,23,23,1,...,,,576.414,602.576,577.273,582.078,597.997,439.765,636.051,630.875


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
838368,0,0,1,0.347,0,2022-05-29 00:00:00,270.0,838368,0,True,2022-05-29
838369,0,0,1,333.312,1,2022-05-29 00:00:00,270.0,838369,0,True,2022-05-29
838370,0,0,2,0.000,0,2022-05-29 00:00:00,270.0,838370,1,True,2022-05-29
838371,0,0,2,15.525,1,2022-05-29 00:00:00,270.0,838371,1,True,2022-05-29
838372,0,0,3,1.746,0,2022-05-29 00:00:00,270.0,838372,2,True,2022-05-29
...,...,...,...,...,...,...,...,...,...,...,...
909115,15,1,0,,1,2022-06-19 23:00:00,,909115,64,False,2022-06-19
909116,15,1,1,,0,2022-06-19 23:00:00,,909116,59,False,2022-06-19
909117,15,1,1,,1,2022-06-19 23:00:00,,909117,59,False,2022-06-19
909118,15,1,3,,0,2022-06-19 23:00:00,,909118,60,False,2022-06-19


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-19 00:00:00,0,0,1,1.332,0,,899472,0,0_0_1_0,,...,,1.332000,1.332000,1.332000,1.332000,1.332000,0.243625,0.207571,0.377125,0.615857
2022-06-19 00:00:00,0,0,1,277.751,1,,899473,0,0_0_1_1,,...,,277.751000,277.751000,351.227333,351.227333,351.227333,51.506400,53.434000,88.283217,167.093143
2022-06-19 00:00:00,0,0,2,0.000,0,,899474,1,0_0_2_0,,...,,0.000000,0.000000,1.552000,1.552000,1.552000,0.000000,0.000000,0.408875,0.537000
2022-06-19 00:00:00,0,0,2,12.373,1,,899475,1,0_0_2_1,,...,,12.373000,12.373000,387.965500,387.965500,387.965500,7.023500,7.023500,79.671045,130.320857
2022-06-19 00:00:00,0,0,3,3.324,0,,899476,2,0_0_3_0,,...,,3.324000,3.324000,2.328000,2.328000,2.328000,0.714909,0.948143,0.426652,0.539857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-05 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,199.429375,123.583286,138.521250,405.987375,409.156143,589.942458,726.425714
2022-07-05 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.191143,0.000000,0.170750,0.000000,3.030542,6.584429
2022-07-05 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,201.710417,113.343714,118.834750,167.900375,68.269000,783.123958,68.716000
2022-07-05 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.191143,0.000000,3.104125,0.147857,2.973042,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.347,0,2022-05-29 00:00:00,270.0,838368,0,True,...,,,,,,,,,,
1,0,0,1,333.312,1,2022-05-29 00:00:00,270.0,838369,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-29 00:00:00,270.0,838370,1,True,...,,,,,,,,,,
3,0,0,2,15.525,1,2022-05-29 00:00:00,270.0,838371,1,True,...,,,,,,,,,,
4,0,0,3,1.746,0,2022-05-29 00:00:00,270.0,838372,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67663,15,1,1,,1,2022-06-19 23:00:00,,909117,59,False,...,,22.278500,41.466,159.321417,165.211333,209.474500,165.546625,154.656143,763.033458,649.469571
67664,15,1,3,,0,2022-06-19 23:00:00,,909118,60,False,...,,209.738292,1.002,81.497917,0.334000,0.334000,3.107583,0.203286,0.131583,0.239571
67665,15,1,3,,0,2022-06-19 23:00:00,,909118,60,False,...,,209.570917,0.501,56.787292,0.200400,0.250500,3.225542,0.712714,0.236042,0.564000
67666,15,1,3,,1,2022-06-19 23:00:00,,909119,60,False,...,,385.698958,330.582,153.254708,161.317333,161.317333,1028.909292,1245.622714,644.121500,450.559857


0
Time to process test data: 19.061984062194824
Time to add test to df: 2.0173850059509277
Day: 16
Preds predicted
Time to add test: 0.02540874481201172


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-20 00:00:00,15,1,3,349.819,1,,902821,60,0,0,15_1_3_1
2022-06-20 00:00:00,11,1,0,0.000,0,,902786,67,0,0,11_1_0_0
2022-06-20 00:00:00,11,0,3,259.353,1,,902785,45,0,0,11_0_3_1
2022-06-20 00:00:00,11,0,3,1.029,0,,902784,45,0,0,11_0_3_0
2022-06-20 00:00:00,11,0,2,1.250,1,,902783,44,0,0,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-07 23:00:00,13,1,3,0.000,0,,851212,52,23,3,13_1_3_0
2022-07-07 23:00:00,14,1,1,98.738,1,,851219,55,23,3,14_1_1_1
2022-07-07 23:00:00,7,1,1,0.000,0,,851164,29,23,3,7_1_1_0
2022-07-07 23:00:00,12,1,3,570.265,1,,851205,49,23,3,12_1_3_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-20 01:00:00,0,0,1,0.924,0,,902688,0,0,0,0_0_1_0
2022-06-20 02:00:00,0,0,1,0.695,0,,902822,0,1,0,0_0_1_0
2022-06-20 03:00:00,0,0,1,0.484,0,,902956,0,2,0,0_0_1_0
2022-06-20 04:00:00,0,0,1,0.449,0,,903090,0,3,0,0_0_1_0
2022-06-20 05:00:00,0,0,1,1.491,0,,903224,0,4,0,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-07 20:00:00,9,1,3,116.610,1,,850645,37,19,3,9_1_3_1
2022-07-07 21:00:00,9,1,3,123.902,1,,850779,37,20,3,9_1_3_1
2022-07-07 22:00:00,9,1,3,94.437,1,,850913,37,21,3,9_1_3_1
2022-07-07 23:00:00,9,1,3,94.907,1,,851047,37,22,3,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-20 00:00:00,15,1,3,349.819,1,,902821,60,0,0,...,,,,,,,,,,
2022-06-20 00:00:00,11,1,0,0.000,0,,902786,67,0,0,...,,,,,,,,,,
2022-06-20 00:00:00,11,0,3,259.353,1,,902785,45,0,0,...,,,,,,,,,,
2022-06-20 00:00:00,11,0,3,1.029,0,,902784,45,0,0,...,,,,,,,,,,
2022-06-20 00:00:00,11,0,2,1.250,1,,902783,44,0,0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-07 23:00:00,13,1,3,0.000,0,,851212,52,23,3,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-07-07 23:00:00,14,1,1,98.738,1,,851219,55,23,3,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-07-07 23:00:00,7,1,1,0.000,0,,851164,29,23,3,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-07-07 23:00:00,12,1,3,570.265,1,,851205,49,23,3,...,,,575.473,427.516,452.007,529.867,576.029,545.517,415.084,594.044


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
841584,0,0,1,0.795,0,2022-05-30 00:00:00,271.0,841584,0,True,2022-05-30
841585,0,0,1,322.507,1,2022-05-30 00:00:00,271.0,841585,0,True,2022-05-30
841586,0,0,2,0.000,0,2022-05-30 00:00:00,271.0,841586,1,True,2022-05-30
841587,0,0,2,12.182,1,2022-05-30 00:00:00,271.0,841587,1,True,2022-05-30
841588,0,0,3,1.073,0,2022-05-30 00:00:00,271.0,841588,2,True,2022-05-30
...,...,...,...,...,...,...,...,...,...,...,...
912331,15,1,0,,1,2022-06-20 23:00:00,,912331,64,False,2022-06-20
912332,15,1,1,,0,2022-06-20 23:00:00,,912332,59,False,2022-06-20
912333,15,1,1,,1,2022-06-20 23:00:00,,912333,59,False,2022-06-20
912334,15,1,3,,0,2022-06-20 23:00:00,,912334,60,False,2022-06-20


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-20 00:00:00,0,0,1,0.924,0,,902688,0,0_0_1_0,,...,,0.924000,0.924000,0.924000,0.924000,0.92400,0.252000,0.154429,0.346938,0.442714
2022-06-20 00:00:00,0,0,1,295.605,1,,902689,0,0_0_1_1,,...,,295.605000,295.605000,345.970000,345.970000,345.97000,54.343400,56.485571,90.602217,165.191571
2022-06-20 00:00:00,0,0,2,0.000,0,,902690,1,0_0_2_0,,...,,0.000000,0.000000,0.860000,0.860000,0.86000,0.000000,0.000000,0.333875,0.348714
2022-06-20 00:00:00,0,0,2,16.511,1,,902691,1,0_0_2_1,,...,,16.511000,16.511000,371.152500,371.152500,371.15250,8.880500,8.880500,81.283909,126.111571
2022-06-20 00:00:00,0,0,3,1.656,0,,902692,2,0_0_3_0,,...,,1.656000,1.656000,1.290000,1.290000,1.29000,0.545182,0.640000,0.348391,0.351714
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-07 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,196.919714,138.10900,423.324333,464.416571,772.501208,425.718000
2022-07-07 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,53.938667,0.248000,0.00000,0.017250,0.000000,3.040167,1.460429
2022-07-07 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,224.441000,182.14675,175.678542,181.577143,884.390458,2002.585143
2022-07-07 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,62.757833,0.248000,0.00000,5.393833,0.155857,2.357625,0.105143


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.795,0,2022-05-30 00:00:00,271.0,841584,0,True,...,,,,,,,,,,
1,0,0,1,322.507,1,2022-05-30 00:00:00,271.0,841585,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-30 00:00:00,271.0,841586,1,True,...,,,,,,,,,,
3,0,0,2,12.182,1,2022-05-30 00:00:00,271.0,841587,1,True,...,,,,,,,,,,
4,0,0,3,1.073,0,2022-05-30 00:00:00,271.0,841588,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67531,15,1,0,,1,2022-06-20 23:00:00,,912331,64,False,...,,121.180667,112.751,137.403458,135.662667,135.662667,319.826833,211.266000,301.165458,195.186857
67532,15,1,1,,0,2022-06-20 23:00:00,,912332,59,False,...,,70.262000,0.000,21.798083,0.000000,0.000000,0.075500,0.000000,2.743958,1.460429
67533,15,1,1,,1,2022-06-20 23:00:00,,912333,59,False,...,,19.119542,36.757,137.555208,147.118500,147.118500,141.726667,90.119750,298.061917,190.586571
67534,15,1,3,,0,2022-06-20 23:00:00,,912334,60,False,...,,152.047667,2.000,16.193750,0.666667,0.666667,2.886917,0.308286,0.131583,0.286143


0
Time to process test data: 19.158453464508057
Time to add test to df: 2.0473251342773438
Day: 17
Preds predicted
Time to add test: 0.024749040603637695


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-21 00:00:00,15,1,3,289.824,1,,906037,60,0,1,15_1_3_1
2022-06-21 00:00:00,11,1,0,0.000,0,,906002,67,0,1,11_1_0_0
2022-06-21 00:00:00,11,0,3,244.368,1,,906001,45,0,1,11_0_3_1
2022-06-21 00:00:00,11,0,3,0.749,0,,906000,45,0,1,11_0_3_0
2022-06-21 00:00:00,11,0,2,1.247,1,,905999,44,0,1,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-09 23:00:00,4,0,1,0.018,0,,851134,15,23,5,4_0_1_0
2022-07-09 23:00:00,3,1,3,747.461,1,,851133,14,23,5,3_1_3_1
2022-07-09 23:00:00,14,0,3,124.207,1,,851217,54,23,5,14_0_3_1
2022-07-09 23:00:00,0,1,2,0.000,0,,851108,61,23,5,0_1_2_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-21 01:00:00,0,0,1,0.590,0,,905904,0,0,1,0_0_1_0
2022-06-21 02:00:00,0,0,1,0.553,0,,906038,0,1,1,0_0_1_0
2022-06-21 03:00:00,0,0,1,0.586,0,,906172,0,2,1,0_0_1_0
2022-06-21 04:00:00,0,0,1,0.589,0,,906306,0,3,1,0_0_1_0
2022-06-21 05:00:00,0,0,1,3.845,0,,906440,0,4,1,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-09 20:00:00,9,1,3,116.610,1,,850645,37,19,5,9_1_3_1
2022-07-09 21:00:00,9,1,3,123.902,1,,850779,37,20,5,9_1_3_1
2022-07-09 22:00:00,9,1,3,94.437,1,,850913,37,21,5,9_1_3_1
2022-07-09 23:00:00,9,1,3,94.907,1,,851047,37,22,5,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-21 00:00:00,15,1,3,289.824,1,,906037,60,0,1,...,,,,,,,,,,
2022-06-21 00:00:00,11,1,0,0.000,0,,906002,67,0,1,...,,,,,,,,,,
2022-06-21 00:00:00,11,0,3,244.368,1,,906001,45,0,1,...,,,,,,,,,,
2022-06-21 00:00:00,11,0,3,0.749,0,,906000,45,0,1,...,,,,,,,,,,
2022-06-21 00:00:00,11,0,2,1.247,1,,905999,44,0,1,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-09 23:00:00,4,0,1,0.018,0,,851134,15,23,5,...,,,0.001,0.002,0.000,0.001,0.005,0.000,0.002,0.003
2022-07-09 23:00:00,3,1,3,747.461,1,,851133,14,23,5,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-07-09 23:00:00,14,0,3,124.207,1,,851217,54,23,5,...,,,114.868,120.525,124.748,147.345,116.591,106.001,135.410,142.885
2022-07-09 23:00:00,0,1,2,0.000,0,,851108,61,23,5,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
844800,0,0,1,0.660,0,2022-05-31 00:00:00,272.0,844800,0,True,2022-05-31
844801,0,0,1,246.110,1,2022-05-31 00:00:00,272.0,844801,0,True,2022-05-31
844802,0,0,2,0.000,0,2022-05-31 00:00:00,272.0,844802,1,True,2022-05-31
844803,0,0,2,12.608,1,2022-05-31 00:00:00,272.0,844803,1,True,2022-05-31
844804,0,0,3,2.106,0,2022-05-31 00:00:00,272.0,844804,2,True,2022-05-31
...,...,...,...,...,...,...,...,...,...,...,...
915547,15,1,0,,1,2022-06-21 23:00:00,,915547,64,False,2022-06-21
915548,15,1,1,,0,2022-06-21 23:00:00,,915548,59,False,2022-06-21
915549,15,1,1,,1,2022-06-21 23:00:00,,915549,59,False,2022-06-21
915550,15,1,3,,0,2022-06-21 23:00:00,,915550,60,False,2022-06-21


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-21 00:00:00,0,0,1,0.590,0,,905904,0,0_0_1_0,,...,,0.590000,0.590000,0.590000,0.590000,0.590000,0.102625,0.085286,0.325562,0.471571
2022-06-21 00:00:00,0,0,1,270.771,1,,905905,0,0_0_1_1,,...,,270.771000,270.771000,364.840333,364.840333,364.840333,49.944600,51.874000,91.934391,175.896571
2022-06-21 00:00:00,0,0,2,0.000,0,,905906,1,0_0_2_0,,...,,0.000000,0.000000,0.484667,0.484667,0.484667,0.000000,0.000000,0.308667,0.310857
2022-06-21 00:00:00,0,0,2,15.917,1,,905907,1,0_0_2_1,,...,,15.917000,15.917000,411.875000,411.875000,411.875000,8.582000,8.582000,83.805455,139.848000
2022-06-21 00:00:00,0,0,3,0.864,0,,905908,2,0_0_3_0,,...,,0.864000,0.864000,0.727000,0.727000,0.727000,0.598818,0.700429,0.322087,0.314143
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-09 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,116.450571,181.676625,110.410286,138.521250,405.987375,409.156143,608.449500,796.392000
2022-07-09 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,56.135083,0.191143,0.000000,0.186833,0.000000,3.027000,8.720714
2022-07-09 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.890857,201.710417,113.343714,118.834750,170.441250,77.682571,783.123958,64.312714
2022-07-09 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.579571,51.345667,0.191143,0.000000,4.983708,0.135714,2.969500,0.000000


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,0.660,0,2022-05-31 00:00:00,272.0,844800,0,True,...,,,,,,,,,,
1,0,0,1,246.110,1,2022-05-31 00:00:00,272.0,844801,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-05-31 00:00:00,272.0,844802,1,True,...,,,,,,,,,,
3,0,0,2,12.608,1,2022-05-31 00:00:00,272.0,844803,1,True,...,,,,,,,,,,
4,0,0,3,2.106,0,2022-05-31 00:00:00,272.0,844804,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67531,15,1,0,,1,2022-06-21 23:00:00,,915547,64,False,...,,110.585292,104.252,122.388792,148.548,148.548,279.682708,211.917500,293.748417,183.403714
67532,15,1,1,,0,2022-06-21 23:00:00,,915548,59,False,...,,99.786167,0.000,51.458375,0.000,0.000,1.074583,0.000000,2.511208,1.498714
67533,15,1,1,,1,2022-06-21 23:00:00,,915549,59,False,...,,17.226708,32.054,123.052792,170.696,170.696,138.025875,89.650750,291.611083,179.331143
67534,15,1,3,,0,2022-06-21 23:00:00,,915550,60,False,...,,232.920875,2.001,42.841375,0.667,0.667,2.690833,0.340286,0.145500,0.287000


0
Time to process test data: 19.453450679779053
Time to add test to df: 2.0518269538879395
Day: 18
Preds predicted
Time to add test: 0.03124713897705078


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-22 00:00:00,15,1,3,306.971,1,,909253,60,0,2,15_1_3_1
2022-06-22 00:00:00,11,1,0,0.000,0,,909218,67,0,2,11_1_0_0
2022-06-22 00:00:00,11,0,3,279.501,1,,909217,45,0,2,11_0_3_1
2022-06-22 00:00:00,11,0,3,0.753,0,,909216,45,0,2,11_0_3_0
2022-06-22 00:00:00,11,0,2,1.199,1,,909215,44,0,2,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-11 23:00:00,14,1,3,1074.844,1,,851221,56,23,0,14_1_3_1
2022-07-11 23:00:00,13,1,3,0.000,0,,851212,52,23,0,13_1_3_0
2022-07-11 23:00:00,14,1,1,98.738,1,,851219,55,23,0,14_1_1_1
2022-07-11 23:00:00,13,0,1,8.755,1,,851207,50,23,0,13_0_1_1


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-22 01:00:00,0,0,1,1.055,0,,909120,0,0,2,0_0_1_0
2022-06-22 02:00:00,0,0,1,1.080,0,,909254,0,1,2,0_0_1_0
2022-06-22 03:00:00,0,0,1,0.718,0,,909388,0,2,2,0_0_1_0
2022-06-22 04:00:00,0,0,1,0.844,0,,909522,0,3,2,0_0_1_0
2022-06-22 05:00:00,0,0,1,2.081,0,,909656,0,4,2,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-11 20:00:00,9,1,3,116.610,1,,850645,37,19,0,9_1_3_1
2022-07-11 21:00:00,9,1,3,123.902,1,,850779,37,20,0,9_1_3_1
2022-07-11 22:00:00,9,1,3,94.437,1,,850913,37,21,0,9_1_3_1
2022-07-11 23:00:00,9,1,3,94.907,1,,851047,37,22,0,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-22 00:00:00,15,1,3,306.971,1,,909253,60,0,2,...,,,,,,,,,,
2022-06-22 00:00:00,11,1,0,0.000,0,,909218,67,0,2,...,,,,,,,,,,
2022-06-22 00:00:00,11,0,3,279.501,1,,909217,45,0,2,...,,,,,,,,,,
2022-06-22 00:00:00,11,0,3,0.753,0,,909216,45,0,2,...,,,,,,,,,,
2022-06-22 00:00:00,11,0,2,1.199,1,,909215,44,0,2,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-11 23:00:00,14,1,3,1074.844,1,,851221,56,23,0,...,,,887.351,568.845,523.650,801.878,931.881,954.863,497.578,815.446
2022-07-11 23:00:00,13,1,3,0.000,0,,851212,52,23,0,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000
2022-07-11 23:00:00,14,1,1,98.738,1,,851219,55,23,0,...,,,100.536,105.151,105.583,100.211,92.691,87.873,111.337,100.417
2022-07-11 23:00:00,13,0,1,8.755,1,,851207,50,23,0,...,,,6.397,7.129,11.129,10.394,7.341,5.790,7.256,11.858


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['datetime'] = df['datetime'] + dt.timedelta(days=1)


'TRAIN'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,date
848016,0,0,1,1.851,0,2022-06-01 00:00:00,273.0,848016,0,True,2022-06-01
848017,0,0,1,306.098,1,2022-06-01 00:00:00,273.0,848017,0,True,2022-06-01
848018,0,0,2,0.000,0,2022-06-01 00:00:00,273.0,848018,1,True,2022-06-01
848019,0,0,2,14.681,1,2022-06-01 00:00:00,273.0,848019,1,True,2022-06-01
848020,0,0,3,2.490,0,2022-06-01 00:00:00,273.0,848020,2,True,2022-06-01
...,...,...,...,...,...,...,...,...,...,...,...
918763,15,1,0,,1,2022-06-22 23:00:00,,918763,64,False,2022-06-22
918764,15,1,1,,0,2022-06-22 23:00:00,,918764,59,False,2022-06-22
918765,15,1,1,,1,2022-06-22 23:00:00,,918765,59,False,2022-06-22
918766,15,1,3,,0,2022-06-22 23:00:00,,918766,60,False,2022-06-22


'revealed_targets'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,id,target_lag_1h,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-22 00:00:00,0,0,1,1.055,0,,909120,0,0_0_1_0,,...,,1.055000,1.055000,1.055000,1.055000,1.055000,0.326500,0.168857,0.352250,0.408714
2022-06-22 00:00:00,0,0,1,271.659,1,,909121,0,0_0_1_1,,...,,271.659000,271.659000,354.336333,354.336333,354.336333,53.884500,53.712143,93.391957,171.255286
2022-06-22 00:00:00,0,0,2,0.000,0,,909122,1,0_0_2_0,,...,,0.000000,0.000000,1.228333,1.228333,1.228333,0.000000,0.000000,0.354375,0.406429
2022-06-22 00:00:00,0,0,2,12.213,1,,909123,1,0_0_2_1,,...,,12.213000,12.213000,395.675000,395.675000,395.675000,6.706000,6.706000,85.288909,135.379000
2022-06-22 00:00:00,0,0,3,2.630,0,,909124,2,0_0_3_0,,...,,2.630000,2.630000,1.842500,1.842500,1.842500,0.535727,0.649714,0.369783,0.409857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-11 23:00:00,15,1,0,114.072,1,,851227,64,15_1_0_1,,...,124.325,145.376000,117.519429,189.915917,196.919714,138.109000,423.324333,464.416571,799.801333,319.714857
2022-07-11 23:00:00,15,1,1,0.000,0,,851228,59,15_1_1_0,,...,0.000,74.130667,0.000000,60.164833,0.248000,0.000000,0.000375,0.000000,3.036625,0.135714
2022-07-11 23:00:00,15,1,1,36.401,1,,851229,59,15_1_1_1,,...,41.927,42.307917,38.659857,192.196958,224.441000,182.146750,175.678542,260.374571,711.329250,1494.959286
2022-07-11 23:00:00,15,1,3,0.000,0,,851230,60,15_1_3_0,,...,0.000,140.917625,0.722429,62.757833,0.248000,0.000000,5.393833,0.155857,2.358292,0.132714


'MERGED with revealed targets'

Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id,currently_scored,...,target_lag_288h,target_rolling_avg_24h,target_rolling_avg_hour_7d,target_rolling_allp_avg_24h,target_rolling_allp_avg_hour_7d,target_rolling_allp_avg_hour_hour_day_4w,target_rolling_avg_24h_estonia,target_rolling_avg_hour_7d_estonia,target_rolling_allp_avg_24h_estonia,target_rolling_allp_avg_hour_7d_estonia
0,0,0,1,1.851,0,2022-06-01 00:00:00,273.0,848016,0,True,...,,,,,,,,,,
1,0,0,1,306.098,1,2022-06-01 00:00:00,273.0,848017,0,True,...,,,,,,,,,,
2,0,0,2,0.000,0,2022-06-01 00:00:00,273.0,848018,1,True,...,,,,,,,,,,
3,0,0,2,14.681,1,2022-06-01 00:00:00,273.0,848019,1,True,...,,,,,,,,,,
4,0,0,3,2.490,0,2022-06-01 00:00:00,273.0,848020,2,True,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67531,15,1,0,,1,2022-06-22 23:00:00,,918763,64,False,...,,122.483542,109.079,138.085917,152.465667,152.465667,364.482542,212.082500,346.493875,194.185286
67532,15,1,1,,0,2022-06-22 23:00:00,,918764,59,False,...,,114.813292,0.000,80.508792,0.000000,0.000000,0.661833,0.000000,2.489000,1.479286
67533,15,1,1,,1,2022-06-22 23:00:00,,918765,59,False,...,,19.511083,30.824,139.593500,174.159000,174.159000,182.731625,91.135500,345.046833,188.833143
67534,15,1,3,,0,2022-06-22 23:00:00,,918766,60,False,...,,244.236625,5.225,65.315000,1.741667,1.741667,2.836125,0.800286,0.282208,0.747857


0
Time to process test data: 19.214780569076538
Time to add test to df: 2.0985302925109863
Day: 19
Preds predicted
Time to add test: 0.015624284744262695


'df'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-23 00:00:00,15,1,3,336.545,1,,912469,60,0,3,15_1_3_1
2022-06-23 00:00:00,11,1,0,0.000,0,,912434,67,0,3,11_1_0_0
2022-06-23 00:00:00,11,0,3,256.656,1,,912433,45,0,3,11_0_3_1
2022-06-23 00:00:00,11,0,3,0.674,0,,912432,45,0,3,11_0_3_0
2022-06-23 00:00:00,11,0,2,1.303,1,,912431,44,0,3,11_0_2_1
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-13 23:00:00,4,0,1,9.686,1,,851135,15,23,2,4_0_1_1
2022-07-13 23:00:00,4,0,1,0.018,0,,851134,15,23,2,4_0_1_0
2022-07-13 23:00:00,3,1,3,747.461,1,,851133,14,23,2,3_1_3_1
2022-07-13 23:00:00,14,1,1,0.000,0,,851218,55,23,2,14_1_1_0


'lagged_feature 1'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,id
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-06-23 01:00:00,0,0,1,0.920,0,,912336,0,0,3,0_0_1_0
2022-06-23 02:00:00,0,0,1,0.908,0,,912470,0,1,3,0_0_1_0
2022-06-23 03:00:00,0,0,1,0.807,0,,912604,0,2,3,0_0_1_0
2022-06-23 04:00:00,0,0,1,0.695,0,,912738,0,3,3,0_0_1_0
2022-06-23 05:00:00,0,0,1,2.194,0,,912872,0,4,3,0_0_1_0
...,...,...,...,...,...,...,...,...,...,...,...
2022-07-13 20:00:00,9,1,3,116.610,1,,850645,37,19,2,9_1_3_1
2022-07-13 21:00:00,9,1,3,123.902,1,,850779,37,20,2,9_1_3_1
2022-07-13 22:00:00,9,1,3,94.437,1,,850913,37,21,2,9_1_3_1
2022-07-13 23:00:00,9,1,3,94.907,1,,851047,37,22,2,9_1_3_1


'df 2'

Unnamed: 0_level_0,county,is_business,product_type,target,is_consumption,data_block_id,row_id,prediction_unit_id,hour,day,...,target_lag_23h,target_lag_24h,target_lag_48h,target_lag_72h,target_lag_96h,target_lag_120h,target_lag_144h,target_lag_168h,target_lag_264h,target_lag_288h
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-06-23 00:00:00,15,1,3,336.545,1,,912469,60,0,3,...,,,,,,,,,,
2022-06-23 00:00:00,11,1,0,0.000,0,,912434,67,0,3,...,,,,,,,,,,
2022-06-23 00:00:00,11,0,3,256.656,1,,912433,45,0,3,...,,,,,,,,,,
2022-06-23 00:00:00,11,0,3,0.674,0,,912432,45,0,3,...,,,,,,,,,,
2022-06-23 00:00:00,11,0,2,1.303,1,,912431,44,0,3,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-13 23:00:00,4,0,1,9.686,1,,851135,15,23,2,...,,,9.324,9.340,16.628,9.969,8.879,9.171,10.627,9.666
2022-07-13 23:00:00,4,0,1,0.018,0,,851134,15,23,2,...,,,0.001,0.002,0.000,0.001,0.005,0.000,0.002,0.003
2022-07-13 23:00:00,3,1,3,747.461,1,,851133,14,23,2,...,,,739.677,777.866,761.635,754.692,735.991,725.900,731.024,741.122
2022-07-13 23:00:00,14,1,1,0.000,0,,851218,55,23,2,...,,,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000


KeyboardInterrupt: 

## Major time is from processing test data. 

Takes 15s initially, and then already up to 18s after 5 days or so. If this stayed constant, would take 100 minutes to process data. Which is a lot, but is manageable I think. But it grows constantly. So I need a way to cut the time down. I can do this by truncating the test dfs so that they only store 14~18 days of data, and it never grows beyond that. I can also do this by using vaex or something similar (maybe).

In [None]:
# display(test_data)
display( test_data.columns[~test_data.columns.isin(data_processor.df.columns)])