This R jupyter notebook contains exploratory data analysis and data preprocessing.

Description of dataframes:

* **train**: training dataset containing series id of the buildings, timestamp, their corresponding power consumption, surrounding temperature from weather stations
* **meta**: contains meta data about building such as surface area, day of the week
* **test**: description same as train
* **submission format**: contains the format in which the submission csv needs to be in. The series id have windows against them that needs the power consumption in hourly, daily or bi-weekly basis.

In [1]:
options(warn=-1)  # Turn off warnings

In [2]:
requiredPackages = c('caret', 'e1071','broom',
                     'tidyverse', 'lubridate','mice', 'magrittr',
                     'prophet', 'zoo', 'forecast', 'xts')
for(p in requiredPackages)
{
  if(!require(p,character.only = TRUE)) 
  {
    install.packages(p,dependencies=TRUE, repos='http://cran.rstudio.com/')
    print(paste(p,"installed successfully"))
    library(p,character.only = TRUE)
  }
  else{print(paste(p , "library already installed"))}
}
print("List of the attached packages in your session: " )
search()

Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2


[1] "caret library already installed"


Loading required package: e1071


[1] "e1071 library already installed"


Loading required package: broom


[1] "broom library already installed"


Loading required package: tidyverse
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v tibble  1.4.1     v purrr   0.2.4
v tidyr   0.7.2     v dplyr   0.7.4
v readr   1.1.1     v stringr 1.2.0
v tibble  1.4.1     v forcats 0.2.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter()  masks stats::filter()
x purrr::flatten() masks jsonlite::flatten()
x dplyr::lag()     masks stats::lag()
x purrr::lift()    masks caret::lift()


[1] "tidyverse library already installed"


Loading required package: lubridate

Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date



[1] "lubridate library already installed"


Loading required package: mice

Attaching package: 'mice'

The following object is masked from 'package:tidyr':

    complete

The following objects are masked from 'package:base':

    cbind, rbind



[1] "mice library already installed"


Loading required package: magrittr

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract



[1] "magrittr library already installed"


Loading required package: prophet
Loading required package: Rcpp


[1] "prophet library already installed"


Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric



[1] "zoo library already installed"


Loading required package: forecast


[1] "forecast library already installed"


Loading required package: xts

Attaching package: 'xts'

The following objects are masked from 'package:dplyr':

    first, last



[1] "xts library already installed"
[1] "List of the attached packages in your session: "


**Import the datasets**

The advantage of reading dplyr::read_csv() is that it automatically detects the datatype of the columns and parses them correctly to be used in the dataframe. Also, its faster than the default read.csv() from base R.

In [3]:
train <- read_csv('../datasets/consumption_train.csv')
meta <- read_csv('../datasets/meta.csv')
test <- read_csv('../datasets/cold_start_test.csv')
submission_format <- read_csv('../datasets/submission_format.csv')

Parsed with column specification:
cols(
  X1 = col_integer(),
  series_id = col_integer(),
  timestamp = col_datetime(format = ""),
  consumption = col_double(),
  temperature = col_double()
)
Parsed with column specification:
cols(
  series_id = col_integer(),
  surface = col_character(),
  base_temperature = col_character(),
  monday_is_day_off = col_character(),
  tuesday_is_day_off = col_character(),
  wednesday_is_day_off = col_character(),
  thursday_is_day_off = col_character(),
  friday_is_day_off = col_character(),
  saturday_is_day_off = col_character(),
  sunday_is_day_off = col_character()
)
Parsed with column specification:
cols(
  X1 = col_integer(),
  series_id = col_integer(),
  timestamp = col_datetime(format = ""),
  consumption = col_double(),
  temperature = col_double()
)
Parsed with column specification:
cols(
  pred_id = col_integer(),
  series_id = col_integer(),
  timestamp = col_datetime(format = ""),
  temperature = col_double(),
  consumption = col_double(),

Lets take a look at the **structure** of the data to get a understanding of what the datasets consists of, what their datatypes are etc.

In [4]:
# 1 take a look at the structure of the data
str(meta)   # 1383 obs. of  10 variables

Classes 'tbl_df', 'tbl' and 'data.frame':	1383 obs. of  10 variables:
 $ series_id           : int  100003 100004 100006 100008 100010 100012 100017 100020 100021 100025 ...
 $ surface             : chr  "x-large" "x-large" "x-small" "x-small" ...
 $ base_temperature    : chr  "low" "low" "low" "low" ...
 $ monday_is_day_off   : chr  "False" "False" "False" "False" ...
 $ tuesday_is_day_off  : chr  "False" "False" "False" "False" ...
 $ wednesday_is_day_off: chr  "False" "False" "False" "False" ...
 $ thursday_is_day_off : chr  "False" "False" "False" "False" ...
 $ friday_is_day_off   : chr  "False" "False" "False" "False" ...
 $ saturday_is_day_off : chr  "True" "True" "True" "True" ...
 $ sunday_is_day_off   : chr  "True" "True" "True" "True" ...
 - attr(*, "spec")=List of 2
  ..$ cols   :List of 10
  .. ..$ series_id           : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ surface             : list()
  .. .. ..- attr(*, "class")= chr  "collect

In [5]:
str(train)  # 509376 obs. of  5 variables

Classes 'tbl_df', 'tbl' and 'data.frame':	509376 obs. of  5 variables:
 $ X1         : int  0 1 2 3 4 5 6 7 8 9 ...
 $ series_id  : int  103088 103088 103088 103088 103088 103088 103088 103088 103088 103088 ...
 $ timestamp  : POSIXct, format: "2014-12-24 00:00:00" "2014-12-24 01:00:00" ...
 $ consumption: num  101842 105878 91619 94474 96977 ...
 $ temperature: num  NA NA NA NA NA NA NA NA NA NA ...
 - attr(*, "spec")=List of 2
  ..$ cols   :List of 5
  .. ..$ X1         : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ series_id  : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ timestamp  :List of 1
  .. .. ..$ format: chr ""
  .. .. ..- attr(*, "class")= chr  "collector_datetime" "collector"
  .. ..$ consumption: list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  .. ..$ temperature: list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  ..$ default: list()
  .. ..- attr

In [6]:
str(test)   # 111984 obs. of  5 variables

Classes 'tbl_df', 'tbl' and 'data.frame':	111984 obs. of  5 variables:
 $ X1         : int  0 1 2 3 4 5 6 7 8 9 ...
 $ series_id  : int  102781 102781 102781 102781 102781 102781 102781 102781 102781 102781 ...
 $ timestamp  : POSIXct, format: "2013-02-27 00:00:00" "2013-02-27 01:00:00" ...
 $ consumption: num  15296 15163 15022 15370 15303 ...
 $ temperature: num  17 18.2 18 17 16.9 ...
 - attr(*, "spec")=List of 2
  ..$ cols   :List of 5
  .. ..$ X1         : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ series_id  : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ timestamp  :List of 1
  .. .. ..$ format: chr ""
  .. .. ..- attr(*, "class")= chr  "collector_datetime" "collector"
  .. ..$ consumption: list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  .. ..$ temperature: list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  ..$ default: list()
  .. ..- attr(*, "class")=

**Summary of the data**

**Summary** of the datasets give us an idea about the descriptive statistics of the data.

In [6]:
summary(meta)

   series_id        surface          base_temperature   monday_is_day_off 
 Min.   :100003   Length:1383        Length:1383        Length:1383       
 1st Qu.:100947   Class :character   Class :character   Class :character  
 Median :101881   Mode  :character   Mode  :character   Mode  :character  
 Mean   :101855                                                           
 3rd Qu.:102756                                                           
 Max.   :103634                                                           
 tuesday_is_day_off wednesday_is_day_off thursday_is_day_off friday_is_day_off 
 Length:1383        Length:1383          Length:1383         Length:1383       
 Class :character   Class :character     Class :character    Class :character  
 Mode  :character   Mode  :character     Mode  :character    Mode  :character  
                                                                               
                                                                           

In [7]:
summary(train)

       X1           series_id        timestamp                  
 Min.   :     0   Min.   :100003   Min.   :2013-01-02 00:00:00  
 1st Qu.:127344   1st Qu.:100998   1st Qu.:2016-02-01 15:00:00  
 Median :254688   Median :101885   Median :2017-01-05 23:30:00  
 Mean   :254688   Mean   :101851   Mean   :2016-09-10 15:19:52  
 3rd Qu.:382031   3rd Qu.:102697   3rd Qu.:2017-07-02 22:00:00  
 Max.   :509375   Max.   :103634   Max.   :2017-12-29 23:00:00  
                                                                
  consumption       temperature    
 Min.   :      0   Min.   :-13.47  
 1st Qu.:  15421   1st Qu.:  8.45  
 Median :  49862   Median : 15.16  
 Mean   : 107624   Mean   : 15.19  
 3rd Qu.: 135166   3rd Qu.: 21.80  
 Max.   :2085110   Max.   : 44.35  
                   NA's   :228689  

In [8]:
summary(test)

       X1           series_id        timestamp                  
 Min.   :     0   Min.   :100004   Min.   :2012-06-01 00:00:00  
 1st Qu.: 27996   1st Qu.:100876   1st Qu.:2015-05-13 19:45:00  
 Median : 55992   Median :101940   Median :2016-05-17 13:00:00  
 Mean   : 55992   Mean   :101881   Mean   :2016-03-15 21:15:26  
 3rd Qu.: 83987   3rd Qu.:102920   3rd Qu.:2017-03-19 17:15:00  
 Max.   :111983   Max.   :103629   Max.   :2017-12-03 23:00:00  
                                                                
  consumption       temperature    
 Min.   :      0   Min.   :-12.21  
 1st Qu.:  14171   1st Qu.: 10.75  
 Median :  38691   Median : 16.00  
 Mean   : 158589   Mean   : 15.98  
 3rd Qu.: 106759   3rd Qu.: 21.60  
 Max.   :5366167   Max.   : 42.00  
                   NA's   :44916   

**Observations**: From the above it is evident, that:
* there are lots of missing values in the temperature column,
* values in the surface column contain categorical data that need to be converted to columns
* there are 0 values in the consumption column that need to be replaced with some other values,
* a majority of the values in consumption column are outliers

In [9]:
# 3 take a look at the top rows of the data
head(meta)

series_id,surface,base_temperature,monday_is_day_off,tuesday_is_day_off,wednesday_is_day_off,thursday_is_day_off,friday_is_day_off,saturday_is_day_off,sunday_is_day_off
100003,x-large,low,False,False,False,False,False,True,True
100004,x-large,low,False,False,False,False,False,True,True
100006,x-small,low,False,False,False,False,False,True,True
100008,x-small,low,False,False,False,False,False,True,True
100010,x-small,low,False,False,False,False,False,True,True
100012,x-large,low,False,False,False,False,False,True,True


In [10]:
head(train)

X1,series_id,timestamp,consumption,temperature
0,103088,2014-12-24 00:00:00,101842.23,
1,103088,2014-12-24 01:00:00,105878.05,
2,103088,2014-12-24 02:00:00,91619.11,
3,103088,2014-12-24 03:00:00,94473.71,
4,103088,2014-12-24 04:00:00,96976.76,
5,103088,2014-12-24 05:00:00,109154.51,


In [11]:
head(test)

X1,series_id,timestamp,consumption,temperature
0,102781,2013-02-27 00:00:00,15295.74,17.0
1,102781,2013-02-27 01:00:00,15163.21,18.25
2,102781,2013-02-27 02:00:00,15022.26,18.0
3,102781,2013-02-27 03:00:00,15370.42,17.0
4,102781,2013-02-27 04:00:00,15303.1,16.9
5,102781,2013-02-27 05:00:00,14553.15,17.0


In [21]:
sapply(meta[-1], table)  # Taking a look at the distribution of the values 

$surface

   large   medium    small  x-large  x-small xx-large xx-small 
     112      172      146      382      314      108      149 

$base_temperature

high  low 
  35 1348 

$monday_is_day_off

False  True 
 1379     4 

$tuesday_is_day_off

False  True 
 1379     4 

$wednesday_is_day_off

False  True 
 1376     7 

$thursday_is_day_off

False  True 
 1376     7 

$friday_is_day_off

False  True 
 1366    17 

$saturday_is_day_off

False  True 
  125  1258 

$sunday_is_day_off

False  True 
  109  1274 


**Observations**: The consumption is more on weekdays as compared to weekends.

In [17]:
# Checking differnce between series_id that are present in meta but not in train or test
metaButNotTrain_sid <- as.vector(setdiff(meta$series_id, train$series_id))     
length(metaButNotTrain_sid)  # 625 values

In [18]:
metaButNotTest_sid <- as.vector(setdiff(meta$series_id, test$series_id))
length(metaButNotTest_sid)   # 758 values

In [19]:
# Missing value 
sort(colMeans(is.na(train)*100), decreasing = TRUE)  # 45% missing values of temperature
sort(colMeans(is.na(test)*100), decreasing = TRUE)   # 40% missing values of temperature

In [26]:
# Looking at range of temperature and consumption in train data, eliminating NA/ Inf
print(paste0("Minimum Temperature: ", min(train$temperature, na.rm = TRUE)))
print(paste0("Maximum Temperature: ", max(train$temperature, na.rm = TRUE)))

[1] "Minimum Temperature: -13.4666666666667"
[1] "Maximum Temperature: 44.35"


In [27]:
print(paste0("Minimum Consumption: ", min(train$consumption, na.rm = TRUE)))
print(paste0("Maximum Consumption: ", max(train$consumption, na.rm = TRUE)))

[1] "Minimum Consumption: 0"
[1] "Maximum Consumption: 2085109.45353471"


**Observations**: 
1. There are lot of missing values in the temperature columns and we will drop this colmn in subsequent steps.
2. There are 0 values in the consumption column which can impact out modeling so we will impute them later

In [28]:
train_trial_xts <- xts(x = train, order.by = train$timestamp, frequency = 24)
head(train_trial_xts)

                    X1       series_id timestamp             consumption   
2013-01-02 00:00:00 "270144" "103298"  "2013-01-02 00:00:00" "1.358024e+05"
2013-01-02 00:00:00 "287616" "102642"  "2013-01-02 00:00:00" "2.884351e+04"
2013-01-02 01:00:00 "270145" "103298"  "2013-01-02 01:00:00" "1.355866e+05"
2013-01-02 01:00:00 "287617" "102642"  "2013-01-02 01:00:00" "2.644524e+04"
2013-01-02 02:00:00 "270146" "103298"  "2013-01-02 02:00:00" "1.413574e+05"
2013-01-02 02:00:00 "287618" "102642"  "2013-01-02 02:00:00" "2.867972e+04"
                    temperature    
2013-01-02 00:00:00 "-3.766667e+00"
2013-01-02 00:00:00 "-3.766667e+00"
2013-01-02 01:00:00 "-3.833333e+00"
2013-01-02 01:00:00 "-3.833333e+00"
2013-01-02 02:00:00 "-3.733333e+00"
2013-01-02 02:00:00 "-3.733333e+00"

**Tidying and transformaing training, meta and testing data**

In the subsequent steps, we will transform the data so that it suitable for our modelling. The steps include:
* Converting dummy variables
* Renameing columns for consistency
* Converting numerical data to categorical data
* Converting boolean values to 0s and 1s
* Dropping columns that are not required
* Joining dataframes meta with train and test

In [12]:
##### tidying meta data #####
# 1. Converting dummy variable for surface
library(magrittr)
meta %<>% 
  mutate(v = 1, sf = surface) %>% 
  spread(sf, v, fill = 0)

In [13]:
# 2. Renaming dummy columns from surface
meta %<>% rename(is_large = large)        # large
meta %<>% rename(is_medium = medium)      # medium
meta %<>% rename(is_Xlarge = 'x-large')   # xlarge
meta %<>% rename(is_Xsmall = 'x-small')   # x small
meta %<>% rename(is_XXlarge = 'xx-large') # xx-large
meta %<>% rename(is_XXsmall = 'xx-small') # xx-small
meta %<>% rename(is_small = small)        # small
meta %<>% rename(is_baseTemperatureHigh = base_temperature) # base Temperature

In [14]:
# 3. change high = 1 and low = 0 in base_temperature
meta$is_baseTemperatureHigh <- if_else(meta$is_baseTemperatureHigh == 'high', 1, 0) 

In [15]:
# 4. Converting False to 0 and True to 1 in days columns
meta %<>% mutate_at(vars(ends_with('_is_day_off')),
                                 funs(case_when(
                                  . =='False' ~ 0,
                                  . == 'True'~ 1)))

In [16]:
# 5. Converting the newly created dummy var datatype from numeric to factor 
# except the series_id column:
meta %<>% mutate_at(vars(-series_id), funs(as.factor(.)))

# Caution: will convert all the column to factor
#### meta %<>% mutate_if(is.numeric, as.factor) 

In [17]:
# 6. Dropping column surface
meta$surface <- NULL 

In [None]:
# Save the objects to  file
saveRDS(meta, file = "./processed/meta.rds")
# Restore the object
openRDS(file = "./processed/meta.rds")
##############################################################

In [18]:
##### tidying train data #####
# Column(s) to add
# Extracting hour from timestamp and storing it as factor
train$hour <- as.factor(hour(train$timestamp))
test$hour <- as.factor(hour(test$timestamp))

In [19]:
head(meta)

series_id,is_baseTemperatureHigh,monday_is_day_off,tuesday_is_day_off,wednesday_is_day_off,thursday_is_day_off,friday_is_day_off,saturday_is_day_off,sunday_is_day_off,is_large,is_medium,is_small,is_Xlarge,is_Xsmall,is_XXlarge,is_XXsmall
100003,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0
100004,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0
100006,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
100008,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
100010,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
100012,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0


In [20]:
head(train)

X1,series_id,timestamp,consumption,temperature,hour
0,103088,2014-12-24 00:00:00,101842.23,,0
1,103088,2014-12-24 01:00:00,105878.05,,1
2,103088,2014-12-24 02:00:00,91619.11,,2
3,103088,2014-12-24 03:00:00,94473.71,,3
4,103088,2014-12-24 04:00:00,96976.76,,4
5,103088,2014-12-24 05:00:00,109154.51,,5


In [21]:
head(test)

X1,series_id,timestamp,consumption,temperature,hour
0,102781,2013-02-27 00:00:00,15295.74,17.0,0
1,102781,2013-02-27 01:00:00,15163.21,18.25,1
2,102781,2013-02-27 02:00:00,15022.26,18.0,2
3,102781,2013-02-27 03:00:00,15370.42,17.0,3
4,102781,2013-02-27 04:00:00,15303.1,16.9,4
5,102781,2013-02-27 05:00:00,14553.15,17.0,5


In [22]:
# Imputing implicit missing values in consumption by taking median of consu in
# xsmall and xxsmall 
# Check summary
summary(train$consumption)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0   15421   49862  107624  135166 2085110 

In [23]:
summary(test$consumption)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0   14171   38691  158589  106759 5366167 

In [24]:
# Count of values that have consumption 0:
train %>% count(consumption == 0)

consumption == 0,n
False,509333
True,43


In [25]:
test %>% count(consumption == 0)

consumption == 0,n
False,111980
True,4


In [26]:
#Converting 0 value of consumption to NA:
train %<>% mutate(consumption = na_if(consumption,0))
test %<>% mutate(consumption = na_if(consumption,0)) 

In [27]:
#Number of missing values before implicit conversion:
sum(is.na(train$consumption))

In [28]:
sum(is.na(test$consumption))

In [29]:
# Replacing NA with median values 
train %<>% mutate_at(vars(consumption), funs(ifelse(is.na(.),median(., na.rm = TRUE),.)))

In [30]:
test %<>% mutate_at(vars(consumption), funs(ifelse(is.na(.), median(., na.rm = TRUE), .)))

In [31]:
# Check summary
summary(train$consumption)

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
      6.8   15433.6   49864.9  107628.0  135166.1 2085109.5 

In [32]:
summary(test$consumption)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     13   14180   38693  158591  106759 5366167 

In [33]:
# Column(s) to drop
train$X1 <- NULL
test$X1 <- NULL

In [34]:
train$temperature <- NULL
test$temperature <- NULL

In [35]:
# Names of columns of train data
colnames(train)

In [36]:
colnames(test)

In [None]:
######### Save the data so far #############################
# Save an object to a file
saveRDS(train, file = "./processed/train.rds")
# Restore the object
train <- openRDS(file = "./processed/train.rds")

In [None]:
saveRDS(test, file = "./processed/test.rds")
# Restore the object
test <- openRDS(file = "./processed/test.rds")

**Joining data**

In [40]:
# left join between train and meta
train_meta_lj <- left_join(x= train, y = meta, suffix = c(".x", ".m1"))

Joining, by = "series_id"


In [41]:
head(train_meta_lj)

series_id,timestamp,consumption,hour,is_baseTemperatureHigh,monday_is_day_off,tuesday_is_day_off,wednesday_is_day_off,thursday_is_day_off,friday_is_day_off,saturday_is_day_off,sunday_is_day_off,is_large,is_medium,is_small,is_Xlarge,is_Xsmall,is_XXlarge,is_XXsmall
103088,2014-12-24 00:00:00,101842.23,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
103088,2014-12-24 01:00:00,105878.05,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
103088,2014-12-24 02:00:00,91619.11,2,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
103088,2014-12-24 03:00:00,94473.71,3,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
103088,2014-12-24 04:00:00,96976.76,4,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0
103088,2014-12-24 05:00:00,109154.51,5,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0


In [42]:
# left join between test and meta
test_meta_lj <- left_join(x= test, y= meta, suffix = c(".y", ".m2"))

Joining, by = "series_id"


In [43]:
head(test_meta_lj)

series_id,timestamp,consumption,hour,is_baseTemperatureHigh,monday_is_day_off,tuesday_is_day_off,wednesday_is_day_off,thursday_is_day_off,friday_is_day_off,saturday_is_day_off,sunday_is_day_off,is_large,is_medium,is_small,is_Xlarge,is_Xsmall,is_XXlarge,is_XXsmall
102781,2013-02-27 00:00:00,15295.74,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0
102781,2013-02-27 01:00:00,15163.21,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0
102781,2013-02-27 02:00:00,15022.26,2,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0
102781,2013-02-27 03:00:00,15370.42,3,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0
102781,2013-02-27 04:00:00,15303.1,4,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0
102781,2013-02-27 05:00:00,14553.15,5,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0


**Saving the objects**

In [None]:
########### Save the joined objects #########################
# Save an object to a file
saveRDS(train_meta_lj, file = "./processed/train_meta_lj.rds")
# Restore the object
train_meta_lj <- openRDS(file = "./processed/train_meta_lj.rds")

In [None]:
# Save an object to a file
saveRDS(test_meta_lj, file = "./processed/test_meta_lj.rds")
# Restore the object
test_meta_lj <- openRDS(file = "./processed/test_meta_lj.rds")
################################################################

In [None]:
# Write the objects to a file

write.csv(train_meta_lj,'./processed/consumption_train.csv')
write.csv(test_meta_lj,'./processed/cold_start_test.csv')