# Correlation analysis for the apartments with themselves

<div class="alert alert-success" role="alert">
    <strong>Summary:</strong><br> This notebook calculates the correlation between apartment to form groups with similar consumption patterns. The first section of the notebook determines the correlation between apartments with themselves with a sampling of 10-min then we increase the timestep until the correlation hits a correlation above 0.5. At the same time, we explore the second hypothesis which is that the correlation is higher for the mean value of the three apartments. As before we increase the timestep as long as the threshold is not respected.
</div>

The `os`module is employed to get the names of the entries by manipulating the path. To get more information, you have the option to refer to the library's documentation. Even though, this notebook will provide explanations for the [functions used]( https://docs.python.org/3/library/os.html).  
The `pandas` module facilitates data analysis and the use of dataframes. You can find the library's documentation [there.](https://pandas.pydata.org/docs/)   
The `numpy` module is instrumental in manipulating matrices and tables, serving for numerical calculations. You can find the library's documentation on [there.](https://numpy.org/doc/stable/) 

In [1]:
import os
import numpy as np
import pandas as pd

The required Domestic Hot Water (DHW) files end with "-IECS", this designation indicates data related to DHW.  
The `os` module is utilized to compile a list of DHW data for all apartments.

In [2]:
folder = r"../Data/"
files = os.listdir(folder)
#Get a list of the different files named IECS
list_IECS = [file for file in files if '-IECS' in file]   
list_IECS.sort()

Using the `pandas` module, we read the CSV files containing the listed information from the previous step. Then, we use the `resample` function within the `pandas` module to resample the data to a 10-min interval.

In [3]:
data = {} #Creation of a dictionary
for file in list_IECS:
    df = pd.read_csv(folder + file) #Read the csv file
    ts = df.set_index('0')['Value']     # DataFrame -> TimeSeries
    ts.index = pd.to_datetime(ts.index, unit='s')   # index to secondes
    ts = ts.resample("10Min").mean()    # resample 10 min
    data[file[:-4]] = ts 

We proceed to create a dataframe with the previously acquired data. 

In [4]:
df = pd.DataFrame(data)
df = df[~df.isnull().any(axis=1)]  # remove the row with Nan Value

We employ the `drop` function from the `pandas` module to suppress columns unnecessary for our analysis.  
Then, the `mean` function from the `pandas` module is used to work out the average value across the three remaining apartments.

In [5]:
df=df.drop(columns=['25-IECS', '26-IECS', '27-IECS', '35-IECS', '64-IECS', '65-IECS', '67-IECS', '9-IECS' ])
df['mean'] = df.mean(axis=1) 
#print(df)

Conversion of the water consumption from l/s to l/h.

In [6]:
df=df*3600
print(df)

                     34-IECS  36-IECS  63-IECS   mean
0                                                    
2020-12-01 12:10:00     0.00     0.00     0.00   0.00
2020-12-01 12:20:00     0.00     0.00     0.00   0.00
2020-12-01 12:30:00     0.00     0.00     0.00   0.00
2020-12-01 12:40:00     5.94     0.00     0.00   1.98
2020-12-01 12:50:00     0.00     0.00     5.76   1.92
...                      ...      ...      ...    ...
2021-06-08 14:50:00     0.00    59.40     0.00  19.80
2021-06-08 15:00:00     0.00    11.88     0.00   3.96
2021-06-08 15:10:00     0.00     0.00     0.00   0.00
2021-06-08 15:20:00     0.00    17.64     0.00   5.88
2021-06-08 15:30:00     0.00     0.00     0.00   0.00

[26089 rows x 4 columns]


## Comparing the month of December

<div class="alert alert-info">
<strong>Details :</strong><br>
Working out the correlation of the apartments with themselves and for their mean value. If not sufficient, increase the resampling timestep. For this analysis we need to divide our information into two dataframe to compare them together. We want to compare the data of the month of December so we divide the month into two equal dataset.
</div>

We start by dividing our data into two dataframe, the first analysis is composed of the month of December 2020, we will compare their values to see if ten-minute sampling of first part of December is equivalent to ten-minute sampling of the second part of December.  

### Data preprocessing: 


It is essential to have the same number of columns and rows for this analysis either way the correlation won't work. The dataframe may exhibit varying numbers of rows, even though the period remains the same. This difference arises because we eliminated NaN values from the dataset, and their presence could be anywhere in the dataset.   
To overcome this issue, we need to preprocess our data so we analyse our values by printing the data and looking at the numbers of rows and columns. We can use the `drop` function to suppress the right number of rows. We must change the dataset until the number of rows is equal to the other one. The dataprocessing is essential to compare the dataframe together, either way it won't be possible and an error will appear.

We use the `corrwith` function of the `pandas` module to work out the correlation between two dataframes.  
The name "df_12_corr" mean that we get a dataframe containing correlation values (indicate by the suffixe "corr") comparing the dataframe 1 and 2.

In [7]:
df1=df['2020-12-01 00:00:00' : '2020-12-16 00:00:00']
df2=df['2020-12-16 00:10:00' : '2020-12-30 16:00:00']
print(df1)
print(df2)

                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2020-12-01 12:10:00     0.00      0.0     0.00  0.00
2020-12-01 12:20:00     0.00      0.0     0.00  0.00
2020-12-01 12:30:00     0.00      0.0     0.00  0.00
2020-12-01 12:40:00     5.94      0.0     0.00  1.98
2020-12-01 12:50:00     0.00      0.0     5.76  1.92
...                      ...      ...      ...   ...
2020-12-15 23:10:00     5.94      0.0     0.00  1.98
2020-12-15 23:30:00    17.82      0.0     0.00  5.94
2020-12-15 23:40:00     0.00      0.0     0.00  0.00
2020-12-15 23:50:00    11.88      0.0     0.00  3.96
2020-12-16 00:00:00     0.00      0.0     0.00  0.00

[2004 rows x 4 columns]
                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2020-12-16 00:10:00      0.0      0.0      0.0   0.0
2020-12-16 00:20:00      0.0      0.0      0.0   0.0
2020-12-16 00:30:00      0.0      0.0      0.0   0.0
2020-12-16 00:40:00  

When it is done and we have the same number of rows, we can apply the correlation to the dataset.  
To work out the correlation between two dataframe we use the function `corrwith` that calculate the correlation between two dataframes.  
However, to compare the dataframes it is necessary to have the same name of the different rows, either way the correlation will return a NaN value. To do so, we use the `set_axis` function of the `pandas` module that sets the rows name of the dataframe 2 as the rows name of the dataframe 1.

In [8]:
df_12_corr =df1.corrwith(df2.set_axis(df1.index, axis='index', copy=False)) # Correlation matrice of df1 with df2 changing the index name with the index name of dh1 to compare them
print(df_12_corr)

34-IECS    0.009986
36-IECS   -0.011536
63-IECS   -0.050793
mean      -0.064636
dtype: float64


As the value of the correlation are insufficient, we increase the timeset of the sampling to one hour and drop the needed number of value to get the same number of rows by preprocessing the data.

In [9]:
df3=df1.resample('H').mean()
df4=df2.resample('H').mean()

print(df3)
print(df4)

df4=df4.drop('2020-12-30 13:00:00')
df4=df4.drop('2020-12-30 14:00:00')
df4=df4.drop('2020-12-30 15:00:00')
df4=df4.drop('2020-12-30 16:00:00')

print(df4)

df_34_corr =df3.corrwith(df4.set_axis(df3.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_34_corr)

                     34-IECS  36-IECS  63-IECS    mean
0                                                     
2020-12-01 12:00:00    1.188     0.00    1.152   0.780
2020-12-01 13:00:00    4.950     0.96    5.880   3.930
2020-12-01 14:00:00   10.890     0.00   17.760   9.550
2020-12-01 15:00:00    0.000    14.76   15.780  10.180
2020-12-01 16:00:00    0.000     0.00    0.000   0.000
...                      ...      ...      ...     ...
2020-12-15 20:00:00    5.940     0.96    7.860   4.920
2020-12-15 21:00:00    7.920     0.00    8.820   5.580
2020-12-15 22:00:00    8.910     0.00   26.760  11.890
2020-12-15 23:00:00    7.128     0.00    0.000   2.376
2020-12-16 00:00:00    0.000     0.00    0.000   0.000

[349 rows x 4 columns]
                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2020-12-16 00:00:00     0.00     0.00     0.00  0.00
2020-12-16 01:00:00     0.00     0.00     0.00  0.00
2020-12-16 02:00:00     0.00     0.00     0.00  0

As the value of the correlation are insufficient, we increase the timeset of the sampling to a day. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-day.  
We repeat the process of preprocessing the data as done previously.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [10]:
df5=df1.resample('D').mean()
df6=df2.resample('D').mean()

print(df5)
print(df6)

df5=df5.drop('2020-12-16')

df_56_corr =df5.corrwith(df6.set_axis(df5.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_56_corr)

             34-IECS   36-IECS    63-IECS      mean
0                                                  
2020-12-01  3.936176  4.500000   7.740000  5.392059
2020-12-02  1.740857  3.648857   4.047429  3.145714
2020-12-03  3.040292  2.267737   5.888759  3.732263
2020-12-04  2.320147  2.459118   7.197353  3.992206
2020-12-05  4.546619  5.630504   6.265036  5.480719
2020-12-06  5.186331  5.876547   8.476835  6.513237
2020-12-07  4.612993  3.804964   5.352701  4.590219
2020-12-08  1.994453  2.435912   6.070073  3.500146
2020-12-09  4.602302  3.426475  11.305036  6.444604
2020-12-10  5.108143  2.638286  10.962000  6.236143
2020-12-11  4.787737  1.968175   9.741022  5.498978
2020-12-12  4.884604  2.566619   9.152806  5.534676
2020-12-13  4.048029  9.417810   7.173723  6.879854
2020-12-14  2.867050  2.152230   5.829928  3.616403
2020-12-15  3.078129  3.286619   6.417842  4.260863
2020-12-16  0.000000  0.000000   0.000000  0.000000
             34-IECS   36-IECS   63-IECS      mean
0            

As the value of the correlation are insufficient except for one apartment, we increase the timeset of the sampling to five days. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of five-days.  
We repeat the process of preprocessing the data as done previously.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [11]:
df7=df1.resample('5D').mean()
df8=df2.resample('5D').mean()

print(df7)
print(df8)

df7=df7.drop('2020-12-16')

df_78_corr =df7.corrwith(df8.set_axis(df7.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_78_corr)

             34-IECS   36-IECS   63-IECS      mean
0                                                 
2020-12-01  3.024871  3.620323  6.047419  4.230871
2020-12-06  4.307775  3.637977  8.452717  5.466156
2020-12-11  3.930304  3.867786  7.658466  5.152185
2020-12-16  0.000000  0.000000  0.000000  0.000000
             34-IECS   36-IECS   63-IECS      mean
0                                                 
2020-12-16  4.876701  4.354462  4.238351  4.489838
2020-12-21  4.439464  2.517321  5.789464  4.248750
2020-12-26  4.896662  2.836447  3.832098  3.855069
34-IECS   -0.699922
36-IECS   -0.411685
63-IECS    0.609686
mean      -0.614913
dtype: float64


As we can see the correlation work only for the apartment 63 because the threshold is above 0.5.  
Compare to what we expected the mean is still low and doesn't increase with time.  
We want to shift our analysis to compare one month of data with another one months of data. We compare the month of December with the month of January.

## Correlation of the 3 apartments with themselves and of the mean value of the apartment starting with a comparison between December 2020 and January 2021

<div class="alert alert-info">
<strong>Details :</strong><br>
Working out the correlation of the apartments with themselves and for their mean value. If not sufficient, increase the resampling timestep. For this analysis we need to divide our information into two dataframe to compare them together.
</div>

We start by dividing our data into two dataframe, the second analysis is composed of the month of December 2020 and January 2021, we will compare their values to see if ten-minute sampling of December is equivalent to ten-minute sampling of January.  

### Data preprocessing: 


It is essential to have the same number of columns and rows for this analysis either way the correlation won't work. The dataframe may exhibit varying numbers of rows, even though the period remains the same. This difference arises because we eliminated NaN values from the dataset, and their presence could be anywhere in the dataset.   
To overcome this issue, we need to preprocess our data so we analyse our values by printing the data and looking at the numbers of rows and columns. We must change the dataset until the number of rows is equal to the other one. The dataprocessing is essential to compare the dataframe together, either way it won't be possible and an error will appear.

In [12]:
df9=df['2020-12-01 00:00:00' : '2020-12-31 00:00:00']
df10=df['2021-01-01 00:00:00' : '2021-01-30 13:00:00']
print(df9)
print(df10)

                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2020-12-01 12:10:00     0.00      0.0     0.00  0.00
2020-12-01 12:20:00     0.00      0.0     0.00  0.00
2020-12-01 12:30:00     0.00      0.0     0.00  0.00
2020-12-01 12:40:00     5.94      0.0     0.00  1.98
2020-12-01 12:50:00     0.00      0.0     5.76  1.92
...                      ...      ...      ...   ...
2020-12-30 23:20:00     0.00      0.0     0.00  0.00
2020-12-30 23:30:00     0.00      0.0     0.00  0.00
2020-12-30 23:40:00     0.00      0.0     0.00  0.00
2020-12-30 23:50:00     0.00      0.0     0.00  0.00
2020-12-31 00:00:00     0.00      0.0     0.00  0.00

[4053 rows x 4 columns]
                     34-IECS  36-IECS  63-IECS   mean
0                                                    
2021-01-01 00:00:00      0.0      0.0     0.00   0.00
2021-01-01 00:10:00      0.0      0.0     0.00   0.00
2021-01-01 00:20:00      0.0      0.0     0.00   0.00
2021-01-01 00:30

When it is done and we have the same number of rows, we can apply the correlation to the dataset.  
To work out the correlation between two dataframe we use the `corrwith` function that calculate the correlation between two dataframes.  
However, to compare the dataframes it is necessary to have the same name of the different rows, either way the correlation will return a NaN value. To do so, we use the `set_axis` function of the `pandas` module that sets the rows name of the dataframe 2 as the rows name of the dataframe 1.

In [13]:
df_910_corr =df9.corrwith(df10.set_axis(df9.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_910_corr)

34-IECS   -0.017422
36-IECS   -0.021889
63-IECS   -0.025402
mean      -0.058032
dtype: float64


As the value of the correlation are insufficient, we increase the timeset of the sampling to one hour.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [14]:
df11=df9.resample('H').mean()
df12=df10.resample('H').mean()

print(df11)
print(df12)

df12=df12.drop('2021-01-30 13:00:00')

print(df4)

df_1112_corr =df11.corrwith(df12.set_axis(df11.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1112_corr)

                     34-IECS  36-IECS  63-IECS    mean
0                                                     
2020-12-01 12:00:00    1.188     0.00    1.152   0.780
2020-12-01 13:00:00    4.950     0.96    5.880   3.930
2020-12-01 14:00:00   10.890     0.00   17.760   9.550
2020-12-01 15:00:00    0.000    14.76   15.780  10.180
2020-12-01 16:00:00    0.000     0.00    0.000   0.000
...                      ...      ...      ...     ...
2020-12-30 20:00:00    0.000     0.00    2.376   0.792
2020-12-30 21:00:00    1.980     0.96    0.000   0.980
2020-12-30 22:00:00    0.000     0.96    0.000   0.320
2020-12-30 23:00:00    0.000     0.00    0.000   0.000
2020-12-31 00:00:00    0.000     0.00    0.000   0.000

[709 rows x 4 columns]
                     34-IECS  36-IECS  63-IECS    mean
0                                                     
2021-01-01 00:00:00    0.000    0.000     0.00   0.000
2021-01-01 01:00:00    0.000    0.960     0.00   0.320
2021-01-01 02:00:00    0.000    0.000    

As the value of the correlation are insufficient, we increase the timeset of the sampling to 10 days. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of ten-days.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [15]:
df13=df9.resample('10D').mean()
df14=df10.resample('10D').mean()

print(df13)
print(df14)

df13=df13.drop('2020-12-31')
print(df13)

df_1415_corr =df13.corrwith(df14.set_axis(df13.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1415_corr)

             34-IECS   36-IECS   63-IECS      mean
0                                                 
2020-12-01  3.701524  3.629634  7.316067  4.882409
2020-12-11  4.396149  4.105996  5.959037  4.820394
2020-12-21  4.758890  2.833192  4.808327  4.133470
2020-12-31  0.000000  0.000000  0.000000  0.000000
             34-IECS   36-IECS   63-IECS      mean
0                                                 
2021-01-01  4.944077  4.309442  2.202747  3.818755
2021-01-11  4.208430  3.687904  7.284388  5.060240
2021-01-21  4.650770  3.005076  6.574079  4.743308
             34-IECS   36-IECS   63-IECS      mean
0                                                 
2020-12-01  3.701524  3.629634  7.316067  4.882409
2020-12-11  4.396149  4.105996  5.959037  4.820394
2020-12-21  4.758890  2.833192  4.808327  4.133470
34-IECS   -0.553395
36-IECS    0.640311
63-IECS   -0.822186
mean      -0.342955
dtype: float64


As we can see the correlation work only for the apartment 36 because the threshold is above 0.5.  
Compare to what we expected the mean is still low and doesn't increase with time.  
We want to shift our analysis to compare six months with another six months.

## Comparing two dataframe of six months


<div class="alert alert-info">
<strong>Details :</strong><br>
We repeat our experimentation with two dataframe with six months of data. Starting by working out the correlation of the apartments with themselves and for their mean value. If not sufficient, increase the resampling timestep. For this analysis we need to divide our information into two equivalent dataframe to compare them together.
</div>

We start by dividing our data into two dataframe, the third analysis is composed of the six months, we will compare their values to see if ten-minute sampling of first six months is equivalent to ten-minute sampling of the second six months period.  

### Data preprocessing:  

It is essential to have the same number of columns and rows for this analysis either way the correlation won't work. The dataframe may exhibit varying numbers of rows, even though the period remains the same. This difference arises because we eliminated NaN values from the dataset, and their presence could be anywhere in the dataset.   
To overcome this issue, we need to preprocess our data so we analyse our values by printing the data and looking at the numbers of rows and columns. We must change the dataset until the number of rows is equal to the other one. The dataprocessing is essential to compare the dataframe together, either way it won't be possible and an error will appear.

In [16]:
df15=df['2020-12-01 00:00:00' : '2021-02-28 00:00:00']
df16=df['2021-03-01 00:00:00' : '2021-05-28 04:00:00']
print(df15)
print(df16)


df_1516_corr =df15.corrwith(df16.set_axis(df15.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1516_corr)

                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2020-12-01 12:10:00     0.00      0.0     0.00  0.00
2020-12-01 12:20:00     0.00      0.0     0.00  0.00
2020-12-01 12:30:00     0.00      0.0     0.00  0.00
2020-12-01 12:40:00     5.94      0.0     0.00  1.98
2020-12-01 12:50:00     0.00      0.0     5.76  1.92
...                      ...      ...      ...   ...
2021-02-27 23:20:00     0.00      0.0     0.00  0.00
2021-02-27 23:30:00     0.00      0.0     0.00  0.00
2021-02-27 23:40:00     0.00      0.0     0.00  0.00
2021-02-27 23:50:00     0.00      0.0     0.00  0.00
2021-02-28 00:00:00     0.00      0.0     0.00  0.00

[12187 rows x 4 columns]
                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2021-03-01 00:00:00      0.0      0.0      0.0   0.0
2021-03-01 00:10:00      0.0      0.0      0.0   0.0
2021-03-01 00:20:00      0.0      0.0      0.0   0.0
2021-03-01 00:30:00 

As the value of the correlation are insufficient, we increase the timeset of the sampling to one hour. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-hour.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [17]:
df17=df15.resample('H').mean()
df18=df16.resample('H').mean()
print(df17)
print(df18)

df17=df17.drop('2020-12-01 12:00:00')
df17=df17.drop('2020-12-01 13:00:00')
df17=df17.drop('2020-12-01 14:00:00')
df17=df17.drop('2020-12-01 15:00:00')
df17=df17.drop('2020-12-01 16:00:00')
df17=df17.drop('2020-12-01 17:00:00')
df17=df17.drop('2020-12-01 18:00:00')
df17=df17.drop('2020-12-01 19:00:00')

print(df17.shape)

df_1718_corr =df17.corrwith(df18.set_axis(df17.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1718_corr)

                     34-IECS  36-IECS  63-IECS   mean
0                                                    
2020-12-01 12:00:00    1.188    0.000    1.152   0.78
2020-12-01 13:00:00    4.950    0.960    5.880   3.93
2020-12-01 14:00:00   10.890    0.000   17.760   9.55
2020-12-01 15:00:00    0.000   14.760   15.780  10.18
2020-12-01 16:00:00    0.000    0.000    0.000   0.00
...                      ...      ...      ...    ...
2021-02-27 20:00:00   22.716   82.872   13.032  39.54
2021-02-27 21:00:00    0.000    0.960   26.760   9.24
2021-02-27 22:00:00    0.000    0.000    0.000   0.00
2021-02-27 23:00:00    0.000    0.000    0.000   0.00
2021-02-28 00:00:00    0.000    0.000    0.000   0.00

[2125 rows x 4 columns]
                     34-IECS  36-IECS  63-IECS  mean
0                                                   
2021-03-01 00:00:00      0.0     0.00     0.00  0.00
2021-03-01 01:00:00      0.0     0.00    17.82  5.94
2021-03-01 02:00:00      0.0     0.00     0.00  0.00
2021-03-

As the value of the correlation are insufficient, we increase the timeset of the sampling to one day. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-day.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [18]:
df19=df15.resample('D').mean()
df20=df16.resample('D').mean()
print(df19)
print(df20)

df19=df19.drop('2020-12-01')

print(df19.shape)

df_1920_corr =df19.corrwith(df20.set_axis(df19.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1920_corr)

             34-IECS    36-IECS    63-IECS       mean
0                                                    
2020-12-01  3.936176   4.500000   7.740000   5.392059
2020-12-02  1.740857   3.648857   4.047429   3.145714
2020-12-03  3.040292   2.267737   5.888759   3.732263
2020-12-04  2.320147   2.459118   7.197353   3.992206
2020-12-05  4.546619   5.630504   6.265036   5.480719
...              ...        ...        ...        ...
2021-02-24  3.719118   3.285000   5.831471   4.278529
2021-02-25  1.326043   3.822734   6.298705   3.815827
2021-02-26  3.528000   3.605333  10.618667   5.917333
2021-02-27  4.655036  11.157372  15.167299  10.326569
2021-02-28  0.000000   0.000000   0.000000   0.000000

[90 rows x 4 columns]
             34-IECS   36-IECS   63-IECS      mean
0                                                 
2021-03-01  2.439712  2.367194  7.730935  4.179281
2021-03-02  5.612374  2.486331  4.389928  4.162878
2021-03-03  3.146143  3.216857  2.365714  2.909571
2021-03-04  1.956350

As the value of the correlation are insufficient, we increase the timeset of the sampling to 10 days. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of ten-days.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [19]:
df21=df15.resample('10D').mean()
df22=df16.resample('10D').mean()
print(df19)
print(df20)



df_2122_corr =df21.corrwith(df22.set_axis(df21.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_2122_corr)

             34-IECS    36-IECS    63-IECS       mean
0                                                    
2020-12-02  1.740857   3.648857   4.047429   3.145714
2020-12-03  3.040292   2.267737   5.888759   3.732263
2020-12-04  2.320147   2.459118   7.197353   3.992206
2020-12-05  4.546619   5.630504   6.265036   5.480719
2020-12-06  5.186331   5.876547   8.476835   6.513237
...              ...        ...        ...        ...
2021-02-24  3.719118   3.285000   5.831471   4.278529
2021-02-25  1.326043   3.822734   6.298705   3.815827
2021-02-26  3.528000   3.605333  10.618667   5.917333
2021-02-27  4.655036  11.157372  15.167299  10.326569
2021-02-28  0.000000   0.000000   0.000000   0.000000

[89 rows x 4 columns]
             34-IECS   36-IECS   63-IECS      mean
0                                                 
2021-03-01  2.439712  2.367194  7.730935  4.179281
2021-03-02  5.612374  2.486331  4.389928  4.162878
2021-03-03  3.146143  3.216857  2.365714  2.909571
2021-03-04  1.956350

As the value of the correlation are insufficient, we increase the timeset of the sampling to one month. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-month.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [20]:
df23=df15.resample('M').mean()
df24=df16.resample('M').mean()
print(df23)
print(df24)



df_2324_corr =df23.corrwith(df24.set_axis(df23.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_2324_corr)

             34-IECS   36-IECS   63-IECS      mean
0                                                 
2020-12-31  4.311243  3.492764  5.880114  4.561374
2021-01-31  4.798633  3.772303  5.606222  4.725719
2021-02-28  3.751156  3.750915  7.158876  4.886982
             34-IECS   36-IECS   63-IECS      mean
0                                                 
2021-03-31  4.609124  3.991454  6.315685  4.972088
2021-04-30  3.969048  3.927811  5.157361  4.351406
2021-05-31  3.264386  4.083513  2.905680  3.417860
34-IECS    0.557498
36-IECS    0.036186
63-IECS   -0.874419
mean      -0.992668
dtype: float64


The correlation value is enough for one apartment the 34 but not for the other.  
The mean value is no better than the apartment by themselves.