# Correlation analysis for the apartments with themselves

<div class="alert alert-success" role="alert">
    <strong>Summary:</strong><br> This notebook calculates the correlation between apartment to form groups with similar consumption patterns. The first section of the notebook determines the correlation between apartments with themselves with a sampling of 10-min then we increase the timestep until the correlation hits a correlation above 0.5. At the same time, we explore the second hypothesis which is that the correlation is higher for the mean value of the three apartments. As before we increase the timestep as long as the threshold is not respected.
</div>

The `os`module is employed to get the names of the entries by manipulating the path. To get more information, you have the option to refer to the library's documentation. Even though, this notebook will provide explanations for the [functions used]( https://docs.python.org/3/library/os.html).  
The `pandas` module facilitates data analysis and the use of dataframes. You can find the library's documentation [there.](https://pandas.pydata.org/docs/)   
The `numpy` module is instrumental in manipulating matrices and tables, serving for numerical calculations. You can find the library's documentation on [there.](https://numpy.org/doc/stable/) 

In [1]:
import os
import numpy as np
import pandas as pd

The required Domestic Hot Water (DHW) files end with "-IECS", this designation indicates data related to DHW.  
The `os` module is utilized to compile a list of DHW data for all apartments.

In [2]:
folder = r"../Data/"
files = os.listdir(folder)
#Get a list of the different files named IECS
list_IECS = [file for file in files if '-IECS' in file]   
list_IECS.sort()

Using the `pandas` module, we read the CSV files containing the listed information from the previous step. Then, we use the `resample` function within the `pandas` module to resample the data to a 10-min interval.

In [3]:
data = {} #Creation of a dictionary
for file in list_IECS:
    df = pd.read_csv(folder + file) #Read the csv file
    ts = df.set_index('0')['Value']     # DataFrame -> TimeSeries
    ts.index = pd.to_datetime(ts.index, unit='s')   # index to secondes
    ts = ts.resample("10Min").mean()    # resample 10 min
    data[file[:-4]] = ts 

We proceed to create a dataframe with the previously acquired data. 

In [4]:
df = pd.DataFrame(data)
#print(df)
df = df[~df.isnull().any(axis=1)]  # remove the row with Nan Value

We employ the `drop` function from the `pandas` module to suppress columns unnecessary for our analysis.  
Then, the `mean` function from the `pandas` module is used to work out the average value across the three remaining apartments.

In [5]:
#df=df.drop(columns=['25-IECS', '26-IECS', '27-IECS', '35-IECS', '64-IECS', '65-IECS', '67-IECS', '9-IECS' ])
df['mean'] = df.mean(axis=1) 
#print(df)

Conversion of the water consumption from l/s to l/h.

In [6]:
df=df*3600
print(df)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:10:00      0.0     0.00     0.00     0.00     0.00     0.00   
2020-12-01 12:20:00      0.0     0.00     0.00     0.00    23.76     0.00   
2020-12-01 12:30:00      0.0     0.00     5.76     0.00     0.00     0.00   
2020-12-01 12:40:00      0.0     0.00     0.00     5.94    11.88     0.00   
2020-12-01 12:50:00      0.0     0.00     0.00     0.00     0.00     0.00   
...                      ...      ...      ...      ...      ...      ...   
2021-06-08 14:50:00      0.0    11.88     0.00     0.00    11.88    59.40   
2021-06-08 15:00:00      0.0     0.00     0.00     0.00     0.00    11.88   
2021-06-08 15:10:00      0.0     0.00     0.00     0.00     0.00     0.00   
2021-06-08 15:20:00      0.0     0.00     0.00     0.00     0.00    17.64   
2021-06-08 15:30:00      0.0     0.00     0.00     0.00     0.00     0.00   

## Comparing the month of December

<div class="alert alert-info">
<strong>Details :</strong><br>
Working out the correlation of the apartments with themselves and for their mean value. If not sufficient, increase the resampling timestep. For this analysis we need to divide our information into two dataframe to compare them together. We want to compare the data of the month of December so we divide the month into two equal dataset.
</div>

We start by dividing our data into two dataframe, the first analysis is composed of the month of December 2020, we will compare their values to see if ten-minute sampling of first part of December is equivalent to ten-minute sampling of the second part of December.  

### Data preprocessing: 


It is essential to have the same number of columns and rows for this analysis either way the correlation won't work. The dataframe may exhibit varying numbers of rows, even though the period remains the same. This difference arises because we eliminated NaN values from the dataset, and their presence could be anywhere in the dataset.   
To overcome this issue, we need to preprocess our data so we analyse our values by printing the data and looking at the numbers of rows and columns. We can use the `drop` function to suppress the right number of rows. We must change the dataset until the number of rows is equal to the other one. The dataprocessing is essential to compare the dataframe together, either way it won't be possible and an error will appear.

We use the `corrwith` function of the `pandas` module to work out the correlation between two dataframes.  
The name "df_12_corr" mean that we get a dataframe containing correlation values (indicate by the suffixe "corr") comparing the dataframe 1 and 2.

In [7]:
df1=df['2020-12-01 00:00:00' : '2020-12-16 00:00:00']
df2=df['2020-12-16 00:10:00' : '2020-12-30 16:00:00']
print(df1)
print(df2)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:10:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-01 12:20:00      0.0      0.0     0.00     0.00    23.76      0.0   
2020-12-01 12:30:00      0.0      0.0     5.76     0.00     0.00      0.0   
2020-12-01 12:40:00      0.0      0.0     0.00     5.94    11.88      0.0   
2020-12-01 12:50:00      0.0      0.0     0.00     0.00     0.00      0.0   
...                      ...      ...      ...      ...      ...      ...   
2020-12-15 23:10:00      0.0      0.0     0.00     5.94     0.00      0.0   
2020-12-15 23:30:00      0.0      0.0     0.00    17.82     0.00      0.0   
2020-12-15 23:40:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-15 23:50:00      0.0      0.0     0.00    11.88     0.00      0.0   
2020-12-16 00:00:00      0.0      0.0     0.00     0.00    23.76      0.0   

When it is done and we have the same number of rows, we can apply the correlation to the dataset.  
To work out the correlation between two dataframe we use the function `corrwith` that calculate the correlation between two dataframes.  
However, to compare the dataframes it is necessary to have the same name of the different rows, either way the correlation will return a NaN value. To do so, we use the `set_axis` function of the `pandas` module that sets the rows name of the dataframe 2 as the rows name of the dataframe 1.

In [8]:
df_12_corr =df1.corrwith(df2.set_axis(df1.index, axis='index', copy=False)) # Correlation matrice of df1 with df2 changing the index name with the index name of dh1 to compare them
print(df_12_corr)

25-IECS   -0.024167
26-IECS   -0.010448
27-IECS   -0.038886
34-IECS    0.009986
35-IECS   -0.050969
36-IECS   -0.011536
63-IECS   -0.050793
64-IECS   -0.021400
65-IECS   -0.057313
67-IECS    0.009978
9-IECS    -0.008921
mean      -0.096787
dtype: float64


As the value of the correlation are insufficient, we increase the timeset of the sampling to one hour and drop the needed number of value to get the same number of rows by preprocessing the data.

In [9]:
df3=df1.resample('H').mean()
df4=df2.resample('H').mean()

print(df3)
print(df4)

df4=df4.drop('2020-12-30 13:00:00')
df4=df4.drop('2020-12-30 14:00:00')
df4=df4.drop('2020-12-30 15:00:00')
df4=df4.drop('2020-12-30 16:00:00')

print(df4)

df_34_corr =df3.corrwith(df4.set_axis(df3.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_34_corr)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:00:00      0.0     0.00    1.152    1.188    7.128     0.00   
2020-12-01 13:00:00      0.0     0.00    0.000    4.950    3.960     0.96   
2020-12-01 14:00:00      0.0     0.00    0.000   10.890    0.000     0.00   
2020-12-01 15:00:00      0.0     0.96    0.000    0.000    5.880    14.76   
2020-12-01 16:00:00      6.0     0.00    0.000    0.000    1.980     0.00   
...                      ...      ...      ...      ...      ...      ...   
2020-12-15 20:00:00      0.0     0.00    0.000    5.940    0.960     0.96   
2020-12-15 21:00:00      0.0     0.00    0.000    7.920    0.000     0.00   
2020-12-15 22:00:00      0.0     0.00    0.000    8.910    0.000     0.00   
2020-12-15 23:00:00      0.0     0.00    0.000    7.128    0.000     0.00   
2020-12-16 00:00:00      0.0     0.00    0.000    0.000   23.760     0.00   

As the value of the correlation are insufficient, we increase the timeset of the sampling to a day. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-day.  
We repeat the process of preprocessing the data as done previously.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [10]:
df5=df1.resample('D').mean()
df6=df2.resample('D').mean()

print(df5)
print(df6)

df5=df5.drop('2020-12-16')

df_56_corr =df5.corrwith(df6.set_axis(df5.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_56_corr)

             25-IECS   26-IECS   27-IECS   34-IECS    35-IECS   36-IECS  \
0                                                                         
2020-12-01  2.917059  0.349412  0.084706  3.936176   2.085882  4.500000   
2020-12-02  4.700571  1.913143  1.350000  1.740857   6.204857  3.648857   
2020-12-03  1.392701  0.959124  1.589781  3.040292   2.459562  2.267737   
2020-12-04  2.684118  0.918529  1.170000  2.320147   3.531176  2.459118   
2020-12-05  0.686331  1.577266  2.561439  4.546619   2.165180  5.630504   
2020-12-06  2.191079  3.193381  1.577266  5.186331   6.130360  5.876547   
2020-12-07  2.075912  0.562336  4.154453  4.612993   3.245255  3.804964   
2020-12-08  1.219270  1.166715  1.776350  1.994453   2.188905  2.435912   
2020-12-09  1.976115  1.714532  0.885755  4.602302   4.874245  3.426475   
2020-12-10  1.229143  0.678857  0.588857  5.108143   4.250571  2.638286   
2020-12-11  2.270365  1.784234  1.684380  4.787737   4.209635  1.968175   
2020-12-12  2.830791  2.7

As the value of the correlation are insufficient except for one apartment, we increase the timeset of the sampling to five days. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of five-days.  
We repeat the process of preprocessing the data as done previously.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [11]:
df7=df1.resample('5D').mean()
df8=df2.resample('5D').mean()

print(df7)
print(df8)

df7=df7.drop('2020-12-16')

df_78_corr =df7.corrwith(df8.set_axis(df7.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_78_corr)

             25-IECS   26-IECS   27-IECS   34-IECS    35-IECS   36-IECS  \
0                                                                         
2020-12-01  2.431742  1.237355  1.496323  3.024871   3.433355  3.620323   
2020-12-06  1.738092  1.465491  1.788035  4.307775   4.146243  3.637977   
2020-12-11  3.270217  1.905760  1.342055  3.930304   4.127236  3.867786   
2020-12-16  0.000000  0.000000  0.000000  0.000000  23.760000  0.000000   

             63-IECS   64-IECS   65-IECS   67-IECS    9-IECS      mean  
0                                                                       
2020-12-01  6.047419  0.981871  5.081806  2.088871  1.008000  2.768358  
2020-12-06  8.452717  1.098728  5.283468  1.952688  1.274046  3.195024  
2020-12-11  7.658466  0.871085  5.240579  2.123792  1.231085  3.233488  
2020-12-16  0.000000  0.000000  0.000000  0.000000  0.000000  2.160000  
             25-IECS   26-IECS   27-IECS   34-IECS   35-IECS   36-IECS  \
0                                    

As we can see the correlation work only for the apartment 63 because the threshold is above 0.5.  
Compare to what we expected the mean is still low and doesn't increase with time.  
We want to shift our analysis to compare one month of data with another one months of data. We compare the month of December with the month of January.

## Correlation of the 3 apartments with themselves and of the mean value of the apartment starting with a comparison between December 2020 and January 2021

<div class="alert alert-info">
<strong>Details :</strong><br>
Working out the correlation of the apartments with themselves and for their mean value. If not sufficient, increase the resampling timestep. For this analysis we need to divide our information into two dataframe to compare them together.
</div>

We start by dividing our data into two dataframe, the second analysis is composed of the month of December 2020 and January 2021, we will compare their values to see if ten-minute sampling of December is equivalent to ten-minute sampling of January.  

### Data preprocessing: 


It is essential to have the same number of columns and rows for this analysis either way the correlation won't work. The dataframe may exhibit varying numbers of rows, even though the period remains the same. This difference arises because we eliminated NaN values from the dataset, and their presence could be anywhere in the dataset.   
To overcome this issue, we need to preprocess our data so we analyse our values by printing the data and looking at the numbers of rows and columns. We must change the dataset until the number of rows is equal to the other one. The dataprocessing is essential to compare the dataframe together, either way it won't be possible and an error will appear.

In [12]:
df9=df['2020-12-01 00:00:00' : '2020-12-31 00:00:00']
df10=df['2021-01-01 00:00:00' : '2021-01-30 13:00:00']
print(df9)
print(df10)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:10:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-01 12:20:00      0.0      0.0     0.00     0.00    23.76      0.0   
2020-12-01 12:30:00      0.0      0.0     5.76     0.00     0.00      0.0   
2020-12-01 12:40:00      0.0      0.0     0.00     5.94    11.88      0.0   
2020-12-01 12:50:00      0.0      0.0     0.00     0.00     0.00      0.0   
...                      ...      ...      ...      ...      ...      ...   
2020-12-30 23:20:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-30 23:30:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-30 23:40:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-30 23:50:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-31 00:00:00      0.0      0.0     0.00     0.00     0.00      0.0   

When it is done and we have the same number of rows, we can apply the correlation to the dataset.  
To work out the correlation between two dataframe we use the `corrwith` function that calculate the correlation between two dataframes.  
However, to compare the dataframes it is necessary to have the same name of the different rows, either way the correlation will return a NaN value. To do so, we use the `set_axis` function of the `pandas` module that sets the rows name of the dataframe 2 as the rows name of the dataframe 1.

In [13]:
df_910_corr =df9.corrwith(df10.set_axis(df9.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_910_corr)

25-IECS    0.002023
26-IECS   -0.019589
27-IECS   -0.037482
34-IECS   -0.017422
35-IECS   -0.023995
36-IECS   -0.021889
63-IECS   -0.025402
64-IECS   -0.019413
65-IECS   -0.029530
67-IECS   -0.020714
9-IECS    -0.025183
mean      -0.095351
dtype: float64


As the value of the correlation are insufficient, we increase the timeset of the sampling to one hour.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [14]:
df11=df9.resample('H').mean()
df12=df10.resample('H').mean()

print(df11)
print(df12)

df12=df12.drop('2021-01-30 13:00:00')


df_1112_corr =df11.corrwith(df12.set_axis(df11.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1112_corr)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:00:00      0.0     0.00    1.152    1.188    7.128     0.00   
2020-12-01 13:00:00      0.0     0.00    0.000    4.950    3.960     0.96   
2020-12-01 14:00:00      0.0     0.00    0.000   10.890    0.000     0.00   
2020-12-01 15:00:00      0.0     0.96    0.000    0.000    5.880    14.76   
2020-12-01 16:00:00      6.0     0.00    0.000    0.000    1.980     0.00   
...                      ...      ...      ...      ...      ...      ...   
2020-12-30 20:00:00      0.0     0.00    0.000    0.000    0.000     0.00   
2020-12-30 21:00:00      0.0     0.00    0.000    1.980    0.000     0.96   
2020-12-30 22:00:00      0.0     0.00    0.000    0.000    0.000     0.96   
2020-12-30 23:00:00      0.0     0.00    0.000    0.000    0.000     0.00   
2020-12-31 00:00:00      0.0     0.00    0.000    0.000    0.000     0.00   

As the value of the correlation are insufficient, we increase the timeset of the sampling to 10 days. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of ten-days.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [15]:
df13=df9.resample('10D').mean()
df14=df10.resample('10D').mean()

print(df13.shape)
print(df14.shape)

df13=df13.drop('2020-12-31')
print(df13.shape)

df_1415_corr =df13.corrwith(df14.set_axis(df13.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1415_corr)

(4, 12)
(3, 12)
(3, 12)
25-IECS    0.999946
26-IECS   -0.632077
27-IECS    0.387007
34-IECS   -0.553395
35-IECS    0.082207
36-IECS    0.640311
63-IECS   -0.822186
64-IECS   -0.565009
65-IECS   -0.936921
67-IECS    0.628888
9-IECS    -0.395351
mean      -0.282644
dtype: float64


As we can see the correlation work only for the apartment 36 because the threshold is above 0.5.  
Compare to what we expected the mean is still low and doesn't increase with time.  
We want to shift our analysis to compare six months with another six months.

## Comparing two dataframe of three months


<div class="alert alert-info">
<strong>Details :</strong><br>
We repeat our experimentation with two dataframe with six months of data. Starting by working out the correlation of the apartments with themselves and for their mean value. If not sufficient, increase the resampling timestep. For this analysis we need to divide our information into two equivalent dataframe to compare them together.
</div>

We start by dividing our data into two dataframe, the third analysis is composed of the three months, we will compare their values to see if ten-minute sampling of first three months is equivalent to ten-minute sampling of the second three months period.  

### Data preprocessing:  

It is essential to have the same number of columns and rows for this analysis either way the correlation won't work. The dataframe may exhibit varying numbers of rows, even though the period remains the same. This difference arises because we eliminated NaN values from the dataset, and their presence could be anywhere in the dataset.   
To overcome this issue, we need to preprocess our data so we analyse our values by printing the data and looking at the numbers of rows and columns. We must change the dataset until the number of rows is equal to the other one. The dataprocessing is essential to compare the dataframe together, either way it won't be possible and an error will appear.

In [16]:
df15=df['2020-12-01 00:00:00' : '2021-02-28 00:00:00']
df16=df['2021-03-01 00:00:00' : '2021-05-28 04:00:00']
print(df15)
print(df16)


df_1516_corr =df15.corrwith(df16.set_axis(df15.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1516_corr)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:10:00      0.0      0.0     0.00     0.00     0.00      0.0   
2020-12-01 12:20:00      0.0      0.0     0.00     0.00    23.76      0.0   
2020-12-01 12:30:00      0.0      0.0     5.76     0.00     0.00      0.0   
2020-12-01 12:40:00      0.0      0.0     0.00     5.94    11.88      0.0   
2020-12-01 12:50:00      0.0      0.0     0.00     0.00     0.00      0.0   
...                      ...      ...      ...      ...      ...      ...   
2021-02-27 23:20:00      0.0      0.0     0.00     0.00     0.00      0.0   
2021-02-27 23:30:00      0.0      0.0     0.00     0.00     0.00      0.0   
2021-02-27 23:40:00      0.0      0.0     0.00     0.00     0.00      0.0   
2021-02-27 23:50:00      0.0      0.0     0.00     0.00     0.00      0.0   
2021-02-28 00:00:00      0.0      0.0     0.00     0.00     0.00      0.0   

As the value of the correlation are insufficient, we increase the timeset of the sampling to one hour. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-hour.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [17]:
df17=df15.resample('H').mean()
df18=df16.resample('H').mean()
print(df17)
print(df18)

df17=df17.drop('2020-12-01 12:00:00')
df17=df17.drop('2020-12-01 13:00:00')
df17=df17.drop('2020-12-01 14:00:00')
df17=df17.drop('2020-12-01 15:00:00')
df17=df17.drop('2020-12-01 16:00:00')
df17=df17.drop('2020-12-01 17:00:00')
df17=df17.drop('2020-12-01 18:00:00')
df17=df17.drop('2020-12-01 19:00:00')

print(df17.shape)

df_1718_corr =df17.corrwith(df18.set_axis(df17.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1718_corr)

                     25-IECS  26-IECS  27-IECS  34-IECS  35-IECS  36-IECS  \
0                                                                           
2020-12-01 12:00:00    0.000    0.000    1.152    1.188    7.128    0.000   
2020-12-01 13:00:00    0.000    0.000    0.000    4.950    3.960    0.960   
2020-12-01 14:00:00    0.000    0.000    0.000   10.890    0.000    0.000   
2020-12-01 15:00:00    0.000    0.960    0.000    0.000    5.880   14.760   
2020-12-01 16:00:00    6.000    0.000    0.000    0.000    1.980    0.000   
...                      ...      ...      ...      ...      ...      ...   
2021-02-27 20:00:00    1.152    1.152    0.000   22.716    0.000   82.872   
2021-02-27 21:00:00    0.000    0.000    0.000    0.000    4.920    0.960   
2021-02-27 22:00:00    0.000    0.000    0.000    0.000    0.000    0.000   
2021-02-27 23:00:00    0.000    0.000    0.000    0.000    0.000    0.000   
2021-02-28 00:00:00    0.000    0.000    0.000    0.000    0.000    0.000   

As the value of the correlation are insufficient, we increase the timeset of the sampling to one day. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-day.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [18]:
df19=df15.resample('D').mean()
df20=df16.resample('D').mean()
print(df19)
print(df20)

df19=df19.drop('2020-12-01')

print(df19.shape)

df_1920_corr =df19.corrwith(df20.set_axis(df19.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_1920_corr)

             25-IECS   26-IECS   27-IECS   34-IECS   35-IECS    36-IECS  \
0                                                                         
2020-12-01  2.917059  0.349412  0.084706  3.936176  2.085882   4.500000   
2020-12-02  4.700571  1.913143  1.350000  1.740857  6.204857   3.648857   
2020-12-03  1.392701  0.959124  1.589781  3.040292  2.459562   2.267737   
2020-12-04  2.684118  0.918529  1.170000  2.320147  3.531176   2.459118   
2020-12-05  0.686331  1.577266  2.561439  4.546619  2.165180   5.630504   
...              ...       ...       ...       ...       ...        ...   
2021-02-24  2.003824  1.487647  0.780882  3.719118  5.775882   3.285000   
2021-02-25  1.982590  1.538417  1.447770  1.326043  3.532662   3.822734   
2021-02-26  1.501333  0.746667  1.093333  3.528000  3.477333   3.605333   
2021-02-27  6.151533  1.037956  2.543650  4.655036  3.544818  11.157372   
2021-02-28  0.000000  0.000000  0.000000  0.000000  0.000000   0.000000   

              63-IECS   

As the value of the correlation are insufficient, we increase the timeset of the sampling to 10 days. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of ten-days.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [19]:
df21=df15.resample('10D').mean()
df22=df16.resample('10D').mean()
print(df19)
print(df20)



df_2122_corr =df21.corrwith(df22.set_axis(df21.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_2122_corr)

             25-IECS   26-IECS   27-IECS   34-IECS   35-IECS    36-IECS  \
0                                                                         
2020-12-02  4.700571  1.913143  1.350000  1.740857  6.204857   3.648857   
2020-12-03  1.392701  0.959124  1.589781  3.040292  2.459562   2.267737   
2020-12-04  2.684118  0.918529  1.170000  2.320147  3.531176   2.459118   
2020-12-05  0.686331  1.577266  2.561439  4.546619  2.165180   5.630504   
2020-12-06  2.191079  3.193381  1.577266  5.186331  6.130360   5.876547   
...              ...       ...       ...       ...       ...        ...   
2021-02-24  2.003824  1.487647  0.780882  3.719118  5.775882   3.285000   
2021-02-25  1.982590  1.538417  1.447770  1.326043  3.532662   3.822734   
2021-02-26  1.501333  0.746667  1.093333  3.528000  3.477333   3.605333   
2021-02-27  6.151533  1.037956  2.543650  4.655036  3.544818  11.157372   
2021-02-28  0.000000  0.000000  0.000000  0.000000  0.000000   0.000000   

              63-IECS   

As the value of the correlation are insufficient, we increase the timeset of the sampling to one month. It means that if the correlation works, the apartment will need a tank to cover the hot water supply of one-month.  
We repeat the process of preprocessing the data.  
We repeat our analysis with the `corrwith` function and the `set_axis` function for the same purpose as mentioned earlier.

In [20]:
df23=df15.resample('M').mean()
df24=df16.resample('M').mean()
print(df23)
print(df24)



df_2324_corr =df23.corrwith(df24.set_axis(df23.index, axis='index', copy=False)) # Correlation matrice of df1h with df2 changing the index name with the index name of dh1 to compare them
print(df_2324_corr)

             25-IECS   26-IECS   27-IECS   34-IECS   35-IECS   36-IECS  \
0                                                                        
2020-12-31  2.584436  1.843444  1.775874  4.311243  4.416933  3.492764   
2021-01-31  2.601626  2.257697  1.433318  4.798633  4.064718  3.772303   
2021-02-28  2.258331  1.357464  1.269535  3.751156  4.134992  3.750915   

             63-IECS   64-IECS   65-IECS   67-IECS    9-IECS      mean  
0                                                                       
2020-12-31  5.880114  0.993112  5.440696  2.756923  1.290780  3.162393  
2021-01-31  5.606222  1.024511  3.885310  2.403820  1.499041  3.031564  
2021-02-28  7.158876  1.048315  7.925104  1.574045  1.363531  3.235660  
             25-IECS   26-IECS   27-IECS   34-IECS   35-IECS   36-IECS  \
0                                                                        
2021-03-31  1.774501  2.313914  1.230475  4.609124  5.074473  3.991454   
2021-04-30  2.654152  2.081092  1.259478  

The correlation value is enough for one apartment the 34 but not for the other.  
The mean value is no better than the apartment by themselves.