# DSLab Homework 1 - Data Science with CO2

## Hand-in Instructions

- __Due: 23.03.2021 23h59 CET__
- `git push` your final verion to the master branch of your group's Renku repository before the due
- check if `Dockerfile`, `environment.yml` and `requirements.txt` are properly written
- add necessary comments and discussion to make your codes readable

## Carbosense

The project Carbosense establishes a uniquely dense CO2 sensor network across Switzerland to provide near-real time information on man-made emissions and CO2 uptake by the biosphere. The main goal of the project is to improve the understanding of the small-scale CO2 fluxes in Switzerland and concurrently to contribute to a better top-down quantification of the Swiss CO2 emissions. The Carbosense network has a spatial focus on the City of Zurich where more than 50 sensors are deployed. Network operations started in July 2017.

<img src="http://carbosense.wdfiles.com/local--files/main:project/CarboSense_MAP_20191113_LowRes.jpg" width="500">

<img src="http://carbosense.wdfiles.com/local--files/main:sensors/LP8_ZLMT_3.JPG" width="156">  <img src="http://carbosense.wdfiles.com/local--files/main:sensors/LP8_sensor_SMALL.jpg" width="300">

## Description of the homework

In this homework, we will curate a set of **CO2 measurements**, measured from cheap but inaccurate sensors, that have been deployed in the city of Zurich from the Carbosense project. The goal of the exercise is twofold: 

1. Learn how to deal with real world sensor timeseries data, and organize them efficiently using python dataframes.

2. Apply data science tools to model the measurements, and use the learned model to process them (e.g., detect drifts in the sensor measurements). 

The sensor network consists of 46 sites, located in different parts of the city. Each site contains three different sensors measuring (a) **CO2 concentration**, (b) **temperature**, and (c) **humidity**. Beside these measurements, we have the following additional information that can be used to process the measurements: 

1. The **altitude** at which the CO2 sensor is located, and the GPS coordinates (latitude, longitude).

2. A clustering of the city of Zurich in 17 different city **zones** and the zone in which the sensor belongs to. Some characteristic zones are industrial area, residential area, forest, glacier, lake, etc.

## Prior knowledge

The average value of the CO2 in a city is approximately 400 ppm. However, the exact measurement in each site depends on parameters such as the temperature, the humidity, the altitude, and the level of traffic around the site. For example, sensors positioned in high altitude (mountains, forests), are expected to have a much lower and uniform level of CO2 than sensors that are positioned in a business area with much higher traffic activity. Moreover, we know that there is a strong dependence of the CO2 measurements, on temperature and humidity.

Given this knowledge, you are asked to define an algorithm that curates the data, by detecting and removing potential drifts. **The algorithm should be based on the fact that sensors in similar conditions are expected to have similar measurements.** 

## To start with

The following csv files in the `../data/carbosense-raw/` folder will be needed: 

1. `CO2_sensor_measurements.csv`
    
   __Description__: It containts the CO2 measurements `CO2`, the name of the site `LocationName`, a unique sensor identifier `SensorUnit_ID`, and the time instance in which the measurement was taken `timestamp`.
    
2. `temperature_humidity.csv`

   __Description__: It contains the temperature and the humidity measurements for each sensor identifier, at each timestamp `Timestamp`. For each `SensorUnit_ID`, the temperature and the humidity can be found in the corresponding columns of the dataframe `{SensorUnit_ID}.temperature`, `{SensorUnit_ID}.humidity`.
    
3. `sensor_metadata.csv`

   __Description__: It contains the name of the site `LocationName`, the zone index `zone`, the altitude in meters `altitude`, the longitude `lon`, and the latitude `lat`. 

Import the following python packages:

In [None]:
import pandas as pd
import numpy as np
import sklearn
import plotly.express as px
import plotly.graph_objects as go
import os
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
from sklearn.linear_model import LinearRegression

In [None]:
pd.options.mode.chained_assignment = None

## PART I: Handling time series with pandas (10 points)

### a) **8/10**

Merge the `CO2_sensor_measurements.csv`, `temperature_humidity.csv`, and `sensors_metadata.csv`, into a single dataframe. 

* The merged dataframe contains:
    - index: the time instance `timestamp` of the measurements
    - columns: the location of the site `LocationName`, the sensor ID `SensorUnit_ID`, the CO2 measurement `CO2`, the `temperature`, the `humidity`, the `zone`, the `altitude`, the longitude `lon` and the latitude `lat`.

| timestamp | LocationName | SensorUnit_ID | CO2 | temperature | humidity | zone | altitude | lon | lat |
|:---------:|:------------:|:-------------:|:---:|:-----------:|:--------:|:----:|:--------:|:---:|:---:|
|    ...    |      ...     |      ...      | ... |     ...     |    ...   |  ... |    ...   | ... | ... |



* For each measurement (CO2, humidity, temperature), __take the average over an interval of 30 min__. 

* If there are missing measurements, __interpolate them linearly__ from measurements that are close by in time.

__Hints__: The following methods could be useful

1. ```python 
pandas.DataFrame.resample()
``` 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
    
2. ```python
pandas.DataFrame.interpolate()
```
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html
    
3. ```python
pandas.DataFrame.mean()
```
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html
    
4. ```python
pandas.DataFrame.append()
```
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

In [None]:
!git lfs pull

In [None]:
co2_measurements = pd.read_csv("../data/carbosense-raw/CO2_sensor_measurements.csv",
                                sep="\t",
                                parse_dates=['timestamp'])
# rename for consistency of column names accross the dataframes
co2_measurements = co2_measurements.rename({'SensorUnit_ID':'sensor'}, axis=1)

In [None]:
co2_measurements.head()

In [None]:
co2_measurements.isnull().any(None) # verify null values

In [None]:
co2_measurements['sensor'] = co2_measurements['sensor'].astype(int)

In [None]:
# function to resample the C02 measurements dataframe
def co2_measurements_resample(df):
    res = df['CO2'].resample('30min').mean().to_frame()
    # Some periods of 30min have 0 datapoints, therefore we need to interpolate
    res = res.interpolate('linear', axis=0)
    # save the location name for each sensor
    res['LocationName'] = df['LocationName'].values[0]
    return res

co2_measurements = co2_measurements.set_index('timestamp') \
                                   .groupby('sensor') \
                                   .apply(co2_measurements_resample)

In [None]:
co2_measurements.isnull().any(None)

In [None]:
temp_humidity = pd.read_csv("../data/carbosense-raw/temperature_humidity.csv",
                            sep="\t",
                            parse_dates=['Timestamp'])
# rename for consistency of column names accross the dataframes
temp_humidity = temp_humidity.rename({'Timestamp':'timestamp'}, axis=1)

In [None]:
temp_humidity.head()

In [None]:
# Mely the dataframe into long format keeping only the timestamp
temp_humidity = pd.melt(temp_humidity, id_vars='timestamp', var_name='sensor.temp_humidity', value_name='measurement')
# split the Var column to get the sensor ID and the column name in seperate columns
temp_humidity[['sensor','temp_humidity']] = temp_humidity['sensor.temp_humidity'].str.split('.', expand=True)
# Finally pivot the dataframe to get it in a desired format
temp_humidity = temp_humidity.pivot(index=['timestamp','sensor'], columns='temp_humidity', values='measurement').reset_index()

In [None]:
temp_humidity

In [None]:
temp_humidity['sensor'] = temp_humidity['sensor'].astype(int)

In [None]:
temp_humidity.isnull().any(None)

In [None]:
# resample the temperature humidity dataframe every 30 min
temp_humidity = temp_humidity.set_index('timestamp') \
                             .groupby('sensor') \
                             .apply(lambda df: df[['temperature', 'humidity']]
                                               .interpolate('linear', axis=0) \
                                               .resample('30min').mean())

In [None]:
temp_humidity.isnull().any(None)

In [None]:
temp_humidity

In [None]:
metadata = pd.read_csv("../data/carbosense-raw/sensors_metadata.csv", sep="\t")

In [None]:
metadata.head()

In [None]:
# merge all three dataframes together
temp_humidity = temp_humidity.reset_index()
co2_measurements = co2_measurements.reset_index()
final_df = pd.merge(temp_humidity, co2_measurements, how='inner', right_on=['sensor','timestamp'], left_on=['sensor','timestamp'])
final_df = pd.merge(final_df, metadata, left_on='LocationName', right_on='LocationName', validate='m:1')

In [None]:
final_df.isnull().any(None)

In [None]:
final_df = final_df.set_index('timestamp')

In [None]:
final_df

### b) **2/10** 

Export the curated and ready to use timeseries to a csv file, and properly push the merged csv to Git LFS.

In [None]:
os.chdir('..')

In [None]:
!git lfs track -l 

In [None]:
save_path = 'data/carbosense-raw/final_df.csv'

In [None]:
!git lfs track data/carbosense-raw/final_df.csv

In [None]:
!git lfs track -l 

In [None]:
final_df.to_csv(save_path, sep='\t')

In [None]:
# !git add  data/carbosense-raw/final_df.csv

In [None]:
# !git commit -m "df csv file"

In [None]:
# !git push

In [None]:
os.chdir('notebooks')

## PART II: Data visualization (15 points)

### a) **5/15** 
Group the sites based on their altitude, by performing K-means clustering. 
- Find the optimal number of clusters using the [Elbow method](https://en.wikipedia.org/wiki/Elbow_method_(clustering)). 
- Wite out the formula of metric you use for Elbow curve. 
- Perform clustering with the optimal number of clusters and add an additional column `altitude_cluster` to the dataframe of the previous question indicating the altitude cluster index. 
- Report your findings.

__Note__: [Yellowbrick](http://www.scikit-yb.org/) is a very nice Machine Learning Visualization extension to scikit-learn, which might be useful to you. 

In [None]:
metadata

In [None]:
site_data= metadata[['LocationName', 'altitude', 'lat', 'lon']]

In [None]:
# extracting altitude values of the sites 
X= site_data.altitude.values.reshape(-1,1)

In [None]:
# Here we use the Elbow method to choose the optimal number of clusters
# We use KElbowVisualizer from Yelloybrick library to fit the model with different K values and choose the most optimal
model= KMeans()
visualizer = KElbowVisualizer(model, k=(2,12), timings = False)
visualizer.fit(X)
visualizer.show()

In [None]:
# From the visualization in the previous cell, the optimal k by the elbow method is 4
optimal_k = 4

In [None]:
# We fit the model with the optimal number of clusters
model = KMeans(n_clusters= optimal_k)
model.fit(X)

In [None]:
# we add the assigned cluster index to the dataframe
assigned_clusters = model.labels_
site_data['altitude_cluster'] = assigned_clusters

In [None]:
site_data.head()

In [None]:
# We add the altitude_cluster column to the dataframe 
final_df = pd.merge(final_df.reset_index(), site_data[["LocationName", "altitude_cluster"]], on= "LocationName").set_index("timestamp")

In [None]:
final_df

### b) **4/15** 

Use `plotly` (or other similar graphing libraries) to create an interactive plot of the monthly median CO2 measurement for each site with respect to the altitude. 

Add proper title and necessary hover information to each point, and give the same color to stations that belong to the same altitude cluster.

In [None]:
df = co2_measurements.groupby(by='LocationName').median()

In [None]:
df = df.drop(columns= ["sensor"]).rename(columns= {"CO2" : "CO2_median"})

In [None]:
df

In [None]:
# We merge the two dataframes found
result = pd.merge(df, site_data, on= 'LocationName') 
result["altitude_cluster"] = result["altitude_cluster"].astype(str)

In [None]:
result.head()

In [None]:
# we visualize the monthly median as a function of the altitude colored according to the clustering result
px.scatter(result, x= 'altitude', y= 'CO2_median', color= 'altitude_cluster', title= "Clustering of sensors by altitude", hover_name="LocationName")

### c) **6/15**

Use `plotly` (or other similar graphing libraries) to plot an interactive time-varying density heatmap of the mean daily CO2 concentration for all the stations. Add proper title and necessary hover information.

__Hints:__ Check following pages for more instructions:
- [Animations](https://plotly.com/python/animations/)
- [Density Heatmaps](https://plotly.com/python/mapbox-density-heatmaps/)

In [None]:
# We compute the mean daily CO2 measurements for each site 
daily_co2_measurements= co2_measurements.groupby(by = ["LocationName", co2_measurements['timestamp'].dt.day]).mean().drop(columns= ["sensor"]).reset_index()

In [None]:
daily_co2_measurements.head()

In [None]:
site_data.head()

In [None]:
# We merge the two dataframes 
daily_co2_measurements = pd.merge(daily_co2_measurements, site_data, on= ["LocationName"])

In [None]:
daily_co2_measurements.head()

In [None]:
daily_co2_measurements = daily_co2_measurements.rename(columns={'timestamp' : 'day'})


In [None]:
# compute min and max CO2 mean 
min_CO2_mean= s.CO2.values.min()
max_CO2_mean = s.CO2.values.max()

In [None]:
fig = px.density_mapbox(daily_co2_measurements, lat='lat', lon='lon', z='CO2', radius=10,
                        center=dict(lat=daily_co2_measurements.lat.mean(), lon=daily_co2_measurements.lon.mean()), zoom=11,
                        mapbox_style="stamen-terrain", animation_frame='day', animation_group= 'LocationName', 
                        hover_name='LocationName', title= 'Time varying Density heatmap of mean CO2 measurements per day',
                        range_color=[0, 500], height=900, width=900, opacity=0.8)

fig.show()

## PART III: Model fitting for data curation (35 points)

### a) **2/35**

The domain experts in charge of these sensors report that one of the CO2 sensors `ZSBN` is exhibiting a drift on Oct. 24. Verify the drift by visualizing the CO2 concentration of the drifting sensor and compare it with some other sensors from the network. 

In [None]:
# keep only C02 values after Oct 20 and for a few selected regions in order to compare
df_after_oct_20 = final_df[final_df.index.day >= 20]
df_comparison = df_after_oct_20[df_after_oct_20['LocationName'].isin(['ZSBN','ZLDW','SMHK','ZWCH'])]

In [None]:
df_comparison

In [None]:
fig = px.line(df_comparison,
              y='CO2',
              color='LocationName',
              labels={
                'timestamp':'time',
                'CO2':'CO2 (ppm)',
                'LocationName':'Sensor'
              }, 
              title='CO2 Level (ppm) after October 20th'
             )
fig.update_xaxes(
    dtick=24*60*60*1000
)
fig.update_layout(
    hovermode='x unified'
)
fig.show()

### b) **8/35**

The domain experts ask you if you could reconstruct the CO2 concentration of the drifting sensor had the drift not happened. You decide to:
- Fit a linear regression model to the CO2 measurements of the site, by considering as features the covariates not affected by the malfunction (such as temperature and humidity)
- Create an interactive plot with `plotly` (or other similar graphing libraries):
    - the actual CO2 measurements
    - the values obtained by the prediction of the linear model for the entire month of October
    - the __confidence interval__ obtained from cross validation
- What do you observe? Report your findings.

__Note:__ Cross validation on time series is different from that on other kinds of datasets. The following diagram illustrates the series of training sets (in orange) and validation sets (in blue). For more on time series cross validation, there are a lot of interesting articles available online. scikit-learn provides a nice method [`sklearn.model_selection.TimeSeriesSplit`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html).

![ts_cv](https://player.slideplayer.com/86/14062041/slides/slide_28.jpg)

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
import scipy.stats
import statsmodels.api as sm

In [None]:
# Dataframe for regression, only ZSBN data is kept, with the unaffected covariates
reg_df = final_df.loc[final_df['LocationName'] == 'ZSBN', ['temperature', 'humidity', 'CO2']]
reg_df['time'] = reg_df.reset_index().index

In [None]:
# Function that performs rolling cross validation and returns the best model (According to MSE)
# and the used training set
def rolling_cross_validation(X, y):
    # split the dataset
    series_split = TimeSeriesSplit(n_splits=50)
    model = LinearRegression()
    min_mse = np.inf
    final_train_ind = None
    # iterate on the different train-test split in order to pick the best model
    for train_indices, test_indices in series_split.split(X):
        train_x, train_y = X[train_indices], y[train_indices]
        test_x, test_y = X[test_indices], y[test_indices]
        model.fit(train_x, train_y)
        mse = mean_squared_error(test_y, model.predict(test_x))
        if mse <= min_mse:
            min_mse = mse
            final_train_ind = train_indices
    return final_train_ind 

In [None]:
# only keep values before OCT 24 for training
X_train = reg_df.loc[reg_df.index.day < 24 , ['temperature', 'humidity', 'time']].values
y_train = reg_df.loc[reg_df.index.day < 24, 'CO2'].values
X_pred = reg_df[['temperature', 'humidity', 'time']].values

In [None]:
train_indices = rolling_cross_validation(X_train, y_train)
X_train = X_train[train_indices]
y_train = y_train[train_indices]

In [None]:
# We use StatsModels in order to compute the 95% CIs
ols = sm.OLS(y_train, sm.add_constant(X_train))
results = ols.fit()
predictions = results.get_prediction(sm.add_constant(X_pred))
conf_intervals =  predictions.conf_int()

In [None]:
reg_df['lower_ci'] = conf_intervals[:, 0]
reg_df['upper_ci'] = conf_intervals[:, 1]
reg_df['predicted'] = predictions.predicted_mean

In [None]:
fig = go.Figure([go.Scatter(x=reg_df.index,
                              y=reg_df['predicted'],
                              mode='lines', name='Predicted'), 
                 go.Scatter(x=reg_df.index,
                            y=reg_df['CO2'],
                            mode='lines', name='Measured'),
                 go.Scatter( x=reg_df.index.append(reg_df.index[::-1]),
                             y=list(reg_df['upper_ci']) + list(reg_df['lower_ci'][::-1]),
                             fill='toself',
                             fillcolor='rgba(0,100,80,0.2)',
                             line=dict(color='rgba(255,255,255,0)'),
                             name='95 % CI')
                 ])

fig.update_xaxes(dtick=24*60*60*1000)
fig.update_layout(hovermode='x unified', title='Measured vs Predicted CO2 levels For ZSBN Sensor in October')
fig.show()

### c) **10/35**

In your next attempt to solve the problem, you decide to exploit the fact that the CO2 concentrations, as measured by the sensors __experiencing similar conditions__, are expected to be similar.

- Find the sensors sharing similar conditions with `ZSBN`. Explain your definition of "similar condition".
- Fit a linear regression model to the CO2 measurements of the site, by considering as features:
    - the information of provided by similar sensors
    - the covariates associated with the faulty sensors that were not affected by the malfunction (such as temperature and humidity).
- Create an interactive plot with `plotly` (or other similar graphing libraries):
    - the actual CO2 measurements
    - the values obtained by the prediction of the linear model for the entire month of October
    - the __confidence interval__ obtained from cross validation
- What do you observe? Report your findings.

## Answer

We can consider "similar conditions" as being in close altitude and in close environmental status on average. Hence similar sensors as ZSBN are sensors that are in the same cluster and that have the same temperature and humidity on average as ZSBN. 

In [None]:
final_df.head()

In [None]:
aver_conditions= final_df.groupby("LocationName").mean()

In [None]:
aver_conditions.head()

In [None]:
result

In [None]:
#aver_conditions = pd.merge(aver_conditions.reset_index(), result, on= "LocationName")[["LocationName", "temperature", "humidity", "altitude_cluster"]]
aver_conditions = aver_conditions.reset_index()[["LocationName", "temperature", "humidity", "altitude_cluster"]]

In [None]:
aver_conditions

In [None]:
# extracting the cluster of ZSBN sensor
zsbn_cluster = aver_conditions[aver_conditions.LocationName == "ZSBN"].altitude_cluster.iloc[0]

In [None]:
# taking only sensors in the same cluster as ZSBN
aver_conditions = aver_conditions[aver_conditions.altitude_cluster == zsbn_cluster]

In [None]:
aver_conditions.head()

In [None]:
# Extracting average humidity and average temperature of ZSBN sensor
zsbn_humidity = aver_conditions[aver_conditions.LocationName == "ZSBN"]["humidity"].iloc[0]
zsbn_temp = aver_conditions[aver_conditions.LocationName == "ZSBN"]["temperature"].iloc[0]

In [None]:
# extracting similar sensors. Here the choice of the constants can be changed
similar = aver_conditions.loc[(np.abs(aver_conditions.temperature - zsbn_temp) <= 0.5) & (np.abs(aver_conditions.humidity - zsbn_humidity) <= 1) ]

In [None]:
# extracting the location name of the sensors as a set
similar_sensors = set(similar[similar["LocationName"] != "ZSBN"].LocationName.values)

In [None]:
similar_sensors

In [None]:
features = final_df[final_df.LocationName == "ZSBN"][["temperature", "humidity"]]

In [None]:
features.head()

In [None]:
X = np.array(features)

In [None]:
X.shape

In [None]:
# constructing the feature vector i.e [ZSBN_temperature, ZSBN_humidity, CO2_measurements_of_similar_sensors]
for sensor in similar_sensors:
    t = final_df[final_df.LocationName == sensor]["CO2"]
    t = np.array(t).reshape(-1,1)
    X = np.concatenate((X, t), axis=1)

In [None]:
X.shape

In [None]:
# constructing the target vector : CO2 measurement of the faulty sensor ZSBN
Y= np.array(final_df[final_df.LocationName == "ZSBN"]["CO2"]).reshape(-1,1)

In [None]:
Y.shape

In [None]:
# fit linear regression model
model= LinearRegression()
model.fit(X, Y)

In [None]:
features["actual_CO2"]= final_df[final_df.LocationName == "ZSBN"]["CO2"]

In [None]:
features["predicted_CO2"]= model.predict(X)

### d) **10/35**

Now, instead of feeding the model with all features, you want to do something smarter by using linear regression with fewer features.

- Start with the same sensors and features as in question c)
- Leverage at least two different feature selection methods
- Create similar interactive plot as in question c)
- Describe the methods you choose and report your findings

## First Feature selection method: Univariate selection based on F-score

One way to select the most relevant features is by using Univariate selection. The method consists of computing the correlation between each feature and the target variable (Here, the CO2 measurement). We then keep the most k correlated features.
This approach detects only linear relationship between the covariate and the target.

In [None]:
from sklearn.feature_selection import f_regression
from sklearn.feature_selection import SelectKBest

In [None]:
X_new = SelectKBest(f_regression, k=2).fit_transform(X, Y.ravel())

In [None]:
model= LinearRegression()
model.fit(X_new, Y)

### e) **5/35**

Eventually, you'd like to try something new - __Bayesian Structural Time Series Modelling__ - to reconstruct counterfactual values, that is, what the CO2 measurements of the faulty sensor should have been, had the malfunction not happened on October 24. You will use:
- the information of provided by similar sensors - the ones you identified in question c)
- the covariates associated with the faulty sensors that were not affected by the malfunction (such as temperature and humidity).

To answer this question, you can choose between a Python port of the CausalImpact package (such as https://github.com/dafiti/causalimpact) or the original R version (https://google.github.io/CausalImpact/CausalImpact.html) that you can run in your notebook via an R kernel (https://github.com/IRkernel/IRkernel).

Before you start, watch first the [presentation](https://www.youtube.com/watch?v=GTgZfCltMm8) given by Kay Brodersen (one of the creators of the causal impact implementation in R), and this introductory [ipython notebook](http://nbviewer.jupyter.org/github/dafiti/causalimpact/blob/master/examples/getting_started.ipynb) with examples of how to use the python package.

- Report your findings:
    - Is the counterfactual reconstruction of CO2 measurements significantly different from the observed measurements?
    - Can you try to explain the results?

In [None]:
from causalimpact import CausalImpact

In [None]:
d = final_df[final_df.LocationName == 'ZSBN'][['CO2','temperature','humidity']]
d.head()

In [None]:
similar_df = final_df[final_df.LocationName.isin(similar_sensors)][['CO2','LocationName','temperature','humidity']]
similar_df.head()

In [None]:
cols = list(d.columns)
[cols.append(x) for x in similar_df.LocationName.unique()]
cols

In [None]:
observations = d.join(similar_df.pivot(columns='LocationName', values='CO2'))[cols]
observations.head()

In [None]:
ci = CausalImpact(observations,
                  ['2017-10-01 00:00:00', '2017-10-23 23:30:00'],
                  ['2017-10-24 00:00:00', '2017-10-31 23:30:00'])

In [None]:
ci.plot(figsize=(20, 15))

In [None]:
print(ci.summary(output='report'))

# That's all, folks!