## Chennai's quest to quench its thirst

### Chennai & its water sources

Chennai also known as Madras, is the capital of the Indian state of Tamil Nadu. Located on the Coromandel Coast off the Bay of Bengal, it is the biggest cultural, economic and educational centre of south India. Population of Chennai is close to 9 million and is the 36th largest urban area by population in the world

Chennai is entirely dependent on ground water resources to meet its water needs. Ground water resources in Chennai are replenished by rain water and the city's average rainfall is 1,276 mm1.

Following are the major sources of water supply for Chennai city.

1. Four major reservoirs in Red Hills, Cholavaram, Poondi and Chembarambakkam
2. Cauvery water from Veeranam lake
3. Desalination plants at Nemelli and Minjur
4. Aquifers in Neyveli, Minjur and Panchetty
5. Tamaraipakkam, Poondi and Minjur Agriculture wells
6. CMWSSB Boreweels
7. Retteri lake

The above one is also roughly the descending order in which the contribution is made to overall fresh water requirements of the city. In addition to this, people make use of borewells and private tankers for their water needs.

Chennai is facing an acute water shortage due to shortage of rainfall for the past three years (and we had one of the worst floods in history the year before that!). As a result, the water in these resources are depleting along with the groundwater level. This [video](https://www.youtube.com/watch?v=iaG7kRcSxwA&feature=youtu.be) will give an idea about the current state.

### Content
This dataset has details about the water availability in the four main reservoirs over the last 15 years.
All the measurements are in mcft (million cubic feet).

Poondi
Cholavaram
Redhills
Chembarambakkam


In this notebook, let us explore the data of different water resources available.

## Import libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()
%matplotlib inline

import plotly.offline as py
from plotly import tools
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go

## Read the data

Firstly, we have data about the water availability in four major reservoirs that supply water to Chennai. This data spans from 2004 to 2019. All the measurements are in mcft (million cubic feet). Let us look at the top few lines.

In [2]:
df = pd.read_csv("../data/chennai_reservoir_levels.csv")
df["Date"] = pd.to_datetime(df["Date"], format='%d-%m-%Y')
df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM
0,2004-01-01,3.9,0.0,268.0,0.0
1,2004-01-02,3.9,0.0,268.0,0.0
2,2004-01-03,3.9,0.0,267.0,0.0
3,2004-01-04,3.9,0.0,267.0,0.0
4,2004-01-05,3.8,0.0,267.0,0.0


## Find out and compare the water levels of the 4 major resoviours over a period of time



In [3]:
import datetime

def scatter_plot(cnt_srs, color):
    trace = go.Scatter(
        x=cnt_srs.index[::-1],
        y=cnt_srs.values[::-1],
        showlegend=False,
        marker=dict(
            color=color,
        ),
    )
    return trace

cnt_srs = df["POONDI"]
cnt_srs.index = df["Date"]
trace1 = scatter_plot(cnt_srs, 'red')

cnt_srs = df["CHOLAVARAM"]
cnt_srs.index = df["Date"]
trace2 = scatter_plot(cnt_srs, 'blue')

cnt_srs = df["REDHILLS"]
cnt_srs.index = df["Date"]
trace3 = scatter_plot(cnt_srs, 'green')

cnt_srs = df["CHEMBARAMBAKKAM"]
cnt_srs.index = df["Date"]
trace4 = scatter_plot(cnt_srs, 'purple')

subtitles = ["Water Availability in Poondi reservoir - in mcft",
             "Water Availability in Cholavaram reservoir - in mcft",
             "Water Availability in Redhills reservoir - in mcft",
             "Water Availability in Chembarambakkam reservoir - in mcft"
            ]
fig = tools.make_subplots(rows=4, cols=1, vertical_spacing=0.08,
                          subplot_titles=subtitles)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 3, 1)
fig.append_trace(trace4, 4, 1)
fig['layout'].update(height=1200, width=800, paper_bgcolor='rgb(233,233,233)')
py.iplot(fig, filename='h2o-plots')


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



**Inference:**

* We could clearly see that evey year there is a decremental phase and a replenishment phase (mainly during october to december)
* There was a very bad water scarcity phase seen during 2004.
* We can also see a bad phase during 2014-15 but there was to water availability in two reservoirs (Redhills and Chembarambakkam) and so it was a savior.
* Now coming to recent times, the data shows that there is no water availability in any of the four major reservoirs.


## Combine the major water reservoirs to get a better picture and note down your observations

In [10]:
# Combining the data of all 4 major reservoirs
df["total"] = df["POONDI"] + df["CHOLAVARAM"] + df["REDHILLS"] + df["CHEMBARAMBAKKAM"]

# Using the scatter function we created
cnt_srs = df["total"]
cnt_srs.index = df["Date"]
trace5 = scatter_plot(cnt_srs, 'red')

# creating the subplot
fig = tools.make_subplots(rows=1, cols=1, vertical_spacing=0.08,
                          subplot_titles=["Total water availability from all four reservoirs - in mcft"])
# adding the desired graph to the subplot
fig.append_trace(trace5, 1, 1)

# Creating the layout
fig['layout'].update(height=400, width=800, paper_bgcolor='rgb(233,233,233)')
py.iplot(fig, filename='h2o-plots')


## Rainfall Levels in Reservoir Regions

Now there are two clear facts:

There is no water in any of the major reservoirs
Reservoirs depend on rain for their replenishment.

### Next we can look at the rainfall data in these reservoir regions to analyze the rainfall months. Let us take the total monthly rainfall in these reservoir regions and plot the same.

* Read the data

* Combine the rainfall data for major reservoirs

* Plot the rainfall data

* Note down your observation

Note - Hover over the graph to see the better results

In [11]:
# reading the rainfall data
rain_df = pd.read_csv("../data/chennai_reservoir_rainfall.csv")

# converting the dates to date time format for analysis
rain_df["Date"] = pd.to_datetime(rain_df["Date"], format='%d-%m-%Y')

rain_df.head()

Unnamed: 0,Date,POONDI,CHOLAVARAM,REDHILLS,CHEMBARAMBAKKAM
0,2004-01-01,0.0,0.0,0.0,0.0
1,2004-01-02,0.0,0.0,0.0,0.0
2,2004-01-03,0.0,0.0,0.0,0.0
3,2004-01-04,0.0,0.0,0.0,0.0
4,2004-01-05,0.0,0.0,0.0,0.0


In [12]:
# Adding the rainfall data for the areas of reservoirs
rain_df["total"] = rain_df["POONDI"] + rain_df["CHOLAVARAM"] + rain_df["REDHILLS"] + rain_df["CHEMBARAMBAKKAM"]
rain_df["total"] = rain_df["POONDI"] + rain_df["CHOLAVARAM"] + rain_df["REDHILLS"] + rain_df["CHEMBARAMBAKKAM"]

# function to plot
def bar_plot(cnt_srs, color):
    trace = go.Bar(
        x=cnt_srs.index[::-1],
        y=cnt_srs.values[::-1],
        showlegend=False,
        marker=dict(
            color=color,
        ),
    )
    return trace

# creating the year month column
rain_df["YearMonth"] = pd.to_datetime(rain_df["Date"].dt.year.astype(str) + rain_df["Date"].dt.month.astype(str), format='%Y%m')

# ploting the rainfall data
cnt_srs = rain_df.groupby("YearMonth")["total"].sum()
trace5 = bar_plot(cnt_srs, 'red')

fig = tools.make_subplots(rows=1, cols=1, vertical_spacing=0.08,
                          subplot_titles=["Total rainfall in all four reservoir regions - in mm"])
fig.append_trace(trace5, 1, 1)


fig['layout'].update(height=400, width=800, paper_bgcolor='rgb(233,233,233)')
py.iplot(fig, filename='h2o-plots')


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



**Inferences:**

* Looks like the city gets some rains in the month of June, July, August and September due to south west monsoon.
* Major rainfall happens during October and November of every year which is due to North-east monsoon.
* During the initial years rain from north-east monsoon is much higher than south-west monsoon. But seems like last few years, they both are similar (reduction in rains from north-east monsoon).
* We have got some good rains in August and September 2019, but the water reservoir levels are yet to go up.


## Plot the yearly rainfall data and note your observations

In [13]:
rain_df["Year"] = pd.to_datetime(rain_df["Date"].dt.year.astype(str), format='%Y')

cnt_srs = rain_df.groupby("Year")["total"].sum()
trace5 = bar_plot(cnt_srs, 'red')

fig = tools.make_subplots(rows=1, cols=1, vertical_spacing=0.08,
                          subplot_titles=["Total yearly rainfall in all four reservoir regions - in mm"])
fig.append_trace(trace5, 1, 1)


fig['layout'].update(height=400, width=800, paper_bgcolor='rgb(233,233,233)')
py.iplot(fig, filename='h2o-plots')


The amount of rainfall in 2018 is the lowest of all the years from 2004.

We are getting some good rains so far in 2019. Hopefully this continues.



## Water shortage estimation

Since all the data is available in the public domain, we want to do some analysis and see whether we can estimate this water shortage ahead of time so as to plan for it?

First let us just take a simple step to compare the sum of water levels at the beginning of summer (Let us take February 1st of every year). This is because there will not be any replenishment till the next monsson and the amount of water stored in the four reservoirs itself will be clear indicator of how long can the water be managed during summer and whether there should be some backup plans.



In [14]:
temp_df = df[(df["Date"].dt.month==2) & (df["Date"].dt.day==1)]

cnt_srs = temp_df["total"]
cnt_srs.index = temp_df["Date"]
trace5 = bar_plot(cnt_srs, 'red')

fig = tools.make_subplots(rows=1, cols=1, vertical_spacing=0.08,
                          subplot_titles=["Availability of total reservoir water (4 major ones) at the beginning of summer"])
fig.append_trace(trace5, 1, 1)


fig['layout'].update(height=400, width=800, paper_bgcolor='rgb(233,233,233)')
py.iplot(fig, filename='h2o-plots')


This clearly indicates that there is not enough water in the reservoirs at the beginning of summer 2019 to cope up with the needs of the city. Infact this is the second worst level after 2004 (Also it is important to note that city has grown a lot bigger from 2004 to 2019).

The city has just had 1000 mcft of water at the beginning of the summer which is much worser than the 2017 levels of 1500 mcft. So just by looking at the very low water level, the water scarcity could have been forecasted without even computing the consumption level per day.

## Conclusion :

Conclusion:
The water scarcity of 2004 has brought Veeranam lake as the new means of water supply for the city.

Hopefully, this current scarcity (July 2019) will bring more additional sources of water for the ailing city. The city has grown a lot in the last 15 years and so need additional water resources to manage the needs.

The city needs to devise better scarcity control methods by estimating the needs ahead of time.

## Activity

### Can you think of a similar urban large scale problem having a real time affect that you would like to analyze and solve with the help data? Note them, break down the possible ways and steps to solve the same