<h1><center>Understanding: Acea Smart Water Analytics</center></h1>

![](https://storage.googleapis.com/kaggle-competitions/kaggle/24191/logos/header.png)

<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='color:#ccc; background:#112; border:0' role="tab" aria-controls="home"><center>Quick Navigation</center></h3>

* [0. Brief of Water Types](#0)
    
* [1. Water Types: Aquifer](#1)

* [1a. Auser](#1a)
    
* [1b. Doganella](#1b)
    
* [1c. Luco](#1c)
    
* [1d. Petrignano](#1d)
    
* [2. Water Type: Lake](#2)
    
* [2a. Bilancino](#2a)

* [3. Water Type: River](#3)
    
* [3a. Arno](#3a)

* [4. Water Type: Water Spring](#4)
    
* [4a. Amiata](#4a)
    
* [4b. Lupa](#4b)
    
* [4c. Madonna di Canneto](#4c)

<a id="0"></a>
<h2 style='background:#112; border:0; color:#ccc'><center>Brief of Water Types<center><h2>


- An **aquifer** is an underground layer of rock that holds groundwater. Groundwater is rain or melted snow that has seeped into the ground and is held there. Aquifers are filled slowly. For this reason, aquifers can dry up when people drain them faster than they can be refilled—a process called aquifer depletion. Aquifers can be drained by man-made wells or they can flow out naturally in springs.
<img src="https://www.dtn.com/wp-content/uploads/2016/05/aquifer_diagram_crop.jpg">

- A **spring** is the result of an aquifer being filled to the point that the water overflows onto the land surface. They range in size from intermittent seeps, which flow only after much rain, to huge pools flowing hundreds of millions of gallons daily. 
<img width="700" height="500" src="https://www.mysuwanneeriver.com/DocumentCenter/View/11121/Anatomy-of-a-Spring-Cross-Section-JPG?bidId=">

- A **river** is a natural flowing watercourse, usually freshwater, flowing towards an ocean, sea, lake or another river. 
<img width="700" height="500" src="https://3.bp.blogspot.com/-I_9DNMewaKo/VrWloP4yBqI/AAAAAAAAA-s/r2MUeOBkrNk/s1600/12400775_926135070809054_8373180960300481606_n.jpg">

-  A **Lake** is an area filled with water, localized in a basin, surrounded by land, apart from any river or other outlet that serves to feed or drain the lake

<left><img  src="https://2.bp.blogspot.com/-cQYDI0xTQDk/UeVaic4e8_I/AAAAAAAAAb4/FtQ0PEmux20/w1200-h630-p-k-no-nu/hydrologic_cycle.gif"></left>


In [None]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go

from matplotlib.offsetbox import AnchoredText
from mpl_toolkits.axes_grid1 import make_axes_locatable

import seaborn as sns
import warnings
warnings.filterwarnings(action='ignore')
plt.rcParams['figure.dpi'] = 200

<a id="1"></a>
<h2 style='background:#112; border:0; color:#ccc'><center>Water Type: Aquifer<center><h2>

<a id="1a"></a>
<h2 style='background:white; border:0; color:#112'><center>- Auser -<center><h2>

Description: This waterbody consists of two subsystems, called NORTH and SOUTH, where the former partly influences the behavior of the latter. Indeed, the north subsystem is a water table (or unconfined) aquifer while the south subsystem is an artesian (or confined) groundwater.

The levels of the NORTH sector are represented by the values of the SAL, PAG, CoS and DIEC wells, while the levels of the SOUTH sector by the LT2 well.


In [None]:
auser = pd.read_csv("../input/acea-water-prediction/Aquifer_Auser.csv")

In [None]:
auser.shape

In [None]:
auser.columns

In [None]:
auser.head(10)

In [None]:
auser.tail(10)

In [None]:
auser.describe()

<h3> Features to predict: Depth_to_Groundwater_LT2, Depth_to_Groundwater_SAL, Depth_to_Groundwater_CoS</h3>

<h3>- Depth_to_Groundwater_LT2<h3>

In [None]:
auser['Depth_to_Groundwater_LT2'].isnull().sum()

In [None]:
auser['Depth_to_Groundwater_LT2'].notnull().sum()

In [None]:
auser.loc[auser['Depth_to_Groundwater_LT2'].isnull(), 'flag'] = True
auser.loc[auser['Depth_to_Groundwater_LT2'].notnull(), 'flag'] = False

In [None]:
ds = auser['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]
ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_LT2 null values', 
    width=700,
    height=500
)
fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = auser['Depth_to_Groundwater_LT2'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_LT2 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_SAL</h3>

In [None]:
auser.loc[auser['Depth_to_Groundwater_SAL'].isnull(), 'flag'] = True
auser.loc[auser['Depth_to_Groundwater_SAL'].notnull(), 'flag'] = False

In [None]:
ds = auser['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_SAL null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = auser['Depth_to_Groundwater_SAL'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_SAL distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3> - Depth_to_Groundwater_CoS </h3>

In [None]:
auser.loc[auser['Depth_to_Groundwater_CoS'].isnull(), 'flag'] = True
auser.loc[auser['Depth_to_Groundwater_CoS'].notnull(), 'flag'] = False

In [None]:
ds = auser['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_CoS null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = auser['Depth_to_Groundwater_CoS'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_CoS distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="1b"></a>
<h2 style='background:white; border:0; color:#112'><center>- Doganella -<center><h2>

Description: The wells field Doganella is fed by two underground aquifers not fed by rivers or lakes but fed by meteoric infiltration. The upper aquifer is a water table with a thickness of about 30m. The lower aquifer is a semi-confined artesian aquifer with a thickness of 50m and is located inside lavas and tufa products. These aquifers are accessed through wells called Well 1, ..., Well 9. Approximately 80% of the drainage volumes come from the artesian aquifer. The aquifer levels are influenced by the following parameters: rainfall, humidity, subsoil, temperatures and drainage volumes.

In [None]:
dogan= pd.read_csv("../input/acea-water-prediction/Aquifer_Doganella.csv")

In [None]:
dogan.shape

In [None]:
dogan.columns

In [None]:
dogan.head()

In [None]:
dogan.tail()

In [None]:
dogan.describe()

<h3> Feature to predict:</h3>
<h4>Depth_to_Groundwater_Pozzo_1,Depth_to_Groundwater_Pozzo_2, Depth_to_Groundwater_Pozzo_3, Depth_to_Groundwater_Pozzo_4, Depth_to_Groundwater_Pozzo_5, Depth_to_Groundwater_Pozzo_6, Depth_to_Groundwater_Pozzo_7, Depth_to_Groundwater_Pozzo_8, Depth_to_Groundwater_Pozzo_9 </h4>

<h3>- Depth_to_Groundwater_Pozzo_1</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_1'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_1'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_1 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_1'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_1 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_2</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_2'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_2'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_2 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_2'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_2 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_3</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_3'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_3'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_3 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_3'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_3 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_4</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_4'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_4'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_4 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_4'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_4 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_5</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_5'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_5'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_5 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_5'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_5 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_6</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_6'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_6'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_6 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_6'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_6 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_7<h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_7'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_7'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_7 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_7'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_7 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_8<h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_8'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_8'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_8 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_8'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_8 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_Pozzo_9</h3>

In [None]:
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_9'].isnull(), 'flag'] = True
dogan.loc[dogan['Depth_to_Groundwater_Pozzo_9'].notnull(), 'flag'] = False

In [None]:
ds1 = dogan['flag'].value_counts().reset_index()
ds1.columns = [
    'isNull', 
    'percent'
]

ds1['percent'] /= len(auser)

fig = px.pie(
    ds1, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Pozzo_9 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = dogan['Depth_to_Groundwater_Pozzo_9'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Pozzo_9 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="1c"></a>
<h2 style='background:white; border:0; color:#112'><center>- Luco -<center><h2>

Description: The Luco wells field is fed by an underground aquifer. This aquifer not fed by rivers or lakes but by meteoric infiltration at the extremes of the impermeable sedimentary layers. Such aquifer is accessed through wells called Well 1, Well 3 and Well 4 and is influenced by the following parameters: rainfall, depth to groundwater, temperature and drainage volumes.

In [None]:
luco = pd.read_csv("../input/acea-water-prediction/Aquifer_Luco.csv")

In [None]:
luco.shape

In [None]:
luco.columns

In [None]:
luco.head()

In [None]:
luco.tail()

In [None]:
luco.describe()

<h3> Feature to predict: Depth_to_Groundwater_Podere_Casetta</h3>

In [None]:
luco.loc[luco['Depth_to_Groundwater_Podere_Casetta'].isnull(), 'flag'] = True
luco.loc[luco['Depth_to_Groundwater_Podere_Casetta'].notnull(), 'flag'] = False

In [None]:
ds = luco['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_Podere_Casetta null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = luco['Depth_to_Groundwater_Podere_Casetta'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_Podere_Casetta distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="1d"></a>
<h2 style='background:white; border:0; color:#112'><center>- Petrignano -<center><h2>

Description: The wells field of the alluvial plain between Ospedalicchio di Bastia Umbra and Petrignano is fed by three underground aquifers separated by low permeability septa. The aquifer can be considered a water table groundwater and is also fed by the Chiascio river. The groundwater levels are influenced by the following parameters: rainfall, depth to groundwater, temperatures and drainage volumes, level of the Chiascio river.

In [None]:
petri = pd.read_csv("../input/acea-water-prediction/Aquifer_Petrignano.csv")

In [None]:
petri.shape

In [None]:
petri.columns

In [None]:
petri.head()

In [None]:
petri.tail()

In [None]:
petri.describe()

<h3> Feature to predict: Depth_to_Groundwater_P24, Depth_to_Groundwater_P25</h3>

<h3>- Depth_to_Groundwater_P24 </h3>

In [None]:
petri.loc[petri['Depth_to_Groundwater_P24'].isnull(), 'flag'] = True
petri.loc[petri['Depth_to_Groundwater_P24'].notnull(), 'flag'] = False

In [None]:
ds = petri['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_P24 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = petri['Depth_to_Groundwater_P24'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_P24 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Depth_to_Groundwater_P25 </h3>

In [None]:
petri.loc[petri['Depth_to_Groundwater_P25'].isnull(), 'flag'] = True
petri.loc[petri['Depth_to_Groundwater_P25'].notnull(), 'flag'] = False

In [None]:
ds = petri['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_P25 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = petri['Depth_to_Groundwater_P25'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Depth_to_Groundwater_P25 distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="2"></a>
<h2 style='background:#112; border:0; color:#ccc'><center>Water Type: Lake</center><h2>

<a id="2a"></a>
<h2 style='background:white; border:0; color:#112'><center>- Bilancino -<center><h2>

Description: Bilancino lake is an artificial lake located in the municipality of Barberino di Mugello (about 50 km from Florence). It is used to refill the Arno river during the summer months. Indeed, during the winter months, the lake is filled up and then, during the summer months, the water of the lake is poured into the Arno river.

Each waterbody has its own different features to be predicted. The table below shows the expected feature to forecast for each waterbody.

In [None]:
bilan = pd.read_csv("../input/acea-water-prediction/Lake_Bilancino.csv")

In [None]:
bilan.shape

In [None]:
bilan.columns

In [None]:
bilan.head()

In [None]:
bilan.tail()

In [None]:
bilan.describe()

<h3>Feature to predict: Lake_Level, Flow_Rate</h3>

<h3>- Lake_Level</h3>

In [None]:
bilan['Lake_Level'].isnull().sum()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = bilan['Lake_Level'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Lake_Level distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3>- Flow_Rate </h3>

In [None]:
bilan.loc[bilan['Flow_Rate'].isnull(), 'flag'] = True
bilan.loc[bilan['Flow_Rate'].notnull(), 'flag'] = False

In [None]:
ds = bilan['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Depth_to_Groundwater_LT2 null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = bilan['Flow_Rate'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate distribution ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="3"></a>
<h2 style='background:#112; border:0; color:#ccc'><center>Water Type: River</center><h2>

<a id="3a"></a>
<h2 style='background:white; border:0; color:#112'><center>- Arno -<center><h2>

Description: Arno is the second largest river in peninsular Italy and the main waterway in Tuscany and it has a relatively torrential regime, due to the nature of the surrounding soils (marl and impermeable clays). Arno results to be the main source of water supply of the metropolitan area of Florence-Prato-Pistoia. The availability of water for this waterbody is evaluated by checking the hydrometric level of the river at the section of Nave di Rosano.

In [None]:
arno = pd.read_csv("../input/acea-water-prediction/River_Arno.csv")

In [None]:
arno.shape

In [None]:
arno.columns

In [None]:
arno.head()

In [None]:
arno.tail()

In [None]:
arno.describe()

<h3>Feature to predict: Hydrometry_Nave_di_Rosano</h3>

In [None]:
arno.loc[arno['Hydrometry_Nave_di_Rosano'].isnull(), 'flag'] = True
arno.loc[arno['Hydrometry_Nave_di_Rosano'].notnull(), 'flag'] = False

In [None]:
ds = arno['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Hydrometry_Nave_di_Rosano null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = arno['Hydrometry_Nave_di_Rosano'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Hydrometry_Nave_di_Rosano distributions ', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="4"></a>
<h2 style='background:#112; border:0; color:#ccc'><center>Water Type: Water_Spring</center><h2>

<a id="4a"></a>
<h2 style='background:white; border:0; color:#112'><center>- Amiata -<center><h2>

Description: The Amiata waterbody is composed of a volcanic aquifer not fed by rivers or lakes but fed by meteoric infiltration. This aquifer is accessed through Ermicciolo, Arbure, Bugnano and Galleria Alta water springs. The levels and volumes of the four sources are influenced by the parameters: rainfall, depth to groundwater, hydrometry, temperatures and drainage volumes.

In [None]:
ami = pd.read_csv("../input/acea-water-prediction/Water_Spring_Amiata.csv")

In [None]:
ami.shape

In [None]:
ami.columns

In [None]:
ami.head()

In [None]:
ami.tail()

In [None]:
ami.describe()

<h3>Feature to predict: Flow_Rate_Bugnano, Flow_Rate_Arbure, Flow_Rate_Ermicciolo, Flow_Rate_Galleria_Alta</h3>

<h3> - Flow_Rate_Bugnano </h3>

In [None]:
ami.loc[ami['Flow_Rate_Bugnano'].isnull(), 'flag'] = True
ami.loc[ami['Flow_Rate_Bugnano'].notnull(), 'flag'] = False

In [None]:
ds = ami['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Flow_Rate_Bugnano null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = ami['Flow_Rate_Bugnano'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate_Bugnano distributions', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3> - Flow_Rate_Arbure </h3>

In [None]:
ami.loc[ami['Flow_Rate_Arbure'].isnull(), 'flag'] = True
ami.loc[ami['Flow_Rate_Arbure'].notnull(), 'flag'] = False

In [None]:
ds = ami['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Flow_Rate_Arbure null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = ami['Flow_Rate_Arbure'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate_Arbure distributions', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3> - Flow_Rate_Ermicciolo </h3>

In [None]:
ami.loc[ami['Flow_Rate_Ermicciolo'].isnull(), 'flag'] = True
ami.loc[ami['Flow_Rate_Ermicciolo'].notnull(), 'flag'] = False

In [None]:
ds = ami['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Flow_Rate_Ermicciolo null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = ami['Flow_Rate_Ermicciolo'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate_Ermicciolo distributions', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<h3> - Flow_Rate_Galleria_Alta </h3>

In [None]:
ami.loc[ami['Flow_Rate_Galleria_Alta'].isnull(), 'flag'] = True
ami.loc[ami['Flow_Rate_Galleria_Alta'].notnull(), 'flag'] = False

In [None]:
ds = ami['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Flow_Rate_Galleria_Alta null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = ami['Flow_Rate_Galleria_Alta'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate_Galleria_Alta distributions', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="4b"></a>
<h2 style='background:white; border:0; color:#112'><center>- Lupa -<center><h2>

Description: this water spring is located in the Rosciano Valley, on the left side of the Nera river. The waters emerge at an altitude of about 375 meters above sea level through a long draining tunnel that crosses, in its final section, lithotypes and essentially calcareous rocks. It provides drinking water to the city of Terni and the towns around it.

In [None]:
lupa = pd.read_csv("../input/acea-water-prediction/Water_Spring_Lupa.csv")

In [None]:
lupa.shape

In [None]:
lupa.columns

In [None]:
lupa.head()

In [None]:
lupa.tail()

In [None]:
lupa.describe()

<h3>Feature to predict: Flow_Rate_Lupa </h3>

In [None]:
lupa.loc[lupa['Flow_Rate_Lupa'].isnull(), 'flag'] = True
lupa.loc[lupa['Flow_Rate_Lupa'].notnull(), 'flag'] = False

In [None]:
ds = lupa['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent', 
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Flow_Rate_Lupa null values', 
    width=700,
    height=500 
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = lupa['Flow_Rate_Lupa'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate_Lupa distributions', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()

<a id="4c"></a>
<h2 style='background:white; border:0; color:#112'><center>- Madonna_di_Canneto -<center><h2>

Description: The Madonna di Canneto spring is situated at an altitude of 1010m above sea level in the Canneto valley. It does not consist of an aquifer and its source is supplied by the water catchment area of the river Melfa.

In [None]:
madonna = pd.read_csv("../input/acea-water-prediction/Water_Spring_Madonna_di_Canneto.csv")

In [None]:
madonna.shape

In [None]:
madonna.columns

In [None]:
madonna.head()

In [None]:
madonna.tail()

In [None]:
madonna.describe()

<h3> Feature to predict: Flow_Rate_Madonna_di_Canneto </h3>

In [None]:
madonna.loc[madonna['Flow_Rate_Madonna_di_Canneto'].isnull(), 'flag'] = True
madonna.loc[madonna['Flow_Rate_Madonna_di_Canneto'].notnull(), 'flag'] = False

In [None]:
ds = madonna['flag'].value_counts().reset_index()
ds.columns = [
    'isNull', 
    'percent'
]

ds['percent'] /= len(auser)

fig = px.pie(
    ds, 
    names='isNull', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Flow_Rate_Madonna_di_Canneto null values', 
    width=700,
    height=500,
)

fig.show()

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15, 6))
data_q1 = madonna['Flow_Rate_Madonna_di_Canneto'].value_counts().sort_index()
ax.bar(data_q1.index, data_q1, 
       color='#6ca1ba',
       linewidth=0.7)

for s in ['top', 'left', 'right']:
    ax.spines[s].set_visible(False)

   
ax.set_xticklabels(data_q1.index, fontfamily='serif')
ax.set_yticklabels(np.arange(0, 501, 100),fontfamily='serif')
fig.text(0.1, 0.95, 'Flow_Rate_Madonna_di_Canneto distributions', fontsize=15, fontweight='bold', fontfamily='serif')    
ax.grid(axis='y', linestyle='-', alpha=0.4)    
plt.show()