# Venice Astronomical Tide Data Cleaning
In this notebook, we'll be consolidate data on stronomical tides from different meteorological stations in Venice into a single tidy dataset. This notebook will explore data from the following sources:
- Data on Atronomical Tide Levels 2022 and 2023
- Data from the following stations in venice:
    - Stazione Burano
    - Stazione Diga Sud Lido
    - Stazione Misericordia
    - Stazione Palazzo Cavalli
    - Stazione Piattaforma PG
    - Stazione Piattaforma
    - Stazione Punta Salute (Canal Grande)
    - Stazione San Giorgio

Additional project descriptions and in-depth details on the data can be found on the `README` of this repository.


In [2]:
import warnings
import datetime
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
warnings.filterwarnings('once')
%matplotlib inline

## Importing Datasets

### Astronomical Tides Data

#### Astronomical Tides 2022

In [36]:
# Import astronomical tides data 2022
astro_tide_2022 = pd.read_json('astronomical_tide_2022.json')

astro_tide_2022.rename(columns={'data':'date','valore':'value'}, inplace=True) #Translate column names from Italian
astro_tide_2022['date'] = pd.to_datetime(astro_tide_2022.date) #Convert the date column into a datetime object

print(astro_tide_2022.head())
print(astro_tide_2022.tail())

                 date  value
0 2022-01-01 00:00:00     45
1 2022-01-01 00:05:00     44
2 2022-01-01 00:10:00     43
3 2022-01-01 00:15:00     42
4 2022-01-01 00:20:00     42
                      date  value
105115 2022-12-31 23:35:00     18
105116 2022-12-31 23:40:00     19
105117 2022-12-31 23:45:00     19
105118 2022-12-31 23:50:00     19
105119 2022-12-31 23:55:00     19


In [37]:
# Explore the astronomical tides data
astro_tide_2022.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105120 entries, 0 to 105119
Data columns (total 2 columns):
 #   Column  Non-Null Count   Dtype         
---  ------  --------------   -----         
 0   date    105120 non-null  datetime64[ns]
 1   value   105120 non-null  int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 1.6 MB


In [39]:
# Convert the date column into a date object then explore the data summary
astro_tide_2022.describe()

Unnamed: 0,value
count,105120.0
mean,32.222108
std,24.179518
min,-34.0
25%,15.0
50%,34.0
75%,50.0
max,87.0


The dataset contains **105,120 observations** across two columns, one with the **date** and the other with the **value** (astronomical tide value). No missing values in the dataset. The data is focused on 2022 with dates ranging from Jan 1st to Dec 31, 2022. The astro tides values across the year range from -34 to 87 with an avergae of 32.222.

#### Astronomical Tides 2023

In [41]:
# Import astronomical tides data 2023
astro_tide_2023 = pd.read_json('astronomical_tide_2023.json')

astro_tide_2023.rename(columns={'data':'date','valore':'value'}, inplace=True) #Translate column names from Italian
astro_tide_2023['date'] = pd.to_datetime(astro_tide_2023.date) #Convert the date column into a datetime object

print(astro_tide_2023.head())
print(astro_tide_2023.tail())

                 date  value
0 2023-01-01 00:00:00     21
1 2023-01-01 00:05:00     22
2 2023-01-01 00:10:00     22
3 2023-01-01 00:15:00     22
4 2023-01-01 00:20:00     23
                      date  value
105115 2023-12-31 23:35:00     38
105116 2023-12-31 23:40:00     39
105117 2023-12-31 23:45:00     40
105118 2023-12-31 23:50:00     41
105119 2023-12-31 23:55:00     42


In [43]:
astro_tide_2023.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105120 entries, 0 to 105119
Data columns (total 2 columns):
 #   Column  Non-Null Count   Dtype         
---  ------  --------------   -----         
 0   date    105120 non-null  datetime64[ns]
 1   value   105120 non-null  int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 1.6 MB


In [44]:
astro_tide_2023.describe()

Unnamed: 0,value
count,105120.0
mean,32.662072
std,24.39179
min,-33.0
25%,15.0
50%,34.0
75%,51.0
max,86.0


The dataset is very similar in structure to `astro_tide_2022`. The dataset also contains **105,120 observations** across two columns, one with the **date** and the other with the **value** (astronomical tide value). No missing values in the dataset. The data is focused on 2023 with dates ranging from Jan 1st to Dec 31, 2023. The astro tides values across the year range from -33 to 86 with an avergae of 32.662.

#### Data Consolidation
We'll now combine the two datasets on astronomical tides, `astro_tides_2022` and `astro_tides_2023` to create one consolidated dataset with dates ranging from January 1st 2022 to December 31st, 2023.

In [47]:
# Concatenate the data to remain with the two columns but consolidated rows
astro_tide = pd.concat([astro_tide_2022,astro_tide_2023], ignore_index=True)
print(astro_tide.head())
print(astro_tide.tail())

                 date  value
0 2022-01-01 00:00:00     45
1 2022-01-01 00:05:00     44
2 2022-01-01 00:10:00     43
3 2022-01-01 00:15:00     42
4 2022-01-01 00:20:00     42
                      date  value
210235 2023-12-31 23:35:00     38
210236 2023-12-31 23:40:00     39
210237 2023-12-31 23:45:00     40
210238 2023-12-31 23:50:00     41
210239 2023-12-31 23:55:00     42


In [48]:
astro_tide.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 210240 entries, 0 to 210239
Data columns (total 2 columns):
 #   Column  Non-Null Count   Dtype         
---  ------  --------------   -----         
 0   date    210240 non-null  datetime64[ns]
 1   value   210240 non-null  int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 3.2 MB


In [49]:
astro_tide.describe()

Unnamed: 0,value
count,210240.0
mean,32.44209
std,24.286824
min,-34.0
25%,15.0
50%,34.0
75%,51.0
max,87.0


### Data from Stations
#### Stazione Burano

In [9]:
burano_station = pd.read_csv('Stazione_Burano.csv', delimiter=';')
burano_station.head()

Unnamed: 0,Data,Burano Liv.Idrom. medio 1m
0,2022-11-10 00:00:00,0.66
1,2022-11-10 00:05:00,0.67
2,2022-11-10 00:10:00,0.67
3,2022-11-10 00:15:00,0.67
4,2022-11-10 00:20:00,0.67


In [10]:
burano_station.describe()

Unnamed: 0,Burano Liv.Idrom. medio 1m
count,26496.0
mean,0.375182
std,0.261048
min,-0.58
25%,0.2
50%,0.39
75%,0.56
max,0.97


In [11]:
burano_station.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26496 entries, 0 to 26495
Data columns (total 2 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Data                        26496 non-null  object 
 1   Burano Liv.Idrom. medio 1m  26496 non-null  float64
dtypes: float64(1), object(1)
memory usage: 414.1+ KB


#### Stazione Diga Sud Lido

In [12]:
diga_station = pd.read_csv('Stazione_DigaSudLido.csv', delimiter=';')
diga_station.head()

Unnamed: 0,Data,D.S. Lido Liv.Idrom. medio 1m,Faro Diga Lido V.Vento med.10m,Faro Diga Lido V.Vento max,Faro Diga Lido D.Vento med. 10m
0,2022-11-10 00:00:00,0.65,3.8,4.5,23
1,2022-11-10 00:05:00,0.64,3.9,4.5,31
2,2022-11-10 00:10:00,0.64,3.6,4.3,33
3,2022-11-10 00:15:00,0.63,3.6,4.5,34
4,2022-11-10 00:20:00,0.62,3.7,4.3,38


#### Stazione Misericordia

In [13]:
misericordia_station = pd.read_csv('Stazione_Misericordia.csv', delimiter=';')
misericordia_station.head()

Unnamed: 0,Data,Misericordia Liv.Idrom. medio 1m,Misericordia D.Vento med. 10m,Misericordia V.Vento med.10m,Misericordia V.Vento max,Misericordia Onda: Alt. sign.,Misericordia Onda: Alt. max
0,2022-11-10 00:00:00,0.71,12,3.0,3.8,0.14,0.2
1,2022-11-10 00:05:00,0.71,10,3.1,3.8,0.12,0.22
2,2022-11-10 00:10:00,0.71,13,3.2,3.9,0.1,0.18
3,2022-11-10 00:15:00,0.71,11,3.3,4.0,0.1,0.16
4,2022-11-10 00:20:00,0.71,29,3.3,3.8,0.1,0.18


#### Stazione Palazzo Cavalli

In [14]:
palazzo_station = pd.read_csv('Stazione_PalazzoCavalli.csv', delimiter=';')
palazzo_station.head()

Unnamed: 0,Data,Palazzo Cavalli Pressione,Palazzo Cavalli Temperatura,Palazzo Cavalli Umidita',Palazzo Cavalli Radiazione solare,Palazzo Cavalli Pioggia 5m
0,2022-11-10 00:00:00,1018.9,14.3,77,0.0,0.0
1,2022-11-10 00:05:00,1018.9,14.1,78,0.0,0.0
2,2022-11-10 00:10:00,1018.8,14.2,78,0.0,0.0
3,2022-11-10 00:15:00,1018.8,14.2,78,0.0,0.0
4,2022-11-10 00:20:00,1018.7,14.2,78,0.0,0.0


#### Stazione Piattaforma PG

In [16]:
pg_piattaforma_station = pd.read_csv('Stazione_Piattaforma_pg.csv', delimiter=';')
pg_piattaforma_station.head()

Unnamed: 0,Data,Piattaforma CNR Pioggia 5m
0,2022-11-10 00:00:00,0.0
1,2022-11-10 01:00:00,0.0
2,2022-11-10 02:00:00,0.0
3,2022-11-10 03:00:00,0.0
4,2022-11-10 04:00:00,0.0


#### Stazione Piattaforma

In [17]:
piattaforma_station = pd.read_csv('Stazione_Piattaforma.csv', delimiter=';')
piattaforma_station.head()

Unnamed: 0,Data,Piattaforma CNR Liv.Idrom. medio 1m,Piattaforma CNR D.Vento med. 10m,Piattaforma CNR V.Vento med.10m,Piattaforma CNR V.Vento max,Piattaforma CNR Onda: Alt. sign.,Piattaforma CNR Onda: Alt. max,Piattaforma CNR Pressione,Piattaforma CNR Temperatura,Piattaforma CNR Temp. Acqua,Piattaforma CNR Umidita',Piattaforma CNR Radiazione solare,Piattaforma CNR Pioggia 5m
0,2022-11-10 00:00:00,0.64,31.0,6.0,7.2,0.28,0.51,1018.5,15.6,18.5,75.0,,0.0
1,2022-11-10 00:05:00,0.63,34.0,6.3,7.0,0.28,0.51,1018.5,15.5,18.5,75.0,,0.0
2,2022-11-10 00:10:00,0.63,32.0,6.6,7.6,0.28,0.51,1018.4,15.5,18.7,74.0,,0.0
3,2022-11-10 00:15:00,0.62,36.0,6.8,7.1,0.26,0.47,1018.3,15.4,18.4,74.0,,0.0
4,2022-11-10 00:20:00,0.61,35.0,6.8,7.8,0.26,0.47,1018.4,15.4,18.6,76.0,,0.0


#### Stazione Punta Salute (Canal Grande)

In [18]:
punta_station = pd.read_csv('Stazione_PuntaSalute_CanalGrande.csv', delimiter=';')
punta_station.head()

Unnamed: 0,Data,P. Salute Liv.Idrom. medio 1m,P. Salute Temp. Acqua
0,2022-08-10 00:00:00,0.32,26.9
1,2022-08-10 00:05:00,0.3,27.8
2,2022-08-10 00:10:00,0.27,27.9
3,2022-08-10 00:15:00,0.26,26.9
4,2022-08-10 00:20:00,0.24,27.8


In [19]:
sangiorgio_station = pd.read_csv('Stazione_SanGiorgio.csv', delimiter=';')
sangiorgio_station.head()

Unnamed: 0,Data,San Giorgio D.Vento med. 10m,San Giorgio V.Vento med.10m,San Giorgio V.Vento max,San Giorgio Temperatura,San Giorgio Umidita',San Giorgio Radiazione solare
0,2022-11-10 00:00:00,18.0,4.0,4.9,14.0,79.0,0.0
1,2022-11-10 00:05:00,18.0,4.3,4.9,14.0,79.0,0.0
2,2022-11-10 00:10:00,22.0,4.5,5.2,14.0,79.0,0.0
3,2022-11-10 00:15:00,25.0,4.3,5.4,14.0,79.0,0.0
4,2022-11-10 00:20:00,23.0,4.6,5.2,13.9,79.0,0.0
