# Analyzing River Thames Water Levels
Time series data is everywhere, from watching your stock portfolio to monitoring climate change, and even live-tracking as local cases of a virus become a global pandemic. In this project, you’ll work with a time series that tracks the tide levels of the Thames River. You’ll first load the data and inspect it data visually, and then perform calculations on the dataset to generate some summary statistics. You’ll end by reducing the time series to its component attributes and analyzing them. 

The original dataset is available from the British Oceanographic Data Center [here](https://www.bodc.ac.uk/data/published_data_library/catalogue/10.5285/b66afb2c-cd53-7de9-e053-6c86abc0d251) and you can read all about this fascinating archival story in [this article](https://www.nature.com/articles/s41597-022-01223-7) from the Nature journal.

Here's a map of the locations of the tidal meters along the River Thames in London.

![](locations.png)


The TXT file contains data for three variables, described in the table below. 

| Variable Name | Description | Format |
| ------------- | ----------- | ------ |
| Date and time | Date and time of measurement to GMT. Note the tide gauge is accurate to one minute. | dd/mm/yyyy hh:mm:ss |
| Water level | High or low water level measured by tide meter. Tide gauges are accurate to 1 centimetre. | metres (Admiralty Chart Datum (CD), Ordnance Datum Newlyn (ODN or Trinity High Water (THW)) | 
| Flag | High water flag = 1, low water flag = 0 | Categorical (0 or 1) |



### Transform table: data type

In [37]:
df["datetime"]= pd.to_datetime(df["datetime"])
df["water_level"]=df["water_level"].astype("float")
df.dtypes

datetime        datetime64[ns]
water_level            float64
is_high_tide             int64
dtype: object

## Filtering the water level of River Thames into 2 categories: high and very high tide

In [39]:
tide_high = df.loc[df["is_high_tide"]==1, "water_level"]
tide_low = df.loc[df["is_high_tide"]==0, "water_level"]

## Calculate summary statistic of each group

In [40]:
high_statistics = tide_high.agg(['mean', 'median', IQR])
low_statistics = tide_low.agg(['mean', 'median', IQR])
high_statistics

mean      3.318373
median    3.352600
IQR       0.743600
Name: water_level, dtype: float64

## Analyze the trend in water circulation of high/ very high days

In [41]:
all_high_days = df[df['is_high_tide'] == 1].groupby('year')['water_level'].count()
very_high_days = df[(df['water_level'] > tide_high.quantile(0.90)) & (df['is_high_tide'] == 1)].groupby('year').count()['water_level']
very_high_ratio = (very_high_days/all_high_days).reset_index()

In [42]:
all_high_days

year
1911    244
1912    557
1913    669
1914    687
1915    666
       ... 
1991    706
1992    707
1993    699
1994    705
1995    705
Name: water_level, Length: 85, dtype: int64

In [43]:
very_high_days

year
1911      1
1912     18
1913     55
1914     38
1915     30
       ... 
1991     68
1992     73
1993    102
1994    106
1995    120
Name: water_level, Length: 85, dtype: int64

In [44]:
very_high_ratio

Unnamed: 0,year,water_level
0,1911,0.004098
1,1912,0.032316
2,1913,0.082212
3,1914,0.055313
4,1915,0.045045
...,...,...
80,1991,0.096317
81,1992,0.103253
82,1993,0.145923
83,1994,0.150355


## Analyze the trend in water circulation of low/ very low days

In [45]:
all_low_days = df[df['is_high_tide'] == 0].groupby('year')['water_level'].count()
very_low_days = df[(df['water_level'] < tide_low.quantile(0.10)) & (df['is_high_tide'] == 0)].groupby('year').count()['water_level']
very_low_ratio = (very_low_days/all_low_days).reset_index()

solution = {'high_statistics': high_statistics, 'low_statistics': low_statistics, 'very_high_ratio': very_high_ratio, 'very_low_ratio':very_low_ratio}
print(solution)

{'high_statistics': mean      3.318373
median    3.352600
IQR       0.743600
Name: water_level, dtype: float64, 'low_statistics': mean     -2.383737
median   -2.412900
IQR       0.538200
Name: water_level, dtype: float64, 'very_high_ratio':     year  water_level
0   1911     0.004098
1   1912     0.032316
2   1913     0.082212
3   1914     0.055313
4   1915     0.045045
..   ...          ...
80  1991     0.096317
81  1992     0.103253
82  1993     0.145923
83  1994     0.150355
84  1995     0.170213

[85 rows x 2 columns], 'very_low_ratio':     year  water_level
0   1911     0.060606
1   1912     0.066667
2   1913     0.022388
3   1914     0.039017
4   1915     0.033435
..   ...          ...
80  1991     0.150355
81  1992     0.107496
82  1993     0.112696
83  1994     0.106383
84  1995     0.107801

[85 rows x 2 columns]}
