This project was inspired by a factoid I heard, that the high tide in the Thames River in London is the result of two seaparate high tides; the swell that travels north, around Scotland, continues down the North sea, reaching London just as the 'next' high tide swell reaches London from the South. I wanted to see the validity and visusalise this effect.

This notebook scrapes high/low tide height and time data from a number of coastal points around the UK and Ireland - data was scraped from [TideTimes](https://www.tidetimes.org.uk/uk-tides).

Data geoprocessing, interpolation, and plotting are performed in separate notebooks.

First job here is to parse the "All locations" page to generate a list of all available location data"

In [88]:
import numpy as np
import pandas as pd
import requests
from datetime import datetime

In [2]:
from bs4 import BeautifulSoup

In [5]:
url = 'https://www.tidetimes.org.uk/uk-tides'
page = requests.get(url)

print()

<!DOCTYPE html>
<html lang="en">
<head>
	<!-- Google Tag Manager -->
	<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
	new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
	j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
	'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
	})(window,document,'script','dataLayer','GTM-TSHVD73F');</script>
	<!-- End Google Tag Manager -->

	<title>All Tide Locations | TideTimes</title>

	<meta name="description" content="A list of all the locations covered by TideTimes tidal predictions."/>
	<meta name="keywords"    content="tide,tiems,all,uk,locations,places,ports"/>
	<meta name="robots"      content="index,follow"/>

	<meta name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0"/>
	<meta charset="utf-8"/>

	<link rel="stylesheet"    type="text/css"     href="/includes/tide-times.css?tidetimes=2022-01-03.01"/> 
	<link rel="stylesheet"  

In [7]:
soup = BeautifulSoup(page.content, 'lxml')

In [10]:
locations_results = soup.find(id='locations')

In [19]:
locations = locations_results.find_all('li')

In [28]:
for i in range(len(locations)):
    print(locations[i])

<li id=".aberdaron"><a href="/aberdaron-tide-times" title="Aberdaron Tide Times">Aberdaron</a></li>
<li id=".aberdeen"><a href="/aberdeen-tide-times" title="Aberdeen Tide Times">Aberdeen</a></li>
<li id=".aberdovey"><a href="/aberdovey-tide-times" title="Aberdovey Tide Times">Aberdovey</a></li>
<li id=".aberporth"><a href="/aberporth-tide-times" title="Aberporth Tide Times">Aberporth</a></li>
<li id=".aberystwyth"><a href="/aberystwyth-tide-times" title="Aberystwyth Tide Times">Aberystwyth</a></li>
<li id=".albert-bridge"><a href="/albert-bridge-tide-times" title="Albert Bridge Tide Times">Albert Bridge</a></li>
<li id=".aldeburgh"><a href="/aldeburgh-tide-times" title="Aldeburgh Tide Times">Aldeburgh</a></li>
<li id=".allington-lock"><a href="/allington-lock-tide-times" title="Allington Lock Tide Times">Allington Lock</a></li>
<li id=".alloa"><a href="/alloa-tide-times" title="Alloa Tide Times">Alloa</a></li>
<li id=".amble"><a href="/amble-tide-times" title="Amble Tide Times">Amble</

In [29]:
type(locations)

bs4.element.ResultSet

In [30]:
type(locations[0])

bs4.element.Tag

In [46]:
locations_list = []

for i in range(len(locations)):
    locations_list.append(locations[i].text)

In [47]:
locations_list

['Aberdaron',
 'Aberdeen',
 'Aberdovey',
 'Aberporth',
 'Aberystwyth',
 'Albert Bridge',
 'Aldeburgh',
 'Allington Lock',
 'Alloa',
 'Amble',
 'Amlwch',
 'Annan Waterfoot',
 'Anstruther Easter',
 'Applecross',
 'Appledore',
 'Arbroath',
 'Ardchattan Point',
 'Ardglass',
 'Ardnave Point',
 'Ardrossan',
 'Arklow',
 'Arnside',
 'Arrochar',
 'Ayr',
 'Badcall Bay',
 'Baginbun Head',
 'Balbriggan',
 'Balivanich',
 'Ballinskelligs Bay Castle',
 'Ballycastle Bay',
 'Ballycotton',
 'Ballycrovane Harbour',
 'Ballysadare Bay (Culleenamore)',
 'Baltasound Pier',
 'Baltimore, Ireland',
 'Banff',
 'Bangor',
 'Bantry',
 'Barcaldine Pier',
 'Bardsey Island',
 'Barmouth',
 'Barnstaple',
 'Barra Head',
 'Barra (North Bay)',
 'Barrow (Ramsden Dock)',
 'Barry',
 'Bartlett Creek',
 'Battlesbridge',
 'Bawdsey',
 'Bay Of Laig',
 'Bay Of Quendale',
 'Bays Loch',
 'Beachley (Aust)',
 'Beaumaris',
 'Bee Ness',
 'Belfast',
 'Bembridge Approaches',
 'Bembridge Harbour',
 'Berkeley',
 'Berwick',
 'Bideford',
 'Bla

In [54]:
locations[0].find('a')['href']

'/aberdaron-tide-times'

In [56]:
location_links_list = []

for i in range(len(locations)):
    location_links_list.append(locations[i].find('a')['href'])

In [57]:
location_links_list

['/aberdaron-tide-times',
 '/aberdeen-tide-times',
 '/aberdovey-tide-times',
 '/aberporth-tide-times',
 '/aberystwyth-tide-times',
 '/albert-bridge-tide-times',
 '/aldeburgh-tide-times',
 '/allington-lock-tide-times',
 '/alloa-tide-times',
 '/amble-tide-times',
 '/amlwch-tide-times',
 '/annan-waterfoot-tide-times',
 '/anstruther-easter-tide-times',
 '/applecross-tide-times',
 '/appledore-tide-times',
 '/arbroath-tide-times',
 '/ardchattan-point-tide-times',
 '/ardglass-tide-times',
 '/ardnave-point-tide-times',
 '/ardrossan-tide-times',
 '/arklow-tide-times',
 '/arnside-tide-times',
 '/arrochar-tide-times',
 '/ayr-tide-times',
 '/badcall-bay-tide-times',
 '/baginbun-head-tide-times',
 '/balbriggan-tide-times',
 '/balivanich-tide-times',
 '/ballinskelligs-bay-castle-tide-times',
 '/ballycastle-bay-tide-times',
 '/ballycotton-tide-times',
 '/ballycrovane-harbour-tide-times',
 '/ballysadare-bay-culleenamore-tide-times',
 '/baltasound-pier-tide-times',
 '/baltimore-ireland-tide-times',
 '/

Now that we have the names and href links, we need to scrape each page for High/Low, time, and height. 

We will test on the first page: https://www.tidetimes.org.uk/aberdaron-tide-times-20250101

URL dat formatted year month day YYYYMMDD (in excellent, non-American form)

Route for time data
- Update URL
- requests get url for page
- soup parse page
- find all tides, then table rows of class vis2
- each row
    - .find('td', class_='tac') - time
    - .find('td', class_='tar') - height - maybe [-1:] to ignore 'm' unit
    - .find('td', class_='tal') - High/Low

In [64]:
time_url = 'https://www.tidetimes.org.uk/aberdaron-tide-times-20250101'

time_page = requests.get(time_url)

time_soup = BeautifulSoup.(time_page, 'lxml')

In [77]:
time_soup

<!DOCTYPE html>
<html lang="en">
<head>
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
	new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
	j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
	'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
	})(window,document,'script','dataLayer','GTM-TSHVD73F');</script>
<!-- End Google Tag Manager -->
<title>Aberdaron Tide Times  for 1st January 2025 | Tide Times</title>
<meta content="Aberdaron Tide Times - free 7 day tide predictions for Aberdaron with historic high and low tides, sunrise, sunset and phases of the moon" name="description"/>
<meta content="aberdaron tide times,aberdaron tide tables,aberdaron high tide,aberdaron low tide,aberdaron tides" name="keywords"/>
<meta content="index,follow" name="robots"/>
<meta content="width=device-width,initial-scale=1.0,maximum-scale=1.0" name="viewport"/>
<meta charset="utf-8"/>


In [86]:
time_soup.find_all(id='tides')[0].find_all('tr', class_='vis2')

[<tr class="vis2">
 <td class="tal">Low</td>
 <td class="tac"><span>03:36</span></td>
 <td class="tar">0.72m</td>
 </tr>,
 <tr class="vis2">
 <td class="tal">High</td>
 <td class="tac"><span>08:49</span></td>
 <td class="tar">4.35m</td>
 </tr>,
 <tr class="vis2">
 <td class="tal">Low</td>
 <td class="tac"><span>15:59</span></td>
 <td class="tar">0.55m</td>
 </tr>,
 <tr class="vis2">
 <td class="tal">High</td>
 <td class="tac"><span>21:06</span></td>
 <td class="tar">4.12m</td>
 </tr>]

In [132]:
time_table = time_soup.find_all(id='tides')[0].find_all('tr', class_='vis2')

time_table[0]

<tr class="vis2">
<td class="tal">Low</td>
<td class="tac"><span>03:36</span></td>
<td class="tar">0.72m</td>
</tr>

In [135]:
len(time_table)

4

In [133]:
time_table[0].find('td', class_='tac').text

'03:36'

What is my dataframe hierarchy? Is it Datetime - Port - High/Low - Height?

Or Port - DateTime - Height - High Low

I think the latter 
    - for each port 
    - go through all the dates
    - each date, determine number of min-max
    - add times, heights, and high/low to dataframe
    
Given the frequency of tidal swings - let's just get the data for March 2024 - this period recorded the largest tidal swings in the UK for a while, so would be the best visualisation

Let's write some functions to get the data, given a url extension for port and date

In [140]:
dates = pd.date_range(start='3/1/2024', end='3/31/2024')
dates

DatetimeIndex(['2024-03-01', '2024-03-02', '2024-03-03', '2024-03-04',
               '2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08',
               '2024-03-09', '2024-03-10', '2024-03-11', '2024-03-12',
               '2024-03-13', '2024-03-14', '2024-03-15', '2024-03-16',
               '2024-03-17', '2024-03-18', '2024-03-19', '2024-03-20',
               '2024-03-21', '2024-03-22', '2024-03-23', '2024-03-24',
               '2024-03-25', '2024-03-26', '2024-03-27', '2024-03-28',
               '2024-03-29', '2024-03-30', '2024-03-31'],
              dtype='datetime64[ns]', freq='D')

In [141]:
dates[0].strftime('%Y%m%d')

'20240301'

In [142]:
url = 'https://www.tidetimes.org.uk'
port = '/aberdaron-tide-times'

# 'https://www.tidetimes.org.uk/aberdaron-tide-times-20250101'

for date in dates:
    print(url+port+'-'+date.strftime('%Y%m%d'))

https://www.tidetimes.org.uk/aberdaron-tide-times-20240301
https://www.tidetimes.org.uk/aberdaron-tide-times-20240302
https://www.tidetimes.org.uk/aberdaron-tide-times-20240303
https://www.tidetimes.org.uk/aberdaron-tide-times-20240304
https://www.tidetimes.org.uk/aberdaron-tide-times-20240305
https://www.tidetimes.org.uk/aberdaron-tide-times-20240306
https://www.tidetimes.org.uk/aberdaron-tide-times-20240307
https://www.tidetimes.org.uk/aberdaron-tide-times-20240308
https://www.tidetimes.org.uk/aberdaron-tide-times-20240309
https://www.tidetimes.org.uk/aberdaron-tide-times-20240310
https://www.tidetimes.org.uk/aberdaron-tide-times-20240311
https://www.tidetimes.org.uk/aberdaron-tide-times-20240312
https://www.tidetimes.org.uk/aberdaron-tide-times-20240313
https://www.tidetimes.org.uk/aberdaron-tide-times-20240314
https://www.tidetimes.org.uk/aberdaron-tide-times-20240315
https://www.tidetimes.org.uk/aberdaron-tide-times-20240316
https://www.tidetimes.org.uk/aberdaron-tide-times-202403

In [152]:
def return_date_URL(dt,
                    url = 'https://www.tidetimes.org.uk',
                    port_href = '/aberdaron-tide-times'):
    
    return url+port_href+'-'+dt.strftime('%Y%m%d')

In [154]:
return_date_URL(dates[0])

'https://www.tidetimes.org.uk/aberdaron-tide-times-20240301'

In [171]:
def get_tide_data(port_name, port_href, date):
    '''Given port, url and date and dataframe, parse web page 
    and return scraped data in a dataframe'''
    
    date_port_url = return_date_URL(date, port_href = port_href)
#     print(date_port_url)
    
    page = requests.get(date_port_url)
#     print(page.text)
    
    soup = BeautifulSoup(page.content, 'lxml')
    
    table = soup.find_all(id='tides')[0].find_all('tr', class_='vis2')
    
    rows_list = []
    
    for row in table:
        row_dict = {'Port': port_name, 
                    'Date': date, 
                    'Time': row.find('td', class_='tac').text,
                    'Height': row.find('td', class_='tar').text[:-1],
                    'HiLo': row.find('td', class_='tal').text}
        
        rows_list.append(row_dict)
        
    
    return pd.DataFrame(rows_list)

In [172]:
get_tide_data('Aberdaron', '/aberdaron-tide-times', dates[0]) # test for first port and date

Unnamed: 0,Port,Date,Time,Height,HiLo
0,Aberdaron,2024-03-01,05:43,1.04,Low
1,Aberdaron,2024-03-01,11:24,3.97,High
2,Aberdaron,2024-03-01,18:07,1.23,Low
3,Aberdaron,2024-03-01,23:48,3.71,High


Process flow for time data
- Update URL
- requests get url for page
- soup parse page
- find all tides, then table rows of class vis2
- each row
    - .find('td', class_='tac') - time
    - .find('td', class_='tar') - height - maybe [-1:] to ignore 'm' unit
    - .find('td', class_='tal') - High/Low

In [180]:
aberdaron_df = pd.DataFrame()

for date in dates:
    
    aberdaron_df = aberdaron_df.append(get_tide_data('Aberdaron', 
                                      '/aberdaron-tide-times', 
                                      date))
    
aberdaron_df.reset_index()

Unnamed: 0,index,Port,Date,Time,Height,HiLo
0,0,Aberdaron,2024-03-01,05:43,1.04,Low
1,1,Aberdaron,2024-03-01,11:24,3.97,High
2,2,Aberdaron,2024-03-01,18:07,1.23,Low
3,3,Aberdaron,2024-03-01,23:48,3.71,High
4,0,Aberdaron,2024-03-02,06:16,1.31,Low
...,...,...,...,...,...,...
114,2,Aberdaron,2024-03-30,17:39,1.15,Low
115,3,Aberdaron,2024-03-30,23:17,3.94,High
116,0,Aberdaron,2024-03-31,06:57,1.20,Low
117,1,Aberdaron,2024-03-31,12:34,3.81,High


In [182]:
df_tides = pd.DataFrame()

for port_name, port_href in zip(locations_list, location_links_list):
    
    for date in dates:
        
        df_tides = df_tides.append(get_tide_data(port_name, port_href, date))
        
df_tides.head(100)

Unnamed: 0,Port,Date,Time,Height,HiLo
0,Aberdaron,2024-03-01,05:43,1.04,Low
1,Aberdaron,2024-03-01,11:24,3.97,High
2,Aberdaron,2024-03-01,18:07,1.23,Low
3,Aberdaron,2024-03-01,23:48,3.71,High
0,Aberdaron,2024-03-02,06:16,1.31,Low
...,...,...,...,...,...
3,Aberdaron,2024-03-25,20:49,4.17,High
0,Aberdaron,2024-03-26,03:21,0.51,Low
1,Aberdaron,2024-03-26,08:55,4.39,High
2,Aberdaron,2024-03-26,15:45,0.37,Low


In [183]:
df_tides.describe()

  """Entry point for launching an IPython kernel.


Unnamed: 0,Port,Date,Time,Height,HiLo
count,79603,79603,79603,79603.0,79603
unique,705,31,1440,1414.0,2
top,Warsash,2024-03-01 00:00:00,19:24,0.92,High
freq,180,2642,88,277.0,42610
first,,2024-03-01 00:00:00,,,
last,,2024-03-31 00:00:00,,,


In [185]:
df_tides.reset_index().to_csv('UK_tide_data_march2024.csv') # save scraped data externally