# Getting structured tides from hydrographic data

<hr>

## Requirement

Starting with hydrographic html for a given location <sub>see [OneNote 08.2024 tide times]</sub>, build structured tide data for a set of dates
# Getting structured tides from hydrographic data

<hr>

## Requirement

Starting with hydrographic html for a given location <sub>see [OneNote 08.2024 tide times]</sub>, build structured tide data for a set of dates.  
Against <code>TIME_TIME_DATA_FILE</code> below, enter the data file for the 7 day tides window.



In [77]:
TIDE_TIME_DATA_FILE = "./tides_002_end_Sep2024.txt"



## Entities

<hr>

### Entity: TideForDay

| property | value |
|:-------------|----|
| Definition | The collection of TideMarks for a given calendar day |
| Key | t_date |
| Cardinalities | 1-M TideMark |

#### Data attributes

| Name | aka | data type | Definition |
|:-------------|----|-------|:-------------|
| t_date | Tide Date | int | UID of the tide for a given day. Strictly an alternate key , but no need for surrogates here. Not \"date\" due to keyword conflict. This is the date, but does not include the month and year. That is added during the output to csv stage |
| tidal_range | Tidal Range | float | The distance between the minimum low water mark, and the maximum high water mark, for that day |

<p>
<hr>
<p>

### Entity: TideMark

| property | value |
|:-------------|----|
| Definition | One of the set of tidemarks for a given day. A tidemark is the data related to the sea reaching a high- or low-water mark.  |
| Key | t_date, t_seq |
| Cardinalities | 1-1 with TideForDay |



#### Data attributes

 
| Name | aka | data type | Definition |
|:-------------|----|-------|:-------------|
| t_date | Tide Date | int | The date of the tide, but not the month or year. See TideMark.t_date|
| t_seq | Tide Sequence | int | When does this tidemark happen relative to other tidemarks in the day. If it is the first after midnight, its sequence is 1 |
| t_type | Tide Type | string | Is this is a high tide, or a low tide. Must be one of 'High' or 'Low' |
| t_height | Tide Height | float | tide height in metres, to 1 decimal place |
| t_time | Tide Time | datetime | The time when the high/low water mark occurs |



<hr>

In [78]:
# pip install regex
import re
from pprint import pprint as pp
from dataclasses import dataclass
from datetime import datetime

@dataclass
class TideMark:
    t_date: int
    t_seq: int
    t_type: str # High or Low
    t_height: float
    t_time: datetime

@dataclass
class TideForDay:
    t_date: int
    tidal_range: float 


tide_marks = []
tides = []

with open(TIDE_TIME_DATA_FILE,"r") as infile:
    lines = infile.readlines()



In [79]:
# Data example:
# Sun 04 AugNew moon on this day
# High Water of 3.7 metres, at 19:59.

# Remove any blank lines, then I only want lines that start with the values in ok_search_terms
# Where a value is Sun, Mon, etc, subsequent records refer to the low or high-water times for
# that day... until you hit a new day. Etc.
ok_search_terms = ("Low","High","Fri","Sat","Sun","Mon","Tue","Wed","Thur")
#remove non-printing chars - typically tab etc
cleaned_lines = [line.strip() for line in lines]
high_lows = [line for line in cleaned_lines if line.startswith(ok_search_terms)]
print(high_lows)

['Low Water\tHigh Water', 'Sat 28 Sep', 'High Water of 2.7 metres, at 04:24.', 'Low Water of 1.4 metres, at 09:48.', 'High Water of 3.0 metres, at 16:40.', 'Low Water of 1.1 metres, at 22:43.\t-', 'Sun 29 Sep', 'High Water of 3.1 metres, at 05:31.', 'Low Water of 1.1 metres, at 10:56.', 'High Water of 3.4 metres, at 17:40.', 'Low Water of 0.8 metres, at 23:35.\t-', 'Mon 30 Sep', 'High Water of 3.4 metres, at 06:14.', 'Low Water of 0.9 metres, at 11:48.', 'High Water of 3.7 metres, at 18:24.\t-\t-']


In [80]:
for line in high_lows:
    words = line.split()

    # If the sentence starts with Sun, Mon, etc, then subsequent records are
    # high/low water times. Keep words 1 (base 0)
    # (date, but not day). Month and year will be entered downstream
    if words[0] in ("Sun","Mon","Tues","Weds","Thurs","Fri","Sat"):
        curr_date = int(words[1])
        curr_seq = 0
        # new record required
        
        continue
    # After previous block, 7 or more words means a High or Low water sentence. 
    # Words 0 (base zero), 3 and 6 are respectively High/Low, tide-height, and 
    # time. Other words and characters are discarded
    if len(words) >= 7:
       
        curr_type = words[0]
        curr_time = words[6].rstrip('.')
        curr_height = words[3]
        # dataclass populates here:
        tide_mark = TideMark(curr_date, curr_seq, curr_type, curr_height, curr_time)
        tide_marks.append(tide_mark)
        #print(tide_mark)
        curr_seq += 1
        continue     


pp(tide_marks)


[TideMark(t_date=28, t_seq=0, t_type='High', t_height='2.7', t_time='04:24'),
 TideMark(t_date=28, t_seq=1, t_type='Low', t_height='1.4', t_time='09:48'),
 TideMark(t_date=28, t_seq=2, t_type='High', t_height='3.0', t_time='16:40'),
 TideMark(t_date=28, t_seq=3, t_type='Low', t_height='1.1', t_time='22:43'),
 TideMark(t_date=29, t_seq=0, t_type='High', t_height='3.1', t_time='05:31'),
 TideMark(t_date=29, t_seq=1, t_type='Low', t_height='1.1', t_time='10:56'),
 TideMark(t_date=29, t_seq=2, t_type='High', t_height='3.4', t_time='17:40'),
 TideMark(t_date=29, t_seq=3, t_type='Low', t_height='0.8', t_time='23:35'),
 TideMark(t_date=30, t_seq=0, t_type='High', t_height='3.4', t_time='06:14'),
 TideMark(t_date=30, t_seq=1, t_type='Low', t_height='0.9', t_time='11:48'),
 TideMark(t_date=30, t_seq=2, t_type='High', t_height='3.7', t_time='18:24')]


In [81]:
# Initialize a dictionary to store the highest and lowest heights for each date
heights_by_date = {}

# Get a sorted list of the unique dates for this set of tides. 
# Example: [4, 5, 6, 7]
unique_dates = list(set(int(mark.t_date) for mark in tide_marks))
unique_dates.sort()
pp(unique_dates)
water_marks = {}
for d in unique_dates:
    water_marks[d] = {'Low':100,'High':-100}

pp(tide_marks)

tide_heights = [(int(mark.t_date), mark.t_seq, mark.t_type, float(mark.t_height), mark.t_time) for mark in tide_marks]

pp(tide_heights)

for test_tide in tide_heights:
    test_date = test_tide[0]
    test_type = test_tide[2]
    new_height = test_tide[3]
    
    if  test_type == 'Low':
        if  new_height < water_marks[test_date][test_type]:
            water_marks[test_date][test_type] = new_height
    else: # High
        if  new_height > water_marks[test_date][test_type]:
            water_marks[test_date][test_type] = new_height

pp(water_marks)
    

[28, 29, 30]
[TideMark(t_date=28, t_seq=0, t_type='High', t_height='2.7', t_time='04:24'),
 TideMark(t_date=28, t_seq=1, t_type='Low', t_height='1.4', t_time='09:48'),
 TideMark(t_date=28, t_seq=2, t_type='High', t_height='3.0', t_time='16:40'),
 TideMark(t_date=28, t_seq=3, t_type='Low', t_height='1.1', t_time='22:43'),
 TideMark(t_date=29, t_seq=0, t_type='High', t_height='3.1', t_time='05:31'),
 TideMark(t_date=29, t_seq=1, t_type='Low', t_height='1.1', t_time='10:56'),
 TideMark(t_date=29, t_seq=2, t_type='High', t_height='3.4', t_time='17:40'),
 TideMark(t_date=29, t_seq=3, t_type='Low', t_height='0.8', t_time='23:35'),
 TideMark(t_date=30, t_seq=0, t_type='High', t_height='3.4', t_time='06:14'),
 TideMark(t_date=30, t_seq=1, t_type='Low', t_height='0.9', t_time='11:48'),
 TideMark(t_date=30, t_seq=2, t_type='High', t_height='3.7', t_time='18:24')]
[(28, 0, 'High', 2.7, '04:24'),
 (28, 1, 'Low', 1.4, '09:48'),
 (28, 2, 'High', 3.0, '16:40'),
 (28, 3, 'Low', 1.1, '22:43'),
 (29, 0,

In [82]:
# get the tidal range
for i in water_marks:
    t_high = water_marks[i]['High']
    t_low = water_marks[i]['Low']
    water_marks[i]['TidalRange'] = round(t_high - t_low,2)
pp(water_marks)

{28: {'High': 3.0, 'Low': 1.1, 'TidalRange': 1.9},
 29: {'High': 3.4, 'Low': 0.8, 'TidalRange': 2.6},
 30: {'High': 3.7, 'Low': 0.9, 'TidalRange': 2.8}}


In [83]:
# populate the tide_day class from the dictionary
tidal_ranges_by_date = [TideForDay(t_date=key, tidal_range=value['TidalRange']) for key, value in water_marks.items()]
pp(tidal_ranges_by_date)
pp(tide_marks)


[TideForDay(t_date=28, tidal_range=1.9),
 TideForDay(t_date=29, tidal_range=2.6),
 TideForDay(t_date=30, tidal_range=2.8)]
[TideMark(t_date=28, t_seq=0, t_type='High', t_height='2.7', t_time='04:24'),
 TideMark(t_date=28, t_seq=1, t_type='Low', t_height='1.4', t_time='09:48'),
 TideMark(t_date=28, t_seq=2, t_type='High', t_height='3.0', t_time='16:40'),
 TideMark(t_date=28, t_seq=3, t_type='Low', t_height='1.1', t_time='22:43'),
 TideMark(t_date=29, t_seq=0, t_type='High', t_height='3.1', t_time='05:31'),
 TideMark(t_date=29, t_seq=1, t_type='Low', t_height='1.1', t_time='10:56'),
 TideMark(t_date=29, t_seq=2, t_type='High', t_height='3.4', t_time='17:40'),
 TideMark(t_date=29, t_seq=3, t_type='Low', t_height='0.8', t_time='23:35'),
 TideMark(t_date=30, t_seq=0, t_type='High', t_height='3.4', t_time='06:14'),
 TideMark(t_date=30, t_seq=1, t_type='Low', t_height='0.9', t_time='11:48'),
 TideMark(t_date=30, t_seq=2, t_type='High', t_height='3.7', t_time='18:24')]


In [84]:
# Given a date, add the month and year to which this applies
def add_month_year(t_date):
    month = "09"
    year = "2024"
    return f"{t_date}/{month}/{year}"
    
    

In [85]:
# Now that the objects are populated, we can loop over these to
# create the csv for importing into a spreadsheet
# I want 2 tides per day, the "usable" tides, when I have a chance of
# getting down to the sea. There are 3 or 4 water marks in a calendar
# day. Take the first 2 that occur after say 0600.
# Drive the loop from the parent class

tides_as_csv = ""
x = ""
for i in tidal_ranges_by_date:
    
    #print(i)
    x += f"{add_month_year(i.t_date)},{i.tidal_range}"
    m = [mark for mark in tide_marks if mark.t_date == i.t_date and mark.t_time > "06:00"]
    sorted_marks = sorted(m, key=lambda x: x.t_seq, reverse=False)
    tide_count = 0
    for sorted_mark in sorted_marks:
        if tide_count < 2:
            x += f",{sorted_mark.t_type},{sorted_mark.t_time},{sorted_mark.t_height}"
        tide_count += 1
    x += "\n"
    
    # print("m------------")
    # print(m)
    # print("------------")
    # print(sorted_marks)
    
pp(x)    
# 02,High,2.5,05:53,3.1,Low,11:31,0.9,High,18:18,3.4,
# 03,Low,2.8,00:15,0.8,High,06:56,3.3,Low,12:32,0.8,High,19:13,3.6,
# 04,Low,3.0,01:08,0.7,High,07:46,3.4,Low,13:21,0.7,High,19:59,3.7,
# 05,Low,3.2,01:53,0.6,High,08:28,3.5,Low,14:03,0.6,High,20:37,3.8,
# 06,Low,3.2,02:33,0.6,High,09:01,3.5,Low,14:39,0.6,High,21:08,3.8,
# 07,Low,3.1,03:06,0.6,High,09:28,3.5,Low,15:10,0.6,High,21:33,3.7,
# 08,Low,2.9,03:33,0.6,High,09:49,3.5,Low,15:36,0.6,

('28/09/2024,1.9,Low,09:48,1.4,High,16:40,3.0\n'
 '29/09/2024,2.6,Low,10:56,1.1,High,17:40,3.4\n'
 '30/09/2024,2.8,High,06:14,3.4,Low,11:48,0.9\n')
