<h1>Getting structured tides from hydrographic data</h1>
<hr>
<h2>Requirement</h2>
Starting with hydrographic html for a given location <sub>see [OneNote 08.2024 tide times]</sub>, build structured tide data for a set of dates.  


<hr>
<h2>Entities</h2>
<hr>
<h3>TideForDay</h3>
<h4>Definition</h4> Gathers data for a tide on a given calendar day
<h4>Key</h4> t_date 
<h4>Cardinalities</h4> 1-M TideMark
<h4>Data attributes</h4>
<p></p>

|Name|aka|data type|Definition| 
|:-------------|----|-------|:-------------| 
|
|t_date| Tide Date | int | UID of the tide for a given day. Strictly an alternate key, but no need for surrogates here. Not "date" due to keyword conflict. This is the date, but does not include the month and year. That is added during the output to csv stage |
|tidal_range|Tidal Range | float | The distance between the minimum low water mark, and the maximum high water mark, for that day |





<hr>
<h3>TideMark</h3>
<h4>Definition</h4> An instance of a high or low water mark
<h4>Key</h4> 1. t_date 2. t_seq 
<h4>Cardinalities</h4> M->1 with parent TideForDay
<h4>Data attributes</h4>
<p></p>

|Name|aka|Definition| 
|:-------------|:----|:-------------| 
|
|t_date| Date | foreign key onto the parent TideForDay |
|t_seq| Sequence in Day | Where this tide mark falls in the sequence of tides after midnight. Eg - first tide after midnight has sequence 1|
|t_type| Type | Values are "High" or "Low", referring to whether this is a high or low water mark |
|t_height| Height | the height of the high or low water mark. float to 1 decimal point. The incoming value may be negative, which is to be resolved to zero  |
|t_time| Time | the time of the high or low water mark |






In [1]:
# pip install regex
import re
from dataclasses import dataclass
from datetime import datetime

@dataclass
class TideMark:
    t_date: int
    t_seq: int
    t_type: str # High or Low
    t_height: float
    t_time: datetime

@dataclass
class TideForDay:
    t_date: int
    tidal_range: float # distance between min and max tide_marks


tide_marks = []
tides = []

with open("./hydro_2024.09.01.txt","r") as infile:
    lines = infile.readlines()



In [2]:
# Data example:
# Sun 04 AugNew moon on this day
# High Water of 3.7 metres, at 19:59.

# Remove any blank lines, then I only want lines that start with the values in ok_search_terms
# Where a value is Sun, Mon, etc, subsequent records refer to the low or high-water times for
# that day... until you hit a new day. Etc.
ok_search_terms = ("Low","High","Fri","Sat","Sun","Mon","Tue","Wed","Thur")
#remove non-printing chars - typically tab etc
cleaned_lines = [line.strip() for line in lines]
high_lows = [line for line in cleaned_lines if line.startswith(ok_search_terms)]
print(high_lows)

['Sun 01 Sep', 'Low Water of 0.8 metres, at 00:01.', 'High Water of 3.3 metres, at 06:42.', 'Low Water of 0.8 metres, at 12:15.', 'High Water of 3.6 metres, at 18:54.', 'Mon 02 Sep', 'Low Water of 0.6 metres, at 00:49.', 'High Water of 3.5 metres, at 07:25.', 'Low Water of 0.6 metres, at 13:00.', 'High Water of 3.8 metres, at 19:35.', 'Tues 03 SepNew moon on this day', 'Low Water of 0.5 metres, at 01:30.', 'High Water of 3.6 metres, at 08:01.', 'Low Water of 0.5 metres, at 13:40.', 'High Water of 3.9 metres, at 20:10.', 'Weds 04 Sep', 'Low Water of 0.5 metres, at 02:06.', 'High Water of 3.7 metres, at 08:31.', 'Low Water of 0.5 metres, at 14:14.', 'High Water of 3.9 metres, at 20:39.', 'Thurs 05 Sep', 'Low Water of 0.5 metres, at 02:37.', 'High Water of 3.7 metres, at 08:56.', 'Low Water of 0.5 metres, at 14:44.', 'High Water of 3.9 metres, at 21:03.', 'Fri 06 Sep', 'Low Water of 0.5 metres, at 03:04.', 'High Water of 3.7 metres, at 09:18.', 'Low Water of 0.5 metres, at 15:10.', 'High 

In [26]:
for line in high_lows:
    words = line.split()

    # If the sentence starts with Sun, Mon, etc, then subsequent records are
    # high/low water times. Keep words 1 (base 0)
    # (date, but not day). Month and year will be entered downstream
    if words[0] in ("Sun","Mon","Tues","Weds","Thurs","Fri","Sat"):
        curr_date = int(words[1])
        curr_seq = 0
        # new record required
        
        continue
    # After previous block, 7 or more words means a High or Low water sentence. 
    # Words 0 (base zero), 3 and 6 are respectively High/Low, tide-height, and 
    # time. Other words and characters are discarded
    if len(words) >= 7:
       
        curr_type = words[0]
        curr_time = words[6].rstrip('.')
        curr_height = words[3]
        # dataclass populates here:
        tide_mark = TideMark(curr_date, curr_seq, curr_type, curr_height, curr_time)
        tide_marks.append(tide_mark)
        #print(tide_mark)
        curr_seq += 1
        continue     


for tide in tide_marks:
    print(tide)

#print(tides)

NameError: name 'curr_date' is not defined

In [14]:
def pp(object_to_print):
    pprint.pprint(object_to_print)

In [15]:
import pprint
# Initialize a dictionary to store the highest and lowest heights for each date
heights_by_date = {}

# Get a sorted list of the unique dates for this set of tides. 
# Example: [4, 5, 6, 7]
unique_dates = list(set(int(mark.t_date) for mark in tide_marks))
unique_dates.sort()
pp(unique_dates)
water_marks = {}
for d in unique_dates:
    water_marks[d] = {'Low':100,'High':-100}

pp(tide_marks)

tide_heights = [(int(mark.t_date), mark.t_seq, mark.t_type, float(mark.t_height), mark.t_time) for mark in tide_marks]

pp(tide_heights)

for test_tide in tide_heights:
    test_date = test_tide[0]
    test_type = test_tide[2]
    new_height = test_tide[3]
    
    #print(water_marks[test_date])
    #print(water_marks[test_date][test_type])
    if  test_type == 'Low':
        if  new_height < water_marks[test_date][test_type]:
            water_marks[test_date][test_type] = new_height
    else: # High
        if  new_height > water_marks[test_date][test_type]:
            water_marks[test_date][test_type] = new_height

pprint.pprint(water_marks)
    
pprint.pprint(water_marks)

[]
[]
[]
{}
{}


In [16]:
# get the tidal range
for i in water_marks:
    t_high = water_marks[i]['High']
    t_low = water_marks[i]['Low']
    water_marks[i]['TidalRange'] = round(t_high - t_low,2)
pprint.pprint(water_marks)

{}


In [17]:
# populate the tide_day class from the dictionary
tidal_ranges_by_date = [TideForDay(t_date=key, tidal_range=value['TidalRange']) for key, value in water_marks.items()]
pp(tidal_ranges_by_date)
pp(tide_marks)


[]
[]


In [18]:
# Given a date, add the month and year to which this applies
def add_month_year(t_date):
    month = "08"
    year = "2024"
    return f"{t_date}/{month}/{year}"
    
    

In [20]:
# Now that the objects are populated, we can loop over these to
# create the csv for importing into a spreadsheet
# I want 2 tides per day, the "usable" tides, when I have a chance of
# getting down to the sea. There are 3 or 4 water marks in a calendar
# day. Take the first 2 that occur after say 0600.
# Drive the loop from the parent class

tides_as_csv = ""
x = ""
for i in tidal_ranges_by_date:
    
    #print(i)
    x += f"{add_month_year(i.t_date)},{i.tidal_range}"
    m = [mark for mark in tide_marks if mark.t_date == i.t_date and mark.t_time > "06:00"]
    sorted_marks = sorted(m, key=lambda x: x.t_seq, reverse=False)
    tide_count = 0
    for sorted_mark in sorted_marks:
        if tide_count < 2:
            x += f",{sorted_mark.t_type},{sorted_mark.t_time},{sorted_mark.t_height}"
        tide_count += 1
    x += "\n"
    
    # print("m------------")
    # print(m)
    # print("------------")
    # print(sorted_marks)
    
print(x)    
# 02,High,2.5,05:53,3.1,Low,11:31,0.9,High,18:18,3.4,
# 03,Low,2.8,00:15,0.8,High,06:56,3.3,Low,12:32,0.8,High,19:13,3.6,
# 04,Low,3.0,01:08,0.7,High,07:46,3.4,Low,13:21,0.7,High,19:59,3.7,
# 05,Low,3.2,01:53,0.6,High,08:28,3.5,Low,14:03,0.6,High,20:37,3.8,
# 06,Low,3.2,02:33,0.6,High,09:01,3.5,Low,14:39,0.6,High,21:08,3.8,
# 07,Low,3.1,03:06,0.6,High,09:28,3.5,Low,15:10,0.6,High,21:33,3.7,
# 08,Low,2.9,03:33,0.6,High,09:49,3.5,Low,15:36,0.6,


