## Tide data scraper

This notebook scrapes data from https://marine.meteoconsult.fr/meteo-marine/horaires-des-marees/pointe-d-agon-944/juillet-2022
to retrieve information about the tides dates, coefficients and times.


In [118]:
""""Do the necessary imports"""
from bs4 import BeautifulSoup
import requests
import pandas as pd
import os

In [143]:
"""Scrape a first time to all the months where data about the tides can be found and create a list that 
can be pluged in the url to load all the pages where scrapable data is available"""

url = "https://marine.meteoconsult.fr/meteo-marine/horaires-des-marees/pointe-d-agon-944/juillet-2022"

response=requests.get(url)

soup1 = BeautifulSoup(response.content, 'html.parser')

list_dates=[]
list_years=[]

for i in soup1.find_all(class_="month-container"):
    list_dates.append((i.find(class_="name").text))
    list_years.append((i.find(class_="year").text))
    
list_dates_and_years = [m+str(n) for m,n in zip(list_dates, list_years)]

month_years_final=[]
for i in list_dates_and_years:
    i=i.replace("20", "-20")
    month_years_final.append(i)
    





The following code iterates over each of the periods for which data is available on the website
For each period, the corresponding web page is scrapped and data about the date (two classes scrapped because
of the distinction between weekends days and week days made on the website), the time of the low and high tides,
the value of the coefficient of the first and second ties.
Once this data is collected, it is reshaped to eliminate the \n obtained with the scraped information.
The times are also formated so that a distinction is made between the time of each tide (4 tides per day in total)
Finally all the data is grouped in a dataframe which is then saved as a csv (one dataframe per month)


In [229]:
"""Loop on each month and extract the information"""

url = "https://marine.meteoconsult.fr/meteo-marine/horaires-des-marees/pointe-d-agon-944"


for month in month_years_final:
    
    days=[]
    high_tide_time=[]
    low_tide_time=[]
    first_tide_coeff=[]
    second_tide_coeff=[]
      
    req=requests.get(url + f"/{month}")
    soup = BeautifulSoup(req.content, 'html.parser')
    
    for i in soup.find_all(class_= ["tide-date week", "tide-date weekEnd"]):
        days.append(i.text)

    for i in soup.find_all(class_="high-tide"):
        high_tide_time.append(i.find(class_="hour").text)

    for i in soup.find_all(class_="low-tide"):
        low_tide_time.append(i.find(class_="hour").text)

    for index,value in enumerate(soup.find_all(class_=["coef tide-coef-level-1","coef tide-coef-level-2","coef tide-coef-level-3","coef tide-coef-level-4","coef tide-coef-level-5"])):
        if (index%2) == 0:
            first_tide_coeff.append(value.text)
        else:
            second_tide_coeff.append(value.text)
            
    """Do the necessary reshapes"""
    
    days_formated=[]
    for i in days:
        i=i.replace("\n","")
        days_formated.append(i)

    first_tide_coeff_formated=[]
    for i in first_tide_coeff:
        i=i.replace("\n","")
        first_tide_coeff_formated.append(i)

    second_tide_coeff_formated=[]
    for i in second_tide_coeff:
        i=i.replace("\n","")
        second_tide_coeff_formated.append(i)
        
    high_tide_time_1=[]
    high_tide_time_2=[]
    low_tide_time_1=[]
    low_tide_time_2=[]

    for index, value in enumerate(high_tide_time):
        if index%2==0:
            high_tide_time_1.append(value)
        else:
            high_tide_time_2.append(value)
            
    for index, value in enumerate(low_tide_time):
        if index%2==0:
            low_tide_time_1.append(value)
        else:
            low_tide_time_2.append(value)
            
    """Zip the different lists in a unique data set"""
            
    result = zip(days_formated,first_tide_coeff_formated,second_tide_coeff_formated,high_tide_time_1,high_tide_time_2,low_tide_time_1,low_tide_time_2)
    tide_calendar = pd.DataFrame(list(result), columns = ['Date', "Coeff1",'Coeff2',"first_high_tide","second_high_tide","first_low_tide","second_low_tide"])
     
    
    """For each loop save the data set of the corresponding month"""        
    file_path = os.getcwd()

    tide_calendar.to_csv(file_path+f"/results_tides/{month}.csv")
        
    
    
    

    

Unfortunatly some of the dataframes will be incorrect due to the fact that some days have less than 4 tides per day, which
results in the creation of a gap in the data. This only happens very infrequently and can therefore be corrected by hand.