# Gridium Interview
This coding exercise is intended to help us get to know each other through code. You get to see a small example of the types of problems our new backend engineer will solve, and we get to see if your approach to coding is a fit for us. We expect this should take no more than 1-2 hours; if it takes much longer, stop and send what you have.

Write a simple Python web scraper to help us visit the tide pools.

Go to https://www.tide-forecast.com/ to get tide forecasts for these locations:

Half Moon Bay, California

Huntington Beach, California

Providence, Rhode Island

Wrightsville Beach, North Carolina

Load the tide forecast page for each location and extract information on low tides that occur after sunrise and before sunset. Return the time and height for each daylight low tide.

In your response, be sure to include a URL where we can see the code and a description of how to run it (including installing dependencies, if needed).

## Setup & Instructions
1. I used a Jupyter Notebook because it would be the easiest to annotate and share - if needed it shouldnt be too hard to convert this to a script that produces a text or json file or something along those lines.
2. The easiest way to install the right dependencies to run this notebook is to use the accompanying environment.yml - if you have conda installed, `conda env create -f environment.yml` and `conda activate gridium_interview` should be enough to be able to run this. 
3. I've included at the end printed forms of the asked for information, please let me know if there's a different format you'd prefer.


In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.ui import Select

import pandas as pd

import datetime
import time

In [2]:
options = Options()
options.headless = True

driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)

In [3]:
LOCATIONS = [('Half Moon Bay', 'California'), ('Huntington Beach', 'California'), ('Providence', 'Rhode Island'), ('Wrightsville Beach', 'North Carolina')]


## Navigating to Location Page and returning HTML

These functions takes in the selenium driver, and a city-state pair, and uses the dropdown menus at the top of the site to navigate to the page for the specific location. It returns the HTML for the Location Page, which is perhaps overkill, but for the sake of getting this out the door I figured it wouldn't add too much overhead. I was hoping to avoid using selenium, which can be a hassle to install and is slow but it is comparitively easy to use.


In [4]:
def getPageHTML(driver: webdriver.chrome.webdriver.WebDriver) -> str:
    root_element = driver.find_element(By.XPATH, "//*")
    source_code = root_element.get_attribute("outerHTML")

    return source_code

def navigateToLocationPage(driver: webdriver.chrome.webdriver.WebDriver, city: str, state: str) -> None:
    state_dropdown = 'region_id'
    city_dropdown = 'location_filename_part'
    tide_forecast_url = "https://www.tide-forecast.com/"

    driver.get(tide_forecast_url)

    state_select = Select(driver.find_element(By.ID, state_dropdown))
    state_select.select_by_visible_text(state)

    time.sleep(2) # Allow Cities Dropdown to reload
    
    city_select = Select(driver.find_element(By.ID, city_dropdown))
    city_select.select_by_visible_text(city)
      
def getLocationPageHTML(driver: webdriver.chrome.webdriver.WebDriver, city: str, state: str) -> str: 
    navigateToLocationPage(driver, city, state)

    return getPageHTML(driver)



### Extracting Sunrise, Sunset and Tide Data
The Location Pages have a content block for each day, with two tables, the first for tides, and the second for sunrise, sunset, moonrise and moonset.

The getSunriseSunset function takes in this sun/moon table and uses regex to extract the times for sunrise and sunset, and converts them to a datetime time object for easy comparison later. The site uses a 12HR time format, but 00:10AM for what should be 12:10AM according to the python specification.

The formatTides function produces a cleaned dataframe with extracted dates and times for each tide. With more time I'd clean up some of these magic strings and numbers.

In [5]:
def getSunField(column: pd.DataFrame, pattern: str) -> datetime.time:
    return pd.to_datetime(column.str.extract(pattern)[0].replace('00:', '12:'), format='%I:%M%p').dt.time[0]
    
def getSunriseSunset(df: pd.DataFrame) -> pd.DataFrame:
    sunrise = getSunField(df[0], "Sunrise: (.*)")
    sunset =  getSunField(df[1], "Sunset: (.*)")
    
    return sunrise, sunset

def formatTides(df: pd.DataFrame, sunrise: datetime.time, sunset: datetime.time) -> pd.DataFrame:
    df['sunrise'] = sunrise
    df['sunset'] = sunset
    
    df['date'] = df.iloc[:, 1].str.extract("\((.*?)\)")
    df['time'] = df.iloc[:, 1].str.replace('00:', '12:').str.extract("(.*)\(.*\)")
    df['date_time'] = pd.to_datetime(df['time'], format='%I:%M %p').dt.time
    
    df.drop(columns='time')

    return df



    

## Concatenating Tides and Sunrise/Sunset for all days for a location

This function loops over the 30 day forecast and produces one dataframe with the relevant tide time and sunrise/sunset times.


In [6]:
def extractTideDf(tides_df: pd.DataFrame, sun_cycle_df: pd.DataFrame) -> pd.DataFrame:
    sunrise, sunset = getSunriseSunset(sun_cycle_df)

    return formatTides(tides_df, sunrise, sunset)

def getTides(location_page: str) -> pd.DataFrame:
    dataframes = pd.read_html(location_page)

    tide_dfs = []

    for i in range(3, len(dataframes), 2):
        tides_df = extractTideDf(tides_df=dataframes[i], sun_cycle_df=dataframes[i + 1])

        tide_dfs.append(tides_df)
    
    return pd.concat(tide_dfs).reset_index()
    

## Returning only Daylight Low Tides
By doing the work earlier to store all of the relevant tide and sunrise/sunset times as time fields, querying over the entire dataframe for the location is made much easier. This function could probably be abstracted to take in the type of tide and range of times as parameters.

In [7]:
def daylightLowTides(tides_df: pd.DataFrame) -> pd.DataFrame:
    return tides_df.loc[
        (tides_df.Tide == 'Low Tide') & 
        (tides_df.date_time > tides_df.sunrise) & 
        (tides_df.date_time < tides_df.sunset)
    ][
        ['date', 'date_time', 'Height']
    ]

## Main Function

Takes in a Selenium Driver and a list of tuples of locations - It should work fine with any locations covered by the tide forecast website. Future work would involve handling the cases of cities not covered better - by better error handling in the getLocationPageHTML function if nothing else.

In [8]:
def daylightTidesByLocations(driver: webdriver.chrome.webdriver.WebDriver, locations: list[tuple]):
    daylight_tides = {}

    for city, state in locations:
        location_page_html = getLocationPageHTML(driver, city, state)

        tides = getTides(location_page_html)

        daylight_tides[', '.join([city, state])] = daylightLowTides(tides)
    
    return daylight_tides


## Execute this to get the results

It's quite slow because of the selenium.

In [9]:
tides = daylightTidesByLocations(driver, LOCATIONS)

In [10]:
tides['Half Moon Bay, California']

Unnamed: 0,date,date_time,Height
2,Tue 31 January,13:48:00,0.03 ft (0.01 m)
6,Wed 01 February,14:34:00,-0.18 ft (-0.05 m)
10,Thu 02 February,15:13:00,-0.3 ft (-0.09 m)
14,Fri 03 February,15:48:00,-0.36 ft (-0.11 m)
18,Sat 04 February,16:19:00,-0.36 ft (-0.11 m)
22,Sun 05 February,16:49:00,-0.29 ft (-0.09 m)
26,Mon 06 February,17:17:00,-0.16 ft (-0.05 m)
40,Fri 10 February,07:24:00,1.92 ft (0.59 m)
44,Sat 11 February,08:19:00,1.61 ft (0.49 m)
48,Sun 12 February,09:22:00,1.25 ft (0.38 m)


In [11]:
tides['Huntington Beach, California']

Unnamed: 0,date,date_time,Height
1,Tue 31 January,13:00:00,1.33 ft (0.41 m)
5,Wed 01 February,13:38:00,1.02 ft (0.31 m)
9,Thu 02 February,14:10:00,0.81 ft (0.25 m)
13,Fri 03 February,14:41:00,0.66 ft (0.2 m)
17,Sat 04 February,15:10:00,0.61 ft (0.19 m)
21,Sun 05 February,15:38:00,0.66 ft (0.2 m)
25,Mon 06 February,16:04:00,0.82 ft (0.25 m)
29,Tue 07 February,16:29:00,1.08 ft (0.33 m)
33,Wed 08 February,16:52:00,1.44 ft (0.44 m)
37,Thu 09 February,17:13:00,1.9 ft (0.58 m)


In [12]:
tides['Providence, Rhode Island']

Unnamed: 0,date,date_time,Height
1,Tue 31 January,11:33:00,0.87 ft (0.27 m)
5,Wed 01 February,12:23:00,0.82 ft (0.25 m)
9,Thu 02 February,11:09:00,0.77 ft (0.23 m)
11,Thu 02 February,13:04:00,0.76 ft (0.23 m)
15,Fri 03 February,11:45:00,0.58 ft (0.18 m)
19,Sat 04 February,12:25:00,0.37 ft (0.11 m)
23,Sun 05 February,13:06:00,0.17 ft (0.05 m)
27,Mon 06 February,13:45:00,-0.02 ft (-0.01 m)
31,Tue 07 February,14:23:00,-0.16 ft (-0.05 m)
35,Wed 08 February,15:00:00,-0.24 ft (-0.07 m)


In [13]:
tides['Wrightsville Beach, North Carolina']

Unnamed: 0,date,date_time,Height
1,Tue 31 January,09:58:00,0.18 ft (0.05 m)
5,Wed 01 February,10:47:00,0.15 ft (0.05 m)
9,Thu 02 February,11:33:00,0.08 ft (0.02 m)
13,Fri 03 February,12:16:00,-0.03 ft (-0.01 m)
17,Sat 04 February,12:55:00,-0.15 ft (-0.05 m)
21,Sun 05 February,13:31:00,-0.26 ft (-0.08 m)
25,Mon 06 February,14:05:00,-0.33 ft (-0.1 m)
29,Tue 07 February,14:37:00,-0.35 ft (-0.11 m)
33,Wed 08 February,15:08:00,-0.32 ft (-0.1 m)
37,Thu 09 February,15:39:00,-0.26 ft (-0.08 m)
