This coding exercise is intended to help us get to know each other through code. You get to see a small example of the types of problems our new backend engineer will solve, and we get to see if your approach to coding is a fit for us. We expect this should take no more than 1-2 hours; if it takes much longer, stop and send what you have.

Write a simple Python web scraper to help us visit the tide pools.

Go to https://www.tide-forecast.com/ to get tide forecasts for these locations:

Half Moon Bay, California

Huntington Beach, California

Providence, Rhode Island

Wrightsville Beach, North Carolina

Load the tide forecast page for each location and extract information on low tides that occur after sunrise and before sunset. Return the time and height for each daylight low tide.

In your response, be sure to include a URL where we can see the code and a description of how to run it (including installing dependencies, if needed).

In [233]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.ui import Select

import pandas as pd

import time

In [234]:
TIDE_FORECAST_URL = "https://www.tide-forecast.com/"
LOCATIONS = [('Half Moon Bay', 'California'), ('Huntington Beach', 'California'), ('Providence', 'Rhode Island'), ('Wrightsville Beach', 'North Carolina')]

Get Location Page:

Because the URLS for the location pages don't follow a consistent format, I use selenium here to navigate through the select menu at the top for finding each of the locations, before dumping the resulting HTML directly into pandas, which automatically loads all tables from the website.

In [230]:

def getLocationPageHTML(driver, city, state):
    STATE_DROPDOWN = 'region_id'
    SEARCH_ID = 'searchbtn'
    CITY_DROPDOWN = 'location_filename_part'

    driver.get(TIDE_FORECAST_URL)

    select = Select(driver.find_element(By.ID, STATE_DROPDOWN))
    select.select_by_visible_text(state)

    time.sleep(2)
    
    select = Select(driver.find_element(By.ID, CITY_DROPDOWN))
    select.select_by_visible_text(city)

    elem = driver.find_element(By.XPATH, "//*")
    source_code = elem.get_attribute("outerHTML")

    return source_code



In [228]:
def getSunriseSunset(df):
    sunrise = pd.to_datetime(df[0].str.extract("Sunrise: (.*)")[0].replace('00:', '12:'), format='%I:%M%p').dt.time[0]
    sunset =  pd.to_datetime(df[1].str.extract("Sunset: (.*)")[0].replace('00:', '12:'), format='%I:%M%p').dt.time[0]

    return sunrise, sunset

def formatTides(df, sunrise, sunset):
    df['sunrise'] = sunrise
    df['sunset'] = sunset
    
    df['date'] = df.iloc[:, 1].str.extract("\((.*?)\)")
    df['time'] = df.iloc[:, 1].str.replace('00:', '12:').str.extract("(.*)\(.*\)")
    df['date_time'] = pd.to_datetime(df['time'], format='%I:%M %p').dt.time
    df.drop(columns='time')

    return df



    

In [163]:
def getTides(location_page):
    dataframes = pd.read_html(location_page)

    tide_dfs = []

    for i in range(3, len(dataframes), 2):
        tides_df = dataframes[i]
        sun_cycle_df = dataframes[i + 1]

        sunrise, sunset = getSunriseSunset(sun_cycle_df)

        tides_df = formatTides(tides_df, sunrise, sunset)

        tide_dfs.append(tides_df)
    
    return pd.concat(tide_dfs).reset_index()
    

In [235]:
def daylightTides(tides_df):
    return tides_df.loc[
        (tides_df.Tide == 'Low Tide') & 
        (tides_df.date_time > tides_df.sunrise) & 
        (tides_df.date_time < tides_df.sunset)
    ][
        ['date', 'date_time', 'Height']
    ]

In [238]:
def daylightTidesByLocation(driver, locations):
    daylight_tides = {}

    for city, state in locations:
        location_page_html = getLocationPageHTML(driver, city, state)

        tides = getTides(location_page_html)

        daylight_tides[(city, state)] = daylightTides(tides)
    
    return daylight_tides


In [239]:
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

tides = daylightTidesByLocation(driver, LOCATIONS)

In [240]:
tides

{('Half Moon Bay',
  'California'):                 date date_time              Height
 1     Mon 30 January  12:55:00     0.32 ft (0.1 m)
 5     Tue 31 January  13:48:00    0.03 ft (0.01 m)
 9    Wed 01 February  14:34:00  -0.18 ft (-0.05 m)
 13   Thu 02 February  15:13:00   -0.3 ft (-0.09 m)
 17   Fri 03 February  15:48:00  -0.36 ft (-0.11 m)
 21   Sat 04 February  16:19:00  -0.36 ft (-0.11 m)
 25   Sun 05 February  16:49:00  -0.29 ft (-0.09 m)
 29   Mon 06 February  17:17:00  -0.16 ft (-0.05 m)
 43   Fri 10 February  07:24:00    1.92 ft (0.59 m)
 47   Sat 11 February  08:19:00    1.61 ft (0.49 m)
 51   Sun 12 February  09:22:00    1.25 ft (0.38 m)
 55   Mon 13 February  10:32:00    0.82 ft (0.25 m)
 59   Tue 14 February  11:42:00     0.3 ft (0.09 m)
 63   Wed 15 February  12:47:00  -0.24 ft (-0.07 m)
 67   Thu 16 February  13:45:00  -0.73 ft (-0.22 m)
 71   Fri 17 February  14:36:00   -1.1 ft (-0.34 m)
 75   Sat 18 February  15:23:00  -1.29 ft (-0.39 m)
 79   Sun 19 February  16:07: