## Scrape Precipitation forecast values from _wundergroung.com_

I need the precipitation forecast to make the predictions.

www.wunderground.com seems to have some security feature which blocks known spider/bot user agents (like ```urllib``` used by python). I have tried it myself and I couldn´t get the page source. This makes sense because they want you to pay for their API.

If you don´t want to pay (like me) you have to simulate that you are accessing from a known browser user agent (i.e. Chrome).

This is why I use **Selenium WebDriver**. WebDriver drives a browser natively, as a user would.

REQUIREMENTS
- install selenium ```!pip install selenium```
- Make sure that ```chromedriver.exe``` location matches with the one specified here:<br>
```driver = webdriver.Chrome(executable_path='./chromedriver.exe', options=options)```

In [1]:
import numpy as np
import pandas as pd
from datetime import date, timedelta
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Use .format(YYYY, M, D)
lookup_URL = 'https://www.wunderground.com/hourly/us/ny/new-york-city/date/{}-{}-{}.html'

options = webdriver.ChromeOptions();
options.add_argument('headless'); # to run chrome in the backbroung

driver = webdriver.Chrome(executable_path='./chromedriver.exe', options=options)

start_date = date.today() + pd.Timedelta(days=1)
end_date = date.today() + pd.Timedelta(days=4)

df_prep = pd.DataFrame()

while start_date != end_date:
    print('gathering data from: ', start_date)
    list_prep = []
    formatted_lookup_URL = lookup_URL.format(start_date.year,
                                             start_date.month,
                                             start_date.day)
    
    driver.get(formatted_lookup_URL)
    rows = WebDriverWait(driver, 60).until(EC.visibility_of_all_elements_located((By.XPATH, '//td[@class="mat-cell cdk-cell cdk-column-liquidPrecipitation mat-column-liquidPrecipitation ng-star-inserted"]')))
    for row in rows:
        prep = row.find_element_by_xpath('.//span[@class="wu-value wu-value-to"]').text
        list_prep.append(prep)
        
    df_prep[str(start_date.day)] = list_prep
    
    start_date += timedelta(days=1)
df_prep

gathering data from:  2020-07-25
gathering data from:  2020-07-26
gathering data from:  2020-07-27
gathering data from:  2020-07-28
gathering data from:  2020-07-29
gathering data from:  2020-07-30


Unnamed: 0,25,26,27,28,29,30
0,0,0,0,0.0,0,0
1,0,0,0,0.0,0,0
2,0,0,0,0.0,0,0
3,0,0,0,0.0,0,0
4,0,0,0,0.0,0,0
5,0,0,0,0.0,0,0
6,0,0,0,0.0,0,0
7,0,0,0,0.0,0,0
8,0,0,0,0.0,0,0
9,0,0,0,0.0,0,0
