# Purple Air Web Reader

Written by Ross Cheung, South Coast AQMD.  If you have questions, you can contact me at: rcheung@aqmd.gov


   *Last Updated 04-16-2019* 


### Dependencies
For this to work you'll need to install the selenium library, which is not part of an Anaconda installation.  For more on installing selenium on your system, look at: 
http://selenium-python.readthedocs.io/installation.html


You will also need a web browser; I have tested this with Chrome for Windows and Mac, and both work fine.  The windows version of Chrome required a specific automated Chrome browser, but this may have been done away with in more recent versions of Chrome.  If not, the Selenium website has a continually updated link to web drivers. 

### How to download data

A few cells down, under the "input here" heading, change the variable listed as "keyword_list" to a list of strings of keywords (see examples).  The code will search for all station names that contain that substring (e.g. 'AQMD_NASA' will find stations 'AQMD_NASA 1', 'AQMD_NASA 2', etc.).   You can also set keyword_list to a list of many strings, in which case it'll download any data from any station that contains any of those keywords (see commented out examples for how data throughout the South Coast Basin was downloaded). 

Then change the variables "start_date_text" and "end_date_text" to the first and last date, in string format "MM/DD/YY".  THe code will fill in the text fields for "Start Date" and "End Date" respectively. 

Then, run the code. A "phantom" browser will open and actions will be carried out. You should see the website https://map.purpleair.org/sensorlist open, dates be entered, and files for the corrresponding stations and dates begin to download. 

The files will be downloaded to whatever fi

### Note on Browser security settings, and other issues: 

Sometimes the browser freaks out because the code tries to download (depending on what you told it to do) thousands of files at once.  Chrome sometimes gives you a "do you want to let this code download these files" dialog box.  I recommend watching the operation for a bit, the first time you run it, to make sure it is working properly. 

There are other times when the code is run and nothing happens.  I'm not sure why this happens, and any solution besides restarting your computer and trying again, but as of 4/19/19 I have tested it and it works. 

Do note that as PurpleAir updates their sensor list page, the code slows down as it has to search through more entries.  Downloading every station associated with the keyword "AQMD_NASA" takes at most a few hours as of the last time this code was run. 



In [1]:

import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys



## Input here

In the next cell are some variables you can change/update to tell this script what to look for. 

keyword_list: add what keywords to this list, that you want the script to look for.  Example: if "AQMD" is a string in this list, the code will download data from any station that has "AQMD" in the title, like "AQMD_NASA_16", etc. 

start_date_text and "end_date_text": add the dates you want to download data from in North America format, for example "12/06/17" for December 6th, 2017. 

In [2]:
keyword_list = ['AQMD_NASA']
#keyword_list = ['RUSD', 'Redlands', 'Yucaipa', 'Highland', 'LomaLinda', 'Mentone']

#,'PCH Calle', 'CCA', 'Colton', 'Hacienda Heights', 'loma linda', 'mentone', 'west Los Angeles',
               # 'CCA Balboa','Motor Parts of America', 'RUSD_', 'USCEHC','PCH Calle', 'Peters House', 'Porter Ranch',
                #'Redlands', 'RIVR_Co','BikeSGV','Echo Park', 'Venice','Santa Monica', 'SBSC','SCSB','UHills']

#Example: Download all data from Q4 of 2018
start_date_text =  '09/01/18'
end_date_text = '12/31/18'

# Ideally below this point you won't have to touch this code, other than perhaps updating the location of your chrome driver. 

In [3]:
# Note: As of 9/28 for some reason this only works in chrome but not phantomjs.
# You may need to manually watch the browser to see if it complains about many files
# being downloaded; a lot have protections in case bots (like this)

#driver = webdriver.Ie()
driver = webdriver.Chrome('./chromedriver')
#driver = webdriver.PhantomJS(executable_path='./phantomjs') 
driver.get("https://map.purpleair.org/sensorlist")
assert "PurpleAir" in driver.title
time.sleep(3)


startdate = driver.find_element_by_id("startdatepicker")
enddate = driver.find_element_by_id("enddatepicker")
startdate.send_keys(start_date_text, Keys.ENTER)
enddate.send_keys(end_date_text, Keys.ENTER)

In [None]:
def click_specific_keyword(keyword):
    
    allButtons = driver.find_elements_by_xpath("//tr[contains(td[2], '" + keyword + "')]/td[5]/button[text()='Download Primary']")
    for buttons in allButtons:
        try: 
            buttons.click()
        except:
            pass
        time.sleep(30)
        
        

#allButtons = driver.find_elements_by_xpath("//tr[contains(td[2], 'AQMD_')]/td[5]/button[text()='Download Secondary']")
#allButtons = driver.find_elements_by_xpath("//tr[contains(td[2], 'RIVR_Co')]/td[5]/button[text()='Download Primary']")



In [None]:
tic = time.time()

for i in keyword_list:
    click_specific_keyword(i)
    time.sleep(3)


In [None]:
elapsed = time.time()- tic
print(elapsed)

In [None]:
driver.close()
