waiting for elements to be available: 
https://stackoverflow.com/questions/59130200/selenium-wait-until-element-is-present-visible-and-interactable

action chaining: 
https://www.selenium.dev/documentation/webdriver/actions_api/

waiting: 
https://www.selenium.dev/documentation/webdriver/waits/

select by visible text: 
https://stackoverflow.com/questions/7867537/how-to-select-a-drop-down-menu-value-with-selenium-using-python

Sample html file for 
- Return A
- County
- Allegheny
- Oct - 2024 - Oct 2025 

Some elements of a page are loaded and unloaded dynamically - like the options lists for report types and counties. Clicking inside the box loads an html object of ocunties, clicking outside hides it. This is not captured when doing a Save As html from the page. 
Inspecting the page using browser dev tools while it's loaded, copying the HTML from the inspector will grab the options list. 

The SSRS Report Viewer loads inside an iframe, the iframe html doesn't get saved when you right click a page and save the HTML - have to inspect the page manually with browser devtools, locate the iframe, and copy out the innerhtml. (Did this and added it to the sample html file to simulate what selenium sees during a session).
- working with iframes: https://www.selenium.dev/documentation/webdriver/interactions/frames/ 



In [None]:
### Imports & static constants setup - this section shouldn't need changed much at all...
import os 
from datetime import datetime as dt 
from time import sleep

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select

### Set some static lists to work with when choosing parameters 
### for reference / future - parameter options: 
report_by_options = ['County','Agency','Tag']
report_type_options = [
    'Law Enforcement Officers Killed or Assaulted',
    'Return A',
    'Supplemental Homicide',
    'Return of Arson Offenses',
    'Supplement Return A',
    'Human Trafficking',
    'ASRE Adult Arrests',
    'ASRE Adult Victims',
    'ASRE Juvenile Arrests',
    'ASRE Juvenile Victims',
    'Person Charged',
]
county_options = [
    '001 - Adams County',
    '002 - Allegheny County',
    '003 - Armstrong County',
    '004 - Beaver County',
    '005 - Bedford County',
    '006 - Berks County',
    '007 - Blair County',
    '008 - Bradford County',
    '009 - Bucks County',
    '010 - Butler County',
    '011 - Cambria County',
    '012 - Cameron County',
    '013 - Carbon County',
    '014 - Centre County',
    '015 - Chester County',
    '016 - Clarion County',
    '017 - Clearfield County',
    '018 - Clinton County',
    '019 - Columbia County',
    '020 - Crawford County',
    '021 - Cumberland County',
    '022 - Dauphin County',
    '023 - Delaware County',
    '024 - Elk County',
    '025 - Erie County',
    '026 - Fayette County',
    '027 - Forest County',
    '028 - Franklin County',
    '029 - Fulton County',
    '030 - Greene County',
    '031 - Huntingdon County',
    '032 - Indiana County',
    '033 - Jefferson County',
    '034 - Juniata County',
    '035 - Lackawanna County',
    '036 - Lancaster County',
    '037 - Lawrence County',
    '038 - Lebanon County',
    '039 - Lehigh County',
    '040 - Luzerne County',
    '041 - Lycoming County',
    '042 - McKean County',
    '043 - Mercer County',
    '044 - Mifflin County',
    '045 - Monroe County',
    '046 - Montgomery County',
    '047 - Montour County',
    '048 - Northampton County',
    '049 - Northumberland County',
    '050 - Perry County',
    '051 - Pike County',
    '052 - Potter County',
    '053 - Schuylkill County',
    '054 - Snyder County',
    '055 - Somerset County',
    '056 - Sullivan County',
    '057 - Susquehanna County',
    '058 - Tioga County',
    '059 - Union County',
    '060 - Venango County',
    '061 - Warren County',
    '062 - Washington County',
    '063 - Wayne County',
    '064 - Westmoreland County',
    '065 - Wyoming County',
    '066 - York County',
    '067 - Philadelphia County',
]

report_links = {
    "home" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/",
    "home" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Home/Index",
    "faq" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/FAQ/SearchResults?searchTerm=",

    "crime_in_pennsylvania" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/CrimePublication/CrimePublicationReports",
    "reports" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/ReportsIndex/List",
    "pccd_data_extracts" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/ExtractFile/ListPCCDExtractFiles",
    "crime_map" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/CrimeAnalytics/index.html",
    "ad-hoc_queries" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/AdvancedSearch/AdvancedSearch",
    "arrest_distribution_breakdown_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/ArrestDrillDown",
    "arrest_distribution_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/ArrestDistribution",
    "arrest_trends_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/ArrestTrends",
    "group_a_offense_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/GroupACrimeReport",
    "hate_crime_incidents_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/HateCrimeByORIReport",
    "leoka_incidents_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/LEOKAByORIReport",
    "murder_incidents_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/MurderCrimeByORIReport",
    "offense_and_arrest_summary_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/AgencySummaryReport",
    "offense_distribution_breakdown_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/DrillDownReports",
    "offense_distribution_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/CrimeDistributionReport",
    "offense_trends_comparison_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/CrimesIndex",
    "offense_trends_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/CrimeTrends",
    "annual_srs_summary_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/SRSReport/AnnualSRSSummary",
    "arrest_distribution_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/SRSReport/CrimeDistribution",
    "hate_crime_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/HateCrime",
    "offense_density_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/SRSReport/CrimeDensity",
    "offense_trends_comparison" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/SRSReport/CrimesIndex",
    "offense_trends_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/SRSReport/CrimeTrends",
    "ytd_comparison_report" : "https://www.ucr.pa.gov/PAUCRSPUBLIC/Report/UCRSummary",
}



In [4]:
########## SETUP FOR THIS RUN 
### Set download directory for exported report files: 
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
file_download_dir = r'/Users/danwelsh/Projects/CrimeData/PSP_UCR/SRSSummaryReport/MonthlyData'
chrome_options.add_experimental_option('prefs', {'download.default_directory': file_download_dir})

### set the actual parameters you want to use for this loop
report_url = report_links['annual_srs_summary_report']

report_type_selection = 'Return A'
report_by_selection = 'County'

### Time parameters
### for running individual months - unused right now
years = ['2025'] # ['2023','2024','2025']
### build list of Jan - Dec parameter tuples for years in years list 
calendar_year_parameters = [(f'Jan - {year}', f'Dec - {year}') for year in years]
###### ::: 2008 - 2024 captured already... 

months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Dec']
# build parameter option format as list of month - year combinations 
month_by_month_parameters = [(f'{month} - {year}', f'{month} - {year}') for year in years for month in months]

### define which time pd type you want to use
time_pd_parameters = calendar_year_parameters
time_pd_parameters = month_by_month_parameters

### subset counties if needed
counties = county_options

### build a list of dictionaries to use as parameters for the main loop
### srs summary - combination of counties and time periods 
parameters = [ 
    {'county':county, 'start': pd[0], 'end': pd[1]} for pd in time_pd_parameters for county in counties
]

### Parameter override for re-running failures from a previous run 
### (county time period dictionaries are logged if they fail, reload from the json file if needed)
import json 
if os.path.exists('LastRunFailures.json'):
    with open('LastRunFailures.json', 'r') as f:
        prev_failure_parameters = json.loads(f.read())
        # print(json.dumps(parameters, indent=2))
    for p in prev_failure_parameters:
        print(p)
    # print(prev_failure_parameters)
    ###check if file / prev_fail is empty; if empty use parameters above;  else retry failures
    if len(prev_failure_parameters) > 0:
        ## failures found, retry:
        parameters = prev_failure_parameters
    else:
        pass # parameters already set
    
print(dt.now().isoformat())
print(len(parameters), parameters)


2025-10-18T11:01:50.162046
737 [{'county': '001 - Adams County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '002 - Allegheny County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '003 - Armstrong County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '004 - Beaver County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '005 - Bedford County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '006 - Berks County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '007 - Blair County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '008 - Bradford County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '009 - Bucks County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '010 - Butler County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '011 - Cambria County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '012 - Cameron County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}, {'county': '0

In [None]:
### function definition - this gets executed once per loop for each parameter option
def export_iframe_report_to_excel(iframe_element):
    '''This can be re-used across multiple report types 
    Once the parameters have been selected and the report is generated on the page
    Find the iframe element in which the report is loaded
    pass that element here and it will wait for the report to fully load and trigger an excel export 
    '''
    ### switch context to iframe to find elements within it 
    driver.switch_to.frame(iframe_element.get_attribute('id'))
    if print_debug: print('waiting for report/export...', dt.now().isoformat())
    ### once iframe is loaded we can wait for elements within it
    # wait for report viewer to complete the report and show the export button 
    excel_export = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH,'//table[@title="Export drop down menu"]')))
    # wait for the copyright state of PA text to load at the bottom of the report - indicator loading is done, before exporting

    end_of_report_text = WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, '//span[text()="© STATE OF Pennsylvania"]')))
    if print_debug: print('end of report visibility', dt.now())
    ### EXPORT: this is the onclick event for the excel export in the drop down, just exeucte this directly (after all loading is done:): 
    driver.execute_script("$find('rvSiteMapping').exportReport('EXCELOPENXML');")

def get_report_data(parameter):
    '''This is specific tot he SRSAnnual Summary report parameters
    '''
    # extract individual values from paremter dict
    county = parameter['county']
    start_pd = parameter['start']
    end_pd = parameter['end']
    # prep log record with start time 
    county_run = {
        'Start':start_pd,
        'County':county,
        'StartTime':dt.now().isoformat()
    }
    if print_debug: print('\n', start_pd, end_pd, county, dt.now(), end='...')
    
    # load page: 
    driver.get(report_url)

    if print_debug: print('getting elements...', dt.now().isoformat())
    ### find all of the relevant parameter objects on page (not all used, but at least validates they're all present as a test)
    start_month_year = driver.find_element(By.ID,'StartMonthYear') # date / text = set values from list 
    end_month_year = driver.find_element(By.ID, 'EndMonthYear') # date / text = set values from list 
    report_by = Select(driver.find_element(By.ID, 'ReportBy')) # option change "selected" value # SET BEFORE AGENCY
    chosen_choices = driver.find_element(By.CLASS_NAME, 'chosen-choices') # need to add one chosen choice (county) per loop 
    generate_report = driver.find_element(By.XPATH,'//*[@title="Generate Report"]')
    report_type = Select(driver.find_element(By.ID, 'SRSSummaryReportType'))
    county_input = driver.find_element(By.CLASS_NAME, 'chosen-choices')
    county_drop_down = driver.find_element(By.CLASS_NAME, 'chosen-results')

    if print_debug: print('setting report by and report type...', dt.now().isoformat())
    ### set report by and report type once: 
    sleep(3) # make sure page is loaded BEFORE changing report type to county...
    report_type.select_by_visible_text(report_type_selection)
    report_by.select_by_visible_text(report_by_selection) 
    sleep(2) # let county options load after changing report by value

    if print_debug: print('set year month parameters...', dt.now().isoformat())
    ### set year month parameters: 
    driver.execute_script(f"document.getElementsByName('StartMonthYear')[0].value= '{start_pd}'")
    driver.execute_script(f"document.getElementsByName('EndMonthYear')[0].value= '{end_pd}'")

    if print_debug: print('County Click and select...')
    ### click county box to load dropdown of choices
    ### if there is a previous choice - need to clear it...
    try:
        existing_county_choice = driver.find_element(By.CLASS_NAME, 'search-choice-close')
        existing_county_choice.click()
    except:
        # no existing county selection
        pass

    ### click input to load drop down selection. 
    county_input.click()
    sleep(1)
    ### get the element of the county one you want and simulate click (assumes county option list has loaded - will timeout if not)
    if print_debug: print('first try county...', dt.now().isoformat())
    county_choice = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, f"//li[contains(text(), '{county}')]")))
    county_choice.click()

    if print_debug: print('clicking generate...', dt.now().isoformat())
    ### run report: 
    generate_report.click()

    if print_debug: print('waiting for iframe...', dt.now().isoformat())
    ### wait for the iframe to load:
    iframe_element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.ID, 'frmReport')))
    export_iframe_report_to_excel(iframe_element)

    end_time = dt.now().isoformat()
    if print_debug: print('exported and done', dt.now().isoformat())
    county_run['EndTime'] = end_time
    county_run['Status'] = 'Success'
    with open(log_file, 'a') as f:
        f.write(f',\n{json.dumps(county_run, indent=2)}')
    print('done.', end='')
print(dt.now().isoformat())

2025-10-18T11:02:39.084814


In [None]:
### begin html session 
run_log = []
driver = webdriver.Chrome(options=chrome_options)
driver.implicitly_wait(5)
print(dt.now())

2025-10-18 11:02:51.049487


In [9]:
#### MAIN DRIVER - 
# enables a bunch of print statements in the process, turn on if troubleshooting a single run - don't print this much in a big loop
print_debug = False 

start_time = dt.now().isoformat()
print(start_time)
log_file = 'RunLog_Annual.json'
fails = []
for n, parameter in enumerate(parameters):
    print(f'\n{n+1}/{len(parameters)} : {parameter}', dt.now().isoformat())
    
    try:
        get_report_data(parameter)
    except Exception as e:
        print(n, e)
        print('trying again...')
        try:
            get_report_data(parameter)
        except Exception as e:
            print(e)
            fails.append((parameter))
print(len(fails), fails)
with open('LastRunFailures.json', 'w') as f:
    f.write(json.dumps(fails, indent=2))
    print('failures logged')


2025-10-18T11:03:07.112152

1/737 : {'county': '001 - Adams County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
2/737 : {'county': '002 - Allegheny County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
3/737 : {'county': '003 - Armstrong County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
4/737 : {'county': '004 - Beaver County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
5/737 : {'county': '005 - Bedford County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
6/737 : {'county': '006 - Berks County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
7/737 : {'county': '007 - Blair County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
8/737 : {'county': '008 - Bradford County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
9/737 : {'county': '009 - Bucks County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
10/737 : {'county': '010 - Butler County', 'start': 'Jan - 2025', 'end': 'Jan - 2025'}
done.
11/737 : {'county': '011 - Cambria County', 's