## Prepare Field Mapping


```
Dictionary template for ArcGIS Multiline address format. Your input CSV column headers should conform to the following:
    arcgis_address_format = {
        "Address": "",
        "Neighborhood": "",
        "City": "",
        "Subregion": "",  # Typically county or equivalent
        "Region": "",  # Typically state or equivalent
        "Postal": "",
        "CountryCode": ""
    }
    ````

## Imports

In [None]:
# Import necessary libraries

import csv
import requests
import json
import urllib.parse
import time

## Input Parameters

Use the following parameters to control the geocoding job. 

In [None]:

csv_file_path = '/Users/maples/GitHub/Locator-Scripts/Data/oneMillionAddresses.csv' # THe path to your input csv. 

output_csv_path = '/Users/maples/GitHub/Locator-Scripts/Data/geocoded_records01.csv' # Rename your output file here, with the full path to the file

arcgis_service_url = 'https://locator.stanford.edu/arcgis/rest/services/geocode/NorthAmerica/GeocodeServer/geocodeAddresses'

jobSize = 'all' # this parameter controls the first N number of records that will be processed when the tool is run. FOr all records, replace the integer value with 'all' (numeric input must be an integer, without quotes)

chunkSize = 20 # This parameter controls how many address records are submitted for processing, at a time. It is set to 20, which is fairly optimal, but you can experiment with other values. Much more than 20 will result in URI too long errors, when submitting REST Get Requests

outFields = '*' # '*' results in all output fields being included in the output csv. 'none' results in minimal returned output fields (lat & long).

printJob='no' # This parameter indicates whether you want the Get requests written to the console for debugging

## Prepare and submit GET Requests from CSV

The `geocode_addresses` function in the provided Python code is designed to process a CSV file and geocode addresses using the ArcGIS Server GeocodeAddresses service. 

The function takes six parameters: 
- `csv_file_path` is the path to the input CSV file that contains the addresses to be geocoded.
- `arcgis_service_url` is the URL to the ArcGIS GeocodeAddresses service.
- `jobSize` is the total number of addresses to process. It can be 'all' for all addresses or an integer for a specific number of addresses.
- `chunkSize` is the number of addresses to include in each API request.
- `outFields` is a comma-separated list of fields to include in the output.
- `printJob` is a string that, if set to 'yes', will print each GET request URL to the console.

The function starts by reading the CSV file and storing the addresses in a list. It then calculates the total number of records and prepares for batch processing by dividing the addresses into chunks of size `chunkSize`.

For each batch, the function constructs the 'addresses' parameter for the API request and encodes the parameters into a URL. If `printJob` is 'yes', it prints the request URL. It then sends a GET request to the ArcGIS service.

If the response status code is 200, the function processes each location in the response. It ensures that 'attributes' is a dictionary and appends it to the `processed_records` list. If the 'attributes' is not a dictionary, it prints a warning message. If the status code is not 200, it prints an error message.

The function also provides progress reporting. It calculates the number of processed and remaining records, the elapsed time, the estimated total time, and the estimated remaining time, and prints these details.

Finally, the function determines the fieldnames from the processed records and writes the results to a new CSV file. The fieldnames are determined by iterating over the processed records and updating a set with the keys of each record. The results are written to the CSV file using a `csv.DictWriter`.

In [None]:


def geocode_addresses(csv_file_path, arcgis_service_url, jobSize, chunkSize, outFields, printJob):
    """
    Processes a CSV file to geocode addresses using the ArcGIS Server GeocodeAddresses service.

    Parameters:
    csv_file_path (str): Path to the input CSV file.
    arcgis_service_url (str): URL to the ArcGIS GeocodeAddresses service.
    jobSize (str|int): The total number of addresses to process ('all' for all addresses or an integer).
    chunkSize (int): Number of addresses to include in each API request.
    outFields (str): Comma-separated list of fields to include in the output.
    printJob (str): If 'yes', print each GET request URL to the console.
    """
    # Read the CSV file
    with open(csv_file_path, mode='r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        addresses = list(reader)[:jobSize if isinstance(jobSize, int) else None]

    total_records = len(addresses)
    processed_records = []

    start_time = time.time()

    # Prepare batches of addresses for chunked processing
    batches = [addresses[i:i + chunkSize] for i in range(0, len(addresses), chunkSize)]

    for batch_index, batch in enumerate(batches):
        # Construct the 'addresses' parameter for the API request
        records = {
            "records": [
                {
                    "attributes": {
                        "OBJECTID": idx + batch_index * chunkSize,
                        **{key: record[key] for key in record}
                    }
                } for idx, record in enumerate(batch)
            ]
        }
        params = {
            'addresses': json.dumps(records),
            'outFields': outFields,
            'f': 'pjson'
        }
        encoded_params = urllib.parse.urlencode(params, quote_via=urllib.parse.quote)

        # Construct the full URL for the GET request
        request_url = f"{arcgis_service_url}?{encoded_params}"

        if printJob.lower() == 'yes':
            print(f"Request URL: {request_url}")

                # Send the GET request
        response = requests.get(request_url)
        if response.status_code == 200:
            # Process each location in the response
            for location in response.json().get('locations', []):
                # Ensure that 'attributes' is a dictionary
                if isinstance(location.get('attributes'), dict):
                    processed_records.append(location['attributes'])
                else:
                    print(f"Warning: Unexpected data format in response: {location}")
        else:
            print(f"Error processing batch {batch_index}: {response.text}")


        # Progress reporting
        processed = batch_index * chunkSize + len(batch)
        remaining = total_records - processed
        elapsed_time = time.time() - start_time
        estimated_total_time = elapsed_time / processed * total_records
        estimated_remaining_time = estimated_total_time - elapsed_time
        print(f"Processed {processed}/{total_records} records. Remaining: {remaining}. Estimated time to finish: {time.strftime('%H:%M:%S', time.gmtime(estimated_remaining_time))}")

    # Determine fieldnames from the processed records
    fieldnames = set()
    for record in processed_records:
        fieldnames.update(record.keys())

    # Output the results to a new CSV file

    with open(output_csv_path, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        for record in processed_records:
            writer.writerow(record)

# Example usage
geocode_addresses(csv_file_path, arcgis_service_url, jobSize, chunkSize, outFields, printJob)
