## Introduction

The entire Tracking the Sun dataset contains over 1 million points of data for solar cell systems in the United States. In order to set up the dataset for plotting, the locations of the solar cell systems need to be geocoded. Using the geocoding service Nominatim in combination with geopy, the coordinates of each solar cell system will be found.

In [1]:
import pandas as pd
import numpy as np
import csv

#geocoding packages and functions
import geopy
from geopy.geocoders import Nominatim
locator = Nominatim(user_agent= 'starczyn@uw.edu')
#nominatim limits geocoding extracts to one per second
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

## Geocoding

In [2]:
#reading in the dataset
data = pd.read_csv('/home/starczyn/Solar-PV/data/TTS_data.csv')

#adding a column to make the location more compatible with geopy
data['city_state_country'] = data['hostCustomerCity'] + ', ' + data['state'] + ', USA'

  data = pd.read_csv('/home/starczyn/Solar-PV/data/TTS_data.csv')


Geocoding the entire dataset line by line would take over 7 days for 1 million datapoints. So, instead of doing this, only the unique locations are geocoded so that computing time/power isn't wasted on duplicate locations.

In [3]:
#create a dataframe of unique locations
unique_data = data['city_state_country'].unique()
geocoded_cities = pd.DataFrame(data = unique_data, columns = ['city_state_country'])
geocoded_cities

Unnamed: 0,city_state_country
0,"Goodyear, AZ, USA"
1,"Buckeye, AZ, USA"
2,"Scottsdale, AZ, USA"
3,"Hereford, AZ, USA"
4,"Dewey, AZ, USA"
...,...
10752,"Little Chute, WI, USA"
10753,"Walworth, WI, USA"
10754,"Ontario, WI, USA"
10755,"Genoa, WI, USA"


In [4]:
geocoded_cities.loc[647]

city_state_country    POMFRET CENTER, CT, USA
Name: 647, dtype: object

A csv is created to store the geocoded values as they are coded and so that data will not be lost in case of interruptions.

In [24]:
geocode_unique_csv = '/home/starczyn/Solar-PV/data/TTS_data_geocoded_sample.csv'

f = open("/home/starczyn/Solar-PV/data/TTS_data_geocoded_sample.csv", "w")
writer = csv.DictWriter(f, fieldnames= ['location', 'geopy location', 'coordinates'])
writer.writeheader()
f.close()

This function saves the geocoding to the csv file as each city is geocoded.

In [25]:
def geocode_save(row, file = '/home/starczyn/Solar-PV/data/TTS_data_geocoded_sample.csv'): #row is pandas series: index in dataframe and value of column at the row

    current_csv = pd.read_csv(file, index_col = 0)
    
    index = row.name
    loc = row['city_state_country']
    
    #length of csv is number of rows that are done
    csv_length = len(current_csv) 
     
    #if the row has already been geocoded the row is returned back     
    if index <= csv_length - 1:
        
        with open(file, 'a') as geopy_csv:
            append = csv.writer(geopy_csv)
            geopy_csv.close()

        return current_csv.iloc[index].values[0]

    #if the row has not been geocoded, the row is geocoded and saved to the csv
    else:

        location = geocode(loc)

        with open(file, 'a') as geopy_csv:
            append = csv.writer(geopy_csv)
            append.writerow(row)
            append.writerow(location)
            geopy_csv.close()

        return row, location, "({},{})".format(location.longitude,location.latitude)

In [27]:
for idx, row in geocoded_cities.iterrows():
    
    location = geocode_save(row)
    #geocoded_cities.loc[idx,'location'] = location
    #point = ast.literal_eval(location)
    #latitude, longitude = point
    
        #geocoded_cities.loc[idx,'latitude'] = latitude
        #geocoded_cities.loc[idx,'longitude'] = longitude
    print(idx)
    

0
1
2
3
4
5
6
7
8


KeyboardInterrupt: 

In [21]:
pd.read_csv('/home/starczyn/Solar-PV/data/TTS_data_geocoded_sample.csv')

Unnamed: 0,location,geopy location,coordinates
0,"Goodyear, AZ, USA",,
1,"Goodyear, Maricopa County, Arizona, 85395, Uni...","(33.4353672, -112.3576)",
2,"Scottsdale, AZ, USA",,
3,"Scottsdale, Maricopa County, Arizona, United S...","(33.4942189, -111.9260184)",
4,"Dewey, AZ, USA",,
5,"Dewey, Prescott Valley, Yavapai County, Arizon...","(34.5300253, -112.2412739)",
6,"Phoenix, AZ, USA",,
7,"Phoenix, Maricopa County, Arizona, 85004-1905,...","(33.4484367, -112.0741417)",
8,"Prescott, AZ, USA",,
9,"Prescott, Yavapai County, Arizona, United States","(34.5399962, -112.4687616)",
