## Introduction

The entire Tracking the Sun dataset contains over 1 million points of data for solar cell systems in the United States. In order to set up the dataset for plotting, the locations of the solar cell systems need to be geocoded. Using the geocoding service Nominatim in combination with geopy, the coordinates of each solar cell system will be found.

In [66]:
import pandas as pd
import numpy as np
import csv
import ast

#geocoding packages and functions
import geopy
from geopy.geocoders import Nominatim
locator = Nominatim(user_agent= 'starczyn@uw.edu')
#nominatim limits geocoding extracts to one per second
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

## Geocoding

In [235]:
#reading in the dataset
data = pd.read_csv('/home/starczyn/Solar Project/data/TTS_data.csv')
data.replace({'Campe Verde': 'Camp Verde'})
#adding a column to make the location more compatible with geopy
data['city_state_country'] = data['hostCustomerCity'] + ', ' + data['state'] + ', USA'

  data = pd.read_csv('/home/starczyn/Solar Project/data/TTS_data.csv')


Geocoding the entire dataset line by line would take over 7 days for 1 million datapoints. So, instead of doing this, only the unique locations are geocoded so that computing time/power isn't wasted on duplicate locations.

In [234]:
#create a dataframe of unique locations
unique_data = data['city_state_country'].unique()
unique_cities = pd.DataFrame(data = unique_data, columns = ['city_state_country'])
unqiue_cities_csv = pd.DataFrame.to_csv(unique_cities, '/home/starczyn/Solar-PV/visualizations/data/unique_cities.csv' )
unique_cities

A csv is created to store the geocoded values as they are coded and so that data will not be lost in case of interruptions.

In [226]:
geocode_unique_csv = '/home/starczyn/Solar-PV/visualizations/data/TTS_data_geocoded.csv'

f = open("/home/starczyn/Solar-PV/visualizations/data/TTS_data_geocoded.csv", "w")
writer = csv.DictWriter(f, fieldnames= ['geopy location', 'coordinates'])
writer.writeheader()
f.close()

This function saves the geocoding to the csv file as each city is geocoded.

In [227]:
def geocode_save(row, file = geocode_unique_csv): #row is pandas series: index in dataframe and value of column at the row

    current_csv = pd.read_csv(file, index_col = 0)
    
    index = row.name
    loc = row['city_state_country']
    
    #length of csv is number of rows that are done
    csv_length = len(current_csv) 
     
    #if the row has already been geocoded the row is returned back     
    if index <= csv_length - 1:
        
        with open(file, 'a') as geopy_csv:
            append = csv.writer(geopy_csv)
            geopy_csv.close()

        return current_csv.iloc[index].values[0]

    #if the row has not been geocoded, the row is geocoded and saved to the csv
    else:

        location = geocode(loc)

        with open(file, 'a') as geopy_csv:
            append = csv.writer(geopy_csv)
            append.writerow(location)
            geopy_csv.close()

        return "({},{})".format(location.longitude,location.latitude)

In [229]:
for idx, row in unique_cities.iterrows():
    location = geocode_save(row)
    print(idx)
    

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74


Error: iterable expected, not NoneType

In [232]:
unique_cities.loc[75]

city_state_country    Campe Verde, AZ, USA
Name: 75, dtype: object

In [168]:
geopy_df = pd.read_csv('/home/starczyn/Solar-PV/visualizations/data/TTS_data_geocoded.csv')

In [169]:
frames = [geocoded_cities, geopy_df]
unique_geo_df = pd.concat(frames, axis=1, join="inner")

In [170]:
unique_geo_df

Unnamed: 0,city_state_country,geopy location,coordinates
0,"Goodyear, AZ, USA","Goodyear, Maricopa County, Arizona, 85395, Uni...","(33.4353672, -112.3576)"
1,"Buckeye, AZ, USA","Buckeye, Maricopa County, Arizona, United States","(33.3703197, -112.583776)"
2,"Scottsdale, AZ, USA","Scottsdale, Maricopa County, Arizona, United S...","(33.4942189, -111.9260184)"
3,"Hereford, AZ, USA","Hereford, Cochise County, Arizona, United States","(31.4384325, -110.0978554)"
4,"Dewey, AZ, USA","Dewey, Dewey-Humboldt, Yavapai County, Arizona...","(34.5275895, -112.2420702)"
5,"Casa Grande, AZ, USA","Casa Grande, Pinal County, Arizona, 86122, Uni...","(32.8795022, -111.757352)"
6,"Phoenix, AZ, USA","Phoenix, Maricopa County, Arizona, 85004-1905,...","(33.4484367, -112.0741417)"
7,"Camp Verde, AZ, USA","Camp Verde, Yavapai County, Arizona, United St...","(34.5636358, -111.854317)"
8,"Prescott, AZ, USA","Prescott, Yavapai County, Arizona, United States","(34.5399962, -112.4687616)"
9,"Bisbee, AZ, USA","Bisbee, Cochise County, Arizona, 85603, United...","(31.4481547, -109.928408)"


In [171]:
unique_geo_df.keys()

Index(['city_state_country', 'geopy location', 'coordinates'], dtype='object')

In [172]:
for idx, row in unique_geo_df.iterrows():
    point = ast.literal_eval(unique_geo_df['coordinates'].iloc[idx])
    latitude, longitude = point
    unique_geo_df.loc[idx,'latitude'] = latitude
    unique_geo_df.loc[idx,'longitude'] = longitude
    
    

0
(33.4353672, -112.3576)
1
(33.3703197, -112.583776)
2
(33.4942189, -111.9260184)
3
(31.4384325, -110.0978554)
4
(34.5275895, -112.2420702)
5
(32.8795022, -111.757352)
6
(33.4484367, -112.0741417)
7
(34.5636358, -111.854317)
8
(34.5399962, -112.4687616)
9
(31.4481547, -109.928408)
10
(34.8679599, -111.7617165)
11
(33.8333333, -111.9508333)
12
(33.8562535, -112.61451849802984)
13
(33.4933796, -112.3581244)
14
(33.6290111, -112.2819337)
15
(34.2406479, -111.3230261)
16
(34.7395165, -111.9058275625)
17
(34.739489, -112.009793)
18
(31.714229, -110.0652028)
19
(33.6657535, -112.35505216920794)
20
(34.78357, -112.43035972408359)
21
(34.6450394, -111.7848411)
22
(33.4255056, -111.9400091)
23
(34.106747, -110.94024227101073)


In [173]:
unique_geo_df.keys()

Index(['city_state_country', 'geopy location', 'coordinates', 'latitude',
       'longitude'],
      dtype='object')

In [215]:
data[0:38]

Unnamed: 0.1,Unnamed: 0,dataProvider1,dataProvider2,program1ProjectID,PTODate_orProxy_,systemSizeInDCSTC_KW_,totalInstalledCost___,Up_FrontCashIncentive___,customerSegment,is_expansion,...,inv_model1_clean,inverterQuantity_1,additionalInverterModels_Y_N_,inv_microinv1,inv_battery_hybrid1,inv_builtin_meter1,inv_outputcapacity1,dc_optimizer,ILR,city_state_country
0,2,Arizona Public Service,-1,3,24-Jan-00,12.025,-1.0,-1.0,RES,0,...,IQ7-60-2-US [240V],-1.0,-1,1,0,0,0.24,0,-1.0,"Goodyear, AZ, USA"
1,4,Arizona Public Service,-1,5,6-Mar-00,8.64,-1.0,-1.0,RES,0,...,SE7600H-US [240V],-1.0,-1,0,0,1,7.616,1,-1.0,"Buckeye, AZ, USA"
2,7,Arizona Public Service,-1,8,7-May-02,2.4,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Scottsdale, AZ, USA"
3,9,Arizona Public Service,-1,10,17-Dec-02,2.16,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Hereford, AZ, USA"
4,10,Arizona Public Service,-1,11,19-Dec-02,2.52,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Dewey, AZ, USA"
5,11,Arizona Public Service,-1,12,10-Mar-03,2.22,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Casa Grande, AZ, USA"
6,12,Arizona Public Service,-1,13,23-Jun-03,2.839,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Phoenix, AZ, USA"
7,13,Arizona Public Service,-1,14,23-Jun-03,2.004,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Phoenix, AZ, USA"
8,14,Arizona Public Service,-1,15,9-Jul-03,36.6,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Camp Verde, AZ, USA"
9,17,Arizona Public Service,-1,18,30-Jul-03,2.0,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.16,0,-1.0,"Phoenix, AZ, USA"


In [213]:
data

Unnamed: 0.1,Unnamed: 0,dataProvider1,dataProvider2,program1ProjectID,PTODate_orProxy_,systemSizeInDCSTC_KW_,totalInstalledCost___,Up_FrontCashIncentive___,customerSegment,is_expansion,...,inv_microinv1,inv_battery_hybrid1,inv_builtin_meter1,inv_outputcapacity1,dc_optimizer,ILR,city_state_country,geopy_location,latitude,longitude
0,2,Arizona Public Service,-1,3,24-Jan-00,12.025,-1.00,-1.0,RES,0,...,1,0,0,0.240,0,-1.0,"Goodyear, AZ, USA",,,
1,4,Arizona Public Service,-1,5,6-Mar-00,8.640,-1.00,-1.0,RES,0,...,0,0,1,7.616,1,-1.0,"Buckeye, AZ, USA",,,
2,7,Arizona Public Service,-1,8,7-May-02,2.400,-1.00,-1.0,RES,0,...,0,-1,1,15.160,0,-1.0,"Scottsdale, AZ, USA",,,
3,9,Arizona Public Service,-1,10,17-Dec-02,2.160,-1.00,-1.0,RES,0,...,0,-1,1,15.160,0,-1.0,"Hereford, AZ, USA",,,
4,10,Arizona Public Service,-1,11,19-Dec-02,2.520,-1.00,-1.0,RES,0,...,0,-1,1,15.160,0,-1.0,"Dewey, AZ, USA",,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
617575,1007460,Wisconsin Focus on Energy,-1,1147065,15-Dec-20,16.060,38320.00,2507.5,COM,0,...,-1,-1,-1,-1.000,-1,-1.0,"Chilton, WI, USA",,,
617576,1007461,Wisconsin Focus on Energy,-1,1147212,16-Dec-20,10.370,43039.36,500.0,RES,0,...,-1,-1,-1,-1.000,-1,-1.0,"Pewaukee, WI, USA",,,
617577,1007462,Wisconsin Focus on Energy,-1,1142267,16-Dec-20,13.860,52360.00,1000.0,RES,1,...,-1,-1,-1,-1.000,-1,-1.0,"Oregon, WI, USA",,,
617578,1007466,Wisconsin Focus on Energy,-1,1143039,17-Dec-20,6.820,25583.00,500.0,RES,0,...,-1,-1,-1,-1.000,-1,-1.0,"Marshall, WI, USA",,,


In [216]:
data_samp = data[0:200]
data_samp

Unnamed: 0.1,Unnamed: 0,dataProvider1,dataProvider2,program1ProjectID,PTODate_orProxy_,systemSizeInDCSTC_KW_,totalInstalledCost___,Up_FrontCashIncentive___,customerSegment,is_expansion,...,inv_model1_clean,inverterQuantity_1,additionalInverterModels_Y_N_,inv_microinv1,inv_battery_hybrid1,inv_builtin_meter1,inv_outputcapacity1,dc_optimizer,ILR,city_state_country
0,2,Arizona Public Service,-1,3,24-Jan-00,12.025,-1.0,-1.0,RES,0,...,IQ7-60-2-US [240V],-1.0,-1,1,0,0,0.240,0,-1.0,"Goodyear, AZ, USA"
1,4,Arizona Public Service,-1,5,6-Mar-00,8.640,-1.0,-1.0,RES,0,...,SE7600H-US [240V],-1.0,-1,0,0,1,7.616,1,-1.0,"Buckeye, AZ, USA"
2,7,Arizona Public Service,-1,8,7-May-02,2.400,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Scottsdale, AZ, USA"
3,9,Arizona Public Service,-1,10,17-Dec-02,2.160,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Hereford, AZ, USA"
4,10,Arizona Public Service,-1,11,19-Dec-02,2.520,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Dewey, AZ, USA"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,220,Arizona Public Service,-1,221,30-Jun-06,1.400,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Phoenix, AZ, USA"
196,221,Arizona Public Service,-1,222,30-Jun-06,6.400,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Waddell, AZ, USA"
197,222,Arizona Public Service,-1,223,5-Jul-06,3.060,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Wittmann, AZ, USA"
198,223,Arizona Public Service,-1,224,5-Jul-06,2.000,-1.0,-1.0,RES,0,...,Power Station PS247-15-180 [120V],-1.0,-1,0,-1,1,15.160,0,-1.0,"Phoenix, AZ, USA"


In [180]:
unique_geo_df['city_state_country'].iloc[0]

'Goodyear, AZ, USA'

In [222]:
for idx1, row in data_samp.iterrows():
    try:
        city = row['city_state_country']
        print(city)
        idx2 = unique_geo_df['city_state_country'].str.contains(city, na = False)
        latitude = unique_geo_df.loc[idx2,'latitude'].values[0]
        print(latitude)
        longitude = unique_geo_df.loc[idx2,'longitude'].values[0]
        print(longitude)
        data_samp.loc[idx1, 'latitude'] = latitude
        data_samp.loc[idx1, 'longitude'] = longitude

    except: 
        data_samp.loc[idx1, 'latitude'] = np.nan
        data_samp.loc[idx1, 'longitude'] = np.nan
    
    
    


Goodyear, AZ, USA
33.4353672
-112.3576
Buckeye, AZ, USA
33.3703197
-112.583776
Scottsdale, AZ, USA
33.4942189
-111.9260184
Hereford, AZ, USA
31.4384325
-110.0978554
Dewey, AZ, USA
34.5275895
-112.2420702
Casa Grande, AZ, USA
32.8795022
-111.757352
Phoenix, AZ, USA
33.4484367
-112.0741417
Phoenix, AZ, USA
33.4484367
-112.0741417
Camp Verde, AZ, USA
34.5636358
-111.854317
Phoenix, AZ, USA
33.4484367
-112.0741417
Prescott, AZ, USA
34.5399962
-112.4687616
Phoenix, AZ, USA
33.4484367
-112.0741417
Phoenix, AZ, USA
33.4484367
-112.0741417
Bisbee, AZ, USA
31.4481547
-109.928408
Phoenix, AZ, USA
33.4484367
-112.0741417
Scottsdale, AZ, USA
33.4942189
-111.9260184
Sedona, AZ, USA
34.8679599
-111.7617165
Cave Creek, AZ, USA
33.8333333
-111.9508333
Scottsdale, AZ, USA
33.4942189
-111.9260184
Phoenix, AZ, USA
33.4484367
-112.0741417
Morristown, AZ, USA
33.8562535
-112.61451849802984
Camp Verde, AZ, USA
34.5636358
-111.854317
Cave Creek, AZ, USA
33.8333333
-111.9508333
Litchfield Park, AZ, USA
33.493

In [223]:
data_samp[0:200]

Unnamed: 0.1,Unnamed: 0,dataProvider1,dataProvider2,program1ProjectID,PTODate_orProxy_,systemSizeInDCSTC_KW_,totalInstalledCost___,Up_FrontCashIncentive___,customerSegment,is_expansion,...,additionalInverterModels_Y_N_,inv_microinv1,inv_battery_hybrid1,inv_builtin_meter1,inv_outputcapacity1,dc_optimizer,ILR,city_state_country,longitude,latitude
0,2,Arizona Public Service,-1,3,24-Jan-00,12.025,-1.0,-1.0,RES,0,...,-1,1,0,0,0.240,0,-1.0,"Goodyear, AZ, USA",-112.357600,33.435367
1,4,Arizona Public Service,-1,5,6-Mar-00,8.640,-1.0,-1.0,RES,0,...,-1,0,0,1,7.616,1,-1.0,"Buckeye, AZ, USA",-112.583776,33.370320
2,7,Arizona Public Service,-1,8,7-May-02,2.400,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Scottsdale, AZ, USA",-111.926018,33.494219
3,9,Arizona Public Service,-1,10,17-Dec-02,2.160,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Hereford, AZ, USA",-110.097855,31.438433
4,10,Arizona Public Service,-1,11,19-Dec-02,2.520,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Dewey, AZ, USA",-112.242070,34.527589
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,220,Arizona Public Service,-1,221,30-Jun-06,1.400,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Phoenix, AZ, USA",-112.074142,33.448437
196,221,Arizona Public Service,-1,222,30-Jun-06,6.400,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Waddell, AZ, USA",,
197,222,Arizona Public Service,-1,223,5-Jul-06,3.060,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Wittmann, AZ, USA",,
198,223,Arizona Public Service,-1,224,5-Jul-06,2.000,-1.0,-1.0,RES,0,...,-1,0,-1,1,15.160,0,-1.0,"Phoenix, AZ, USA",-112.074142,33.448437
