# Visually Mapping 1911 Charlotte

### Contents:
1. [Project Overview](https://https://cases.umd.edu/user/cfmorris/doc/tree/cfmorris/georeferencing/1-Overview.ipynb)
2. [Georeferencing the Data](https://cases.umd.edu/user/cfmorris/doc/tree/cfmorris/georeferencing/2-Georeferencing.ipynb)
3. [Visualizing and Analyzing the Data](https://cases.umd.edu/user/cfmorris/doc/tree/cfmorris/georeferencing/3-VizualizingandAnalyzingtheData.ipynb)


## 2. Georeferencing the Data

First, we install a few packages.  
   1. pandas - a common data manipulation package
   2. geopy and geopandas - tools that enable us to sned georeferencing requests.
   3. folium - a python map generator.

In [5]:
!pip install geopandas
!pip install geopy
!pip install folium

import geopandas as gpd
import geopy 
import pandas as pd



After installing and importing the necessary packages, we declare a variable "locator" to be called when making georeferencing requests.  This variable uses geopy to make requests from OpenStreetMaps through Nominatum, their georeferencing service.  Nominatum requires that users declare a **user_agent** when making requests from their service.  This is just a way to associate all requests associated with a given project and does not require registration.  We've used "cityDirectory".  In addition we've added a timeout parameter to ensure that our program gives Nominatum enough time to respond before moving on to the next request.  This can add a significant amount of time to the georeferencing process but it prevents many errors that may be difficult to troubleshoot in the future.

Next we try sending a request to nominatum using a known address to ensure everything is working before moving on.  This is done through the eiffel variable which georeferences the address associated with the Eiffel Tower.  Calling print(eiffel) prints the associated address and printing eiffel.latitude or eiffel.longitude print coordinate data that is embedded in the eiffel variable. 

In [6]:
locator = geopy.geocoders.Nominatim(user_agent="cityDirectory", timeout=2)
eiffel = locator.geocode("Champ de Mars, Paris, France", language="en")

print(eiffel)
print("Latitude = {}, Longitude = {}".format(eiffel.latitude, eiffel.longitude))

Field of Mars, Avenue Joseph Bouvard, Quartier du Gros-Caillou, 7th Arrondissement, Paris, Ile-de-France, Metropolitan France, 75007, France
Latitude = 48.85614465, Longitude = 2.297820393322227


Next we load the .csv file containing the addresses that need to be georeferenced into a dataframe using pandas and then use .tail(10) to display the last 10 entries.  This allows us to readily identify the column titles and to verify the total number of entries without leaving the workspace.

In [7]:
dfmin = pd.read_csv("Charlotte1911final_georeferencedv05.csv")
dfmin.tail(10)

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Orig-copy,Race,Name,Title,Married,Wid,Spouse,BizDir,Job,Company,Company Details,Housing,Address,loc,point,latitude,longitude,altitude
5632,15681,15681,"Younts Chas R, student, h 906 Pine",White,Younts Chas R,,,,,0.0,,,,h 906 Pine,906 Pine,"Albemarle Road, Manchester, Charlotte, Mecklen...","(35.2093507, -80.6746658, 0.0)",35.209351,-80.674666,0.0
5633,15682,15682,"Younts Saml, concrete wkr, h 906 Pine",White,Younts Saml,,,,,0.0,,,,h 906 Pine,906 Pine,"Albemarle Road, Manchester, Charlotte, Mecklen...","(35.2093507, -80.6746658, 0.0)",35.209351,-80.674666,0.0
5634,15683,15683,"Younts Wm E (Eunice), trav slsmn, h 906 Pine",White,Younts Wm E,,,,Eunice,0.0,trav slsmn,,,h 906 Pine,906 Pine,"Albemarle Road, Manchester, Charlotte, Mecklen...","(35.2093507, -80.6746658, 0.0)",35.209351,-80.674666,0.0
5635,15684,15684,"Yow Hampton (Addie), emp Char Cordage Co, h 9 ...",White,Yow Hampton,,,,Addie,0.0,emp,Char Cordage Co,,h 9 Short,9 Short,"Short-cut to UNCC from Colville/Vinca, Univers...","(35.3005741, -80.7276831, 0.0)",35.300574,-80.727683,0.0
5636,15691,15691,"Zeman Frank J (Katherine), foreman Jos Sykes B...",White,Zeman Frank J,,,,Katherine,0.0,foreman,Jos Sykes Bros,,h 911 Pine,911 Pine,"Albemarle Road, Manchester, Charlotte, Mecklen...","(35.2093507, -80.6746658, 0.0)",35.209351,-80.674666,0.0
5637,15692,15692,"Zeman Jos F (Margaret), mchst Jos Sykes Bros, ...",White,Zeman Jos F,,,,Margaret,0.0,mchst,Jos Sykes Bros,,h 200 e Morehead,200 e Morehead,"200, West Morehead Street, South End, Charlott...","(35.222881, -80.853108, 0.0)",35.222881,-80.853108,0.0
5638,15703,15703,"*Zigler Jno, farmer, h Fairview",Black,Zigler Jno,,,,,0.0,,,,h Fairview,Fairview,"Fairview Road, Carmel Park, Charlotte, Mecklen...","(35.1533835, -80.7957751, 0.0)",35.153383,-80.795775,0.0
5639,15706,15706,"Zimmerman Mamie M Miss, h 1802 s Boulevard",White,Zimmerman Mamie M,,Miss,,,0.0,,,,h 1802 s Boulevard,1802 s Boulevard,"South Boulevard, Sterling, Charlotte, Mecklenb...","(35.1015849, -80.8808641, 0.0)",35.101585,-80.880864,0.0
5640,15707,15707,"Zimmerman Purnell P (Rebecca), slsmn M’burg Ir...",White,Zimmerman Purnell P,,,,Rebecca,0.0,slsmn,M'burg Iron Wks,,h 1802 s Boulevard,1802 s Boulevard,"South Boulevard, Sterling, Charlotte, Mecklenb...","(35.1015849, -80.8808641, 0.0)",35.101585,-80.880864,0.0
5641,15708,15708,"*Zion Methodist Church, Biddleville, Rev David...",Black,Zion Methodist Church,,,,,0.0,,,,,,"Freedom Drive, Lewis, Westchester, Charlotte, ...","(35.2583432, -80.9126842, 0.0)",35.258343,-80.912684,0.0


Nominatum allows users to make no more than one request each second.  This requires that we use RateLimiter to declare a minimun delay on successful requests of 1 second.  

In [9]:
#geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

The last step before making request is ensuring the requests are specific to the region we're interested in.  We do this by using a lamda function to add **", Charlotte, NC"** to the end of each request we're making.  This ensures that requests like "314 w Hill" don't return results from the UK.

In [10]:
geocode = lambda query: locator.geocode("%s, Charlotte, NC" % query)

At this point we're ready to start running our batch of georeference requests, but because our .csv file contains 15,710 entries this would take a significant amount of time.  Instead of waiting that long we redefine dfmin to only include its first 30 entries.  For later exercises we'll load locally saved data.

We use **dfmin\["loc"\] = dfmin\["Address"\].apply(geocode)** to make our requests.  
1. **dfmin\["loc"\]** declares the heading of the column to be filled  
2. **.apply(geocode)** makes the requests to Nominatum
3. **dfmin\["Address"\]** declares the column which which will populate those requests.

Lastly, we print the results.

In [11]:
dfmin = dfmin.head(30)

dfmin["loc"] = dfmin["Address"].apply(geocode)
dfmin.head()




Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Orig-copy,Race,Name,Title,Married,Wid,Spouse,BizDir,Job,Company,Company Details,Housing,Address,loc,point,latitude,longitude,altitude
0,2,2,"*Aaron Amelia, domestic 506 s Tryon",Black,Aaron Amelia,,,,,0.0,domestic,,,506 s Tryon,506 s Tryon,"(South Tryon, Sedgefield, Charlotte, Mecklenbu...","(35.1643826, -80.9128, 0.0)",35.164383,-80.9128,0.0
1,3,3,"Abbey Simeon A (Mary A), supt constr Genl Fire...",White,Abbey Simeon A,,,,Mary A,0.0,supt constr,,,h 104 Central av,104 Central av,"(Central Avenue, Elizabeth, Charlotte, Mecklen...","(35.2045559, -80.7492833, 0.0)",35.204556,-80.749283,0.0
2,6,6,"Abbott Margaret Miss, h 1804 s Boulevard",White,Abbott Margaret,,Miss,,,0.0,,,,h 1804 s Boulevard,1804 s Boulevard,"(1804, South Boulevard, South End, Charlotte, ...","(35.1015849, -80.8808641, 0.0)",35.101585,-80.880864,0.0
3,7,7,"Abee Junius A, tel opr Sou Ry, bds 507 n Graham",White,Abee Junius A,,,,,0.0,tel opr,Sou Ry,,bds 507 n Graham,507 n Graham,,"(35.282425546132416, -80.77072058540818, 0.0)",35.282426,-80.770721,0.0
4,9,9,"*Abel Belle, h 14 Boundary al",Black,Abel Belle,,,,,0.0,,,,h 14 Boundary al,14 Boundary al,,"(35.2948349, -80.9605485, 0.0)",35.294835,-80.960549,0.0


Next we extract the embedded coordinate data into their own columns.  This must be done prior to saving as a .csv because this coordinate data is lost when the location data is converted into a string.

Only around a third of these entries will return coordinate data, but a null field will cause problems when trying to map the data later.  In order to acoid this we use **if loc else (0, 0, 0)** to insert and arbitrary location in null fields.

After this is complete export the dataset as a new .csv.

In [12]:

dfmin["point"] = dfmin["loc"].apply(lambda loc: tuple(loc.point) if loc else (0, 0, 0))
dfmin[['latitude', 'longitude', 'altitude']] = pd.DataFrame(dfmin['point'].tolist(), index=dfmin.index)
dfmin.to_csv("Charlotte1911final_example.csv")

In [13]:
dfmin.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Orig-copy,Race,Name,Title,Married,Wid,Spouse,BizDir,Job,Company,Company Details,Housing,Address,loc,point,latitude,longitude,altitude
0,2,2,"*Aaron Amelia, domestic 506 s Tryon",Black,Aaron Amelia,,,,,0.0,domestic,,,506 s Tryon,506 s Tryon,"(South Tryon, Sedgefield, Charlotte, Mecklenbu...","(35.19406535, -80.87804923742664, 0.0)",35.194065,-80.878049,0.0
1,3,3,"Abbey Simeon A (Mary A), supt constr Genl Fire...",White,Abbey Simeon A,,,,Mary A,0.0,supt constr,,,h 104 Central av,104 Central av,"(Central Avenue, Elizabeth, Charlotte, Mecklen...","(35.2217673, -80.8212754, 0.0)",35.221767,-80.821275,0.0
2,6,6,"Abbott Margaret Miss, h 1804 s Boulevard",White,Abbott Margaret,,Miss,,,0.0,,,,h 1804 s Boulevard,1804 s Boulevard,"(1804, South Boulevard, South End, Charlotte, ...","(35.21167430391686, -80.85791636515115, 0.0)",35.211674,-80.857916,0.0
3,7,7,"Abee Junius A, tel opr Sou Ry, bds 507 n Graham",White,Abee Junius A,,,,,0.0,tel opr,Sou Ry,,bds 507 n Graham,507 n Graham,,"(0, 0, 0)",0.0,0.0,0.0
4,9,9,"*Abel Belle, h 14 Boundary al",Black,Abel Belle,,,,,0.0,,,,h 14 Boundary al,14 Boundary al,,"(0, 0, 0)",0.0,0.0,0.0
