## Overview of this Project

Okay, so we're going to write a script that takes a string of coordinate data and removes special characters (i.e. those that aren't '.', '-', or 0-9

There is also an instance where two coordinate points were put in the same cell to indicate uncertainty about the location... this will need to be addressed as well; if we just delete the special characters separating the two we will have nonsense

There are also coordinates that are NOT in decimal format which need to be converted



### Code Segment One


In this block of code, we are trying to read in a column from our spreadsheet and turn it into a list 

The below would work for an excel document, however, since we are working in a linux environment we only have access to ods files. I am keeping the excel compatible code for reference

In [1]:
#import pandas as pd

#latitude = pd.read_excel('EarlyModernShapeFile.ods', sheet_name=0)
#latlist = latitude['Latitude'].tolist()

Take two! This time we saved the spreadsheet in csv format so it can be read :) 

In [2]:
import pandas as pd

df = pd.read_csv('EarlyModernPreClean.csv')

Just typing in the name df (stands for data frame) will print out all points... Typing the name with the function head will show the first five elements

This allows us to verify our spreadsheet data was ported over properly

In [3]:
df.head()

Unnamed: 0,Site,Latitude,Longitude
0,Boccalama B,45.388939,12.280872
1,Boccalama A,45.388639,12.280894
2,Contarina 1,45.024497,12.217022
3,Culip VI,42.321567,3.310431
4,Les Sorres X,41.277389,1.993044


This takes our rows for latitude and longitude and turns them into a list of strings. Now we can start accessing the characters

In [4]:
latList = df['Latitude'].tolist()

longList = df['Longitude'].tolist()

### Code Segment Two

Now, we need to look at the list of strings and remove the special characters... I think in the event there is more than one potential coordinate listed, we can't assume the characters separating the two will be the same for every instance so it will be best to remove them and then check if there are any numbers separated by spaces at the end

OKAY, there are some instances where the coordinates are NOT in decimal format so we've got to figure out how to work with that -_-

#### Changing to Decimal Format

Alright, we're going to need to be able to isolate the coordinate in the incorrect data format, hold it's position in the spreadsheet, alter it, then place it back in position

In [5]:
latToConvert = []
longToConvert = []

latCounter = []
longCounter = []

counter = 0
for latitude in latList:
    if "N" in latitude or "S" in latitude:
        latToConvert.append(latitude)
        latCounter.append(counter)
    counter += 1
    
counter = 0
for longitude in longList:
    if "E" in longitude or "W" in longitude:
        longToConvert.append(longitude)
        longCounter.append(counter)
    counter += 1

In [6]:
print(latToConvert)
print(latCounter)

print(longToConvert)
print(longCounter)

[' 38°31\'23.84"N', ' 21° 8\'47.06"N']
[176, 379]
[' 28°38\'6.89"W', ' 75°49\'23.25"W']
[176, 379]


Alright! Now we've got our list of coordinates in the wrong data format... now all we need to do is convert

In this experiment list, some of the characters separating the degrees/minutes/seconds are incorrect.. so I think it'll be best to replace special characters with spaces, change all mutlispaces to one space, and do the calculations based on that :)

In [7]:
import re

convertedLat = []
convertedLong = []

def converter(coordinate):
    temp = []
    data = re.sub(r"[^0-9.-]+", ' ', coordinate)
    for c in data.split():
        try:
            temp.append(float(c))
        except ValueError:
            pass
    input = temp[0] + (temp[1]/60) + (temp[2]/3600)
    return input

for latitude in latToConvert:
    if "N" in latitude:
        input = converter(coordinate = latitude)
        convertedLat.append(input)
    if "S" in latitude:
        input = -1 * converter(coordinate = latitude)
        convertedLat.append(input)
print(convertedLat)

for longitude in longToConvert:
    if "E" in longitude:
        input = converter(coordinate = longitude)
        convertedLong.append(input)
    if "W" in longitude:
        input = -1 * converter(coordinate = longitude)
        convertedLong.append(input)
print(convertedLong)

[38.523288888888885, 21.146405555555557]
[-28.635247222222223, -75.82312499999999]


Now that we habve the converted coordinate points it is time to use them to replace the main list...

In [8]:
counter = 0
for position in latCounter:
    latList[position] = str(convertedLat[counter])
    counter += 1

counter = 0
for position in longCounter:
    longList[position] = str(convertedLong[counter])
    counter += 1


#### Deleting extra characters

In [9]:
cleanLatList = []
cleanLongList = []

for latitude in latList:
    cleanLatList.append(re.sub(r"[^0-9.-]+", ' ', latitude))
    
for longitude in longList:
    cleanLongList.append(re.sub(r"[^0-9.-]+", ' ', longitude))


The print statements below verify our converted coordinates are in the master list

In [10]:
print(cleanLongList[176])
print(cleanLongList[379])

print(cleanLatList[176])
print(cleanLatList[379])

-28.635247222222223
-75.82312499999999
38.523288888888885
21.146405555555557


Okey-dokey, I was hoping we could just look at the data point with 

In [11]:
for point in cleanLongList:
    if " " in point.strip():
        print(point)

18.416667 18.373354


### Code Segment Three

Alright, here we are going to make a brand new csv file with all our data points ready to be uploaded into a map :)