## Preparation of the Target Location Coordinates Column

The target location coordinates provides critical information into understanding where an air mission took place. The first job to do is to understand the type of coordinates reported, as there are a mixture possible, which are denoted in a separate column

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import datetime as dt
import time
import os

%matplotlib inline

# Loading the initial cleaned data
data = '/Users/thomas.farrandibm.com/Documents/GitHub/vietnam_air_records/clean_data/initial_clean_data.csv'
df = pd.read_csv(data, dtype='object')

# Removing whitespace from the columns
for column in df.columns:
    df[column] = df[column].str.strip()

In [7]:
# U denotes UTM, L denotes lat/long style coordinates
df['TYPE OF COORDINATES REPORTED'].value_counts()

U    56047
L    53045
Name: TYPE OF COORDINATES REPORTED, dtype: int64

Relatively equal numbers of UTM, and Lat/Long style coordinates. 

Going to convert the UTM values into Lat/Long figures of the same format as the other categories. This is to ensure consistency across entries.

In [19]:
utm_coords = df[df['TYPE OF COORDINATES REPORTED'] == 'U']
list(utm_coords['TARGET LOCATION COORDINATES (BEGIN)'].unique())

['XS199408',
 'BT437019',
 'AT970455',
 'AT920412',
 'ZA050765',
 'ZA050762',
 'ZA040770',
 'BQ630670',
 'XS274438',
 'BR678765',
 'ZC166665',
 'ZC205667',
 'BS457939',
 'BT370073',
 'BT365092',
 'BS610925',
 'BS400070',
 'BT421052',
 'BS4549O9',
 'ZC185660',
 'ZC178684',
 'BS440980',
 'AT968501',
 'BT341021',
 'BR832711',
 'BR853705',
 'WR972321',
 'XS535928',
 'YT049036',
 'XS505981',
 'XS195407',
 'WR132504',
 'WS475045',
 'XT277607',
 'BR900864',
 'BR853610',
 'X5513980',
 'WS47045',
 'WS430-040',
 'WS455-070',
 'XT282603',
 'WR972320',
 'WS480045',
 'XT890188',
 'XT880210',
 'XT890190',
 'ZA056751',
 'BS455908',
 'BS338820',
 'ZA035742',
 'ZA038778',
 'BS358825',
 'BS445912',
 'XR635710',
 'XS313450',
 'XS925243',
 'AT929450',
 'BQ545485',
 'YC4985',
 'YD452541',
 'BS8143-6621',
 'BS8220',
 'BR45374525',
 'BR43304335',
 'BR54366156',
 'BR57566256',
 'YS3490',
 'YT4422',
 'YU57086200',
 'YT84608452',
 'XS4930-4917',
 'XS4621-4729',
 'WR2009',
 'XR0633',
 'YT2090',
 'XT5838',
 'BR70

In [8]:
list(df.columns)

['Unnamed: 0',
 'MISSION DATE',
 'MISSION IDENTIFIER',
 'TARGET SEQUENCE IDENTIFIER',
 'SORTIE SEQUENCE ID',
 'LAUNCH UNIT IDENTIFIER',
 'MISSION PRIMARY FUNCTION',
 'SORTIE FUNCTION',
 'SECURITY CLASSIFICATION',
 'CORRECTION CODE IDENTIFIER',
 'MISSION ABBREVIATION NUMBER',
 'SERVICE SUPPORTED DESIGNATOR',
 'MISSION ORIGINALLY SCHEDULED',
 'MISSION NICKNAME',
 'OPERATION SUPPORTED',
 'COUNTRY OF ORIGIN',
 'SORTIE LAUNCH BASE OR HULL NO',
 'TYPE/MODEL/SERIES AIRCRAFT',
 'NUMBER OF SORTIE AIRCRAFT',
 'SORTIE FLYING HOURS',
 'TARGET TIME',
 'PERIOD OF DAY',
 'TARGET CONTROL',
 'ELECTRONIC WARFARE SORTIE EFFECTIVENESS',
 'REFUEL INDICATOR',
 'TYPE OF SORTIE IDENTIFIER',
 'MULTI-TARGET/FUNCTION DESIGNATOR',
 'TARGET/OBJECTIVE TYPE CODE',
 'TARGET COUNTRY',
 'TARGET CORPS AREA/ROUTE AREA PACKAGE',
 'PROVINCE CODE',
 'POINT/SEGMENT/AREA TARGET',
 'TYPE OF COORDINATES REPORTED',
 'TARGET LOCATION COORDINATES (BEGIN)',
 'POSTA FIELD IN DEGREES AND THOUSANDTHS',
 'RECONNAISSANCE TERM/TURN POSIT