# Create and Use model

This notebook pulls in the model from the SpyPlane-OptimizingModel.ipynb notebook and uses that model to train the remaining data

This projects is based off the Buzzfeed news article on identifying spy planes found [here](https://www.buzzfeednews.com/article/peteraldhous/hidden-spy-planes), using the data and code adapted from their github repository [here](https://github.com/BuzzFeedNews/2017-08-spy-plane-finder).

In [None]:
%matplotlib inline
#import packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#sci-kit learn is a library with machine learning algorithms
from sklearn.ensemble import RandomForestClassifier

#package for saving our ML model
import pickle

In [None]:
#read in data
planes_labeled = pd.read_csv("/mnt/data/planes_labeled.csv")

In [None]:
#format data by removing non-numeric columnns and factorize the class
X = planes_labeled[['steer1', 'steer2', 'steer4', 'steer5', 'steer6', 'squawk_1', 'altitude3']]
y = pd.factorize(planes_labeled['class'])[0]

In [None]:
# Create a model based on parameters from the random grid search: 
#(n_estimators=1100, max_depth=50, max_features='sqrt', min_samples_split=4, bootstrap=False)
np.random.seed(415)
model_tuned = 

#train model with the features created above


### Classify all data

#### Gather and format data
First we'll remove all of the training data and the known federal planes from the entire data set (which is in the planes_features file.

In [None]:
#read in all data
feds = pd.read_csv("/mnt/data/feds.csv")
train = pd.read_csv("/mnt/data/train.csv")
planes = pd.read_csv('/mnt/data/planes_features.csv')

In [None]:
#first gather list of federal plane identifiers to remove
fed_ids = list(feds['adshex'])
len(fed_ids)

In [None]:
#next gather list of training list identifiers to remove 
remove = fed_ids + list(train['adshex'])
len(remove)

In [None]:
#remove all the rows with adshex values in the 'remove' list created above
classfiy = 

In [None]:
#look at number of rows and columns in the raw data (classify dataframe)


In [None]:
X_all = classify[['steer1', 'steer2', 'steer4', 'steer5', 'steer6', 'squawk_1', 'altitude3']]

In [None]:
X_all.head()

#### Make Predictions

The we'll use our model from the previous section to output those planes that were determined to be potential surveillance plans and join this data with the [FAA aircraft registration database](https://www.faa.gov/licenses_certificates/aircraft_certification/aircraft_registry/releasable_aircraft_download/), which gives the planes’ registration numbers and the organizations they are registered to.|

In [None]:
real_predictions = model_tuned.predict(X_all)

In [None]:
#look at number of predicted spy planes
sum(real_predictions)

In [None]:
#create data frame with only those potential spy planes
candidates = classify[real_predictions == 1]

In [None]:
#print out first few rows
candidates.head()

In [None]:
#read in FAA data
faa = pd.read_csv("/mnt/data/faa-registration.csv")
faa.head()

In [None]:
#look at the column names in the FAA registration dataframe
faa.columns

In [None]:
#seperate out the columns we want to use
plane_info = faa[['N-NUMBER', 'NAME', 'MODE S CODE HEX']].copy()
plane_info.rename(columns = {'N-NUMBER':'n_number', 'NAME':'name', 'MODE S CODE HEX':'adshex'}, inplace = True) 

In [None]:
#combine the candidates dataframe with the plane_info dataframe (use a left join with the candidates as the left table)
spy_candidates = 


#### Look at predicted probabilities

Here, we'll calculate the probabilities and sort them in descending order.

In [None]:
#get a list of the probabilities of each plane being a surveillance plane
probability_pred = 


In [None]:
#add the surveillance plane classifications to the data frame and sort
classify_prob = classify.copy()
classify_prob.loc[:,'spy_prob'] = probability_pred[:,1]

In [None]:
#sort values by the column 'spy_prob' from highest to lowest - make sure to do so 'inplace'



In [None]:
classify_prob.head()

In [None]:
#merge the 'classify_prob' dataframe with FAA names and registration numbers
classify_prob_faa= classify_prob.merge(plane_info, on = 'adshex', how = 'left')

In [None]:
#seperate out only those rows with probabilites greater than 0.5 and the relevant columns
relevant_cols = ['adshex', 'type', 'spy_prob', 'n_number', 'name', 'squawk_1', 'steer1', 'steer2', 'steer4', 'steer5', 'steer6', 'altitude3']
candidates_with_prob = classify_prob_faa.loc[classify_prob_faa['spy_prob'] > 0.5, relevant_cols]

In [None]:
#look at the top 15 results


In [None]:
#save the spy candidates data frame to a csv file
candidates_with_prob.to_csv("/mnt/data/spy_candidates.csv", index = False)

In [None]:
#save the confirmed federal surveillance planes with their relevant data to file
feds_data = planes_labeled[planes_labeled['adshex'].isin(fed_ids)]
feds_data = feds_data[['adshex', 'steer1', 'steer2', 'steer4', 'steer5', 'steer6', 'squawk_1', 'altitude3']]
feds_data.to_csv('/mnt/data/feds_data.csv')

In [None]:
# save the model to disk
file_loc = '/mnt/data/SpyPlane-RandomForest.sav'
pickle.dump(model_tuned, open(file_loc, 'wb'))