# Decision Tree and Random Forest Classifier without koi_score as independent variable
I took out the koi_score because I wasn't sure if that's actually how they classified the koi_disposition. It was easy to run this so I gave it a try. I looked back at the website and learned that the koi_score is a confidence factor, specifically: A value between 0 and 1 that indicates the confidence in the KOI disposition. For CANDIDATEs, a higher value indicates more confidence in its disposition, while for FALSE POSITIVEs, a higher value indicates less confidence in that disposition.
So it needs to be included, and the results below confirm that.

In [1]:
#Imports
import pandas as pd

from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split


In [2]:
exoplanet_complete_kNN = pd.read_csv('exoplanet_complete_kNN.csv')
exoplanet_complete_kNN.head()

Unnamed: 0,koi_disposition,koi_score,koi_period,koi_time0bk,koi_impact,koi_duration,koi_depth,koi_prad,koi_teq,koi_insol,koi_steff,koi_slogg,koi_srad,ra,dec,koi_kepmag
0,1,1.0,9.488036,170.53875,0.146,2.9575,615.8,2.26,793,93.59,5455,4.467,0.927,291.93423,48.141651,15.347
1,1,0.969,54.418383,162.51384,0.586,4.507,874.8,2.83,443,9.11,5455,4.467,0.927,291.93423,48.141651,15.347
2,3,0.0,19.89914,175.850252,0.969,1.7822,10829.0,14.6,638,39.3,5853,4.544,0.868,297.00482,48.134129,15.436
3,3,0.0,1.736952,170.307565,1.276,2.40641,8079.2,33.46,1395,891.96,5805,4.564,0.791,285.53461,48.28521,15.597
4,1,1.0,2.525592,171.59555,0.701,1.6545,603.3,2.75,1406,926.16,6031,4.438,1.046,288.75488,48.2262,15.509


### key for koi_disposition:
1 = CONFIRMED, 
2 = CANDIDATE, 
3 = FALSE POSITIVE

In [3]:
# Remove koi_score from independent variables
exoplanet_trees = exoplanet_complete_kNN.drop("koi_score", axis=1)
exoplanet_trees.head()

Unnamed: 0,koi_disposition,koi_period,koi_time0bk,koi_impact,koi_duration,koi_depth,koi_prad,koi_teq,koi_insol,koi_steff,koi_slogg,koi_srad,ra,dec,koi_kepmag
0,1,9.488036,170.53875,0.146,2.9575,615.8,2.26,793,93.59,5455,4.467,0.927,291.93423,48.141651,15.347
1,1,54.418383,162.51384,0.586,4.507,874.8,2.83,443,9.11,5455,4.467,0.927,291.93423,48.141651,15.347
2,3,19.89914,175.850252,0.969,1.7822,10829.0,14.6,638,39.3,5853,4.544,0.868,297.00482,48.134129,15.436
3,3,1.736952,170.307565,1.276,2.40641,8079.2,33.46,1395,891.96,5805,4.564,0.791,285.53461,48.28521,15.597
4,1,2.525592,171.59555,0.701,1.6545,603.3,2.75,1406,926.16,6031,4.438,1.046,288.75488,48.2262,15.509


In [4]:
tree_target = exoplanet_trees["koi_disposition"]
tree_target_names = ["1", "2", "3"]

In [5]:
tree_data = exoplanet_trees.drop("koi_disposition", axis=1)
feature_names = tree_data.columns
tree_data.head()

Unnamed: 0,koi_period,koi_time0bk,koi_impact,koi_duration,koi_depth,koi_prad,koi_teq,koi_insol,koi_steff,koi_slogg,koi_srad,ra,dec,koi_kepmag
0,9.488036,170.53875,0.146,2.9575,615.8,2.26,793,93.59,5455,4.467,0.927,291.93423,48.141651,15.347
1,54.418383,162.51384,0.586,4.507,874.8,2.83,443,9.11,5455,4.467,0.927,291.93423,48.141651,15.347
2,19.89914,175.850252,0.969,1.7822,10829.0,14.6,638,39.3,5853,4.544,0.868,297.00482,48.134129,15.436
3,1.736952,170.307565,1.276,2.40641,8079.2,33.46,1395,891.96,5805,4.564,0.791,285.53461,48.28521,15.597
4,2.525592,171.59555,0.701,1.6545,603.3,2.75,1406,926.16,6031,4.438,1.046,288.75488,48.2262,15.509


In [7]:
# Separate data into train and test buckets
X_train, X_test, y_train, y_test = train_test_split(tree_data, tree_target, random_state=42)

In [8]:
# Decision Tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.6668334167083542

In [9]:
# Random Forest Classifier
rf = RandomForestClassifier(n_estimators=200)
rf = rf.fit(X_train, y_train)
rf.score(X_test, y_test)

0.7693846923461731

In [10]:
#Sort features based on importance
sorted(zip(rf.feature_importances_, feature_names), reverse=True)

[(0.14854677652727374, 'koi_prad'),
 (0.11561534435296625, 'koi_depth'),
 (0.08831272195687136, 'koi_impact'),
 (0.08218128566167023, 'koi_period'),
 (0.07972332384947041, 'koi_duration'),
 (0.068201762766734, 'koi_insol'),
 (0.06178229424095573, 'koi_teq'),
 (0.05960024870573824, 'koi_time0bk'),
 (0.05348550446280442, 'koi_kepmag'),
 (0.05186083460126996, 'ra'),
 (0.05116965537737496, 'koi_steff'),
 (0.047305292287353404, 'koi_slogg'),
 (0.046850964756433286, 'koi_srad'),
 (0.045363990453083994, 'dec')]

### I didn't get as good of results after removing the koi_score. The remaining features ranked in almost the same order of importance as when the koi score was included.
# Summary:

### Decision Tree: 66.7%
### Random Forest Classifier: 76.9%