# Planetary Stability

The dataset consists of 1500 artifically generated 3-planet systems.  All planets have a mass of $5M_\oplus$, which I think is roughly where the Kepler sample should peak.  They were all integrated for 1 million orbits of the inner planet (initialized at 0.05 AU), and the stable column shows whether the system survives or not.  The next two planets were initialized somewhere between 0-10 Hill radii away.  Each run records the initial orbital parameters of each of the 3 planets.  Below is a description of each column, and see below for a figure showing each angle:

* ID: Unique Identifier
* Stable: Whether system was stable after 1e6 inner orbits
* Norbits: 1e6 for all
* RH/a: The Hill radius, scaled by the semimajor axis.  This just depeneds on the planet/star mass ratio so is the same for all.

For each planet (numbered 1,2,3), we record the initial

* a: semimajor axis
* e: eccentricity
* pomega: longitude of pericenter (what direction an eccentric orbit is oriented, this is Omega + omega in figure below, useful for low inclination orbits)
* inc: inclination
* Omega: longitude of the node (how the orbital plane is oriented)
* f: true anomaly: how far the planet started from pericenter.

See below for an image showing the angles.

All planets were simply started with e and inc distributed uniformly between 0 and 0.01 (radians for inc).  

![orbits](images/orbit.png)

In [33]:
import pandas as pd
import numpy as np
df = pd.read_csv('../data/1e6data.csv', index_col=0)
df.tail()

Unnamed: 0_level_0,Stable,Norbits,Mplanet,RH/a,a1,P1,e1,pomega1,inc1,Omega1,...,inc2,Omega2,f2,a3,P3,e3,pomega3,inc3,Omega3,f3
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1495,False,1000000.0,1.5e-05,0.021544,0.050077,0.011206,0.042378,-0.36356,0.011013,-0.670975,...,0.003859,3.03118,-1.069728,0.065008,0.016574,0.04362,0.502205,0.005477,0.221792,1.924513
1496,False,1000000.0,1.5e-05,0.021544,0.049419,0.010986,0.031753,-5.286663,0.013705,-3.133337,...,0.008787,-2.197973,-3.304425,0.060634,0.01493,0.040648,1.662655,0.003963,-0.432332,-0.462635
1497,False,1000000.0,1.5e-05,0.021544,0.04948,0.011006,0.069293,2.372936,0.006937,3.068393,...,0.00652,1.42274,-0.526362,0.063639,0.016054,0.058353,-1.045209,0.017926,-0.082765,0.235402
1498,False,1000000.0,1.5e-05,0.021544,0.049251,0.01093,0.016901,2.892971,0.016794,1.484133,...,0.014752,-2.369802,2.47411,0.061253,0.015159,0.030802,0.270703,0.034449,0.287326,-0.0495
1499,False,1000000.0,1.5e-05,0.021544,0.048381,0.010642,0.045662,3.261847,0.008018,1.181873,...,0.002352,1.869814,3.49308,0.064895,0.016531,0.056288,-2.734529,0.019173,-2.52057,0.1505


In [34]:
def minHillSep(system):
    def HillSepPair(a1,a2,rha):
        mina = min(a1, a2)
        return abs(a2-a1)/(mina*rha)
    a12 = HillSepPair(system['a1'], system['a2'], system['RH/a'])
    a23 = HillSepPair(system['a2'], system['a3'], system['RH/a'])
    return min(a12,a23)

In [35]:
df['minHillSep'] = df.apply(minHillSep, axis=1)
df.head()

Unnamed: 0_level_0,Stable,Norbits,Mplanet,RH/a,a1,P1,e1,pomega1,inc1,Omega1,...,Omega2,f2,a3,P3,e3,pomega3,inc3,Omega3,f3,minHillSep
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,True,1000000.0,1.5e-05,0.021544,0.050005,0.011182,0.005776,0.238565,0.004611,-0.081291,...,1.631981,-5.303298,0.066488,0.017144,0.009993,-1.222611,0.013411,1.838312,5.118309,7.087599
1,True,1000000.0,1.5e-05,0.021544,0.049997,0.011179,0.005504,-1.829833,0.014697,-1.078126,...,-1.144552,-1.314599,0.069424,0.018292,0.011354,3.09872,0.004279,0.624017,-5.264576,8.275992
2,False,1000000.0,1.5e-05,0.021544,0.047969,0.010506,0.074062,2.319163,0.006916,-0.625531,...,-1.288583,-3.124795,0.066854,0.017285,0.112981,-1.368589,0.004592,0.174013,-0.644255,7.711845
3,False,1000000.0,1.5e-05,0.021544,0.047724,0.010425,0.082,2.891776,0.010222,0.869752,...,-2.319534,-2.894826,0.068622,0.017976,0.06725,-2.265014,0.006661,-0.807832,-0.372393,9.184164
4,True,1000000.0,1.5e-05,0.021544,0.049995,0.011179,0.004748,2.438079,0.013462,-0.013445,...,-1.016651,-3.573545,0.067997,0.017731,0.006454,2.426652,0.001658,-0.559482,-3.150335,7.714585


In [36]:
y = df['Stable']
X = df.ix[:, 'a1':]
#X = df[['inc1', 'minHillSep']].values

In [37]:
from sklearn.cross_validation import train_test_split
X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size=0.3, random_state=42)

In [38]:
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.grid_search import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import calibration_curve
  
cv_s = StratifiedShuffleSplit(y_train,  n_iter=10 , test_size=0.1, random_state=42)
rfc = RandomForestClassifier(max_features= 'auto' ,n_estimators=50) 
param_grid = { 
        'n_estimators': [500],
        'max_features': ['sqrt']}
CV_rfc = GridSearchCV(n_jobs=-1, estimator=rfc, scoring="roc_auc", param_grid=param_grid, cv=cv_s)
CV_rfc.fit(X_train, y_train)

GridSearchCV(cv=StratifiedShuffleSplit(labels=[ True False ..., False  True], n_iter=10, test_size=0.1, random_state=42),
       error_score='raise',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [500], 'max_features': ['sqrt']},
       pre_dispatch='2*n_jobs', refit=True, scoring='roc_auc', verbose=0)

In [39]:
print("The best parameters are {%s} with a score of %0.4f" % (CV_rfc.best_params_, CV_rfc.best_score_))

The best parameters are {{'n_estimators': 500, 'max_features': 'sqrt'}} with a score of 1.0000


In [40]:
model = CV_rfc.best_estimator_
y_pred = model.predict_proba(X_holdout) # probability that team0 wins (what Kaggle calls team 1, and wants for submission)
y_pred_acc = model.predict(X_holdout)
from sklearn import metrics
test_score = metrics.roc_auc_score(y_holdout, y_pred[:,1])
test_score_acc = metrics.accuracy_score(y_holdout, y_pred_acc)
print("AUC score is {0}".format(test_score))
print("Accuracy is {0}".format(test_score_acc))

AUC score is 0.9999602393590584
Accuracy is 0.9977777777777778


In [41]:
print("Feature\t\tImportance\n")
for i in reversed(np.argsort(model.feature_importances_)):
    print("%s\t\t%f" % (X.columns[i], model.feature_importances_[i]))

Feature		Importance

e2		0.224009
e1		0.161745
e3		0.151391
a1		0.105570
P1		0.102382
minHillSep		0.089480
a2		0.044251
P2		0.043172
P3		0.034399
a3		0.032032
inc2		0.001725
inc1		0.001426
inc3		0.001200
f3		0.000995
f1		0.000921
f2		0.000898
Omega3		0.000850
pomega3		0.000780
Omega2		0.000752
pomega1		0.000730
pomega2		0.000706
Omega1		0.000585
