# Planetary Stability

The dataset consists of 1500 artifically generated 3-planet systems.  All planets have a mass of $5M_\oplus$, which I think is roughly where the Kepler sample should peak.  They were all integrated for 1 million orbits of the inner planet (initialized at 0.05 AU), and the stable column shows whether the system survives or not.  The next two planets were initialized somewhere between 0-10 Hill radii away.  Each run records the initial orbital parameters of each of the 3 planets.  Below is a description of each column, and see below for a figure showing each angle:

* Sim_ID: Unique Identifier
* Stable: Whether system was stable after 1e6 inner orbits
* Norbits_instability: Number of inner-planet orbital periods until instability.
* Norbits: 1e6 for all
* Mplanet: Mass of all planets (in solar masses)
* RH/a: The Hill radius, scaled by the semimajor axis.  This just depeneds on the planet/star mass ratio so is the same for all.

For each planet (numbered 1,2,3--all angles in radians), we record the initial

* a: semimajor axis (AU)
* e: eccentricity
* pomega: longitude of pericenter (what direction an eccentric orbit is oriented, this is Omega + omega in figure below, useful for low inclination orbits)
* inc: inclination
* Omega: longitude of the node (how the orbital plane is oriented)
* f: true anomaly: how far the planet started from pericenter.
* xyz: cartesian positions (in AU)
* vxvyvz: cartesian velocities (in AU/yr)

There are then two final columns I added as checks:

* Rel_Eerr: Relative energy error during the integration
* integ_time: Time it took to run simulation (in sec)

These two columns **SHOULD NOT** be used in the analysis.  Like before when I had final eccentricities, they have information about the outcome (runs with worse energy conservation likely had closer encounters, and the ones with shorter integration times are the ones that went unstable).  

See below for an image showing the angles.

All planets were simply started with e distributed uniformly between 0 and 0.02, and the inclination between 0 and 0.01 radians.  

![orbits](images/orbit.png)

In [1]:
import pandas as pd
import numpy as np
df = pd.read_csv('../data/1e6dataset.csv', index_col=0)
df.tail()

Unnamed: 0_level_0,Stable,Norbits_instability,Norbits,Mplanet,RH/a,a1,P1,e1,pomega1,inc1,...,Omega3,f3,x3,y3,z3,vx3,vy3,vz3,Rel_Eerr,integ_time
Sim_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1995,True,1000000.0,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.009089,-1.079974,0.010988,...,2.967275,-0.849203,-0.067186,0.010875,1.5e-05,-3.71096,-23.86027,0.396331,1.221454e-13,920.464203
1996,False,393070.901091,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.01117,4.751147,0.009788,...,2.553637,1.060472,0.006214,-0.06658,0.000532,24.201111,2.232317,-0.156514,3.073272e-14,415.862748
1997,True,1000000.0,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.004709,1.389463,0.01167,...,0.63978,-0.803942,0.056566,-0.039027,-0.000454,13.578741,19.784498,0.054182,1.261405e-14,884.21979
1998,True,1000000.0,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.002514,-1.615616,0.001747,...,0.170469,-2.806279,0.001103,-0.067801,-0.000308,24.000226,0.476752,-0.016553,3.729223e-14,1005.072442
1999,False,35087.808064,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.002846,-2.724788,0.016104,...,1.76417,-1.827689,-0.029586,0.055161,4.7e-05,-22.018055,-12.027872,0.06137,4.347926e-15,33.730164


In [2]:
def HillSep(system):
    return (system['a2']-system['a1'])/(system['a1']*system['RH/a'])
df['HillSep'] = df.apply(HillSep, axis=1)
df.head()

Unnamed: 0_level_0,Stable,Norbits_instability,Norbits,Mplanet,RH/a,a1,P1,e1,pomega1,inc1,...,f3,x3,y3,z3,vx3,vy3,vz3,Rel_Eerr,integ_time,HillSep
Sim_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,True,1000000.0,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.015159,-3.640656,0.004519,...,1.573978,-0.068236,-0.014064,-0.0004729121,4.567729,-23.360975,0.2676,2.398596e-15,1006.014277,8.377687
1,False,210649.647778,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.016949,4.798937,0.004452,...,-1.749616,-0.062409,0.006367,5.514355e-07,-2.331204,-24.959053,0.000893,5.615807e-14,216.212583,5.537457
2,True,1000000.0,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.018957,0.355323,0.001481,...,-0.03255,-0.062995,0.031855,-0.0004634034,-10.685821,-21.138176,-0.047841,7.017671e-14,983.189257,8.824137
3,False,19982.209941,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.010885,-3.958688,0.01054,...,0.946351,0.063081,-0.005132,0.0004295278,2.220935,24.94565,-0.118977,7.01655e-15,20.751416,5.951859
4,False,46995.390321,1000000.0,1.5e-05,0.021544,0.05,0.01118,0.002063,2.488507,0.002705,...,-0.455684,0.051185,0.037257,-7.717988e-05,-14.78124,20.203958,0.035016,1.052342e-14,49.353019,5.944192


In [11]:
columns = ['HillSep']
for i in ['1', '2', '3']:
    #columns += ['a'+i, 'P'+i, 'e'+i, 'pomega'+i, 'inc'+i, 'Omega'+i, 'f'+i, 'x'+i, 'y'+i, 'z'+i, 'vx'+i, 'vy'+i, 'vz'+i]
    columns += ['P'+i, 'e'+i, 'pomega'+i, 'inc'+i, 'Omega'+i, 'f'+i]

y = df['Stable']
X = df[columns]
X.head()

Unnamed: 0_level_0,HillSep,P1,e1,pomega1,inc1,Omega1,f1,P2,e2,pomega2,inc2,Omega2,f2,P3,e3,pomega3,inc3,Omega3,f3
Sim_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
0,8.377687,0.01118,0.015159,-3.640656,0.004519,-3.070751,2.544276,0.01434,0.015676,-4.377415,0.008318,-2.617688,-0.577344,0.018392,0.010094,-4.512346,0.013191,-2.397858,1.573978
1,5.537457,0.01118,0.016949,4.798937,0.004452,3.11291,-3.45895,0.013239,0.013032,-1.32749,0.001638,0.178112,-1.031918,0.015678,0.008655,4.789547,3.7e-05,2.79845,-1.749616
2,8.824137,0.01118,0.018957,0.355323,0.001481,-1.033591,-1.658949,0.014515,0.013395,-4.347107,0.010576,-2.470538,3.651813,0.018845,0.003168,-3.577208,0.006868,-1.740366,-0.03255
3,5.951859,0.01118,0.010885,-3.958688,0.01054,-2.351669,0.41173,0.013398,0.000263,-1.021212,0.004527,1.472345,-0.027364,0.016056,0.009405,-1.027543,0.008314,-2.267802,0.946351
4,5.944192,0.01118,0.002063,2.488507,0.002705,0.417927,-3.759915,0.013395,0.018359,-1.253795,0.013355,1.394416,3.37206,0.016049,0.005534,1.084883,0.001853,1.347118,-0.455684


In [12]:
from sklearn.cross_validation import train_test_split
X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size=0.3, random_state=42)

In [13]:
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.grid_search import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import calibration_curve
  
cv_s = StratifiedShuffleSplit(y_train,  n_iter=10 , test_size=0.1, random_state=42)
rfc = RandomForestClassifier(max_features= 'auto' ,n_estimators=50) 
param_grid = { 
        'n_estimators': [500],
        'max_features': ['sqrt']}
CV_rfc = GridSearchCV(n_jobs=-1, estimator=rfc, scoring="roc_auc", param_grid=param_grid, cv=cv_s)
CV_rfc.fit(X_train, y_train)

GridSearchCV(cv=StratifiedShuffleSplit(labels=[False False ..., False False], n_iter=10, test_size=0.1, random_state=42),
       error_score='raise',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [500], 'max_features': ['sqrt']},
       pre_dispatch='2*n_jobs', refit=True, scoring='roc_auc', verbose=0)

In [14]:
print("The best parameters are {%s} with a score of %0.4f" % (CV_rfc.best_params_, CV_rfc.best_score_))

The best parameters are {{'max_features': 'sqrt', 'n_estimators': 500}} with a score of 0.9769


In [15]:
model = CV_rfc.best_estimator_
y_pred = model.predict_proba(X_holdout) # probability that team0 wins (what Kaggle calls team 1, and wants for submission)
y_pred_acc = model.predict(X_holdout)
from sklearn import metrics
test_score = metrics.roc_auc_score(y_holdout, y_pred[:,1])
test_score_acc = metrics.accuracy_score(y_holdout, y_pred_acc)
print("AUC score is {0}".format(test_score))
print("Accuracy is {0}".format(test_score_acc))

AUC score is 0.965271836354
Accuracy is 0.9


In [16]:
print("Feature\t\tImportance\n")
for i in reversed(np.argsort(model.feature_importances_)):
    print("%s\t\t%f" % (X.columns[i], model.feature_importances_[i]))

Feature		Importance

HillSep		0.285794
P2		0.265495
P3		0.246310
e1		0.017924
e2		0.016860
f2		0.014620
e3		0.014450
pomega3		0.014123
pomega2		0.014059
inc3		0.013820
Omega1		0.012680
inc1		0.012557
inc2		0.012556
Omega3		0.012548
f1		0.011800
f3		0.011598
Omega2		0.011404
pomega1		0.011403
P1		0.000000
