# Data Analysis for Exoplanets

This notebook is ment to perform some data analysis on [exoplanets](https://en.wikipedia.org/wiki/Exoplanet). The dataset is from The [NASA Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=planets).

### Imports

In [24]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [25]:
'''
Function to load in the data and print the header which explains what each of the column fields mean
'''
def load_planets_csv(csvFilename, headerlines=76):
    with open(csvFilename, 'r') as csvFile:
        linecounter = 1
        for line in csvFile:
            print(line)
            linecounter+=1
            if linecounter > headerlines:
                break
    return pd.read_csv(csvFilename, header=headerlines)

In [26]:
# Load in the datafile and inspect the column descriptions:
planet_df = load_planets_csv("./planets_2020.07.25_10.13.04.csv")

# This file was produced by the NASA Exoplanet Archive  http://exoplanetarchive.ipac.caltech.edu

# Sat Jul 25 10:13:04 2020

#

# COLUMN pl_hostname:    Host Name

# COLUMN pl_letter:      Planet Letter

# COLUMN pl_name:        Planet Name

# COLUMN pl_discmethod:  Discovery Method

# COLUMN pl_controvflag: Controversial Flag

# COLUMN pl_pnum:        Number of Planets in System

# COLUMN pl_orbper:      Orbital Period [days]

# COLUMN pl_orbpererr1:  Orbital Period Upper Unc. [days]

# COLUMN pl_orbpererr2:  Orbital Period Lower Unc. [days]

# COLUMN pl_orbperlim:   Orbital Period Limit Flag

# COLUMN pl_orbsmax:     Orbit Semi-Major Axis [au])

# COLUMN pl_orbsmaxerr1: Orbit Semi-Major Axis Upper Unc. [au]

# COLUMN pl_orbsmaxerr2: Orbit Semi-Major Axis Lower Unc. [au]

# COLUMN pl_orbsmaxlim:  Orbit Semi-Major Axis Limit Flag

# COLUMN pl_orbeccen:    Eccentricity

# COLUMN pl_orbeccenerr1: Eccentricity Upper Unc.

# COLUMN pl_orbeccenerr2: Eccentricity Lower Unc.

# COLUMN pl_orb

In [27]:
# Look at the resulting pandas dataframe
planet_df

Unnamed: 0,pl_hostname,pl_letter,pl_name,pl_discmethod,pl_controvflag,pl_pnum,pl_orbper,pl_orbpererr1,pl_orbpererr2,pl_orbperlim,...,st_mass,st_masserr1,st_masserr2,st_masslim,st_rad,st_raderr1,st_raderr2,st_radlim,rowupdate,pl_facility
0,11 Com,b,11 Com b,Radial Velocity,0,1,326.030000,0.320000,-0.320000,0.0,...,2.70,0.30,-0.30,0.0,19.00,2.00,-2.00,0.0,2014-05-14,Xinglong Station
1,11 UMi,b,11 UMi b,Radial Velocity,0,1,516.219970,3.200000,-3.200000,0.0,...,2.78,0.69,-0.69,0.0,29.79,2.84,-2.84,0.0,2018-09-06,Thueringer Landessternwarte Tautenburg
2,14 And,b,14 And b,Radial Velocity,0,1,185.840000,0.230000,-0.230000,0.0,...,2.20,0.10,-0.20,0.0,11.00,1.00,-1.00,0.0,2014-05-14,Okayama Astrophysical Observatory
3,14 Her,b,14 Her b,Radial Velocity,0,1,1773.400020,2.500000,-2.500000,0.0,...,0.90,0.04,-0.04,0.0,0.93,0.01,-0.01,0.0,2018-09-06,W. M. Keck Observatory
4,16 Cyg B,b,16 Cyg B b,Radial Velocity,0,1,798.500000,1.000000,-1.000000,0.0,...,1.08,0.04,-0.04,0.0,1.13,0.01,-0.01,0.0,2018-09-06,Multiple Observatories
5,18 Del,b,18 Del b,Radial Velocity,0,1,993.300000,3.200000,-3.200000,0.0,...,2.30,,,0.0,8.50,,,0.0,2014-05-14,Okayama Astrophysical Observatory
6,1RXS J160929.1-210524,b,1RXS J160929.1-210524 b,Imaging,0,1,,,,,...,0.85,0.20,-0.10,0.0,,,,,2015-04-01,Gemini Observatory
7,24 Boo,b,24 Boo b,Radial Velocity,0,1,30.350600,0.007800,-0.007700,0.0,...,0.99,0.19,-0.13,0.0,10.64,0.84,-0.59,0.0,2018-04-26,Okayama Astrophysical Observatory
8,24 Sex,b,24 Sex b,Radial Velocity,0,2,452.800000,2.100000,-4.500000,0.0,...,1.54,0.08,-0.08,0.0,4.90,0.08,-0.08,0.0,2014-05-14,Lick Observatory
9,24 Sex,c,24 Sex c,Radial Velocity,0,2,883.000000,32.400000,-13.800000,0.0,...,1.54,0.08,-0.08,0.0,4.90,0.08,-0.08,0.0,2014-05-14,Lick Observatory


From the above cell, we can see that there are 4,197 confrimed planets in this dataset, with each row in the data representing a planet. Let's group the planets by their host star names and see how many confirmed planets each star has.

In [17]:
planet_df.groupby('pl_hostname').size().describe()

count    3115.000000
mean        1.347352
std         0.764953
min         1.000000
25%         1.000000
50%         1.000000
75%         1.000000
max         8.000000
dtype: float64

There are 3,115 distinct [planetary systems](https://en.wikipedia.org/wiki/Planetary_system). We notice that most of the planetary systems consist only of a single planet. The average system has 1.3 planets and the system with the most planets has 8! This system known as KOI-351 or [Kepler-90](https://en.wikipedia.org/wiki/Kepler-90) had its [eigth planet](https://en.wikipedia.org/wiki/Kepler-90i) discovered in 2017 [using machine learning methods](https://arxiv.org/abs/1712.05044) developed at Google. Let's inspect that system more closely.

In [29]:
# Sort the systems by number of planets in decreasing order:
planet_df.groupby('pl_hostname').size().sort_values(ascending=False)

pl_hostname
KOI-351       8
TRAPPIST-1    7
Kepler-20     6
Kepler-80     6
HD 10180      6
             ..
Kepler-428    1
Kepler-427    1
Kepler-426    1
Kepler-425    1
11 Com        1
Length: 3115, dtype: int64

In [30]:
# Select the part of the dataframe corresponding to the host KOI-351. This gives us the eigth planets around that star
planet_df[planet_df['pl_hostname'] == 'KOI-351']

Unnamed: 0,pl_hostname,pl_letter,pl_name,pl_discmethod,pl_controvflag,pl_pnum,pl_orbper,pl_orbpererr1,pl_orbpererr2,pl_orbperlim,...,st_mass,st_masserr1,st_masserr2,st_masslim,st_rad,st_raderr1,st_raderr2,st_radlim,rowupdate,pl_facility
1405,KOI-351,b,KOI-351 b,Transit,0,8,7.008151,1.9e-05,-1.9e-05,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
1406,KOI-351,c,KOI-351 c,Transit,0,8,8.719375,2.7e-05,-2.7e-05,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
1407,KOI-351,d,KOI-351 d,Transit,0,8,59.73667,0.00038,-0.00038,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
1408,KOI-351,e,KOI-351 e,Transit,0,8,91.93913,0.00073,-0.00073,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
1409,KOI-351,f,KOI-351 f,Transit,0,8,124.9144,0.0019,-0.0019,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
1410,KOI-351,g,KOI-351 g,Transit,0,8,210.60697,0.00043,-0.00043,,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
1411,KOI-351,h,KOI-351 h,Transit,0,8,331.60059,0.00037,-0.00037,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2014-05-14,Kepler
3617,KOI-351,i,Kepler-90 i,Transit,0,8,14.44912,0.0002,-0.0002,0.0,...,1.2,0.1,-0.1,0.0,1.2,0.1,-0.1,0.0,2017-12-14,Kepler


In [32]:
# 
planet_df[planet_df['pl_hostname'] == 'KOI-351']['pl_orbsmax']

1405    0.074
1406    0.089
1407    0.320
1408    0.420
1409    0.480
1410    0.710
1411    1.010
3617      NaN
Name: pl_orbsmax, dtype: float64