# <font color='black'>Finding key factors that predict an exoplanet's distance from its host star</font> <a id='q1'></a>

### <font color='black'>Group Name: SPACE!</font> <a id='q1'></a>
Ben Rycroft (s3947135)

Rita Lam Cordeiro (s3471881)

### <font color='black'>Introduction</font> <a id='q1'></a>

**Dataset Source**

The NASA Exoplanet archive compiles data on all known exoplanets and their host stars including exoplanet parameters, stellar parameters and, discovery/characterization data. 

This archive includes three data sets: "List of All known planets and hosts", "List of all Kepler Objects of Interest (KOIs)", and "List of all Kepler Threshold-Crossing Events (TCEs). We have chosen to solely use "List of All known planets and hosts" for this project


**Dataset Details**

Each row in the dataset represents an exoplanet. This dataset has _ columns: 

Due to large amount of NaN values we have decided to split the data into subsets which focus on corellating a value with a planet's distance from its host star. 

In [2]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import io
import requests

pd.set_option('display.max_columns', None) 

###
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
%config InlineBackend.figure_format = 'retina'
plt.style.use("seaborn")
###

In [4]:
#naming the dataset
df = pd.read_csv("https://raw.githubusercontent.com/BenRyc/SPACE-/main/orbitDistNoLim.csv")

rad = df.dropna(subset=['pl_rade', 'pl_radeerr1', 'pl_radeerr2'])[['pl_rade', 'pl_radeerr1', 'pl_radeerr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
mass = df.dropna(subset=['pl_masse', 'pl_masseerr1', 'pl_masseerr2'])[['pl_masse', 'pl_masseerr1', 'pl_masseerr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
dens = df.dropna(subset=['pl_dens', 'pl_denserr1', 'pl_denserr2'])[['pl_dens', 'pl_denserr1', 'pl_denserr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
orbeccen = df.dropna(subset=['pl_orbeccen', 'pl_orbeccenerr1', 'pl_orbeccenerr2'])[['pl_orbeccen', 'pl_orbeccenerr1', 'pl_orbeccenerr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
insol = df.dropna(subset=['pl_insol', 'pl_insolerr1', 'pl_insolerr2'])[['pl_insol', 'pl_insolerr1', 'pl_insolerr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
eqt = df.dropna(subset=['pl_eqt', 'pl_eqterr1', 'pl_eqterr2'])[['pl_eqt', 'pl_eqterr1', 'pl_eqterr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
teff = df.dropna(subset=['st_teff', 'st_tefferr1', 'st_tefferr2'])[['st_teff', 'st_tefferr1', 'st_tefferr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
radst = df.dropna(subset=['st_rad', 'st_raderr1', 'st_raderr2'])[['st_rad', 'st_raderr1', 'st_raderr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
massst = df.dropna(subset=['st_mass', 'st_masserr1', 'st_masserr2'])[['st_mass', 'st_masserr1', 'st_masserr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
met = df.dropna(subset=['st_met', 'st_meterr1', 'st_meterr2'])[['st_met', 'st_meterr1', 'st_meterr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]
agest = df.dropna(subset=['st_age', 'st_ageerr1', 'st_ageerr2'])[['st_age', 'st_ageerr1', 'st_ageerr2', 'pl_orbper', 'pl_orbpererr1', 'pl_orbpererr2', 'pl_orbsmax', 'pl_orbsmaxerr1', 'pl_orbsmaxerr2']]



In [5]:
rad

Unnamed: 0,pl_rade,pl_radeerr1,pl_radeerr2,pl_orbper,pl_orbpererr1,pl_orbpererr2,pl_orbsmax,pl_orbsmaxerr1,pl_orbsmaxerr2
0,1.310,0.170,-0.170,7.008151,1.900000e-05,-1.900000e-05,0.07400,0.01600,-0.01600
1,1.190,0.140,-0.140,8.719375,2.700000e-05,-2.700000e-05,0.08900,0.01200,-0.01200
2,2.870,0.300,-0.300,59.736670,3.800000e-04,-3.800000e-04,0.32000,0.05000,-0.05000
3,2.660,0.290,-0.290,91.939130,7.300000e-04,-7.300000e-04,0.42000,0.06000,-0.06000
4,2.880,0.520,-0.520,124.914400,1.900000e-03,-1.900000e-03,0.48000,0.09000,-0.09000
...,...,...,...,...,...,...,...,...,...
2708,12.993,0.229,-0.235,1.752224,8.000000e-07,-8.000000e-07,0.02340,0.00150,-0.00150
2709,2.043,0.069,-0.069,6.001270,2.100000e-05,-2.100000e-05,0.05706,0.00055,-0.00055
2710,12.780,0.340,-0.340,4.187756,6.000000e-07,-6.000000e-07,0.05150,0.00050,-0.00050
2711,23.203,2.466,-2.466,3.765001,8.100000e-06,-8.100000e-06,0.08150,0.00770,-0.00770
