# Exoplanet Habitability

According to [Wikipedia](https://en.wikipedia.org/wiki/Kepler_space_telescope), the key goals of the Kepler Space Telescope are:

- To determine how many Earth-size and larger planets there are in or near the habitable zone (often called "Goldilocks planets") of a wide variety of spectral types of stars.
- To determine the range of size and shape of the orbits of these planets.
- To estimate how many planets there are in multiple-star systems.
- To determine the range of orbit size, brightness, size, mass and density of short-period giant planets.
- To identify additional members of each discovered planetary system using other techniques.
- Determine the properties of those stars that harbor planetary systems.

In [None]:
# Python imports and settings
import numpy  as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn
import sklearn.cluster
import re
from pydash import py_ as _

from src.dataset_koi import koi, koi_columns, koi_column_types
from src.utilities import onehot_encode_comments

# https://stackoverflow.com/questions/11707586/how-do-i-expand-the-output-display-to-see-more-columns-of-a-pandas-dataframe
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)
pd.set_option('display.max_rows', 8)  # 8 is required for .describe()
pd.set_option('mode.chained_assignment', None)

%load_ext autoreload
%autoreload 2

In [None]:
dataset = (
    pd.concat([
        koi['archive'],
        koi['KIC'][['ra','dec']],
        koi['transit'],       
        koi['stellar'],
    ], axis=1)
    .sort_values(by='kepler_name')
    .query('koi_disposition == "CONFIRMED"')            
)
dataset

## Calculating Mass and Density

Unfortunatly it is not possible to calcuate the mass, and by extension density, of an exoplanet using the KOI table alone.

Normally, planetary mass is caculated using the doppler-shift [Radial Velocity](https://www.planetary.org/explore/space-topics/exoplanets/radial-velocity.html) 
method. This method works by observing the gravitational wobble of the star as it orbits around the barycenter of the combined exoplanet-star system. 

Once mass has been computed, then density can be simply calcuated as the mass divided by the volume of a sphere. Density = Mass / (4/3 π r^3)

Using the Kepler Transit Method alone, it is technically possible to measure the mass of exoplanets, but only if multiple exoplanets orbit the same star. 
There exists a small gravitational tug between planets, causing a tiny measurable variation in the transit timings of each orbit, 
due to the relative position of the planets. A paper was published on the discovery of 
[The mass of the Mars-sized exoplanet Kepler-138 b from transit timing](https://www.nature.com/articles/nature14494) as 0.066 (+-0.059) Earth Masses. 

## Goldilocks Exoplanets

The first criteria for a earth-like habitable planet is liquid water, which would require a `koi_teq` Equilibrium Temperature (Kelvin) within the range 273.2K - 373.2K

We start with 2303 total CONFIRMED exoplanets, 110 have a temperature that is "just right"

In [None]:
dataset['goldilocks_temp'] = ((273.2 <= dataset['koi_teq']) & (dataset['koi_teq'] <= 373.2))
goldilocks_counts = {}
goldilocks_counts['temp'] = {
    "too cold":   dataset.query('koi_teq         <= 273.2').shape[0],
    "just right": dataset.query('goldilocks_temp == True' ).shape[0],
    "too hot":    dataset.query('koi_teq         >= 373.2').shape[0],
}
for key, value in goldilocks_counts['temp'].items():
    print( 'Exoplanets that are %-10s: %4d (%5.2f%%)' % ( key, value, 100*value/dataset.shape[0] ) )    

Lacking a formal density measurement from the KOI table, the closest proxy is `koi_prad` Planetary Radius (Earth radii). Different possible planet types include:

- [Super-Earth](https://en.wikipedia.org/wiki/Super-Earth) below 10$M_e$ with a radius of 0.8-1.25$R_e$ - exactly what we are looking for
- Earth Sized [Ocean_planet](https://en.wikipedia.org/wiki/Ocean_planet) would have a much lower density - but could potentually harbour life
- [Carbon planet](https://en.wikipedia.org/wiki/Carbon_planet) low density diamonds in the sky - may lack enough oxgyen to have water
- Gaseous [Mini-Neptunes](https://en.wikipedia.org/wiki/Mini-Neptune) require a minimum radius of 1.7$R_e$
- Small [Sub Earths](https://en.wikipedia.org/wiki/Sub-Earth) under 0.8$R_e$, likely lack the gravity and magnetic fields to sustain a habitable atmosphere

[List of potentially habitable exoplanets](https://en.wikipedia.org/wiki/List_of_potentially_habitable_exoplanets) only lists exoplanets in the range of 0.78-1.63$E_r$, which mostly agrees with the 0.8-1.7$E_r$ range suggested above

In [None]:
dataset['goldilocks_size'] = ((0.8 <= dataset['koi_prad']) & (dataset['koi_prad'] <= 1.7))
goldilocks_counts['size'] = {
    "too small":  dataset.query('koi_prad        <= 0.8' ).shape[0],
    "just right": dataset.query('goldilocks_size == True').shape[0],
    "too big":    dataset.query('koi_prad        >= 1.7' ).shape[0],
}
for key, value in goldilocks_counts['size'].items():
    print( 'Exoplanets that are %-10s: %4d (%5.2f%%)' % ( key, value, 100*value/dataset.shape[0] ) )    

These limits can then be combined together 

In [None]:
dataset['goldilocks'] = ((dataset['goldilocks_temp'] == True) & (dataset['goldilocks_size'] == True))
goldilocks_counts['combined'] = {
    "just right": dataset.query('goldilocks==True').shape[0]
}

for key in goldilocks_counts.keys():
    value = goldilocks_counts[key]['just right']
    print( 'Exoplanets that are "just right" %-10s: %4d (%5.2f%%)' % ( key, value, 100*value/dataset.shape[0] ) )    

In [None]:
sns.set(rc={'figure.figsize':(20,10)})
sns.scatterplot(
    data=dataset,        
    x="koi_teq",
    y="koi_prad",
    size="koi_prad", sizes=(20,400),    
    hue="goldilocks_temp", palette="RdBu",
)
plt.title('Confirmed Exoplanets in the Goldilocks Temperature')
plt.xlabel('Temperature (Kelvin)')
plt.ylabel('Radius (Earth Radii)')
display()

In [None]:
sns.set(rc={'figure.figsize':(20,10)})
sns.scatterplot(
    data=dataset.query('goldilocks_temp==True'),        
    x="koi_teq",
    y="koi_prad",
    size="koi_prad", sizes=(20,400),    
    hue="goldilocks", palette="RdBu",
)
plt.title('Confirmed Exoplanets with the Goldilocks Size and Temperature')
plt.xlabel('Temperature (Kelvin)')
plt.ylabel('Radius (Earth Radii)')
display()

# Different Types of Goldilocks Exoplanet

This is the list of Goldilocks Exoplanets

In [None]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(   "Number of potentually habitable exoplanets: " + str(dataset.query('goldilocks==True').shape[0]) )
    print(   "Names of potentually habitable exoplanets: " + ", ".join( dataset.query('goldilocks==True')['kepler_name'].tolist()) )    
    display( dataset.query('goldilocks==True') )

#### Orbital Distance vs Stellar Mass
Plotting the Goldilocks Exoplanets against Stellar Mass and Orbital Distance.

There is a strong linear correlation between the Stellar Mass (and by extension Surface Temperature), with the Orbital Radius of the Habitable Zone

Within the correlation, there still appears to be 4 distinct clusters, possibly indicating different classes of [Red Dwarfs](https://en.wikipedia.org/wiki/Red_dwarf) and [Main Sequence](https://en.wikipedia.org/wiki/Main_sequence) stars

In [None]:
df = dataset.query('goldilocks==True')
df['KMeans_StarType'] = sklearn.cluster.KMeans(n_clusters=4).fit_predict(df[['koi_sma','koi_smass']])

plot = sns.scatterplot(
    data=df,        
    x="koi_sma",
    y="koi_smass",

    size="koi_prad", sizes=(20,400),    
    hue="KMeans_StarType", palette="Blues",
#     hue="koi_teq", palette="RdBu_r",
)
for line in range(0,df.shape[0]):
     plot.text(
         df['koi_sma'][line]+0.005, 
         df['koi_smass'][line], 
         df['kepler_name'][line], 
         horizontalalignment='left', 
         size='medium', 
         color='black', 
         # weight='semibold'
     )

plt.title('Confirmed Goldilocks Exoplanets')
plt.xlabel('Semi Major Axis / Orbital Distance (AU)')
plt.ylabel('Stellar Mass (solar mass)')
display()

#### Planetary Radius vs Stellar Metallicity

Plotting Planetary Radius against Stellar Metallicity, may provide insight groupings into planet composition, 
as high-metal stars are more likely to form rocky planets, rather than water/ice worlds or carbon planets.

In [None]:
df = dataset.query('goldilocks==True')
df['KMeans_PlanetType'] = sklearn.cluster.KMeans(n_clusters=6).fit_predict(df[['koi_smet','koi_prad']])

plot = sns.scatterplot(
    data=df,        
    x="koi_smet",
    y="koi_prad",

    size="koi_prad", sizes=(20,400),    
    hue="KMeans_PlanetType", palette="Accent",
)
for line in range(0,df.shape[0]):
     plot.text(
         df['koi_smet'][line]+0.005, 
         df['koi_prad'][line], 
         df['kepler_name'][line], 
         horizontalalignment='left', 
         size='medium', 
         color='black', 
         # weight='semibold'
     )

plt.title('Confirmed Goldilocks Exoplanets')
plt.xlabel('Stellar Metallicity')
plt.ylabel('Planetary Radius (Earth radii)')
display()

#### Starmap - Where are my habitable exoplanets?

Location of goldilocks exoplanets in the night sky

In [None]:
display(
    sns.scatterplot(
        data=dataset,
        x="ra", 
        y="dec",
        sizes=(200,20),
        size="goldilocks", 
        hue="goldilocks", palette="hot",
    )
)