# Exoplanets exploration

### Author: Alexander K.

## Overview

In this project explore the exoplanets data that contains information on all known exoplanets (planets outside our solar system) discovered by NASA's various space missions, ground-based observatories, and other sources.        

## Project Goals
The goals of this project are ...

Some questions that are posed:
 - question 1
 - question 2
 - question 3
 - etc.

## Actions

- analyze data;
- clean up the datasets;
- visualize the data using graphs and charts;
- seek to answer the questions;
- making conclusions based on the analysis.

## Data

There is dataset:
1. `nasa_exoplanets.csv` - contains information on all known exoplanets.
The dataset includes information such as the planet's name, mass, radius, distance from its host star, orbital period, and other physical characteristics. The dataset also includes information on the host star, such as its name, mass, and radius.

The dataset can be found [here](https://www.kaggle.com/datasets/adityamishraml/nasaexoplanets/data).


## Analysis

In this section, we will employ descriptive statistics and data visualization methods to gain a deeper understanding of the data. Some of the key metrics that will be calculated include:
1. Frequency distributions
1. Counts
1. Relationships between ...
1. etc.

In [2]:
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# setting options
pd.set_option('display.max_columns', None)
# pd.set_option("display.float_format", "{:.2f}".format)
pd.set_option('max_colwidth', 0)

In [4]:
# load the dataset

exoplanets = pd.read_csv('nasa_exoplanets.csv')

## Exploratory Data Analysis (EDA)

In [5]:
# inspecting dataset
exoplanets.head()

Unnamed: 0,name,distance,stellar_magnitude,planet_type,discovery_year,mass_multiplier,mass_wrt,radius_multiplier,radius_wrt,orbital_radius,orbital_period,eccentricity,detection_method
0,11 Comae Berenices b,304.0,4.72307,Gas Giant,2007,19.4,Jupiter,1.08,Jupiter,1.29,0.892539,0.23,Radial Velocity
1,11 Ursae Minoris b,409.0,5.013,Gas Giant,2009,14.74,Jupiter,1.09,Jupiter,1.53,1.4,0.08,Radial Velocity
2,14 Andromedae b,246.0,5.23133,Gas Giant,2008,4.8,Jupiter,1.15,Jupiter,0.83,0.508693,0.0,Radial Velocity
3,14 Herculis b,58.0,6.61935,Gas Giant,2002,8.13881,Jupiter,1.12,Jupiter,2.773069,4.8,0.37,Radial Velocity
4,16 Cygni B b,69.0,6.215,Gas Giant,1996,1.78,Jupiter,1.2,Jupiter,1.66,2.2,0.68,Radial Velocity


In [6]:
# inspecting dataset
print(f"The shape of the exoplanets dataset is - {exoplanets.shape}")

The shape of the exoplanets dataset is - (5250, 13)


In [7]:
# inspecting dataset
exoplanets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5250 entries, 0 to 5249
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   name               5250 non-null   object 
 1   distance           5233 non-null   float64
 2   stellar_magnitude  5089 non-null   float64
 3   planet_type        5250 non-null   object 
 4   discovery_year     5250 non-null   int64  
 5   mass_multiplier    5227 non-null   float64
 6   mass_wrt           5227 non-null   object 
 7   radius_multiplier  5233 non-null   float64
 8   radius_wrt         5233 non-null   object 
 9   orbital_radius     4961 non-null   float64
 10  orbital_period     5250 non-null   float64
 11  eccentricity       5250 non-null   float64
 12  detection_method   5250 non-null   object 
dtypes: float64(7), int64(1), object(5)
memory usage: 533.3+ KB


In [8]:
# columns in the dataset
exoplanets.columns.to_list()

['name',
 'distance',
 'stellar_magnitude',
 'planet_type',
 'discovery_year',
 'mass_multiplier',
 'mass_wrt',
 'radius_multiplier',
 'radius_wrt',
 'orbital_radius',
 'orbital_period',
 'eccentricity',
 'detection_method']

There are 5250 rows and 13 columns in the exoplanets dataset:
* `name` - the name of each exoplanet
* `distance` - the distance of each exoplanet from Earth (in light-years)
* `stellar_magnitude` - brightness of the planet, the brighter the planet the lower number is assigned to the planet
* `planet_type` - type of the planet, these types are derived from our solar system planets
* `discovery_year` - year in which planet got discovered
* `mass_multiplier` - mass multiplier of the planet with mass_wrt planet
* `mass_wrt` - mass of the planet in comparison with the mass of planets of our solar system
* `radius_multiplier` - radius multiplier of the planet with radius_wrt planet
* `radius_wrt` - radius of the planet in comparison with the radius of planets of our solar system
* `orbital_radius` - orbital radius of planets orbiting around their sun (in AU)
* `orbital_period` - time the planet takes to make a complete orbit around the host star or system
* `eccentricity` - amount by which the orbit of the planet deviates from a perfect circle
* `detection_method` - the method used to detect the exoplanets


### Planet Types Analysis

In [13]:
# inspecting and count of unique planet_types
planet_types = exoplanets['planet_type'].unique()
print(f"The unique planet types are: {planet_types}")
count_planet_types = exoplanets['planet_type'].nunique()
print(f"There are {count_planet_types} unique planet types.")

The unique planet types are: ['Gas Giant' 'Super Earth' 'Neptune-like' 'Terrestrial' 'Unknown']
There are 5 unique planet types.


In [15]:
#  count of planets by planet_type
planet_type_counts = exoplanets['planet_type'].value_counts()
print("Count of planets by planet type:")
print(planet_type_counts)

Count of planets by planet type:
planet_type
Neptune-like    1825
Gas Giant       1630
Super Earth     1595
Terrestrial     195 
Unknown         5   
Name: count, dtype: int64


The column `planet_type` contains data of different types of exoplanets. There are a lot of `Neptune-like` planets (1825). Followed by `Gas Giant` (1630) and `Super Earth` (1595) types of exoplanets. Finally, there are only a few `Terrestrial` (195) and `Unknown` (5) types of exoplanets.

Let's dive deeper into the analysis of planet types and take a closer look at the terrestrial planets.

In [19]:
# filtering planets
filter_by_type = exoplanets['planet_type'] == 'Terrestrial'
terrestrial_planets = exoplanets[filter_by_type]

In [20]:
# displaying planets sorted by distance
terrestrial_planets.head().sort_values(by='distance', ascending=True)

Unnamed: 0,name,distance,stellar_magnitude,planet_type,discovery_year,mass_multiplier,mass_wrt,radius_multiplier,radius_wrt,orbital_radius,orbital_period,eccentricity,detection_method
151,EPIC 206215704 b,358.0,17.83,Terrestrial,2019,0.972,Earth,1.0,Earth,,0.006297,0.0,Transit
137,EPIC 201497682 b,825.0,13.948,Terrestrial,2019,0.26,Earth,0.692,Earth,,0.005749,0.0,Transit
142,EPIC 201833600 c,840.0,14.705,Terrestrial,2019,0.972,Earth,1.0,Earth,,0.010951,0.0,Transit
152,EPIC 206317286 b,1025.0,14.005,Terrestrial,2019,0.84,Earth,0.96,Earth,,0.004381,0.0,Transit
141,EPIC 201757695.02,1884.0,14.974,Terrestrial,2020,0.688,Earth,0.908,Earth,0.0296,0.005476,0.0,Transit


The nearest exoplanet of `Terrestrial` type is `EPIC 206215704 b`, which is located at a distance of 358 light-years from Earth.