<a href="https://colab.research.google.com/github/Kostratana/NASA_project/blob/main/%22NASA_project_1%22.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The project is based on the use of data from the NASA Exoplanet Archive https://exoplanetarchive.ipac.caltech.edu/cgi-bin/nstedAPI/nph-nstedAPI? to analyze candidates for exoplanet status, aiming to predict their ability to support life and the presence of water. The code extracts data about these candidate exoplanets through the NASA API, processes it into a DataFrame, and displays the first few rows of the data. It normalizes various characteristics of the candidates, calculates new parameters, such as the product of orbital periods and stellar insolation, and adds binary indicators for the possibility of life based on temperature.
Furthermore, the project includes the creation of a classification model using the Random Forest algorithm, which is trained on balanced data using the SMOTE method to improve predictions. The results indicate that some candidates, such as K07849.01 and K03395.02, have high chances of habitability, while others, like K07106.01, are not suitable for life. Additionally, the project predicts the probability of water presence on the candidate exoplanets, with K07849.01 showing the highest probability (0.60), followed closely by K07106.01 (0.58), indicating their potential to support life.
Overall, the project demonstrates how NASA data can be utilized to analyze candidate exoplanets and predict their suitability for life, as well as the presence of water, which is a key factor for future research and colonization of other planets.

Links to libraries : requests: https://docs.python-requests.org/en/latest/
pandas: https://pandas.pydata.org/docs/
io: https://docs.python.org/3/library/io.html
scikit-learn: https://scikit-learn.org/stable/
numpy: https://numpy.org/doc/stable/
imblearn: https://imbalanced-learn.org/stable/
seaborn: https://seaborn.pydata.org/
matplotlib: https://matplotlib.org/stable/contents.html

1-This code retrieves candidate exoplanet data from the NASA Exoplanet Archive API, processes it into a DataFrame, and displays the first few rows of the data if the request is successful.

In [None]:
import requests
import pandas as pd
from io import StringIO

# Base URL for NASA Exoplanet Archive API
base_url = "https://exoplanetarchive.ipac.caltech.edu/cgi-bin/nstedAPI/nph-nstedAPI?"

# Define request parameters
table = "cumulative"  # Select the table with data
where_clause = "koi_disposition like 'CANDIDATE'"  # Condition to select candidates
order_by = "koi_period"  # Sort by orbital period
format_type = "csv"  # Data format

# Formulate the request URL
query_url = f"{base_url}table={table}&where={where_clause}&order={order_by}&format={format_type}"

# Execute the request
response = requests.get(query_url)

if response.status_code == 200:
    # Read data into DataFrame
    data = pd.read_csv(StringIO(response.text))

    # Output the first few rows of data
    print(data.head())
else:
    print(f"Error executing request: {response.status_code}")


     kepid kepoi_name  kepler_name koi_disposition koi_pdisposition  koi_score  koi_fpflag_nt  koi_fpflag_ss  koi_fpflag_co  koi_fpflag_ec  koi_period  koi_period_err1  koi_period_err2  koi_time0bk  koi_time0bk_err1  koi_time0bk_err2  koi_impact  koi_impact_err1  koi_impact_err2  koi_duration  koi_duration_err1  koi_duration_err2  koi_depth  koi_depth_err1  koi_depth_err2  koi_prad  koi_prad_err1  koi_prad_err2  koi_teq  koi_teq_err1  koi_teq_err2  koi_insol  koi_insol_err1  koi_insol_err2  koi_model_snr  koi_tce_plnt_num koi_tce_delivname  koi_steff  koi_steff_err1  koi_steff_err2  koi_slogg  koi_slogg_err1  koi_slogg_err2  koi_srad  koi_srad_err1  koi_srad_err2        ra_str       dec_str  koi_kepmag  koi_kepmag_err
0  7582691  K04419.01          NaN       CANDIDATE        CANDIDATE       1.00              0              0              0              0        0.26             0.00            -0.00       131.85              0.00             -0.00        0.97             0.04          

2-This code imports necessary libraries, creates a DataFrame with example exoplanet data, normalizes the numerical features using StandardScaler, and outputs a new DataFrame containing the normalized data along with the original planet names.

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Assume you already have a DataFrame with data
# df = pd.read_csv('your_data.csv')  # Load your data

# Example data (replace with your data)
data = {
    'koi_period': [0.26, 0.31, 0.34, 0.35, 0.43],
    'koi_teq': [1559.00, 2318.00, 3653.00, 2194.00, 2017.00],
    'koi_prad': [1.22, 0.93, 2.09, 1.00, 0.87],
    'koi_srad': [4.74, 4.51, 3.85, 4.60, 4.57],
    'koi_insol': [0.53, 0.79, 2.18, 0.77, 0.78],
    'planet_names': ['Kepler-22b', 'Kepler-16b', 'Kepler-186f', 'Kepler-442b', 'Kepler-69c']  # Original planet names from NASA archive
}

df = pd.DataFrame(data)

# Normalization of data
scaler = StandardScaler()
normalized_data = scaler.fit_transform(df.drop(columns=['planet_names']))  # Exclude planet names from normalization

# Create a new DataFrame with normalized data
normalized_df = pd.DataFrame(normalized_data, columns=df.columns[:-1])  # Exclude planet names from columns

# Add planet names to the new DataFrame
normalized_df['planet_names'] = df['planet_names']

# Output normalized data
print(normalized_df)


   koi_period  koi_teq  koi_prad  koi_srad  koi_insol planet_names
0       -1.40    -1.13     -0.00      0.92      -0.81   Kepler-22b
1       -0.50    -0.04     -0.65      0.18      -0.37   Kepler-16b
2        0.04     1.86      1.93     -1.94       1.97  Kepler-186f
3        0.22    -0.22     -0.49      0.47      -0.40  Kepler-442b
4        1.65    -0.47     -0.78      0.37      -0.39   Kepler-69c


3-This code snippet displays a DataFrame containing normalized features of exoplanets obtained from the NASA archive. The columns represent various characteristics of the exoplanets:

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Assume you already have a DataFrame with data
# df = pd.read_csv('your_data.csv')  # Load your data

# Example data (replace with your data)
data = {
    'koi_period': [0.26, 0.31, 0.34, 0.35, 0.43],
    'koi_teq': [1559.00, 2318.00, 3653.00, 2194.00, 2017.00],
    'koi_prad': [1.22, 0.93, 2.09, 1.00, 0.87],
    'koi_srad': [4.74, 4.51, 3.85, 4.60, 4.57],
    'koi_insol': [0.53, 0.79, 2.18, 0.77, 0.78],
    'planet_names': ['Kepler-22b', 'Kepler-16b', 'Kepler-186f', 'Kepler-442b', 'Kepler-69c']  # Original planet names from NASA archive
}

df = pd.DataFrame(data)

# Normalization of data
scaler = StandardScaler()
normalized_data = scaler.fit_transform(df.drop(columns=['planet_names']))  # Exclude planet names from normalization

# Create a new DataFrame with normalized data
normalized_df = pd.DataFrame(normalized_data, columns=df.columns[:-1])  # Exclude planet names from columns

# Add planet names to the new DataFrame
normalized_df['planet_names'] = df['planet_names']

# Output normalized data
print(normalized_df)


   koi_period  koi_teq  koi_prad  koi_srad  koi_insol planet_names
0       -1.40    -1.13     -0.00      0.92      -0.81   Kepler-22b
1       -0.50    -0.04     -0.65      0.18      -0.37   Kepler-16b
2        0.04     1.86      1.93     -1.94       1.97  Kepler-186f
3        0.22    -0.22     -0.49      0.47      -0.40  Kepler-442b
4        1.65    -0.47     -0.78      0.37      -0.39   Kepler-69c


4-This code normalizes various features of exoplanets, calculates a new feature based on the product of koi_period and koi_insol, and adds a binary indicator for the possibility of life based on temperature, while retaining the original planet names.


In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Example data
data = {
    'koi_period': [0.26, 0.31, 0.34, 0.35, 0.43],
    'koi_teq': [1559.00, 2318.00, 3653.00, 2194.00, 2017.00],
    'koi_prad': [1.22, 0.93, 2.09, 1.00, 0.87],
    'koi_srad': [4.74, 4.51, 3.85, 4.60, 4.57],
    'koi_insol': [0.53, 0.79, 2.18, 0.77, 0.78],
    'planet_names': ['Kepler-22b', 'Kepler-16b', 'Kepler-186f', 'Kepler-442b', 'Kepler-69c']  # Original planet names from NASA archive
}

# Create DataFrame
df = pd.DataFrame(data)

# Create a new feature: product of koi_period and koi_insol
df['koi_period_insol'] = df['koi_period'] * df['koi_insol']

# Add a binary feature: "life possible" (e.g., if koi_teq < 3000)
df['life_possible'] = np.where(df['koi_teq'] < 3000, 1, 0)

# Normalize the numerical data only, excluding planet_names
scaler = StandardScaler()
normalized_data = scaler.fit_transform(df.drop(columns=['planet_names']))  # Exclude planet names from normalization

# Create a new DataFrame with normalized data
normalized_df = pd.DataFrame(normalized_data, columns=df.columns[:-1])  # Exclude planet names from columns

# Add planet names to the new DataFrame
normalized_df['planet_names'] = df['planet_names']

# Output normalized data
print(normalized_df)


   koi_period  koi_teq  koi_prad  koi_srad  koi_insol planet_names  koi_period_insol
0       -1.40    -1.13     -0.00      0.92      -0.81   Kepler-22b              0.50
1       -0.50    -0.04     -0.65      0.18      -0.37   Kepler-16b              0.50
2        0.04     1.86      1.93     -1.94       1.97  Kepler-186f             -2.00
3        0.22    -0.22     -0.49      0.47      -0.40  Kepler-442b              0.50
4        1.65    -0.47     -0.78      0.37      -0.39   Kepler-69c              0.50
