# OrbitalFish Exploratory Analysis
Author: SomeRandomTV (from ZiaTechnica)

Dataset: NASA Exoplanet

Task: Explore the dataset, get into tune with it, know it all \
I want to know what each feature is and how they correlate with each other

In [27]:
import pandas as pd
import numpy as np

In [11]:
exoplanets = pd.read_csv('../data/exoplanets.csv')  # load the csv into a dataframe


## Data Exploration

Now that we have the data loaded, let's explore it. \
Some things we need do are:
- Sanity checks
- Encoding categoric data
- Handling Missing values
- Data type conversion
- Duplicate Removal
- Normalizing/scaling data
- Data Visualization
- Univariate Statistics & Distributions
- Outlier Detection & Treatment
- Correlation & Multicollinearity
- Feature–Target Relationships
- Class Imbalance / Target Distribution
- Cardinality & Rare Levels
- Dimensionality Reduction / Clustering Preview
- Feature Engineering Ideation
- Data Leakage Audit
- Sampling & Performance Considerations
- Automated Profiling & Documentation

### 1) Dataframe sanity checks

Here is the kind of order I look at doing
- Dataframe info(Features, types, etc)
- Dataframe Shape
- Null Values(Percentage of them)

After that do this:
- Encode categorical data
- Handle missing values

In [26]:
exoplanets.head()

Unnamed: 0,kepid,kepoi_name,kepler_name,koi_disposition,koi_pdisposition,koi_score,koi_fpflag_nt,koi_fpflag_ss,koi_fpflag_co,koi_fpflag_ec,...,koi_steff_err2,koi_slogg,koi_slogg_err1,koi_slogg_err2,koi_srad,koi_srad_err1,koi_srad_err2,ra,dec,koi_kepmag
0,10797460,K00752.01,Kepler-227 b,CONFIRMED,CANDIDATE,1.0,0,0,0,0,...,-81.0,4.467,0.064,-0.096,0.927,0.105,-0.061,291.93423,48.141651,15.347
1,10797460,K00752.02,Kepler-227 c,CONFIRMED,CANDIDATE,0.969,0,0,0,0,...,-81.0,4.467,0.064,-0.096,0.927,0.105,-0.061,291.93423,48.141651,15.347
2,10811496,K00753.01,,CANDIDATE,CANDIDATE,0.0,0,0,0,0,...,-176.0,4.544,0.044,-0.176,0.868,0.233,-0.078,297.00482,48.134129,15.436
3,10848459,K00754.01,,FALSE POSITIVE,FALSE POSITIVE,0.0,0,1,0,0,...,-174.0,4.564,0.053,-0.168,0.791,0.201,-0.067,285.53461,48.28521,15.597
4,10854555,K00755.01,Kepler-664 b,CONFIRMED,CANDIDATE,1.0,0,0,0,0,...,-211.0,4.438,0.07,-0.21,1.046,0.334,-0.133,288.75488,48.2262,15.509


In [23]:
exoplanets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9564 entries, 0 to 9563
Data columns (total 49 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   kepid              9564 non-null   int64  
 1   kepoi_name         9564 non-null   object 
 2   kepler_name        2359 non-null   object 
 3   koi_disposition    9564 non-null   object 
 4   koi_pdisposition   9564 non-null   object 
 5   koi_score          8054 non-null   float64
 6   koi_fpflag_nt      9564 non-null   int64  
 7   koi_fpflag_ss      9564 non-null   int64  
 8   koi_fpflag_co      9564 non-null   int64  
 9   koi_fpflag_ec      9564 non-null   int64  
 10  koi_period         9564 non-null   float64
 11  koi_period_err1    9110 non-null   float64
 12  koi_period_err2    9110 non-null   float64
 13  koi_time0bk        9564 non-null   float64
 14  koi_time0bk_err1   9110 non-null   float64
 15  koi_time0bk_err2   9110 non-null   float64
 16  koi_impact         9201 

In [24]:
exoplanets.shape

(9564, 49)

In [25]:
exoplanets.isnull().sum() / 100

kepid                 0.00
kepoi_name            0.00
kepler_name          72.05
koi_disposition       0.00
koi_pdisposition      0.00
koi_score            15.10
koi_fpflag_nt         0.00
koi_fpflag_ss         0.00
koi_fpflag_co         0.00
koi_fpflag_ec         0.00
koi_period            0.00
koi_period_err1       4.54
koi_period_err2       4.54
koi_time0bk           0.00
koi_time0bk_err1      4.54
koi_time0bk_err2      4.54
koi_impact            3.63
koi_impact_err1       4.54
koi_impact_err2       4.54
koi_duration          0.00
koi_duration_err1     4.54
koi_duration_err2     4.54
koi_depth             3.63
koi_depth_err1        4.54
koi_depth_err2        4.54
koi_prad              3.63
koi_prad_err1         3.63
koi_prad_err2         3.63
koi_teq               3.63
koi_teq_err1         95.64
koi_teq_err2         95.64
koi_insol             3.21
koi_insol_err1        3.21
koi_insol_err2        3.21
koi_model_snr         3.63
koi_tce_plnt_num      3.46
koi_tce_delivname     3.46
k

#### Sanity checks results

There are:
- 48 features
-