# NASA Near-Earth Object Analysis

This notebook analyzes NEO data for clustering and pattern discovery.


In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score


In [23]:
df = pd.read_csv('nasa_neo_browse_dataset.csv')

## Data Loading & Initial Inspection

Loading the NASA NEO dataset and examining its basic properties.


In [24]:
df.shape

(110, 19)

In [25]:
df.columns

Index(['id', 'neo_reference_id', 'name', 'nasa_jpl_url',
       'absolute_magnitude_h', 'is_potentially_hazardous_asteroid',
       'close_approach_data', 'is_sentry_object', 'links.self',
       'estimated_diameter.kilometers.estimated_diameter_min',
       'estimated_diameter.kilometers.estimated_diameter_max',
       'estimated_diameter.meters.estimated_diameter_min',
       'estimated_diameter.meters.estimated_diameter_max',
       'estimated_diameter.miles.estimated_diameter_min',
       'estimated_diameter.miles.estimated_diameter_max',
       'estimated_diameter.feet.estimated_diameter_min',
       'estimated_diameter.feet.estimated_diameter_max', 'sentry_data',
       'observation_date'],
      dtype='object')

In [26]:
df.head()

Unnamed: 0,id,neo_reference_id,name,nasa_jpl_url,absolute_magnitude_h,is_potentially_hazardous_asteroid,close_approach_data,is_sentry_object,links.self,estimated_diameter.kilometers.estimated_diameter_min,estimated_diameter.kilometers.estimated_diameter_max,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,estimated_diameter.miles.estimated_diameter_min,estimated_diameter.miles.estimated_diameter_max,estimated_diameter.feet.estimated_diameter_min,estimated_diameter.feet.estimated_diameter_max,sentry_data,observation_date
0,2436030,2436030,436030 (2009 JO2),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,19.44,False,"[{'close_approach_date': '2025-04-20', 'close_...",False,http://api.nasa.gov/neo/rest/v1/neo/2436030?ap...,0.343997,0.769201,343.997255,769.201245,0.21375,0.477959,1128.599953,2523.626214,,2025-04-20
1,3137735,3137735,(2002 TX59),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,23.9,False,"[{'close_approach_date': '2025-04-20', 'close_...",False,http://api.nasa.gov/neo/rest/v1/neo/3137735?ap...,0.044112,0.098637,44.11182,98.637028,0.02741,0.06129,144.723824,323.612307,,2025-04-20
2,3153509,3153509,(2003 HM),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,22.04,True,"[{'close_approach_date': '2025-04-20', 'close_...",False,http://api.nasa.gov/neo/rest/v1/neo/3153509?ap...,0.103886,0.232295,103.88551,232.295062,0.064551,0.144341,340.831737,762.122933,,2025-04-20
3,3654379,3654379,(2013 XV8),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,21.89,False,"[{'close_approach_date': '2025-04-20', 'close_...",False,http://api.nasa.gov/neo/rest/v1/neo/3654379?ap...,0.111315,0.248909,111.31533,248.908644,0.069168,0.154665,365.207786,816.629435,,2025-04-20
4,3830886,3830886,(2018 SN2),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,24.4,False,"[{'close_approach_date': '2025-04-20', 'close_...",False,http://api.nasa.gov/neo/rest/v1/neo/3830886?ap...,0.035039,0.07835,35.039264,78.350176,0.021772,0.048685,114.958219,257.054393,,2025-04-20


In [27]:
df.isnull().sum()

id                                                        0
neo_reference_id                                          0
name                                                      0
nasa_jpl_url                                              0
absolute_magnitude_h                                      0
is_potentially_hazardous_asteroid                         0
close_approach_data                                       0
is_sentry_object                                          0
links.self                                                0
estimated_diameter.kilometers.estimated_diameter_min      0
estimated_diameter.kilometers.estimated_diameter_max      0
estimated_diameter.meters.estimated_diameter_min          0
estimated_diameter.meters.estimated_diameter_max          0
estimated_diameter.miles.estimated_diameter_min           0
estimated_diameter.miles.estimated_diameter_max           0
estimated_diameter.feet.estimated_diameter_min            0
estimated_diameter.feet.estimated_diamet

## Missing Value Analysis

Checking for null values to inform cleaning decisions.


##### Because sentry_data have a lot of null values I am gonna remove column


In [28]:
df.drop(['sentry_data','id','neo_reference_id'],axis=1, inplace=True)

In [29]:
df.isnull().sum()

name                                                    0
nasa_jpl_url                                            0
absolute_magnitude_h                                    0
is_potentially_hazardous_asteroid                       0
close_approach_data                                     0
is_sentry_object                                        0
links.self                                              0
estimated_diameter.kilometers.estimated_diameter_min    0
estimated_diameter.kilometers.estimated_diameter_max    0
estimated_diameter.meters.estimated_diameter_min        0
estimated_diameter.meters.estimated_diameter_max        0
estimated_diameter.miles.estimated_diameter_min         0
estimated_diameter.miles.estimated_diameter_max         0
estimated_diameter.feet.estimated_diameter_min          0
estimated_diameter.feet.estimated_diameter_max          0
observation_date                                        0
dtype: int64

##### Choosing columns for further processing


In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110 entries, 0 to 109
Data columns (total 16 columns):
 #   Column                                                Non-Null Count  Dtype  
---  ------                                                --------------  -----  
 0   name                                                  110 non-null    object 
 1   nasa_jpl_url                                          110 non-null    object 
 2   absolute_magnitude_h                                  110 non-null    float64
 3   is_potentially_hazardous_asteroid                     110 non-null    bool   
 4   close_approach_data                                   110 non-null    object 
 5   is_sentry_object                                      110 non-null    bool   
 6   links.self                                            110 non-null    object 
 7   estimated_diameter.kilometers.estimated_diameter_min  110 non-null    float64
 8   estimated_diameter.kilometers.estimated_diameter_max  110 no

In [31]:
df['is_potentially_hazardous_asteroid'] = df['is_potentially_hazardous_asteroid'].astype(np.int64)
df['is_sentry_object'] = df['is_sentry_object'].astype(np.int64)

In [32]:
df.head()

Unnamed: 0,name,nasa_jpl_url,absolute_magnitude_h,is_potentially_hazardous_asteroid,close_approach_data,is_sentry_object,links.self,estimated_diameter.kilometers.estimated_diameter_min,estimated_diameter.kilometers.estimated_diameter_max,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,estimated_diameter.miles.estimated_diameter_min,estimated_diameter.miles.estimated_diameter_max,estimated_diameter.feet.estimated_diameter_min,estimated_diameter.feet.estimated_diameter_max,observation_date
0,436030 (2009 JO2),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,19.44,0,"[{'close_approach_date': '2025-04-20', 'close_...",0,http://api.nasa.gov/neo/rest/v1/neo/2436030?ap...,0.343997,0.769201,343.997255,769.201245,0.21375,0.477959,1128.599953,2523.626214,2025-04-20
1,(2002 TX59),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,23.9,0,"[{'close_approach_date': '2025-04-20', 'close_...",0,http://api.nasa.gov/neo/rest/v1/neo/3137735?ap...,0.044112,0.098637,44.11182,98.637028,0.02741,0.06129,144.723824,323.612307,2025-04-20
2,(2003 HM),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,22.04,1,"[{'close_approach_date': '2025-04-20', 'close_...",0,http://api.nasa.gov/neo/rest/v1/neo/3153509?ap...,0.103886,0.232295,103.88551,232.295062,0.064551,0.144341,340.831737,762.122933,2025-04-20
3,(2013 XV8),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,21.89,0,"[{'close_approach_date': '2025-04-20', 'close_...",0,http://api.nasa.gov/neo/rest/v1/neo/3654379?ap...,0.111315,0.248909,111.31533,248.908644,0.069168,0.154665,365.207786,816.629435,2025-04-20
4,(2018 SN2),https://ssd.jpl.nasa.gov/tools/sbdb_lookup.htm...,24.4,0,"[{'close_approach_date': '2025-04-20', 'close_...",0,http://api.nasa.gov/neo/rest/v1/neo/3830886?ap...,0.035039,0.07835,35.039264,78.350176,0.021772,0.048685,114.958219,257.054393,2025-04-20


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110 entries, 0 to 109
Data columns (total 16 columns):
 #   Column                                                Non-Null Count  Dtype  
---  ------                                                --------------  -----  
 0   name                                                  110 non-null    object 
 1   nasa_jpl_url                                          110 non-null    object 
 2   absolute_magnitude_h                                  110 non-null    float64
 3   is_potentially_hazardous_asteroid                     110 non-null    int64  
 4   close_approach_data                                   110 non-null    object 
 5   is_sentry_object                                      110 non-null    int64  
 6   links.self                                            110 non-null    object 
 7   estimated_diameter.kilometers.estimated_diameter_min  110 non-null    float64
 8   estimated_diameter.kilometers.estimated_diameter_max  110 no

In [34]:
numerical_features = df.select_dtypes(include=[np.number]).columns.tolist()

In [35]:
numerical_features

['absolute_magnitude_h',
 'is_potentially_hazardous_asteroid',
 'is_sentry_object',
 'estimated_diameter.kilometers.estimated_diameter_min',
 'estimated_diameter.kilometers.estimated_diameter_max',
 'estimated_diameter.meters.estimated_diameter_min',
 'estimated_diameter.meters.estimated_diameter_max',
 'estimated_diameter.miles.estimated_diameter_min',
 'estimated_diameter.miles.estimated_diameter_max',
 'estimated_diameter.feet.estimated_diameter_min',
 'estimated_diameter.feet.estimated_diameter_max']

In [36]:
df = df[numerical_features]

In [37]:
df.head()

Unnamed: 0,absolute_magnitude_h,is_potentially_hazardous_asteroid,is_sentry_object,estimated_diameter.kilometers.estimated_diameter_min,estimated_diameter.kilometers.estimated_diameter_max,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,estimated_diameter.miles.estimated_diameter_min,estimated_diameter.miles.estimated_diameter_max,estimated_diameter.feet.estimated_diameter_min,estimated_diameter.feet.estimated_diameter_max
0,19.44,0,0,0.343997,0.769201,343.997255,769.201245,0.21375,0.477959,1128.599953,2523.626214
1,23.9,0,0,0.044112,0.098637,44.11182,98.637028,0.02741,0.06129,144.723824,323.612307
2,22.04,1,0,0.103886,0.232295,103.88551,232.295062,0.064551,0.144341,340.831737,762.122933
3,21.89,0,0,0.111315,0.248909,111.31533,248.908644,0.069168,0.154665,365.207786,816.629435
4,24.4,0,0,0.035039,0.07835,35.039264,78.350176,0.021772,0.048685,114.958219,257.054393


## Feature Selection Results

The selected numerical features include:
- Absolute magnitude (brightness)
- Binary indicators (hazardous, sentry)
- Estimated diameter measurements in different units

These features will be used for clustering to identify groups of similar NEOs.

## Next Steps

1. Remove redundant features (different unit measurements of the same values)
2. Apply feature scaling to normalize the data before clustering
3. Perform dimensionality reduction with PCA
4. Apply KMeans clustering and evaluate with silhouette score
5. Visualize and interpret the results

2025-04-01 2025-04-08
2025-04-08 2025-04-15
2025-04-15 2025-04-22
2025-04-22 2025-04-29
