# Classification of Hazardous/Nonhazardous Nearest Earth Objects

### Introduction
A near-earth object (NEO) is described as an asteroid or comet on a path that approaches within 1.3 astronomical units of the sun (approximately 150 million kilometres), thereby heading towards Earth in the process.<sup>[1](#1)</sup> One motive for studying NEOs is to prevent collisions with Earth, as they can disrupt natural processes on Earth.<sup>[2](#1)</sup> The JPL Center for NEO Studies conducts research on these objects by computing orbits and assessing the impact risk of individual NEOs over time.<sup>[3](#1)</sup>

In this project, we want to expand on the research done by the JPL. Before assessing the potential threat of a NEO, we must predict its trajectory by studying attributes such as their size, mass, composition and speed.<sup>[2](#1)</sup> However, the data collected is imperfect because calculations are difficult with irregular objects, and the measurements are taken at various times, distances and phase angles.<sup>[3](#1)</sup> Considering this, we want to create a binary classifier that uses the K-nearest neighbours algorithm to predict whether a given NEO is harmful or not based on various quantitative measurements. As NEOs constantly monitored and their threat status changes based on updated trajectory predictions, it is more important that our classifier correctly identify a harmful NEO as harmful (i.e., high recall) rather than prevent false positives (i.e., high precision)<sup>[3](#1)</sup>. Our question is, what predictors and K value will produce a classifier that predicts the hazardous designation of a NEO with the highest accuracy and recall?

### NEO Dataset
The dataset we selected compiles the list of NASA certified asteroids that are classified as NEOs (i.e., Near-Earth Asteroids or NEAs).<sup>[4](#1)</sup>

1. id (unique identifier for each asteroid)
2. name (name of asteroid given by NASA)
3. est_diameter_min (minimum estimated diameter in kilometres)
4. est_diameter_max (maximum estimate diameter in kilometres)
5. relative_velocity (velocity relative to earth)
6. miss_distance (distance in kilometres missed)
7. orbiting_body (planet that the asteroid orbits)
8. sentry_obect (included in JPL Sentry System - an automated collision monitory system)
9. absolute_magnitude (describes intrinsic luminosity)
10. hazardous (boolean factor that indicates whether asteroid is harmful or not)

Notes:
- Absolute magnitude (H) is a measure of the asteroid's intrinsic mean brightness, i.e., the amount of light emitted. This value is uncertain, because it depends on the albedo (a) of the asteroid, i.e., average reflectivity of light. In this dataset, the researchers used an arbitrary albedo value such that 1 km of aspherical NEA corresponds to an H = 17.75. The diameter of an asteroid can be estimated from its H and geometric albedo value.<sup>[5](#1)</sup>
- Could not find the meaning of "miss_distance" on JPL database. We're going to assume that the person who made the dataset mutated the distance columns (how close the asteroid will get to Earth).<sup>[6](#1)</sup>

In [3]:
# Run this cell before continuing
library(tidyverse)
library(repr)
library(tidymodels)
library(cowplot)

ERROR: Error in library(cowplot): there is no package called ‘cowplot’


In [4]:
options(repr.plot.width = 10, repr.plot.height = 5)
url <- "https://raw.githubusercontent.com/LongTortue/DSCI100-Group-Project/main/neo.csv"
neo_data <- read_csv(url)

neo_data <- neo_data |>
mutate(sentry_object = as_factor(sentry_object), hazardous = as_factor(hazardous))
head(neo_data) # transforming the categorical variables into factors

# as the data is unbalanced (81000 false to 9000 true), we will create a new dataset that is balanced

neo_data_false <- neo_data |>
filter(hazardous == "FALSE") |>
sample_n(400)
neo_data_true <- neo_data |>
filter(hazardous == "TRUE") |>
sample_n(400)
neo_data_balanced <- rbind(neo_data_false, neo_data_true) # balanced dataset

neo_data_train <- sample_n(neo_data_balanced, 300) # training dataset

neo_count <- neo_data_train |>
group_by(hazardous) |>
summarize(count = n())
neo_count # number of observations in each class

neo_count_orbiting <- neo_data_train |>
group_by(orbiting_body) |>
summarize(count = n())
neo_count_orbiting # all observations have orbiting bodies of Earth

neo_mean <- neo_data_train |>
select(-id, -name, -orbiting_body, -sentry_object, -hazardous) |>
map_df(mean)
neo_mean # mean of predictors

neo_na_count <- sum(is.na(neo_data_train))
neo_na_count # no missing values in our dataset

# plots visualizing distribution of data, and shown next to each other
neo_plot_max_diameter <- neo_data_train |>
ggplot(aes(x = est_diameter_max)) +
geom_histogram() +
xlab("Estimated Max Diameter (km)")

neo_plot_min_diameter <- neo_data_train |>
ggplot(aes(x = est_diameter_min)) +
geom_histogram() +
xlab("Estimated Min Diameter (km)")

neo_plot_velocity <- neo_data_train |>
ggplot(aes(x = relative_velocity)) +
geom_histogram() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Velocity (Relative to Earth)")

neo_plot_miss_distance <- neo_data_train |>
ggplot(aes(x = miss_distance)) +
geom_histogram() +
scale_x_log10(labels = label_comma()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Miss Distance (km)")

neo_plot_absolute_magnitude <- neo_data_train |>
ggplot(aes(x = absolute_magnitude)) +
geom_histogram() +
xlab("Absolute Magnitude")

plot_grid(neo_plot_max_diameter, neo_plot_min_diameter, neo_plot_velocity, 
          neo_plot_miss_distance, neo_plot_absolute_magnitude)

[1mRows: [22m[34m90836[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): name, orbiting_body
[32mdbl[39m (6): id, est_diameter_min, est_diameter_max, relative_velocity, miss_dis...
[33mlgl[39m (2): sentry_object, hazardous

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


id,name,est_diameter_min,est_diameter_max,relative_velocity,miss_distance,orbiting_body,sentry_object,absolute_magnitude,hazardous
<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<fct>,<dbl>,<fct>
2162635,162635 (2000 SS164),1.1982708,2.67941497,13569.25,54839744,Earth,False,16.73,False
2277475,277475 (2005 WK4),0.2658,0.59434687,73588.73,61438127,Earth,False,20.0,True
2512244,512244 (2015 YE18),0.72202956,1.61450717,114258.69,49798725,Earth,False,17.83,False
3596030,(2012 BV13),0.09650615,0.2157943,24764.3,25434973,Earth,False,22.2,False
3667127,(2014 GE35),0.25500869,0.57021676,42737.73,46275567,Earth,False,20.09,True
54138696,(2021 GY23),0.03635423,0.08129053,34297.59,40585691,Earth,False,24.32,False


hazardous,count
<fct>,<int>
False,162
True,138


orbiting_body,count
<chr>,<int>
Earth,300


est_diameter_min,est_diameter_max,relative_velocity,miss_distance,absolute_magnitude
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.1900522,0.4249696,54500.54,40590809,22.32958


ERROR: Error in plot_grid(neo_plot_max_diameter, neo_plot_min_diameter, neo_plot_velocity, : could not find function "plot_grid"


### Expected outcomes and significance:
Through this data analysis, we will be able to classify whether an unknown rock/object found near Earth may pose a potential risk to our safety or not. We will use different indicators such as distance, size, and speed to evaluate if a near Earth object (NEO) is hazardous (causing significant regional damage) or not. 

### What impact could such findings have?
This will have an impact on everyone's safety on Earth. With a reliable classification model, we will be able to determine whether a NEO may pose a significant risk to human life, and we will be able to decide whether or not to take action, such as evacuating areas at risk or attempting to destroy the object. Being able to classify an object's hazardousness through physical attributes allows us to be ready ahead of time and minimize potential damage.

### What future questions could this lead to?
Researchers can conduct further research on the types of damage NEOs can cause and their impacts. They can analyze the effectiveness of precautions taken and learn how to further improve these precautions to minimize potential harm. Additionally, researchers may identify more predictive factors of a NEO, and develop even better classifiers.

<a id="1"></a> 
### References
1. Keeping an eye on Space Rocks. (n.d.) NASA/JPL Caltech. https://www.jpl.nasa.gov/keeping-an-eye-on-space-rocks
2. NEO Basics. (n.d.) NASA/JPL CNEOS. https://cneos.jpl.nasa.gov/about/target_earth.html
3. Impact Risk. (n.d.) NASA/JPL CNEOS. https://cneos.jpl.nasa.gov/risk/intro.html
4. Vani, Sameep. NASA - Nearest Earth Objects. (n.d.) Kaggle. Retrieved October 26, 2023, from https://www.kaggle.com/datasets/sameepvani/nasa-nearest-earth-objects
5. Discovery Statistics. (n.d.) NASA/JPL CNEOS. https://cneos.jpl.nasa.gov/stats/
6. JPLraw. (2020, April 27). Web Tutorial: How to Navigate the CNEOS Website [Video]. Youtube. https://www.youtube.com/watch?v=UA6voCyCW1g&ab_channel=JPLraw