**Classifying Exoplanets: Exploring NASA's Kepler Space Observatory Dataset**

Space has always beckoned humanity to explore its depths. One of the most fascinating subjects in astronomical research is exoplanets, planets that orbit stars beyond our solar system. Examining exoplanets gives insight into planetary formation and evolution, critical information when searching for habitable planets.

The Kepler Space Observatory, a NASA space telescope for finding exoplanets, has analyzed thousands of planets, especially ones that are roughly Earth-sized and located within the habitable zones of their parent stars. From 2009 to 2018, Kepler revolutionized our understanding of extrasolar systems by cross-checking previous observations of exoplanets and labeling them as confirmed planets, candidates, or false positives.

Our primary question is: *Can we accurately classify celestial bodies as exoplanets based on their observed characteristics using the Kepler exoplanet dataset?*

Our project will analyze the NASA Kepler exoplanet dataset from the Kepler Telescope. This dataset contains details about celestial objects, including their radius, transit, stellar luminosity, and other essential attributes. By analyzing this dataset, we hope to develop a predictive model that discerns exoplanets from other extrasolar entities, furthering our understanding of the universe.

In [17]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source('tests.R')
source('cleanup.R')

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.2     [32m✔[39m [34mrsample     [39m 1.1.1
[32m✔[39m [34mdials       [39m 1.1.0     [32m✔[39m [34mtune        [39m 1.0.1
[32m✔[39m [34minfer       [39m 1.0.4     [32m✔[39m [34mworkflows   [39m 1.1.2
[32m✔[39m [34mmodeldata   [39m 1.0.1     [32m✔[39m [34mworkflowsets[39m 1.0.0
[32m✔[39m [34mparsnip     [39m 1.0.3     [32m✔[39m [34myardstick   [39m 1.1.0
[32m✔[39m [34mrecipes     [39m 1.0.4     

── [1mConflicts[22m ───────────────────────────────────────── tidymodels_conflicts() ──
[31m✖[39m [34mscales[39m::[32mdiscard()[39m masks [34mpurrr[39m::discard()
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m   masks [34mstats[39m::filter()
[31m✖[39m [34mrecipes[39m::[32mfixed()[39m  masks [34mstringr[39m::fixed()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m      masks [34mstats[39m::lag()
[31m✖[39m [3

ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection


In [31]:

##Reaing the data 
exoplanet <- read_csv("https://raw.githubusercontent.com/QuwackJ/dsci-100-group-37/main/Data/cumulative.csv?token=GHSAT0AAAAAACJLMITMUOT2CGMLL6QR2PRAZJ4CN6A")





##Selecting for our predictors
exoplanet_selected <- exoplanet |>
                        select(koi_disposition, koi_period, koi_depth, koi_duration, koi_impact)


head(exoplanet_selected)

##splitting into training and testing data
exoplanet_split <- initial_split(exoplanet_selected, prop = 0.75, strata = koi_disposition)
training_data <- training(exoplanet_split)   
testing_data <- testing(exoplanet_split)

##MEAN value of all predictors for the whole dataset
summary_table <- training_data |>
                    select(-koi_disposition) |>
                    map_df(mean, na.rm = TRUE)


##MEAN value of all predictors for confirmed exoplanets
summary_table_confirmed <- training_data |>
                    filter(koi_disposition == "CONFIRMED") |>
                    select(-koi_disposition) |>
                    map_df(mean, na.rm = TRUE)

##MEAN value of all predictors for false positives
summary_table_false <- training_data |>
                    filter(koi_disposition == "FALSE POSITIVE") |>
                    select(-koi_disposition) |>
                    map_df(mean, na.rm = TRUE)
summary_table
summary_table_confirmed
summary_table_false

summary_plot <- training_data |>
                    ggplot(aes(x = , y = , color = )) +
                    geom_point(alpha = 0.3) +
                    labs(
                        x = ,
                        y = ,
                        color = ,)

[1mRows: [22m[34m9564[39m [1mColumns: [22m[34m50[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (5): kepoi_name, kepler_name, koi_disposition, koi_pdisposition, koi_tc...
[32mdbl[39m (43): rowid, kepid, koi_score, koi_fpflag_nt, koi_fpflag_ss, koi_fpflag_...
[33mlgl[39m  (2): koi_teq_err1, koi_teq_err2

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


koi_disposition,koi_period,koi_depth,koi_duration,koi_impact
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
CONFIRMED,9.488036,615.8,2.9575,0.146
CONFIRMED,54.418383,874.8,4.507,0.586
FALSE POSITIVE,19.89914,10829.0,1.7822,0.969
FALSE POSITIVE,1.736952,8079.2,2.40641,1.276
CONFIRMED,2.525592,603.3,1.6545,0.701
CONFIRMED,11.094321,1517.5,4.5945,0.538


koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
62.23215,24109.49,5.572078,0.7553464


koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
28.12939,1187.205,4.318154,0.426088


koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
72.85858,45638.96,6.483819,1.004392
