**Classifying Exoplanets: Exploring NASA's Kepler Space Observatory Dataset**

Space has always beckoned humanity to explore its depths. One of the most fascinating subjects in astronomical research is exoplanets, planets that orbit stars beyond our solar system. Examining exoplanets gives insight into planetary formation and evolution, critical information when searching for habitable planets.

The Kepler Space Observatory, a NASA space telescope for finding exoplanets, has analyzed thousands of planets, especially ones that are roughly Earth-sized and located within the habitable zones of their parent stars. From 2009 to 2018, Kepler revolutionized our understanding of extrasolar systems by cross-checking previous observations of exoplanets and labeling them as confirmed planets, candidates, or false positives.

Our primary question is: *Can we accurately classify celestial bodies as exoplanets based on their observed characteristics using the Kepler exoplanet dataset?*

Our project will analyze the NASA Kepler exoplanet dataset from the Kepler Telescope. This dataset contains details about celestial objects, including their radius, transit, stellar luminosity, and other essential attributes. By analyzing this dataset, we hope to develop a predictive model that discerns exoplanets from other extrasolar entities, furthering our understanding of the universe.

In [2]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source('tests.R')
source('cleanup.R')

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.2     [32m✔[39m [34mpurrr  [39m 1.0.1
[32m✔[39m [34mtibble [39m 3.2.1     [32m✔[39m [34mdplyr  [39m 1.1.1
[32m✔[39m [34mtidyr  [39m 1.3.0     [32m✔[39m [34mstringr[39m 1.5.0
[32m✔[39m [34mreadr  [39m 2.1.3     [32m✔[39m [34mforcats[39m 0.5.2
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.2     [32m✔[39m [34mrsample     [39m 1.1.1
[32m✔[39m [34mdials       [39m 1.1.0     [32m✔[39m [34mtune        [39m 1.0.1
[32m✔[39m [34minfer       [39m 1.0.4     [32m✔[39m [34mworkflows   [39m 1.1.2
[32m✔[39

ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection


In [3]:

##Reaing the data 
exoplanet <- read_csv("https://raw.githubusercontent.com/QuwackJ/dsci-100-group-37/main/Data/cumulative.csv")


##Selecting for our predictors
exoplanet_selected <- exoplanet |>
                        select(koi_disposition, koi_period, koi_depth, koi_duration, koi_impact)


tail(exoplanet_selected)

##splitting into training and testing data
exoplanet_split <- initial_split(exoplanet_selected, prop = 0.75, strata = koi_disposition)
training_data <- training(exoplanet_split)   
testing_data <- testing(exoplanet_split)



# COUNT of all CONFIRMED, FALSE POSITIVE, and CANDIDATE planets

[1mRows: [22m[34m9564[39m [1mColumns: [22m[34m50[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (5): kepoi_name, kepler_name, koi_disposition, koi_pdisposition, koi_tc...
[32mdbl[39m (43): rowid, kepid, koi_score, koi_fpflag_nt, koi_fpflag_ss, koi_fpflag_...
[33mlgl[39m  (2): koi_teq_err1, koi_teq_err2

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


koi_disposition,koi_period,koi_depth,koi_duration,koi_impact
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
CANDIDATE,47.1096306,752.2,5.741,1.23
FALSE POSITIVE,8.5898708,87.7,4.806,0.765
FALSE POSITIVE,0.5276985,1579.2,3.2221,1.252
CANDIDATE,1.7398494,48.5,3.114,0.043
FALSE POSITIVE,0.6814016,103.6,0.865,0.147
FALSE POSITIVE,4.8560348,76.7,3.078,0.134


koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
80.445,23356.05,5.641699,0.7256953


koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
26.95055,1140.939,4.309482,0.4208111


koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
73.64063,44131.4,6.607775,0.9504213


In [4]:
Mean value of all predictors for the whole dataset


ERROR: Error in parse(text = x, srcfile = src): <text>:1:6: unexpected symbol
1: Mean value
         ^


In [5]:
##MEAN value of all predictors for the whole dataset
summary_table <- training_data |>
                    select(-koi_disposition) |>
                    map_df(mean, na.rm = TRUE)
summary_table

koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
80.445,23356.05,5.641699,0.7256953


In [6]:
Mean value of all predictors for confirmed exoplanets

ERROR: Error in parse(text = x, srcfile = src): <text>:1:6: unexpected symbol
1: Mean value
         ^


In [7]:
##MEAN value of all predictors for confirmed exoplanets
summary_table_confirmed <- training_data |>
                    filter(koi_disposition == "CONFIRMED") |>
                    select(-koi_disposition) |>
                    map_df(mean, na.rm = TRUE)
summary_table_confirmed

koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
26.95055,1140.939,4.309482,0.4208111


In [8]:
Mean value of all predictors for false positives that are not exoplanets

ERROR: Error in parse(text = x, srcfile = src): <text>:1:6: unexpected symbol
1: Mean value
         ^


In [9]:
summary_table_false <- training_data |>
                    filter(koi_disposition == "FALSE POSITIVE") |>
                    select(-koi_disposition) |>
                    map_df(mean, na.rm = TRUE)
summary_table_false

koi_period,koi_depth,koi_duration,koi_impact
<dbl>,<dbl>,<dbl>,<dbl>
73.64063,44131.4,6.607775,0.9504213
