**Classifying Exoplanets: Exploring NASA's Kepler Space Observatory Dataset**

Space has always beckoned humanity to explore its depths. One of the most fascinating subjects in astronomical research is exoplanets, planets that orbit stars beyond our solar system. Examining exoplanets gives insight into planetary formation and evolution, critical information when searching for habitable planets.

The Kepler Space Observatory, a NASA space telescope for finding exoplanets, has analyzed thousands of planets, especially ones that are roughly Earth-sized and located within the habitable zones of their parent stars. From 2009 to 2018, Kepler revolutionized our understanding of extrasolar systems by cross-checking previous observations of exoplanets and labeling them as confirmed planets, candidates, or false positives.

Our primary question is: *Can we accurately classify celestial bodies as exoplanets based on their observed characteristics using the Kepler exoplanet dataset?*

Our project will analyze the NASA Kepler exoplanet dataset from the Kepler Telescope. This dataset contains details about celestial objects, including their radius, transit, stellar luminosity, and other essential attributes. By analyzing this dataset, we hope to develop a predictive model that discerns exoplanets from other extrasolar entities, furthering our understanding of the universe.

In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.2     [32m✔[39m [34mpurrr  [39m 1.0.1
[32m✔[39m [34mtibble [39m 3.2.1     [32m✔[39m [34mdplyr  [39m 1.1.1
[32m✔[39m [34mtidyr  [39m 1.3.0     [32m✔[39m [34mstringr[39m 1.5.0
[32m✔[39m [34mreadr  [39m 2.1.3     [32m✔[39m [34mforcats[39m 0.5.2
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


In [3]:
exoplanet <- read_csv("https://raw.githubusercontent.com/QuwackJ/dsci-100-group-37/main/Data/cumulative.csv?token=GHSAT0AAAAAACJLMIED7O62XE357P5RDVQ2ZJ3GMBQ")
head(exoplanet)

[1mRows: [22m[34m9564[39m [1mColumns: [22m[34m50[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (5): kepoi_name, kepler_name, koi_disposition, koi_pdisposition, koi_tc...
[32mdbl[39m (43): rowid, kepid, koi_score, koi_fpflag_nt, koi_fpflag_ss, koi_fpflag_...
[33mlgl[39m  (2): koi_teq_err1, koi_teq_err2

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


rowid,kepid,kepoi_name,kepler_name,koi_disposition,koi_pdisposition,koi_score,koi_fpflag_nt,koi_fpflag_ss,koi_fpflag_co,⋯,koi_steff_err2,koi_slogg,koi_slogg_err1,koi_slogg_err2,koi_srad,koi_srad_err1,koi_srad_err2,ra,dec,koi_kepmag
<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,10797460,K00752.01,Kepler-227 b,CONFIRMED,CANDIDATE,1.0,0,0,0,⋯,-81,4.467,0.064,-0.096,0.927,0.105,-0.061,291.9342,48.14165,15.347
2,10797460,K00752.02,Kepler-227 c,CONFIRMED,CANDIDATE,0.969,0,0,0,⋯,-81,4.467,0.064,-0.096,0.927,0.105,-0.061,291.9342,48.14165,15.347
3,10811496,K00753.01,,FALSE POSITIVE,FALSE POSITIVE,0.0,0,1,0,⋯,-176,4.544,0.044,-0.176,0.868,0.233,-0.078,297.0048,48.13413,15.436
4,10848459,K00754.01,,FALSE POSITIVE,FALSE POSITIVE,0.0,0,1,0,⋯,-174,4.564,0.053,-0.168,0.791,0.201,-0.067,285.5346,48.28521,15.597
5,10854555,K00755.01,Kepler-664 b,CONFIRMED,CANDIDATE,1.0,0,0,0,⋯,-211,4.438,0.07,-0.21,1.046,0.334,-0.133,288.7549,48.2262,15.509
6,10872983,K00756.01,Kepler-228 d,CONFIRMED,CANDIDATE,1.0,0,0,0,⋯,-232,4.486,0.054,-0.229,0.972,0.315,-0.105,296.2861,48.22467,15.714
