# Introduction

Pulsars belong to a family of neutron stars, which arise when a star more enormous than the sun runs out of fuel and dies on itself. They emit two narrow, constant beams of light in opposite directions. Although the beam's light is steady, when observed from Earth, pulsars appear to flicker because they rotate. Similar to how a lighthouse appears to blink when seen from the sea: As the pulsar rotates, the light beam may sweep over the Earth, swing out of view, and then swing back around, creating the impression that the pulsar is blinking to an astronomer on Earth. 

Because pulsars are important space objects that allow scientists to study extreme states of matter and various cosmic events, it would be great if we had a system to predict whether a particular space object is a pulsar. Therefore, can we use the given information from the data set to create a model that can predict a pulsar star?

The chosen data set contains nine columns. There are eight continuous variables and one class variable. The first four variables are basic statistics derived from a pulsar’s integrated pulse profile, which are unique to a pulsar, whereas the latter four are derived from the DM-SNR (Dispersion Measure of Signal Noise Ratio). 

The order of the columns is as below:

1. Mean of the integrated profile.
2. Standard deviation of the integrated profile.
3. Excess kurtosis of the integrated profile.
4. Skewness of the integrated profile.
5. Mean of the DM-SNR curve.
6. Standard deviation of the DM-SNR curve.
7. Excess kurtosis of the DM-SNR curve.
8. Skewness of the DM-SNR curve.
9. Class


In [19]:
# Load libraries into R:
library(tidyverse)
library(dplyr)

# Read and tidy dataset:
pulsar_data <- read_csv("data/HTRU_2.csv", col_names = FALSE) |>
    # Add column names:
    rename(mean_int = X1, 
           std_dev_int = X2, 
           xs_kurtosis_int = X3, 
           skewness_int = X4, 
           mean_dmsnr = X5, 
           std_dev_dmsnr = X6, 
           xs_kurtosis_dmsnr = X7, 
           skewness_dmsnr = X8, 
           class = X9) |>
    mutate(class = as_factor(class)) |> # Change class from dbl to factor (category).
    select(class, everything()) # Reorder class as the first column in the table. For organization purposes.


[1mRows: [22m[34m17898[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m (9): X1, X2, X3, X4, X5, X6, X7, X8, X9

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
