# Introduction

Pulsars emit two beams of light in opposite directions. Although the beam's light is steady, pulsars appear to flicker because they rotate.  As the pulsar rotates, the light beam may sweep over the Earth, swinging in and out of view, creating the impression that the pulsar is blinking to an astronomer.

Because pulsars are important space objects that allow scientists to study extreme states of matter and cosmic events, it would be great if we had a system to predict whether a particular space object is a pulsar. Therefore, can we use the information from the data set to create a model that can predict a pulsar star?

The data set contains nine columns. Eight continuous variables and one class variable. The first four variables are statistics derived from a pulsar’s integrated pulse profile, which are unique to a pulsar, whereas the latter four are derived from the DM-SNR (Dispersion Measure of Signal Noise Ratio).



In [19]:
# Load libraries into R:
library(tidyverse)
library(dplyr)

# Read and tidy dataset:
pulsar_data <- read_csv("data/HTRU_2.csv", col_names = FALSE) |>
    # Add column names:
    rename(mean_int = X1, 
           std_dev_int = X2, 
           xs_kurtosis_int = X3, 
           skewness_int = X4, 
           mean_dmsnr = X5, 
           std_dev_dmsnr = X6, 
           xs_kurtosis_dmsnr = X7, 
           skewness_dmsnr = X8, 
           class = X9) |>
    mutate(class = as_factor(class)) |> # Change class from dbl to factor (category).
    select(class, everything()) # Reorder class as the first column in the table. For organization purposes.


[1mRows: [22m[34m17898[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m (9): X1, X2, X3, X4, X5, X6, X7, X8, X9

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


We expect to find that we should be able to get a prediction of whether or not a candidate is a pulsar or not. This will be done by taking all the other variables into the recipe.

Using this prediction, we would be able to predict if newly discovered stars are pulsars, given that the data we collect is part of our prediction model. To test this in future cases, we could take a newly discovered star and run it against our prediction model.

This could lead to further questions such as:

- What’s the minimum/maximum DM-SNR curve for a certain star for it to no longer be considered a pulsar?
- What is the average skewness of a pulsar? And for non-pulsars?