# Group Proposal Group-3

## Introduction

**Pulsar** *(from **pulsa**ting **r**adio source)* or Pulsar Stars are highly magnetic, rotating, compact heavenly bodies often viewed as flickering “stars” from the earth night sky. Pulsar stars belong to the family of neutron stars (or sometimes white dwarfs) that emit beams of electromagnetic radiation from it’s magnetic poles. Due to its high degree of rotation, the electromagnetic radiation from its poles appears to be pulsating or flickering from earth, hence its name.

Pulsars are fantastic cosmic tools for scientists to study a wide range of phenomena. Studying them helps understand a lot of unknown information about the universe and helps us push our understanding of how it works. We mainly detect pulsar stars by studying the radio frequencies received by telescopes. Often radio interferences and random noises interfere and make it hard to detect a pulsar star. Through this project we aim to produce a predictive algorithm which helps us classify if received measurements are from pulsar stars or not. 

#### Dataset and its attributes

Each signal is described by eight continuous variables, and a single class variable. The first four are simple statistics obtained from the integrated pulse profile and the remaining four variables are similarly obtained from the DM-SNR curve. These variables are:

Mean of the integrated profile.

Standard deviation of the integrated profile.

Excess kurtosis of the integrated profile.

Skewness of the integrated profile.

Mean of the DM-SNR curve.

Standard deviation of the DM-SNR curve.

Excess kurtosis of the DM-SNR curve.

Skewness of the DM-SNR curve.

Class *(0 if it is not a pulsar star and 1 if it is a pulsar star)

The data set shared here contains 17898 total samples.


### Preliminary exploratory data analysis:

In [3]:
library(tidyverse)
library(tidymodels)

In [4]:
pulsar <- read_csv("data/pulsar_data_train.csv")

Parsed with column specification:
cols(
  `Mean of the integrated profile` = [32mcol_double()[39m,
  `Standard deviation of the integrated profile` = [32mcol_double()[39m,
  `Excess kurtosis of the integrated profile` = [32mcol_double()[39m,
  `Skewness of the integrated profile` = [32mcol_double()[39m,
  `Mean of the DM-SNR curve` = [32mcol_double()[39m,
  `Standard deviation of the DM-SNR curve` = [32mcol_double()[39m,
  `Excess kurtosis of the DM-SNR curve` = [32mcol_double()[39m,
  `Skewness of the DM-SNR curve` = [32mcol_double()[39m,
  target_class = [32mcol_double()[39m
)



In [5]:
head(pulsar)

Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve,target_class
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
121.15625,48.37297,0.3754847,-0.01316549,3.168896,18.39937,7.449874,65.159298,0
76.96875,36.17556,0.7128979,3.38871856,2.399666,17.571,9.414652,102.722975,0
130.58594,53.22953,0.1334083,-0.29724164,2.743311,22.36255,8.508364,74.031324,0
156.39844,48.86594,-0.2159886,-0.17129365,17.471572,,2.958066,7.197842,0
84.80469,36.11766,0.8250128,3.27412537,2.790134,20.61801,8.405008,76.291128,0
121.00781,47.17694,0.2297081,0.09133623,2.036789,,9.546051,112.131721,0


In [6]:
colnames(pulsar) <- c("mean_profile", "sd_profile", "kurtosis_profile", "skew_profile", "mean_dmsnr", "sd_dmsnr", "kurtosis_dmsnr", "skew_dmsnr", "target_class")

In [7]:
pulsar <- pulsar %>%
mutate(target_class = as_factor(target_class)) %>%
na.omit(pulsar)

In [8]:
glimpse(pulsar)

Rows: 9,273
Columns: 9
$ mean_profile     [3m[90m<dbl>[39m[23m 121.15625, 76.96875, 130.58594, 84.80469, 109.40625,…
$ sd_profile       [3m[90m<dbl>[39m[23m 48.37297, 36.17556, 53.22953, 36.11766, 55.91252, 40…
$ kurtosis_profile [3m[90m<dbl>[39m[23m 0.37548466, 0.71289786, 0.13340829, 0.82501279, 0.56…
$ skew_profile     [3m[90m<dbl>[39m[23m -0.01316549, 3.38871856, -0.29724164, 3.27412537, 0.…
$ mean_dmsnr       [3m[90m<dbl>[39m[23m 3.1688963, 2.3996656, 2.7433110, 2.7901338, 2.797658…
$ sd_dmsnr         [3m[90m<dbl>[39m[23m 18.399367, 17.570997, 22.362553, 20.618009, 19.49652…
$ kurtosis_dmsnr   [3m[90m<dbl>[39m[23m 7.4498741, 9.4146523, 8.5083638, 8.4050084, 9.443282…
$ skew_dmsnr       [3m[90m<dbl>[39m[23m 65.1592977, 102.7229747, 74.0313242, 76.2911279, 97.…
$ target_class     [3m[90m<fct>[39m[23m 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…


In [9]:
pulsar_split <- initial_split(pulsar, prop = 0.75, strata = target_class)
pulsar_training <- training(pulsar_split)
pulsar_testing <- testing(pulsar_split)

In [10]:
glimpse(pulsar_training)

Rows: 6,955
Columns: 9
$ mean_profile     [3m[90m<dbl>[39m[23m 76.96875, 130.58594, 84.80469, 109.40625, 95.00781, …
$ sd_profile       [3m[90m<dbl>[39m[23m 36.17556, 53.22953, 36.11766, 55.91252, 40.21981, 47…
$ kurtosis_profile [3m[90m<dbl>[39m[23m 0.71289786, 0.13340829, 0.82501279, 0.56510595, 0.34…
$ skew_profile     [3m[90m<dbl>[39m[23m 3.38871856, -0.29724164, 3.27412537, 0.05624666, 1.1…
$ mean_dmsnr       [3m[90m<dbl>[39m[23m 2.3996656, 2.7433110, 2.7901338, 2.7976589, 2.770066…
$ sd_dmsnr         [3m[90m<dbl>[39m[23m 17.570997, 22.362553, 20.618009, 19.496527, 18.21774…
$ kurtosis_dmsnr   [3m[90m<dbl>[39m[23m 9.4146523, 8.5083638, 8.4050084, 9.4432821, 7.851205…
$ skew_dmsnr       [3m[90m<dbl>[39m[23m 102.7229747, 74.0313242, 76.2911279, 97.3745784, 70.…
$ target_class     [3m[90m<fct>[39m[23m 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…


### Methods:

### Expected outcomes and significance: