# Do Pulsars Clear Their Lines of Sight of Electrons?

### DSCI 100-005 Group 19
### Brett King, Samantha Malinab, Yinuo Sun, Athena Wong

-----

## The Plan

1. Explain clearly what the variables in the dataset mean, and how we can use them to answer our question. Brett will try to create images to help with the explanation.
2. Load in the dataset, and isolate the variables we need for the analysis.
3. Use K nearest neighbours classification to see how well we can predict pulsars based on these variables.
4. If the result is positive, see if we can convincingly argue that it's because the answer to our question is "yes," and not caused by something else.

-----

## Introduction

When a massive star dies in a supernova explosion, it leaves its core behind as a super-compressed ball of neutrons called a neutron star. Neutron stars have an enormous magnetic field that produces polar jets of electromagnetic radiation. Neutron stars also spin very rapidly, so these jets trace out cone shapes as the star rotates. When Earth lies on one of these cones, we can observe the jet sweeping past Earth, and it appears as a pulsing star. Hence, these objects are called *pulsars*.

![](https://github.com/Kugelblitz64/dcsi100group19/blob/main/PulsarGif.gif?raw=true)

The Crab Nebula pulsar, seen near the centre. Gif from https://www.cloudynights.com/topic/285383-crab-nebulass-pulsar-is-blinking-not-a-joke/

Individually, a measurement of a pulsar's pulse is weak and difficult to distinguish from background noise. However, taking a measurement for each rotation of the pulsar within a timespan then averaging the measurements yields a much clearer graph of signal strength over time. This graph is called the pulsar's *integrated profile*, and its shape is stable and unique to each pulsar.

Additionally, each pulsar has a *dispersion measure*, or DM, which is a measurement of how dense free electrons are along the line of sight to the pulsar. The dispersion measure can be determined by how much the pulsar's signal is "spread out" before reaching Earth, and changes over time as it and Earth move through the galaxy. The signal-noise ratio, or SNR, of a pulse measurement depends on the pulsar's dispersion measure, and so each pulsar is assigned a signal-noise ratio versus dispersion measure (DM-SNR) graph.

With this knowledge, we will investigate whether true pulsars clear their lines of sight of free electrons. To do this, we will use the HTRU2 dataset, which contains statistics regarding the two graphs discussed above for about 1600 true pulsars and 16000 false candidates.

To better visualize the statistics contained in the dataset, we have created representations using Desmos. See the "Extras" section for these.

-----

## Methods and Results

We first load in needed libraries and the dataset.

In [2]:
# Load in the necessary libraries.
library(tidyverse)
library(repr)
options(repr.matrix.max.rows = 6)

# Load in the HTRU2 dataset.
pulsar <- read_csv(url("https://raw.githubusercontent.com/Kugelblitz64/dcsi100group19/main/HTRU_2.csv"),
                   col_names = c("int_pro_mean", "int_pro_std", "int_pro_excess_kurtosis", "int_pro_skew",
                                 "dm_snr_mean", "dm_snr_std", "dm_snr_excess_kurtosis", "dm_snr_skew", "class")
                   ) |>
mutate(class = as_factor(class))
pulsar

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
[1mRows: [22m[34m17898[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39

int_pro_mean,int_pro_std,int_pro_excess_kurtosis,int_pro_skew,dm_snr_mean,dm_snr_std,dm_snr_excess_kurtosis,dm_snr_skew,class
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
140.5625,55.68378,-0.2345714,-0.6996484,3.199833,19.11043,7.975532,74.24222,0
102.5078,58.88243,0.4653182,-0.5150879,1.677258,14.86015,10.576487,127.39358,0
103.0156,39.34165,0.3233284,1.0511644,3.121237,21.74467,7.735822,63.17191,0
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
119.3359,59.93594,0.1593631,-0.74302540,21.430602,58.87200,2.499517,4.595173,0
114.5078,53.90240,0.2011614,-0.02478884,1.946488,13.38173,10.007967,134.238910,0
57.0625,85.79734,1.4063910,0.08951971,188.306020,64.71256,-1.597527,1.429475,0


## Extras

Here are visualizations we created to help understand the data we used.

<table><tr>
<td> <img src="https://github.com/Kugelblitz64/dcsi100group19/blob/main/IntegratedProfileExample.png?raw=true" style="width: 500px;"/> </td>
<td> <img src="https://github.com/Kugelblitz64/dcsi100group19/blob/main/DMSNRExample.png?raw=true" style="width: 500px;"/> </td>
</tr></table>

The means of each graph are marked in red. The mean is the average y value of the graph.

The standard deviations are represented by the blue lines. It gives an idea of the "width" of where most values lie in the graph.

The excess kurtosis is essentially a measurement of how much the graph jumps away from the mean.

The skew is in essence a comparison between how much of the graph lies above the mean compared to below it. Positive skew indicates more of the graph lies above, negative skew indicates more of the graph lies below.