**[Machine Learning Home Page](https://www.kaggle.com/learn/intro-to-machine-learning)**

---


# Exercise: Classification

Here you'll get some experience training a classification model yourself. What you'll do here is create a model that can determine if radio signals come from a pulsar. Pulsars are a rare type of neutron stars that produce radio signals we can detect on Earth. As the pulsars rotate, the beam of radio waves points directly at us, then moves away. This leads to a periodic signal that we can use to determine if the radio signal is actually from a pulsar or just noise.

The data itself contains eight measures of this radio signal and a column `target_class` that indicates if the signal is noise (0) or a pulsar (1). Using this data, you'll train a classifier that can identify pulsars from the radio signal data.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
import sklearn.metrics as metrics

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex8 import *
print("Setup complete")

Load in the data and check out the first few rows to get acquainted with the features.

In [None]:
pulsar_data = pd.read_csv('../input/predicting-a-pulsar-star/pulsar_stars.csv')
pulsar_data.head()

As normal, split the data into training and test sets.

In [None]:
y = pulsar_data['target_class']
X = pulsar_data.drop('target_class', axis=1)
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1, test_size=.2)

## 1. Train the classifier

Now, it's time to create the model and fit it to our training data. Use `RandomForestClassifier` here and fit the model on the training data.

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Define the model. Set random_state to 1
model = ____

# Fit your model
____

step_1.check()

In [None]:
# The lines below will show you a hint or the solution.
#step_1.hint() 
#step_1.solution()

## 2. Make Predictions

Make predictions using the trained model and the validation features. Calculate the accuracy of the predictions with metrics.accuracy_score, using the validation targets.

In [None]:
# Get predictions from the trained model using the validation features
pred_y = ____

# Calculate the accuracy of the trained model with the validation targets and predicted targets
accuracy = ____

print("Accuracy: ", accuracy)

step_2.check()

In [None]:
# The lines below will show you a hint or the solution.
#step_2.hint()
#step_2.solution()

## 3. Interpret the results

Finally, calculate the confusion matrix for the classifier. We'll also normalize the confusion matrix to get it terms of rates.

In [None]:
(val_y==0).mean()

In [None]:
confusion = metrics.confusion_matrix(val_y, pred_y)
print(f"Confusion matrix:\n{confusion}")

# Normalizing by the true label counts to get rates
print(f"\nNormalized confusion matrix:")
for row in confusion:
    print(row / row.sum())

Looking at the confusion matrix, do you think the model is doing well at classifying pulsars from radio wave signals? Is the model misclassifying noise as pulsars or missing pulsars in the data?

In [None]:
#step_3.solution()

## Thinking about unbalanced classes

Roughly 91% of this data is made up of noise signals. If it was 99% noise instead, would an accuracy of 98% still be good?

In [None]:
#step_4.solution()

---
**[Machine Learning Home Page](https://www.kaggle.com/learn/intro-to-machine-learning)**





*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum) to chat with other Learners.*