### <span style = 'color:green'> Create a machine learning model that can predict the pulser star </span>


**Support Vector Machines(SVM)**
- Support Vector Machines (SVMs in short) are machine learning algorithms that are used for classification and regression purposes. SVMs are one of the powerful machine learning algorithms for classification, regression and outlier detection purposes. An SVM classifier builds a model that assigns new data points to one of the given categories. Thus, it can be viewed as a non-probabilistic binary linear classifier.

**About the dataset**
- Pulsars are a rare type of Neutron star that produce radio emission detectable here on Earth. They are of considerable scientific interest as probes of space-time, the inter-stellar medium, and states of matter. Machine learning tools are now being used to automatically label pulsar candidates to facilitate rapid analysis. Classification systems in particular are being widely adopted,which treat the candidate data sets as binary classification problems.

**Expected output**
- **Missing values should be treated**
- **Perform Standerdisation and handle outliers**
- **perform Support Vector Mchines and tune the model to increase the efficiency of the model**

- For dataset please click here <a href="https://drive.google.com/file/d/19d2ocdl8d5rrE8Wc8nkBTFu_QrgtDt3q/view?usp=sharing
" title="Google Drive">Click here</a>




In [16]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

In [17]:
data = pd.read_csv("pulsar_dataset.csv")

In [18]:
print(data.head())

    Mean of the integrated profile  \
0                       121.156250   
1                        76.968750   
2                       130.585938   
3                       156.398438   
4                        84.804688   

    Standard deviation of the integrated profile  \
0                                      48.372971   
1                                      36.175557   
2                                      53.229534   
3                                      48.865942   
4                                      36.117659   

    Excess kurtosis of the integrated profile  \
0                                    0.375485   
1                                    0.712898   
2                                    0.133408   
3                                   -0.215989   
4                                    0.825013   

    Skewness of the integrated profile   Mean of the DM-SNR curve  \
0                            -0.013165                   3.168896   
1                        

In [19]:
# Handling missing values by filling them with the mean
data.fillna(data.mean(), inplace=True)

In [20]:
# Separating features and target variable
X = data.drop('target_class', axis=1)
y = data['target_class']

# Standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [21]:
# Using RobustScaler for feature scaling
robust_scaler = RobustScaler()
X_scaled = robust_scaler.fit_transform(X)

In [30]:
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [32]:
print(data['target_class'].value_counts())

target_class
0.0    16745
1.0     1153
Name: count, dtype: int64


In [33]:
print(data['target_class'].unique())

[0. 1.]


In [34]:
print(y.isnull().sum())

0


In [36]:
y = y.astype(int)

In [37]:
print(X.dtypes)
print(y.dtypes)

 Mean of the integrated profile                  float64
 Standard deviation of the integrated profile    float64
 Excess kurtosis of the integrated profile       float64
 Skewness of the integrated profile              float64
 Mean of the DM-SNR curve                        float64
 Standard deviation of the DM-SNR curve          float64
 Excess kurtosis of the DM-SNR curve             float64
 Skewness of the DM-SNR curve                    float64
dtype: object
int32


In [40]:
# Convert target variable to integer type
y_train = y_train.astype(int)
y_test = y_test.astype(int)

# Implementing SVM
svm_model = SVC()

# Fitting the model
svm_model.fit(X_train, y_train)

# Predictions
predictions = svm_model.predict(X_test)

# Evaluating the model
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

[[3287   70]
 [  56  167]]
              precision    recall  f1-score   support

           0       0.98      0.98      0.98      3357
           1       0.70      0.75      0.73       223

    accuracy                           0.96      3580
   macro avg       0.84      0.86      0.85      3580
weighted avg       0.97      0.96      0.97      3580



In [None]:
'''
The confusion matrix shows the counts of true negatives (TN), false positives (FP), false negatives (FN), and true positives (TP).
Precision, recall, and F1-score are metrics to evaluate the model's performance on each class (0 and 1).
Accuracy represents the overall correctness of the model predictions.
'''