# **Abstract** :
 Pulsar candidates collected during the HTRU survey. Pulsars are a type of star, of considerable scientific interest. Candidates must be classified in to pulsar and non-pulsar classes to aid discovery.

---

# **Attribute Information :**


---

Each candidate is described by 8 continuous variables, and a single class variable. The first four are simple statistics obtained from the integrated pulse profile (folded profile). This is an array of continuous variables that describe a longitude-resolved version of the signal that has been averaged in both time and frequency. The remaining four variables are similarly obtained from the DM-SNR curve.

These are summarised below:

1. Mean of the integrated profile.
2. Standard deviation of the integrated profile.
3. Excess kurtosis of the integrated profile.
4. Skewness of the integrated profile.
5. Mean of the DM-SNR curve.
6. Standard deviation of the DM-SNR curve.
7. Excess kurtosis of the DM-SNR curve.
8. Skewness of the DM-SNR curve.
9. Class

# **HTRU 2 Summary :**
17,898 total examples.
1,639 positive examples.
16,259 negative examples.


 STEP 1 : Import the required libraries:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

STEP 2 : Load the dataset:

In [None]:
dataset = np.loadtxt('HTRU_2_AIDATASET.csv', delimiter=",", skiprows=1)


STEP 3 : Print the shape of the dataset:

In [None]:
print(dataset.shape)


STEP 4 : Normalize the data:

In [None]:
data_normalized = (dataset - dataset.min(axis=0)) / (dataset.max(axis=0) - dataset.min(axis=0))


STEP 5 : Shuffle the dataset:

In [None]:
np.random.shuffle(data_normalized)


STEP 6 : Split the dataset into training and testing sets using an 80:20 split:

In [None]:
train_data = data_normalized[:int(0.8 * len(data_normalized))]
test_data = data_normalized[int(0.8 * len(data_normalized)):]


STEP 7 : Separate the features and target variable:

In [None]:
X_train = train_data[:, :-1]
y_train = train_data[:, -1]
X_test = test_data[:, :-1]
y_test = test_data[:, -1]


STEP 8 : Build the neural network model using TensorFlow:

In [None]:
def create_model(input_shape):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(16, activation='relu', input_shape=input_shape),
        tf.keras.layers.Dense(8, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = create_model((X_train.shape[1],))


STEP 9 : Train the model using the training data, with 100 epochs, a batch size of 32, and a 20% validation split:

In [None]:
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1)


STEP 10 : Visualize the training process:

In [None]:
# Plot training and validation loss
plt.figure()
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()

# Plot training and validation accuracy
plt.figure()
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

plt.show()


STEP 11 : Evaluate the model:

In [None]:
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Accuracy: {accuracy:.4f}")


STEP 12 : Manually calculate precision, recall, and F1-score:

In [None]:
y_pred = (model.predict(X_test) > 0.5).astype("int32")

tp = np.sum((y_pred == 1) & (y_test == 1))
tn = np.sum((y_pred == 0) & (y_test == 0))
fp = np.sum((y_pred == 1) & (y_test == 0))
fn = np.sum((y_pred == 0) & (y_test == 1))

precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * (precision * recall) / (precision + recall)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")


In [None]:
val_split = int(0.8 * len(train_data))
X_train_val, y_train_val = X_train[:val_split], y_train[:val_split]
X_valid, y_valid = X_train[val_split:], y_train[val_split:]


In [None]:
single_feature_accuracies = []

for i in range(X_train_val.shape[1]):
    X_train_single = X_train_val[:, i:i+1]
    X_valid_single = X_valid[:, i:i+1]
    
    model_single_feature = create_model((1,))
    model_single_feature.fit(X_train_single, y_train_val, epochs=100, batch_size=32, verbose=1)
    _, accuracy = model_single_feature.evaluate(X_valid_single, y_valid, verbose=1)
    single_feature_accuracies.append(accuracy)


In [None]:
feature_indices_ranked = np.argsort(single_feature_accuracies)[::-1]


In [None]:
reduced_feature_accuracies = []

for num_features_to_remove in range(X_train_val.shape[1]):
    removed_features = feature_indices_ranked[-(num_features_to_remove+1):]
    selected_features = [i for i in range(X_train_val.shape[1]) if i not in removed_features]
    
    X_train_reduced = X_train_val[:, selected_features]
    X_valid_reduced = X_valid[:, selected_features]
    
    model_reduced = create_model((len(selected_features),))
    model_reduced.fit(X_train_reduced, y_train_val, epochs=100, batch_size=32, verbose=1)
    _, accuracy = model_reduced.evaluate(X_valid_reduced, y_valid, verbose=1)
    reduced_feature_accuracies.append(accuracy)


In [None]:
best_reduced_model_idx = np.argmax(reduced_feature_accuracies)
best_reduced_model_accuracy = reduced_feature_accuracies[best_reduced_model_idx]


In [None]:
print(f"Best accuracy with all input features: {accuracy:.4f}")
print(f"Best accuracy with reduced features: {best_reduced_model_accuracy:.4f}")
