# Logistic Regression to Classify Gamma-Ray Signals from background hadrons
### Performing Logistic Regression on the MAGIC Gamma Telescope project dataset

## Dataset Specifications
### Features:
- fLength: Major axis of the ellipse, measured in millimeters (continuous).
- fWidth: Minor axis of the ellipse, measured in millimeters (continuous).
- fSize: Logarithmic sum of the content of all pixels in the telescope's image (continuous).
- fConc: Ratio of the sum of the two highest pixel values over the total size (continuous).
- fConc1: Ratio of the highest pixel value over the total size (continuous).
- fAsym: Distance from the highest pixel to the center of the ellipse, projected onto the major axis (continuous).
- fM3Long: 3rd root of the third moment along the major axis, in millimeters (continuous).
- fM3Trans: 3rd root of the third moment along the minor axis, in millimeters (continuous).
- fAlpha: Angle of the major axis relative to the vector to the origin, in degrees (continuous).
- fDist: Distance from the origin to the center of the ellipse, in millimeters (continuous).
### class: Target variable, representing:
- g: Gamma-ray signal (positive class).
- h: Hadron noise (negative class).

In [72]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [73]:
def train_test_split(X, y, test_size=0.2, random_state=None):
    """
    Split dataset into training and testing sets using NumPy.
    
    Parameters:
    -----------
    X : numpy array, shape (m_samples, n_features)
        Feature dataset
    y : numpy array, shape (m_samples,)
        Target labels
    test_size : float, default=0.2
        Proportion of the dataset to include in the test split (between 0 and 1)
    random_state : int or None, default=None
        Random seed for reproducibility
    
    Returns:
    --------
    X_train, X_test, y_train, y_test : numpy arrays
        Split datasets
    """
    if random_state is not None:
        np.random.seed(random_state)
    
    # Shuffle indices
    indices = np.random.permutation(len(X))
    
    # Determine split point
    test_split = int(len(X) * test_size)
    
    # Split indices
    test_indices = indices[:test_split]
    train_indices = indices[test_split:]
    
    # Return split data
    X_train, X_test = X[train_indices], X[test_indices]
    y_train, y_test = y[train_indices], y[test_indices]
    
    return X_train, X_test, y_train, y_test

In [74]:
def feature_scaling(X):
    """
    Scales features to have a mean of 0 and a standard deviation of 1.

    Parameters:
    -----------
    X : numpy array, shape (m_samples, n_features)
        Feature dataset

    Returns:
    --------
    X_scaled : numpy array, shape (m_samples, n_features)
        Scaled feature dataset
    mu : numpy array, shape (n_features,)
        Mean of each feature
    sigma : numpy array, shape (n_features,)
        Standard deviation of each feature
    """
    mu = np.mean(X, axis=0)
    sigma = np.std(X, axis=0)
    X_scaled = (X - mu) / sigma
    return X_scaled, mu, sigma

In [75]:
df = pd.read_csv('magic-gemma-telescope.csv')
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, :-1].values, df.iloc[:, -1].values, 0.2, 42)

print('\n\nTraining Set:')
print('Training Features:', X_train.shape)
print('Training Labels:', y_train.shape)
print('\n\nTesting Set:')
print('Testing Features:', X_test.shape)
print('Testing Labels:', y_test.shape)



Training Set:
Training Features: (15216, 10)
Training Labels: (15216,)


Testing Set:
Testing Features: (3804, 10)
Testing Labels: (3804,)


In [76]:
# Scale the features
X_train_scaled, mu, sigma = feature_scaling(X_train)
X_test_scaled = (X_test - mu) / sigma

print('\n\nScaled Training Set:')
print('Scaled Training Features:', X_train_scaled.shape)
print('\n\nScaled Testing Set:')
print('Scaled Testing Features:', X_test_scaled.shape)



Scaled Training Set:
Scaled Training Features: (15216, 10)


Scaled Testing Set:
Scaled Testing Features: (3804, 10)


In [77]:
def sigmoid(z):
    """
    Calculate the sigmoid of z.

    Parameters:
    -----------
    z : numpy array or float
        Input value(s)

    Returns:
    --------
    sigmoid(z) : numpy array or float
        Sigmoid of the input
    """
    return 1 / (1 + np.exp(-z))