# GLCM Feature Extraction
The Gray Level Co-occurrence Matrix (GLCM) is a statistical method used in image processing to analyze the spatial distribution of pixel intensities in an image. It is particularly useful for texture analysis and is widely applied in fields such as remote sensing, medical imaging, and computer vision.





This function calculates a specified feature from a Gray Level Co-occurrence Matrix (GLCM).

In [1]:
def glcm_feature(matrix_coocurrence, featureName):
    feature = graycoprops(matrix_coocurrence, featureName)
    result = np.average(feature)
    return result

## Image Preprocessing Function

This function preprocesses an input image for further analysis, specifically for contour detection and ROI extraction.

In [3]:
width=400
height=400
def preprocessingImage(image):
    # Convert the image from BGR to RGB
    test_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Convert the RGB image to grayscale
    test_img_gray = cv2.cvtColor(test_img, cv2.COLOR_RGB2GRAY)
    
    # Apply adaptive thresholding to create a binary image
    test_img_thresh = cv2.adaptiveThreshold(test_img_gray, 255, 
        cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 3)
    
    # Find contours in the thresholded image
    cnts = cv2.findContours(test_img_thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    
    # Sort contours by area in descending order
    cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
    
    # Extract the largest contour as the Region of Interest (ROI)
    for c in cnts:
        x, y, w, h = cv2.boundingRect(c)
        test_img_ROI = test_img[y:y+h, x:x+w]
        break
    
    # Resize the ROI to a fixed size
    test_img_ROI_resize = cv2.resize(test_img_ROI, (width, height))
    
    # Convert the resized ROI to grayscale
    test_img_ROI_resize_gray = cv2.cvtColor(test_img_ROI_resize, cv2.COLOR_RGB2GRAY)
    
    return test_img_ROI_resize_gray


## Feature Extraction and Labeling for Glaucoma Detection

In our study of glaucoma detection, we extracted four key features from the images using the Gray Level Co-occurrence Matrix (GLCM). These features are chosen based on their effectiveness in characterizing texture, which is crucial for differentiating between healthy and glaucomatous eyes.

## Extracted Features

1. **Contrast**:
   - Measures the intensity contrast between a pixel and its neighbor over the whole image.
   - A higher contrast value indicates a greater difference between pixel values, suggesting more texture.

2. **Homogeneity**:
   - Indicates the closeness of the distribution of elements in the GLCM to the GLCM diagonal.
   - Higher homogeneity values imply that the pixel values are similar, which can signify less texture complexity.

3. **Energy**:
   - Represents the uniformity of the GLCM and is calculated as the sum of squared elements in the matrix.
   - High energy values suggest a more uniform texture.

4. **Correlation**:
   - Measures how correlated a pixel is to its neighbor in the GLCM.
   - A higher correlation value indicates a more consistent pixel intensity pattern, which may be associated with certain textures.

## Labeling

- **Labeling of Images**:
   - In our dataset, the labels are assigned as follows:
     - **Normal images**: labeled as `0`
     - **Glaucoma images**: labeled as `1`

### Rationale for Selection:
- **Relevance**: The chosen features have been widely recognized in literature for their effectiveness in analyzing medical images, particularly in detecting conditions like glaucoma.
- **Computational Efficiency**: By focusing on a smaller set of features, the model can operate more efficiently, reducing computational costs and improving the speed of analysis without sacrificing performance.
- **Interpretability**: These features offer better interpretability, allowing clinicians to understand the significance of the model's predictions based on texture analysis.

In summary, these four features—contrast, homogeneity, energy, and correlation—were selected for their relevance to glaucoma detection and their ability to capture essential textural information from retinal images.


In [5]:
import numpy as np

# Define image dimensions and GLCM parameters
width, height = 400, 400
distance = 10  # Distance for GLCM
teta = 90  # Angle for GLCM

# Initialize a numpy array to store features and labels
data_eye = np.zeros((5, 4000))  # 5 features for 5000 images
count = 0
indextable = ['contrast', 'homogeneity', 'energy', 'correlation', 'Label']

# Dataset paths for normal and glaucoma images
normal_dataset_path = 'Combination2/Normal/'
glaucoma_dataset_path = 'Combination2/Glaucoma/'

## Feature Extraction for Normal Eye Images

This section extracts texture features from normal eye images using the Gray Level Co-occurrence Matrix (GLCM). The extracted features include contrast, homogeneity, energy, and correlation, which are crucial for characterizing the texture of the images, aiding in glaucoma detection.

In [7]:
import os
import cv2 as cv2
from skimage.feature import graycomatrix, graycoprops#Fixed typo: changed greycomatrix to greycomatrix

# List all files in the normal dataset directory
allfiles = os.listdir(normal_dataset_path)

# Initialize feature lists and label
for file in allfiles:
    contrast = []
    homogeneity = []
    energy = []
    correlation = []
    label = 0  # Label for normal images

    # Load and preprocess the image
    image = cv2.imread(normal_dataset_path + str(file))
    img = preprocessingImage(image)

    # Calculate the GLCM
    glcm = graycomatrix(img, [distance], [teta], levels=256, symmetric=True, normed=True)

    # Extract features from GLCM
    contrast.append(glcm_feature(glcm, 'contrast'))
    homogeneity.append(glcm_feature(glcm, 'homogeneity'))
    energy.append(glcm_feature(glcm, 'energy'))
    correlation.append(glcm_feature(glcm, 'correlation'))

    # Store extracted features and label in the data array
    data_eye[0, count] = contrast[0]
    data_eye[1, count] = homogeneity[0]
    data_eye[2, count] = energy[0]
    data_eye[3, count] = correlation[0]
    data_eye[4, count] = label

    # Increment the count
    count += 1

## `data_eye` Array Structure

The `data_eye` array is used to store extracted texture features and their corresponding labels for eye images. The array is structured as follows:

- **Shape**: `data_eye` is a 2D array with dimensions `(5, N)`, where `N` is the total number of images processed.

- **Row Descriptions**:
  1. **Row 0**: Contrast values extracted from the GLCM.
  2. **Row 1**: Homogeneity values extracted from the GLCM.
  3. **Row 2**: Energy values extracted from the GLCM.
  4. **Row 3**: Correlation values extracted from the GLCM.
  5. **Row 4**: Labels (0 for normal images).


- **Usage**: 
  - Each column in the array corresponds to a single eye image, with features and labels stored sequentially.
  - This structured format facilitates further analysis and model training for glaucoma detection.

In [9]:
data_eye

array([[1.29885616e+03, 1.43839006e+03, 1.42922065e+03, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [4.22618168e-01, 4.48600292e-01, 4.41891600e-01, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [1.12158366e-01, 1.21158411e-01, 1.24277141e-01, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [9.50896330e-01, 9.48069288e-01, 9.49321727e-01, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])

## Feature Extraction for Glaucoma Eye Images

This code snippet extracts texture features from glaucoma eye images using the Gray Level Co-occurrence Matrix (GLCM) method.

In [11]:
# List all files in the glaucoma dataset directory
allfiles = os.listdir(glaucoma_dataset_path)

# Initialize feature lists and label for glaucoma images
for file in allfiles:
    contrast = []  # List to store contrast values
    homogeneity = []  # List to store homogeneity values
    energy = []  # List to store energy values
    correlation = []  # List to store correlation values
    label = 1  # Label for glaucoma images

    # Load and preprocess the image
    image = cv2.imread(glaucoma_dataset_path + str(file))
    img = preprocessingImage(image)

    # Calculate the GLCM
    glcm = graycomatrix(img, [distance], [teta], levels=256, symmetric=True, normed=True)

    # Extract features from GLCM
    contrast.append(glcm_feature(glcm, 'contrast'))
    homogeneity.append(glcm_feature(glcm, 'homogeneity'))
    energy.append(glcm_feature(glcm, 'energy'))
    correlation.append(glcm_feature(glcm, 'correlation'))

    # Store extracted features and label in the data array
    data_eye[0, count] = contrast[0]
    data_eye[1, count] = homogeneity[0]
    data_eye[2, count] = energy[0]
    data_eye[3, count] = correlation[0]
    data_eye[4, count] = label

    # Increment the count for the next image
    count += 1


In [13]:
data_eye

array([[1.29885616e+03, 1.43839006e+03, 1.42922065e+03, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [4.22618168e-01, 4.48600292e-01, 4.41891600e-01, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [1.12158366e-01, 1.21158411e-01, 1.24277141e-01, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [9.50896330e-01, 9.48069288e-01, 9.49321727e-01, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])

## Creating a Pandas DataFrame from Extracted Features

This code snippet initializes a Pandas DataFrame using the extracted texture features stored in the `data_eye` array.

In [15]:
import pandas as pd  # Import the pandas library for data manipulation

# Create a DataFrame from the transposed data_eye array, with columns specified by indextable
df = pd.DataFrame(np.transpose(data_eye), columns=indextable)

## Generating Descriptive Statistics for the DataFrame

The `df.describe()` method provides a summary of statistics for the DataFrame `df`, which contains the extracted texture features from the `data_eye` array. This method is particularly useful for understanding the distribution and central tendencies of the features.


In [17]:
# Generate and display descriptive statistics for the DataFrame
df.describe()


Unnamed: 0,contrast,homogeneity,energy,correlation,Label
count,4000.0,4000.0,4000.0,4000.0,4000.0
mean,283.397271,0.154877,0.052399,0.406527,0.1335
std,473.824334,0.185359,0.067782,0.459581,0.340157
min,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0
75%,326.797902,0.308051,0.101121,0.920349,0.0
max,1754.074124,0.885988,0.420671,0.99806,1.0


## Scaling the Features with MinMaxScaler

The `MinMaxScaler()` is used to scale the features of the DataFrame `df` to a range between 0 and 1. The DataFrame contains various features, and the label column is excluded from scaling. This scaling process ensures that the model treats all features equally, regardless of their original range or magnitude.


In [19]:

# Import MinMaxScaler from sklearn
from sklearn.preprocessing import MinMaxScaler

# Drop the 'Label' column from the DataFrame to get features
features = df.drop(['Label'], axis='columns')

# Initialize MinMaxScaler
features_scaler = MinMaxScaler()

# Fit and transform the features using MinMaxScaler
features = features_scaler.fit_transform(features)
features

array([[0.74047963, 0.47700227, 0.26661783, 0.95274493],
       [0.82002809, 0.50632787, 0.28801233, 0.9499124 ],
       [0.8148006 , 0.49875588, 0.29542603, 0.95116727],
       ...,
       [0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        ]])

## Data Normalization

A copy of the DataFrame `df` is created for normalization. The specified feature columns (`contrast`, `homogenity`, `energy`, `correlation`) are updated with the scaled features.

In [21]:
# Create a copy of the original DataFrame for normalization
data_normalization = df.copy()

# Update specified feature columns with scaled features
data_normalization[['contrast', 'homogenity', 'energy', 'correlation']] = features
data_normalization

Unnamed: 0,contrast,homogeneity,energy,correlation,Label,homogenity
0,0.740480,0.422618,0.266618,0.952745,0.0,0.477002
1,0.820028,0.448600,0.288012,0.949912,0.0,0.506328
2,0.814801,0.441892,0.295426,0.951167,0.0,0.498756
3,0.800124,0.346679,0.223333,0.943737,0.0,0.391291
4,0.783606,0.386176,0.271969,0.950147,0.0,0.435871
...,...,...,...,...,...,...
3995,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
3996,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
3997,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
3998,0.000000,0.000000,0.000000,0.000000,0.0,0.000000


## Generating Descriptive Statistics for Normalized Data

The `data_normalization.describe()` method provides a summary of statistics for the normalized DataFrame `data_normalization`. This method is useful for examining the distribution, central tendencies, and variability of the normalized feature values.

In [23]:
# Generate descriptive statistics for the normalized DataFrame
data_normalization.describe()

Unnamed: 0,contrast,homogeneity,energy,correlation,Label,homogenity
count,4000.0,4000.0,4000.0,4000.0,4000.0,4000.0
mean,0.161565,0.154877,0.124561,0.407317,0.1335,0.174807
std,0.270128,0.185359,0.161129,0.460474,0.340157,0.209211
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.186308,0.308051,0.240381,0.922138,0.0,0.347692
max,1.0,0.885988,1.0,1.0,1.0,1.0


## Splitting Features and Labels

In the context of preparing data for machine learning, it's essential to separate the features from the target labels. 

- **Features (`x`)**: The variable `x` is created by dropping the 'Label' column from the `data_normalization` DataFrame. This includes all the normalized features (e.g., contrast, homogenity, energy, correlation) that the model will use to make predictions.

- **Target Labels (`y`)**: The variable `y` is set to the 'Label' column of the `data_normalization` DataFrame. This column represents the target variable that the model will learn to predict based on the features.

The 'Label' column is selected as the target variable because it represents the outcome or category that we want the machine learning model to predict. 

Choosing the 'Label' column is crucial for guiding the learning process and measuring the effectiveness of the model.
This separation allows the machine learning model to learn the relationship between the features and the corresponding labels during training.


In [25]:
# Creating a feature set by dropping the 'Label' column from the DataFrame
x = data_normalization.drop(['Label'], axis='columns')

# Assign the 'Label' column to the target variable
y = data_normalization.Label

## Linear Regression Model with Classification

#### 1. Data Splitting
The dataset is split into training (80%) and testing (20%) using `train_test_split`. 
- `X_train`, `X_test`: Features.
- `y_train`, `y_test`: Target labels.

#### 2. Model Training
A `LinearRegression` model is created and trained on `X_train` and `y_train`.

#### 3. Predictions
The model predicts continuous outputs on `X_test`.

#### 4. Regression Metrics
- **MSE (Mean Squared Error)**: Measures the average squared error.
- **RMSE (Root Mean Squared Error)**: Provides the error in the same units as the target.

#### 5. Classification
The continuous outputs are converted into binary classes using a threshold of 0.5.

#### 6. Classification Metrics
- **Accuracy**: Overall correctness of predictions.
- **Precision**: Ratio of true positives to total predicted positives.
- **Recall**: Ratio of true positives to actual positives.
- **F1-Score**: Balance between precision and recall.

This gives insight into the model's performance for both regression and classification.


In [27]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score, precision_score, recall_score, f1_score
import numpy as np
from sklearn.linear_model import LinearRegression


# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on the test set (continuous output)
y_pred = model.predict(X_test)

# For regression tasks:
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f'Mean Squared Error (MSE): {mse}')
print(f'Root Mean Squared Error (RMSE): {rmse}')

# For classification (convert regression output to binary class labels):
threshold = 0.5
y_pred_class = np.where(y_pred >= threshold, 1, 0)

# Calculate classification metrics with zero_division=1 to suppress warnings
accuracy = accuracy_score(y_test, y_pred_class)
precision = precision_score(y_test, y_pred_class, average='weighted', zero_division=1)
recall = recall_score(y_test, y_pred_class, average='weighted', zero_division=1)
f1 = f1_score(y_test, y_pred_class, average='weighted', zero_division=1)

print(f'Accuracy: {accuracy}')
print(f'Precision (weighted): {precision}')
print(f'Recall (weighted): {recall}')
print(f'F1-Score (weighted): {f1}')


Mean Squared Error (MSE): 0.10047023227282176
Root Mean Squared Error (RMSE): 0.31697039652437853
Accuracy: 0.85125
Precision (weighted): 0.7765192388122125
Recall (weighted): 0.85125
F1-Score (weighted): 0.7863460625059371


# HOG (Histogram Of Gradients) Feature Extraction


HOG is a feature descriptor used in computer vision for object detection and recognition. It captures the following key features:

1. **Gradient Magnitude**: Represents the strength of edges in the image.

2. **Gradient Orientation**: Indicates the direction of edges, providing shape information.

3. **Histograms of Gradients**: For each cell (e.g., 8x8 pixels), a histogram is created to represent the distribution of gradient orientations.

4. **Normalized Histograms**: Histograms are normalized to improve robustness against lighting variations.

5. **Spatial Arrangement**: Captures the spatial distribution of features within blocks of cells, aiding in pattern recognition.

6. **Multi-Scale Representation**: HOG can be computed at different scales to detect objects of various sizes.

7. **Concatenated Feature Vector**: The final output is a high-dimensional vector combining all normalized histograms.



In [30]:
import os
import cv2
from sklearn.decomposition import PCA
import csv
import numpy as np

def extract_features(image_path):
    # Create a HOG descriptor object
    hog = cv2.HOGDescriptor()
    
    # Read the image from the given path
    img = cv2.imread(image_path)
    
    # Resize the image to (64, 128) pixels for HOG feature extraction
    resized = cv2.resize(img, (64, 128), interpolation=cv2.INTER_AREA)
    
    # Compute the HOG features for the resized image
    h = hog.compute(resized)
    
    # Return the transposed HOG features for consistent shape
    return h.T

def apply_pca(features, num_components):
    # Apply PCA to reduce the dimensionality of the features
    pca = PCA(n_components=num_components)
    
    # Fit and transform the features using PCA
    reduced_features = pca.fit_transform(features)
    
    # Return the reduced feature set
    return reduced_features

def write_to_csv(features, categories, filename):
    # Prepare data to write to a CSV file
    csv_data = []
    for id, line in enumerate(features):
        new_img = line.tolist()  # Convert numpy array to list
        new_img.insert(0, categories[id])  # Insert the corresponding category label
        csv_data.append(new_img)  # Append the data for this image
    
    # Write the data to the specified CSV file
    with open(filename, 'w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerows(csv_data)  # Write all rows to the CSV file

def main():
    num_fea = 100  # Number of PCA components to retain
    category = []  # List to store category labels
    hog_array = []  # List to store HOG features

    # Iterate through folders in the training dataset directory
    for folder in os.listdir("split_combination/train"):
        # Skip specific files that are not directories
        if folder == "my_model2.h5" or folder == "feature_hog":
            continue
            
        # Iterate through files in the current category folder
        for filename in os.listdir(os.path.join("split_combination/train", folder)):
            # Check if the file is an image (png or jpg)
            if filename[-3:] in ["png", "jpg"]:
                image_path = os.path.join("split_combination/train", folder, filename)
                
                # Extract HOG features from the image
                hog_image = extract_features(image_path)
                
                # Append the extracted HOG features to the list
                hog_array.append(hog_image)
                
                # Append the corresponding category label (1 for Glaucoma, 0 for Normal)
                if folder == "Glaucoma":
                    category.append(1)
                elif folder == "Normal":
                    category.append(0)

    # Convert the list of HOG features to a numpy array
    hog_array_np = np.array(hog_array)
    
    # Reshape the HOG array to 2D for PCA processing
    reshaped_hog_array = np.reshape(hog_array_np, (hog_array_np.shape[0], hog_array_np.shape[1]))
    
    # Apply PCA to reduce the dimensionality of the HOG features
    reduced_features = apply_pca(reshaped_hog_array, num_fea)
    
    # Write the reduced features and corresponding categories to a CSV file
    write_to_csv(reduced_features, category, 'split_combination/extracted_features.csv')

    print("Done Extracting Features")  # Indicate completion of feature extraction

if __name__ == "__main__":
    main()  # Run the main function


Done Extracting Features


## Feature Extraction Analysis with Pandas

This Python script utilizes the Pandas library to analyze the features extracted from images of glaucoma and normal classes.

In [32]:
import pandas as pd

data = pd.read_csv("split_combination/extracted_features.csv", header=None)

# Print descriptive statistics
print(data.describe())

# Optional: specify column names
#data = pd.read_csv("/Users/admin/images1/extracted_features.csv", header=None, names=["feature1", "feature2", ...])

# Optional: handle missing values
# data.fillna(data.mean(), inplace=True)

               0             1             2             3             4    \
count  1230.000000  1.230000e+03  1.230000e+03  1.230000e+03  1.230000e+03   
mean      0.304065  2.665456e-07 -7.624276e-08  1.544744e-07  3.441660e-08   
std       0.460197  1.101019e+00  9.826329e-01  9.073421e-01  7.144700e-01   
min       0.000000 -2.427125e+00 -2.103099e+00 -2.507950e+00 -1.195083e+00   
25%       0.000000 -8.274099e-01 -9.001857e-01 -6.544465e-01 -4.344090e-01   
50%       0.000000 -7.740784e-02  7.693342e-02 -1.641440e-01 -1.253433e-01   
75%       1.000000  8.024688e-01  8.716102e-01  5.125992e-01  2.137835e-01   
max       1.000000  2.749097e+00  2.661562e+00  3.445665e+00  3.145810e+00   

                5             6             7             8             9    \
count  1.230000e+03  1.230000e+03  1.230000e+03  1.230000e+03  1.230000e+03   
mean   1.167220e-08  5.359532e-08 -1.609111e-07 -2.230586e-08  6.449627e-08   
std    5.986914e-01  5.337544e-01  5.130715e-01  4.910246e-0

## Data Preparation for Model Training

This focuses on separating the feature set (X) from the target variable (Y) after loading the dataset into a Pandas DataFrame.


- \( X \) represents the feature set, which is created by selecting all rows from the DataFrame (data) and the columns indexed from 1 to 100. This implies that the first column (index 0) is excluded, as it typically contains the labels or target variable.

- \( Y \) represents the target variable, which is the first column of the DataFrame. This column usually contains the labels that the model will learn to predict based on the features in \( X \).

In [34]:
X = data.iloc[:,range(1,101)]
Y = data.iloc[:,0]

## Model Training and Evaluation

This code demonstrates training a Linear Regression model and evaluating its performance. It begins by loading the dataset, assuming the feature set \( X \) and target variable \( Y \) are already defined. The dataset is split into training (70%) and testing (30%) sets using `train_test_split`.

A `LinearRegression` model is then initialized and trained on the training set. Predictions are made on the test set, resulting in continuous output values, which are converted to binary class labels using a threshold of 0.5.

The code calculates various metrics to evaluate the model's performance, including accuracy, precision, recall, and F1 score for classification, along with mean squared error (MSE) and R-squared (R²) for regression. Finally, it prints these metrics to assess how well the model predicts the target variable based on the input features.

In [36]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, mean_squared_error, r2_score

# Load the dataset
# Assuming X and Y are already defined as in your original code

# Split the dataset into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

# Initialize and train the Linear Regression model
regressor = LinearRegression()
regressor.fit(X_train, Y_train)

# Make predictions on the test set
Y_pred_continuous = regressor.predict(X_test)

# Convert continuous predictions to binary class labels (using a threshold, e.g., 0.5)
threshold = 0.5
Y_pred_class = [1 if pred >= threshold else 0 for pred in Y_pred_continuous]

# Calculate classification metrics
accuracy = accuracy_score(Y_test, Y_pred_class)
precision = precision_score(Y_test, Y_pred_class)
recall = recall_score(Y_test, Y_pred_class)
f1 = f1_score(Y_test, Y_pred_class)

# Calculate regression metrics
mse = mean_squared_error(Y_test, Y_pred_continuous)
r2 = r2_score(Y_test, Y_pred_continuous)

# Print the classification results
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

# Print the regression metrics
print(f"Mean Squared Error: {mse:.2f}")



Accuracy: 0.68
Precision: 0.47
Recall: 0.21
F1 Score: 0.29
Mean Squared Error: 0.22
