This code reads in the data from the parkinsons.csv file in your Dataset directory and separates the input features (features_normalized ) from the output class (labels). Note that we also drop the name column since it is not a useful feature for predicting Parkinson’s disease. We then create a MinMaxScaler object from scikit-learn and fit it on the input features. Finally, we use the scaler to transform the input features, resulting in a normalized version of the data.

In [2]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from sklearn.model_selection import train_test_split

# Read in the data from the CSV file
data = pd.read_csv('Dataset/parkinsons.csv')

# Separate the input features and the output class
features = data.drop(['name', 'status'], axis=1)
labels = data['status']

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit the scaler on the input features
scaler.fit(features)

# Transform the input features using the scaler
features_normalized = scaler.transform(features)

# Split the data into training, validation, and test sets
train_features, test_features, train_labels, test_labels = train_test_split(features_normalized, labels, test_size=0.3, random_state=42)
val_features, test_features, val_labels, test_labels = train_test_split(test_features, test_labels, test_size=0.5, random_state=42)

# Save the pre-processed data to files
np.save('Pre_Processed_Data/train_features.npy', train_features)
np.save('Pre_Processed_Data/train_labels.npy', train_labels)
np.save('Pre_Processed_Data/val_features.npy', val_features)
np.save('Pre_Processed_Data/val_labels.npy', val_labels)
np.save('Pre_Processed_Data/test_features.npy', test_features)
np.save('Pre_Processed_Data/test_labels.npy', test_labels)