## Data Exploration and Preprocessing:

Begin by thoroughly exploring your dataset, checking for missing values, outliers, and any inconsistencies in the data.
Perform feature engineering if necessary, creating new features or transforming existing ones to better represent the underlying patterns in the data.
Split your data into training and testing sets to evaluate your model's performance accurately.

## Model Selection:

Start by experimenting with different machine learning models suitable for binary classification tasks. Logistic Regression, Decision Trees, Random Forests, Gradient Boosting Machines (GBM), and Support Vector Machines (SVM) are all potential candidates.
Consider using ensemble methods like Random Forests or Gradient Boosting to improve predictive performance.
Additionally, you may explore deep learning models like Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) if the data exhibits complex temporal patterns.

## Model Training and Evaluation:

Train your chosen models on the training dataset and evaluate their performance using appropriate evaluation metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
Perform hyperparameter tuning using techniques like grid search or random search to optimize model performance.

## Model Interpretation and Validation:

After selecting the best-performing model, validate it using the testing dataset to ensure its generalization to unseen data.
Interpret the model's predictions and identify the most influential features contributing to seizure prediction. This step is crucial for understanding the model's decision-making process and gaining insights into the underlying relationships in the data.

## Deployment and Monitoring:

Once you're satisfied with your model's performance, deploy it in a real-world setting where it can assist in predicting seizures.
Implement a monitoring system to track the model's performance over time and ensure its continued accuracy and reliability. Regularly update the model as new data becomes available or as the underlying patterns in the data change.

In [21]:
import pandas as pd
import numpy as np

In [22]:
df = pd.read_csv("mat.csv")

In [23]:
df.head()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
0,25,130,80,15.0,98.0,86,high risk
1,35,140,90,13.0,98.0,70,high risk
2,29,90,70,8.0,100.0,80,high risk
3,30,140,85,7.0,98.0,70,high risk
4,35,120,60,6.1,98.0,76,low risk


In [27]:
# Assuming df is your DataFrame with the temperature column named 'BodyTemp'
# Convert Fahrenheit to Celsius for the entire column and round to 1 decimal place
df['BodyTemp'] = ((df['BodyTemp'] - 32) * 5 / 9).round(1)

# Rename the column to indicate it's now in Celsius
df.rename(columns={'BodyTemp': 'BodyTemp_Celsius'}, inplace=True)

# Save the updated DataFrame back to the original dataset
# Assuming 'original_dataset.csv' is the name of your original dataset file
df.to_csv('original_dataset.csv', index=False)


In [29]:
df.head(30)

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp_Celsius,HeartRate,RiskLevel
0,25,130,80,15.0,36.7,86,high risk
1,35,140,90,13.0,36.7,70,high risk
2,29,90,70,8.0,37.8,80,high risk
3,30,140,85,7.0,36.7,70,high risk
4,35,120,60,6.1,36.7,76,low risk
5,23,140,80,7.01,36.7,70,high risk
6,23,130,70,7.01,36.7,78,mid risk
7,35,85,60,11.0,38.9,86,high risk
8,32,120,90,6.9,36.7,70,mid risk
9,42,130,80,18.0,36.7,70,high risk


In [30]:
body_occ = df['BodyTemp_Celsius'].value_counts()
print(body_occ)

BodyTemp_Celsius
36.7    804
38.3     98
38.9     66
37.8     20
39.4     13
37.2     10
36.9      2
37.0      1
Name: count, dtype: int64


In [31]:
# Assuming df is your DataFrame with the temperature column named 'BodyTemp_Celsius'
# Create a new column 'Seizure' based on conditions
df['Seizure'] = df['BodyTemp_Celsius'].apply(lambda x: 'Yes' if x not in [36.7, 36.9] else 'No')

# Display the updated DataFrame
print(df)


      Age  SystolicBP  DiastolicBP    BS  BodyTemp_Celsius  HeartRate  \
0      25         130           80  15.0              36.7         86   
1      35         140           90  13.0              36.7         70   
2      29          90           70   8.0              37.8         80   
3      30         140           85   7.0              36.7         70   
4      35         120           60   6.1              36.7         76   
...   ...         ...          ...   ...               ...        ...   
1009   22         120           60  15.0              36.7         80   
1010   55         120           90  18.0              36.7         60   
1011   35          85           60  19.0              36.7         86   
1012   43         120           90  18.0              36.7         70   
1013   32         120           65   6.0              38.3         76   

      RiskLevel Seizure  
0     high risk      No  
1     high risk      No  
2     high risk     Yes  
3     high risk    

In [35]:
df.columns

Index(['Age', 'SystolicBP', 'DiastolicBP', 'BS', 'BodyTemp_Celsius',
       'HeartRate', 'RiskLevel', 'Seizure'],
      dtype='object')

In [36]:


# Assuming df is your DataFrame

# Remove unwanted columns
df.drop(['SystolicBP', 'DiastolicBP', 'BS'], axis=1, inplace=True)

# Add a new column with random records of time and date
# Generating random timestamps between a specified range
start_date = '2024-01-01'
end_date = '2024-05-16'
num_records = len(df)

random_dates = pd.date_range(start=start_date, end=end_date, periods=num_records)

# Adding random timestamps to DataFrame
df['RandomDateTime'] = np.random.choice(random_dates, num_records)

# Display the updated DataFrame
print(df)


      Age  BodyTemp_Celsius  HeartRate  RiskLevel Seizure  \
0      25              36.7         86  high risk      No   
1      35              36.7         70  high risk      No   
2      29              37.8         80  high risk     Yes   
3      30              36.7         70  high risk      No   
4      35              36.7         76   low risk      No   
...   ...               ...        ...        ...     ...   
1009   22              36.7         80  high risk      No   
1010   55              36.7         60  high risk      No   
1011   35              36.7         86  high risk      No   
1012   43              36.7         70  high risk      No   
1013   32              38.3         76   mid risk     Yes   

                    RandomDateTime  
0    2024-02-11 01:57:59.170779861  
1    2024-01-15 08:45:57.749259624  
2    2024-02-21 13:17:28.371174728  
3    2024-03-12 00:29:51.115498519  
4    2024-03-12 13:23:09.536031589  
...                            ...  
1009 202

In [43]:
import pandas as pd
import numpy as np
import random

# Assuming df is your DataFrame

# Remove unwanted columns

# Add Gender column with random values
genders = ['Male', 'Female']
df['Gender'] = np.random.choice(genders, size=len(df))

# Add Patient ID column with random numbers
df['PatientID'] = np.random.randint(100000, 999999, size=len(df))

# Add a new column with random records of time and date
start_date = '2024-01-01'
end_date = '2024-05-16'
num_records = len(df)

random_dates = pd.date_range(start=start_date, end=end_date, periods=num_records)
df['DateTime'] = np.random.choice(random_dates, num_records)

# Display the updated DataFrame
print(df)
# df.to_csv('original_dataset1.csv', index=False)



      Age  BodyTemp_Celsius  HeartRate  RiskLevel Seizure  \
0      25              36.7         86  high risk      No   
1      35              36.7         70  high risk      No   
2      29              37.8         80  high risk     Yes   
3      30              36.7         70  high risk      No   
4      35              36.7         76   low risk      No   
...   ...               ...        ...        ...     ...   
1009   22              36.7         80  high risk      No   
1010   55              36.7         60  high risk      No   
1011   35              36.7         86  high risk      No   
1012   43              36.7         70  high risk      No   
1013   32              38.3         76   mid risk     Yes   

                    RandomDateTime  Gender  PatientID  \
0    2024-02-17 02:57:41.401776900    Male     563246   
1    2024-02-29 17:50:24.284304047  Female     556511   
2    2024-05-14 09:20:04.738400790    Male     196109   
3    2024-03-31 02:02:15.044422507    M

In [45]:
data1 = pd.read_csv("original_dataset1.csv")
data1.head()

Unnamed: 0,Age,BodyTemp_Celsius,HeartRate,RiskLevel,Seizure,RandomDateTime,Gender,PatientID,DateTime
0,25,36.7,86,high risk,No,2024-02-17 02:57:41.401776900,Female,749979,2024-02-28 12:50:27.838104639
1,35,36.7,70,high risk,No,2024-02-29 17:50:24.284304047,Male,549917,2024-04-02 18:28:47.147087857
2,29,37.8,80,high risk,Yes,2024-05-14 09:20:04.738400790,Male,561158,2024-01-17 15:32:31.036525172
3,30,36.7,70,high risk,No,2024-03-31 02:02:15.044422507,Male,483038,2024-03-03 10:16:56.386969397
4,35,36.7,76,low risk,No,2024-04-03 00:55:26.357354392,Female,466512,2024-03-12 00:29:51.115498519


In [46]:
data1.drop(['RandomDateTime', 'RiskLevel'], axis=1, inplace=True)


In [51]:
data1.head()
data_c = data1.copy()
data1.to_csv('dataset.csv', index=False)


Unnamed: 0,Age,BodyTemp_Celsius,HeartRate,Seizure,Gender,PatientID,DateTime
0,25,36.7,86,No,Female,749979,2024-02-28 12:50:27.838104639
1,35,36.7,70,No,Male,549917,2024-04-02 18:28:47.147087857
2,29,37.8,80,Yes,Male,561158,2024-01-17 15:32:31.036525172
3,30,36.7,70,No,Male,483038,2024-03-03 10:16:56.386969397
4,35,36.7,76,No,Female,466512,2024-03-12 00:29:51.115498519


In [50]:
Seiz_count = df['Seizure'].value_counts()
print(Seiz_count)

Seizure
No     806
Yes    208
Name: count, dtype: int64
