# About Dataset
Context

An experiment on the effects of anti-anxiety medicine on memory recall when being primed with happy or sad memories. The participants were done on novel Islanders whom mimic real-life humans in response to external factors.

Drugs of interest (known-as) [Dosage 1, 2, 3]:

A - Alprazolam (Xanax, Long-term) [1mg/3mg/5mg]

T - Triazolam (Halcion, Short-term) [0.25mg/0.5mg/0.75mg]

S- Sugar Tablet (Placebo) [1 tab/2tabs/3tabs]

* Dosages follow a 1:1 ratio to ensure validity
* Happy or Sad memories were primed 10 minutes prior to testing
* Participants tested every day for 1 week to mimic addiction

###Imports

In [51]:
!pip install catboost
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import LinearSVC, SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

import warnings
warnings.filterwarnings(action='ignore')



###Load the Data

In [52]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("steveahn/memory-test-on-drugged-islanders-data")

print("Path to dataset files:", path)

Using Colab cache for faster access to the 'memory-test-on-drugged-islanders-data' dataset.
Path to dataset files: /kaggle/input/memory-test-on-drugged-islanders-data


In [53]:
import os

download_path = "/root/.cache/kagglehub/datasets/steveahn/memory-test-on-drugged-islanders-data/versions/1"
print(os.listdir(download_path))

['Islander_data.csv']


In [54]:
df = pd.read_csv("/root/.cache/kagglehub/datasets/steveahn/memory-test-on-drugged-islanders-data/versions/1/Islander_data.csv")
display(df.head())

Unnamed: 0,first_name,last_name,age,Happy_Sad_group,Dosage,Drug,Mem_Score_Before,Mem_Score_After,Diff
0,Bastian,Carrasco,25,H,1,A,63.5,61.2,-2.3
1,Evan,Carrasco,52,S,1,A,41.6,40.7,-0.9
2,Florencia,Carrasco,29,H,1,A,59.7,55.1,-4.6
3,Holly,Carrasco,50,S,1,A,51.7,51.2,-0.5
4,Justin,Carrasco,52,H,1,A,47.0,47.1,0.1


####Explore the data


In [55]:
df.isnull().sum()

Unnamed: 0,0
first_name,0
last_name,0
age,0
Happy_Sad_group,0
Dosage,0
Drug,0
Mem_Score_Before,0
Mem_Score_After,0
Diff,0


In [56]:
df.duplicated().sum()

np.int64(0)

In [57]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 198 entries, 0 to 197
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   first_name        198 non-null    object 
 1   last_name         198 non-null    object 
 2   age               198 non-null    int64  
 3   Happy_Sad_group   198 non-null    object 
 4   Dosage            198 non-null    int64  
 5   Drug              198 non-null    object 
 6   Mem_Score_Before  198 non-null    float64
 7   Mem_Score_After   198 non-null    float64
 8   Diff              198 non-null    float64
dtypes: float64(3), int64(2), object(4)
memory usage: 14.1+ KB


###Preprocessing

In [58]:
df = df.drop(columns=["first_name","last_name"])

In [59]:

df["Drug"].value_counts()


Unnamed: 0_level_0,count
Drug,Unnamed: 1_level_1
A,67
S,66
T,65


In [60]:
df["Happy_Sad_group"].value_counts()

Unnamed: 0_level_0,count
Happy_Sad_group,Unnamed: 1_level_1
H,99
S,99


In [61]:
df['Happy_Sad_group'] = df['Happy_Sad_group'].replace({'H': 0, 'S': 1})
df['Drug'] = df['Drug'].replace({'A': 0, 'T': 1, 'S': 2})

display(df.head())

Unnamed: 0,age,Happy_Sad_group,Dosage,Drug,Mem_Score_Before,Mem_Score_After,Diff
0,25,0,1,0,63.5,61.2,-2.3
1,52,1,1,0,41.6,40.7,-0.9
2,29,0,1,0,59.7,55.1,-4.6
3,50,1,1,0,51.7,51.2,-0.5
4,52,0,1,0,47.0,47.1,0.1


In [62]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 198 entries, 0 to 197
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   age               198 non-null    int64  
 1   Happy_Sad_group   198 non-null    int64  
 2   Dosage            198 non-null    int64  
 3   Drug              198 non-null    int64  
 4   Mem_Score_Before  198 non-null    float64
 5   Mem_Score_After   198 non-null    float64
 6   Diff              198 non-null    float64
dtypes: float64(3), int64(4)
memory usage: 11.0 KB


### Split df into X and y

In [63]:
y = df['Drug']
x = df.drop('Drug', axis=1)

###Training

In [64]:
X_train, X_test, y_train, y_test = train_test_split(

                                                    x, y,
                                                    train_size=0.7,
                                                    shuffle=True,
                                                    random_state=1

                                                    )

In [65]:
scaler = StandardScaler()

In [66]:
X_train=scaler.fit_transform(X_train)

In [67]:
X_train

array([[-0.30224741,  1.        ,  1.20671599,  0.24918296, -0.15185019,
        -0.59473473],
       [ 0.90925051,  1.        ,  0.02567481,  1.22103768,  1.03847128,
        -0.09220794],
       [ 1.4284639 ,  1.        ,  0.02567481,  0.43850531, -0.68466075,
        -1.70747262],
       [-1.16760306, -1.        , -1.15536637,  0.41326233,  0.69837943,
         0.51800316],
       [-0.21571184,  1.        , -1.15536637,  0.56472021, -0.117841  ,
        -0.98957721],
       [ 0.04389485, -1.        ,  1.20671599, -0.24305514, -0.36157349,
        -0.22681333],
       [ 1.4284639 ,  1.        , -1.15536637, -0.52072791,  0.22791904,
         1.10129318],
       [-0.38878297, -1.        ,  1.20671599,  0.47636978,  3.05068137,
         4.15234867],
       [-0.21571184,  1.        ,  1.20671599, -0.26198737,  0.85142076,
         1.72047797],
       [ 1.1688572 ,  1.        ,  1.20671599, -0.1673262 ,  1.58828643,
         2.75245262],
       [ 0.30350155,  1.        ,  1.20671599, -0.

In [68]:
X_train

array([[-0.30224741,  1.        ,  1.20671599,  0.24918296, -0.15185019,
        -0.59473473],
       [ 0.90925051,  1.        ,  0.02567481,  1.22103768,  1.03847128,
        -0.09220794],
       [ 1.4284639 ,  1.        ,  0.02567481,  0.43850531, -0.68466075,
        -1.70747262],
       [-1.16760306, -1.        , -1.15536637,  0.41326233,  0.69837943,
         0.51800316],
       [-0.21571184,  1.        , -1.15536637,  0.56472021, -0.117841  ,
        -0.98957721],
       [ 0.04389485, -1.        ,  1.20671599, -0.24305514, -0.36157349,
        -0.22681333],
       [ 1.4284639 ,  1.        , -1.15536637, -0.52072791,  0.22791904,
         1.10129318],
       [-0.38878297, -1.        ,  1.20671599,  0.47636978,  3.05068137,
         4.15234867],
       [-0.21571184,  1.        ,  1.20671599, -0.26198737,  0.85142076,
         1.72047797],
       [ 1.1688572 ,  1.        ,  1.20671599, -0.1673262 ,  1.58828643,
         2.75245262],
       [ 0.30350155,  1.        ,  1.20671599, -0.

In [69]:
models = {
    "                   Logistic Regression": LogisticRegression(),
    "                   K-Nearest Neighbors": KNeighborsClassifier(),
    "                         Decision Tree": DecisionTreeClassifier(),
    "Support Vector Machine (Linear Kernel)": LinearSVC(),
    "   Support Vector Machine (RBF Kernel)": SVC(),
    "                        Neural Network": MLPClassifier(),
    "                         Random Forest": RandomForestClassifier(),
    "                     Gradient Boosting": GradientBoostingClassifier(),
    "                               XGBoost": XGBClassifier(eval_metric='mlogloss'),
    "                              LightGBM": LGBMClassifier(),
    "                              CatBoost": CatBoostClassifier(verbose=0)
}

for name, model in models.items():
    model.fit(X_train, y_train)
    print(name + " trained.")

                   Logistic Regression trained.
                   K-Nearest Neighbors trained.
                         Decision Tree trained.
Support Vector Machine (Linear Kernel) trained.
   Support Vector Machine (RBF Kernel) trained.
                        Neural Network trained.
                         Random Forest trained.
                     Gradient Boosting trained.
                               XGBoost trained.
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000043 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 169
[LightGBM] [Info] Number of data points in the train set: 138, number of used features: 6
[LightGBM] [Info] Start training from score -1.143064
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.056053
                              LightGBM trained.
                              CatBoost trained.


In [70]:
for name, model in models.items():
    model.fit(X_train, y_train)
    print(name + " trained.")

                   Logistic Regression trained.
                   K-Nearest Neighbors trained.
                         Decision Tree trained.
Support Vector Machine (Linear Kernel) trained.
   Support Vector Machine (RBF Kernel) trained.
                        Neural Network trained.
                         Random Forest trained.
                     Gradient Boosting trained.
                               XGBoost trained.
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000041 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 169
[LightGBM] [Info] Number of data points in the train set: 138, number of used features: 6
[LightGBM] [Info] Start training from score -1.143064
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.056053
                              LightGBM trained.
                              CatBoost trained.


In [71]:
for name, model in models.items():
    print(name + ": {:.2f}%".format(model.score(X_test, y_test) * 100))

                   Logistic Regression: 38.33%
                   K-Nearest Neighbors: 38.33%
                         Decision Tree: 41.67%
Support Vector Machine (Linear Kernel): 38.33%
   Support Vector Machine (RBF Kernel): 31.67%
                        Neural Network: 33.33%
                         Random Forest: 41.67%
                     Gradient Boosting: 41.67%
                               XGBoost: 41.67%
                              LightGBM: 41.67%
                              CatBoost: 40.00%
