# Machine Learning Model for Cell Type Classification using Weizmann C3A Dataset

This notebook presents a reference machine learning model designed for the classification of cell types based on the Weizmann C3A dataset using walking through data loading, preprocessing, model training with a RandomForestClassifier, and performance evaluation using ROC AUC scores.

In [1]:
from aiondata import Weizmann3CA
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Load Puram et al. 2017 data
cells, _, _, exp_data = Weizmann3CA()["Puram et al. 2017"]
X = exp_data.T
y = cells["cell_type"].to_list()
cells["cell_type"].value_counts()


cell_type,count
str,u32
"""Macrophage""",98
"""B_cell""",138
"""Mast""",120
"""Malignant""",2539
"""Myocyte""",19
"""Fibroblast""",1440
"""Endothelial""",260
"""T_cell""",1237
"""Dendritic""",51


In [2]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=18)

classifier = RandomForestClassifier(random_state=18)
classifier.fit(X_train, y_train)

y_pred_proba = classifier.predict_proba(X_test)

auc = roc_auc_score(y_test, y_pred_proba, multi_class="ovo", average="macro")

print(f"AUC: {auc:.2f}")

AUC: 1.00
