This notebook downloads the provided embedding dataset from the study and runs logistic regression to compute AUC-ROC for selected ADR endpoints.

In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Assuming 'embeddings.csv' contains SAE-derived features and labels for ADRs
# The dataset should have columns for features (f1, f2, ..., fn) and a column 'label' for the ADR outcome

data = pd.read_csv('embeddings.csv')
features = data.drop(columns=['label'])
labels = data['label']

model = LogisticRegression(max_iter=1000)
model.fit(features, labels)
predictions = model.predict_proba(features)[:, 1]
auc_roc = roc_auc_score(labels, predictions)
print('AUC-ROC:', auc_roc)


The above code demonstrates how SAE-derived features can be input into a logistic regression model for classification of ADRs. This is useful for validating the transparency and predictive performance discussed in the paper.

In [None]:
# For visualization, we can plot the ROC curve using matplotlib and seaborn
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(labels, predictions)
plt.figure(figsize=(8,6))
plt.plot(fpr, tpr, color='#6A0C76', lw=2, label=f'ROC curve (area = {auc_roc:.3f})')
plt.plot([0, 1], [0, 1], color='grey', lw=1, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for ADR Classification')
plt.legend(loc='lower right')
plt.show()


This concludes our Jupyter Notebook section for reproducing the SAE-based ADR classification analysis from the paper.





***
### [**Evolve This Code**](https://biologpt.com/?q=Evolve%20Code%3A%20This%20code%20downloads%20and%20processes%20SAE-derived%20embedding%20data%20from%20the%20study%27s%20dataset%20to%20reproduce%20ADR%20classification%20metrics.%0A%0AIncorporate%20cross-validation%20and%20stratification%20for%20enhanced%20model%20reliability%20and%20include%20external%20validation%20datasets.%0A%0ALarge%20Language%20Models%20Adverse%20Drug%20Reactions%20Hidden%20States%20Analysis%0A%0AThis%20notebook%20downloads%20the%20provided%20embedding%20dataset%20from%20the%20study%20and%20runs%20logistic%20regression%20to%20compute%20AUC-ROC%20for%20selected%20ADR%20endpoints.%0A%0Aimport%20pandas%20as%20pd%0Aimport%20numpy%20as%20np%0Afrom%20sklearn.linear_model%20import%20LogisticRegression%0Afrom%20sklearn.metrics%20import%20roc_auc_score%0A%0A%23%20Assuming%20%27embeddings.csv%27%20contains%20SAE-derived%20features%20and%20labels%20for%20ADRs%0A%23%20The%20dataset%20should%20have%20columns%20for%20features%20%28f1%2C%20f2%2C%20...%2C%20fn%29%20and%20a%20column%20%27label%27%20for%20the%20ADR%20outcome%0A%0Adata%20%3D%20pd.read_csv%28%27embeddings.csv%27%29%0Afeatures%20%3D%20data.drop%28columns%3D%5B%27label%27%5D%29%0Alabels%20%3D%20data%5B%27label%27%5D%0A%0Amodel%20%3D%20LogisticRegression%28max_iter%3D1000%29%0Amodel.fit%28features%2C%20labels%29%0Apredictions%20%3D%20model.predict_proba%28features%29%5B%3A%2C%201%5D%0Aauc_roc%20%3D%20roc_auc_score%28labels%2C%20predictions%29%0Aprint%28%27AUC-ROC%3A%27%2C%20auc_roc%29%0A%0A%0AThe%20above%20code%20demonstrates%20how%20SAE-derived%20features%20can%20be%20input%20into%20a%20logistic%20regression%20model%20for%20classification%20of%20ADRs.%20This%20is%20useful%20for%20validating%20the%20transparency%20and%20predictive%20performance%20discussed%20in%20the%20paper.%0A%0A%23%20For%20visualization%2C%20we%20can%20plot%20the%20ROC%20curve%20using%20matplotlib%20and%20seaborn%0Aimport%20matplotlib.pyplot%20as%20plt%0Afrom%20sklearn.metrics%20import%20roc_curve%0A%0Afpr%2C%20tpr%2C%20thresholds%20%3D%20roc_curve%28labels%2C%20predictions%29%0Aplt.figure%28figsize%3D%288%2C6%29%29%0Aplt.plot%28fpr%2C%20tpr%2C%20color%3D%27%236A0C76%27%2C%20lw%3D2%2C%20label%3Df%27ROC%20curve%20%28area%20%3D%20%7Bauc_roc%3A.3f%7D%29%27%29%0Aplt.plot%28%5B0%2C%201%5D%2C%20%5B0%2C%201%5D%2C%20color%3D%27grey%27%2C%20lw%3D1%2C%20linestyle%3D%27--%27%29%0Aplt.xlabel%28%27False%20Positive%20Rate%27%29%0Aplt.ylabel%28%27True%20Positive%20Rate%27%29%0Aplt.title%28%27ROC%20Curve%20for%20ADR%20Classification%27%29%0Aplt.legend%28loc%3D%27lower%20right%27%29%0Aplt.show%28%29%0A%0A%0AThis%20concludes%20our%20Jupyter%20Notebook%20section%20for%20reproducing%20the%20SAE-based%20ADR%20classification%20analysis%20from%20the%20paper.%0A%0A)
***

### [Created with BioloGPT](https://biologpt.com/?q=Paper%20Review%3A%20Probing%20Large%20Language%20Model%20Hidden%20States%20for%20Adverse%20Drug%20Reaction%20Knowledge)
[![BioloGPT Logo](https://biologpt.com/static/icons/bioinformatics_wizard.png)](https://biologpt.com/)
***