# Task 4: Human in the loop

In this task, I choose an instance_id = 8. Below is the detailed information about the chosen task.

**"task_inst"** (str): task goal description and output formatting instruction

Perform backward feature selection using logistic regression to identify the most relevant chemical features for predicting signal inhibition from the DKPES dataset. Binarize the signal inhibition values using appropriate threshold. Visualize the accuracy as a function of the number of selected features. Save the plot as "pred_results/dkpes_feature_selection_analysis_pred.png".

**"github_name"** (str): the original github repository each task is adapted from

psa-lab/predicting-activity-by-machine-learning

**"domain_knowledge"** (str): expert-annotated information about the task

Backward feature selection is a feature selection technique where all available features are initially included in the fitted model. Then, features are sequentially removed based on their significance, with the least predictive feature being removed and a new model being fitted with the remaining features at each step. This process continues until the model's performance starts to decline or until some termination condition is met. To perform the backward feature selection, one can use SFS function from `mlxtend` library with appropriate arguments (e.g., set 'forward' to False).

**"src_file_or_path" (str)**: source program location in the original github repository that is adapted

code/dkpes_fgroup_analysis.ipynb

**"gold_program_name"**(str): name of annotated program (reference solution) for each task

dkpes_visualization_3.py

**"output_fname"** (str): output location to save the generated program for each task

pred_results/dkpes_feature_selection_analysis_pred.png

**"dataset_preview"** (str): string representation of the first few examples/lines in dataset files used in each task

[START Preview of dkpes/dkpes_train.csv]
index,Signal-inhibition,3-Keto,3-Hydroxy,12-Keto,12-Hydroxy,19-Methyl,18-Methyl,Sulfate-Ester,Sulfate-Oxygens,C4-C5-DB,C6-C7-DB,Sulfur,ShapeQuery,TanimotoCombo,ShapeTanimoto,ColorTanimoto,FitTverskyCombo,FitTversky,FitColorTversky,RefTverskyCombo,RefTversky,RefColorTversky,ScaledColor,ComboScore,ColorScore,Overlap
ZINC04026280,0.24,0,0,0,0,0,1,0,0,0,0,0,DKPES_CSD_MMMF_1_32,1.184,0.708,0.476,1.692,0.886,0.806,1.316,0.779,0.537,0.528,1.235,-5.804,1045.931
ZINC78224296,0.278,0,0,0,0,0,1,0,3,0,0,1,DKPES_CSD_MMMF_1_31,1.063,0.765,0.298,1.346,0.904,0.442,1.31,0.832,0.478,0.48,1.245,-5.278,1122.302
ZINC01532179,0.686,0,0,0,0,0,0,1,3,0,0,1,DKPES_CSD_MMMF_1_16,0.965,0.633,0.332,1.896,1.143,0.752,0.959,0.586,0.373,0.363,0.995,-3.988,770.823
...
[END Preview of dkpes/dkpes_train.csv]

After I extract some important messages from the task. I feed them to the ChatGPT, hoping to get a general direction first.

In [16]:
# pip install mlxtend

In [17]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
import os

# === Load dataset ===
data = pd.read_csv("dkpes.csv")

# === Data preprocessing ===
# Drop non-feature columns
data = data.drop(columns=["index", "ShapeQuery"])

# Separate features and target
y = data["Signal-inhibition"]
X = data.drop(columns=["Signal-inhibition"])

# === Binarize target variable ===
# Use median as threshold
threshold = y.median()
y_bin = (y >= threshold).astype(int)

# === Split data ===
X_train, X_test, y_train, y_test = train_test_split(
    X, y_bin, test_size=0.2, random_state=42, stratify=y_bin
)

# === Standardize features ===
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# === Logistic Regression model ===
lr = LogisticRegression(max_iter=1000, solver="liblinear")

# === Backward Feature Selection ===
sfs = SFS(
    lr,
    k_features=(1, X.shape[1]),
    forward=False,
    floating=False,
    scoring="accuracy",
    cv=5,
    n_jobs=-1,
    verbose=1
)

sfs = sfs.fit(X_train_scaled, y_train)

# === Extract results ===
num_features = []
accuracy = []

for k in range(1, len(sfs.subsets_) + 1):
    num_features.append(k)
    accuracy.append(sfs.subsets_[k]['avg_score'])

# === Visualization ===
plt.figure(figsize=(8, 6))
plt.plot(num_features, accuracy, marker='o', linestyle='-', color='b')
plt.title("Backward Feature Selection (Logistic Regression)")
plt.xlabel("Number of Selected Features")
plt.ylabel("Cross-Validated Accuracy")
plt.grid(True)

# Create output directory
os.makedirs("pred_results", exist_ok=True)
plt.savefig("pred_results/dkpes_feature_selection_analysis_pred.png", dpi=300)
plt.close()

print("✅ Feature selection analysis complete.")
print("Plot saved as: pred_results/dkpes_feature_selection_analysis_pred.png")


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done  11 out of  24 | elapsed:    3.0s remaining:    3.6s
[Parallel(n_jobs=-1)]: Done  24 out of  24 | elapsed:    3.2s finished
Features: 23/1[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 out of  23 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  23 out of  23 | elapsed:    1.2s finished
Features: 22/1[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of  22 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  22 out of  22 | elapsed:    0.0s finished
Features: 21/1[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of  21 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  21 out of  21 | elapsed:    0.0s finished
Features: 20/1[Parallel(n_jobs=-1)]: Using

✅ Feature selection analysis complete.
Plot saved as: pred_results/dkpes_feature_selection_analysis_pred.png


To evaluate if the graph is right, I run `python dkpes_visualization_3.py` from ScienceAgentBench in terminal to get the golden result.

This is a reference result graph.

<img src="pred_results\dkpes_feature_selection_analysis_pred_GOLD.png" alt="golden" width="600"/>

This is the graph from the code that mainly constructed by ChatGPT.


<img src="pred_results\dkpes_feature_selection_analysis_pred.png" alt="gpt" width="600"/>


From the above two graphs, we can see that the results are similar because they both indicate that the turning point is approximately at 5 and the accuracy is approximately at 70%. So the code mainly generated from the ChatGPT works well in this task. The total time taken was 30 minutes, which was 2/3 shorter than that of Task 1.