# 使用 MindSpore 复现 MRSA 活性预测

**目标**: 本 Notebook 调用 `mindspore_chem` 包中的模块，来训练、验证并测试一个用于预测 MRSA 活性的图神经网络模型。

## 1. 导入必要的库和模块

In [1]:
import pandas as pd
import mindspore
from mindspore import context

from data_pre import split_data
from train import run_training, run_testing

context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU")

print('------------------------')

  if not dirpath.find("AppData\Local\Temp"):
  """
  """
  setattr(self, word, getattr(machar, word).flat[0])
  setattr(self, word, getattr(machar, word).flat[0])
  if _ is not 1:


------------------------


## 2. 定义实验参数

我们将所有可调参数集中定义在这里，方便修改和管理。

In [2]:
class Args:
    # --- 数据和路径参数 ---
    data_path = 'data/bbbp.csv'
    smiles_column = 'smiles'
    target_column = 'p_np'
    save_dir = 'bbbp_save_model'
    plot_save_dir = 'bbbp_plot'

    # --- 数据集划分参数 ---
    split_type = 'random' # 可选 'random' 或 'scaffold'
    split_sizes = [0.8, 0.1, 0.1]

    # --- 模型超参数 ---
    hidden_size = 300
    depth = 3
    dropout = 0.1

    # --- 训练超参数 ---
    epochs = 10
    batch_size = 32
    learning_rate = 1e-4

args = Args()
print('------------------------')

------------------------


## 3. 执行主流程

现在，我们按顺序执行数据加载、划分、训练和测试。

In [3]:
# ---加载数据 ---
try:
    df = pd.read_csv(args.data_path)
    print(f"Successfully loaded data from '{args.data_path}'. Total molecules: {len(df)}")
    display(df.head())
except FileNotFoundError:
    print(f"Error: Data file not found at '{args.data_path}'")
    print("Please create a CSV file with 'SMILES' and 'ACTIVITY' columns, and update the `data_path` argument.")

if 'df' in locals():
    # ---划分数据集 ---
    train_data, val_data, test_data = split_data(
        df=df, 
        smiles_column=args.smiles_column, 
        target_column=args.target_column, 
        split_type=args.split_type, 
        split_sizes=args.split_sizes
    )
    
print('------------------------')

Successfully loaded data from 'data/bbbp.csv'. Total molecules: 2050


Unnamed: 0,num,name,p_np,smiles
0,1,Propanolol,1,[Cl].CC(C)NCC(O)COc1cccc2ccccc12
1,2,Terbutylchlorambucil,1,C(=O)(OC(C)(C)C)CCCc1ccc(cc1)N(CCCl)CCCl
2,3,40730,1,c12c3c(N4CCN(C)CC4)c(F)cc1c(c(C(O)=O)cn2C(C)CO...
3,4,24,1,C1CCN(CC1)Cc1cccc(c1)OCCCNC(=O)C
4,5,cloxacillin,1,Cc1onc(c2ccccc2Cl)c1C(=O)N[C@H]3[C@H]4SC(C)(C)...


Splitting data with method: 'random'
Data split sizes: Train=1639, Validation=206, Test=205
------------------------


## 4. 训练并获取最佳模型路径

In [None]:
best_model_path = run_training(args, train_data, val_data)
print('------------------------')

## 5. 使用最佳模型进行测试

In [5]:
run_testing(args, test_data, best_model_path)
print('------------------------')


Starting final testing...
Loaded best model from: bbbp_save_model/best_model.ckpt


Testing: 100%|██████████| 7/7 [00:00<00:00,  9.81it/s]



===== Final Test Results =====
  Test Set AUC:                 0.8083
  Test Set Accuracy:            0.8000
  Test Set Average Precision:   0.9060
Evaluation summary plot saved to bbbp_plot/testset_performance_evaluation_curves.png
------------------------
