# 使用 MindSpore 复现 MRSA 活性预测

**目标**: 本 Notebook 调用 `mindspore_chem` 包中的模块，来训练、验证并测试一个用于预测 MRSA 活性的图神经网络模型。

## 1. 导入必要的库和模块

In [1]:
import pandas as pd
import mindspore
from mindspore import context

from data_pre import split_data
from train import run_training, run_testing

context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU")

print('------------------------')

  if not dirpath.find("AppData\Local\Temp"):
  """
  """
  setattr(self, word, getattr(machar, word).flat[0])
  setattr(self, word, getattr(machar, word).flat[0])
  if _ is not 1:


------------------------


## 2. 定义实验参数

我们将所有可调参数集中定义在这里，方便修改和管理。

In [2]:
class Args:
    # --- 数据和路径参数 ---
    data_path = 'data/mrsa.csv'
    smiles_column = 'SMILES'
    target_column = 'ACTIVITY'
    save_dir = 'mindspore_mrsa_model'
    plot_save_dir = 'mrsa_plot'

    # --- 数据集划分参数 ---
    split_type = 'scaffold' # 可选 'random' 或 'scaffold'
    split_sizes = [0.8, 0.1, 0.1]

    # --- 模型超参数 ---
    hidden_size = 300
    depth = 3
    dropout = 0.1

    # --- 训练超参数 ---
    epochs = 10
    batch_size = 32
    learning_rate = 1e-4

args = Args()
print('------------------------')

------------------------


## 3. 执行主流程

现在，我们按顺序执行数据加载、划分、训练和测试。

In [3]:
# ---加载数据 ---
try:
    df = pd.read_csv(args.data_path)
    print(f"Successfully loaded data from '{args.data_path}'. Total molecules: {len(df)}")
    display(df.head())
except FileNotFoundError:
    print(f"Error: Data file not found at '{args.data_path}'")
    print("Please create a CSV file with 'SMILES' and 'ACTIVITY' columns, and update the `data_path` argument.")

if 'df' in locals():
    # ---划分数据集 ---
    train_data, val_data, test_data = split_data(
        df=df, 
        smiles_column=args.smiles_column, 
        target_column=args.target_column, 
        split_type=args.split_type, 
        split_sizes=args.split_sizes
    )
    
print('------------------------')

Successfully loaded data from 'data/mrsa.csv'. Total molecules: 39312


Unnamed: 0,SMILES,ACTIVITY
0,Nc1nnc(o1)-c1ccc(o1)[N+](=O)[O-],1
1,O[C@H]1COC[C@@H]2O[C@H](CC[C@H]2N(C1)C(=O)Nc1c...,1
2,CC(C)C[C@@H](N)C(=O)N[C@@H]1[C@H](O)c2ccc(c(c2...,1
3,[O-][N+](=O)c1ccc(o1)/C=N/N1CC(=O)NC1=O,1
4,Cn1cnc(c1)CCNC(=O)C[C@@H]1CC[C@@H]2[C@H](COC[C...,1


Splitting data with method: 'scaffold'


[23:47:33] Unusual charge on atom 0 number of radical electrons set to zero
[23:47:40] Explicit valence for atom # 16 Al, 6, is greater than permitted


Data split sizes: Train=31344, Validation=4010, Test=3957
------------------------


## 4. 训练并获取最佳模型路径

In [5]:
best_model_path = run_training(args, train_data, val_data)
print('------------------------')


Starting training...


Epoch 1/10:  46%|████▌     | 452/980 [01:58<02:15,  3.90it/s, loss=0.0108][22:44:26] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 1/10: 100%|██████████| 980/980 [04:14<00:00,  3.85it/s, loss=0.0111]


Epoch 01 | Train Loss: 0.0863 | Val AUC: 0.7873
  -> New best model saved with AUC: 0.7873


Epoch 2/10:  76%|███████▌  | 740/980 [02:52<00:52,  4.54it/s, loss=0.1342][22:49:52] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 2/10: 100%|██████████| 980/980 [03:44<00:00,  4.36it/s, loss=0.0125]


Epoch 02 | Train Loss: 0.0738 | Val AUC: 0.7621


Epoch 3/10:  70%|███████   | 688/980 [02:32<01:07,  4.31it/s, loss=0.1021][22:53:30] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 3/10: 100%|██████████| 980/980 [03:42<00:00,  4.40it/s, loss=0.0067]


Epoch 03 | Train Loss: 0.0707 | Val AUC: 0.8011
  -> New best model saved with AUC: 0.8011


Epoch 4/10:  54%|█████▍    | 530/980 [02:07<01:41,  4.42it/s, loss=0.0225][22:57:00] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 4/10: 100%|██████████| 980/980 [03:47<00:00,  4.31it/s, loss=0.0090]


Epoch 04 | Train Loss: 0.0692 | Val AUC: 0.8205
  -> New best model saved with AUC: 0.8205


Epoch 5/10:  74%|███████▍  | 723/980 [02:51<01:02,  4.13it/s, loss=0.1368][23:01:44] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 5/10: 100%|██████████| 980/980 [03:56<00:00,  4.15it/s, loss=0.0075]


Epoch 05 | Train Loss: 0.0676 | Val AUC: 0.8221
  -> New best model saved with AUC: 0.8221


Epoch 6/10:  54%|█████▍    | 528/980 [02:09<02:06,  3.57it/s, loss=0.0171][23:05:10] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 6/10: 100%|██████████| 980/980 [04:05<00:00,  4.00it/s, loss=0.0200]


Epoch 06 | Train Loss: 0.0669 | Val AUC: 0.8349
  -> New best model saved with AUC: 0.8349


Epoch 7/10:  48%|████▊     | 473/980 [02:01<02:07,  3.97it/s, loss=0.0059][23:09:20] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 7/10: 100%|██████████| 980/980 [04:13<00:00,  3.87it/s, loss=0.0352]


Epoch 07 | Train Loss: 0.0660 | Val AUC: 0.8416
  -> New best model saved with AUC: 0.8416


Epoch 8/10:  85%|████████▌ | 834/980 [03:31<00:37,  3.94it/s, loss=0.0686][23:15:15] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 8/10: 100%|██████████| 980/980 [04:07<00:00,  3.96it/s, loss=0.0156]


Epoch 08 | Train Loss: 0.0642 | Val AUC: 0.8512
  -> New best model saved with AUC: 0.8512


Epoch 9/10:  71%|███████   | 697/980 [02:45<01:06,  4.27it/s, loss=0.1583][23:18:48] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 9/10: 100%|██████████| 980/980 [03:51<00:00,  4.23it/s, loss=0.0090]


Epoch 09 | Train Loss: 0.0634 | Val AUC: 0.8515
  -> New best model saved with AUC: 0.8515


Epoch 10/10:  22%|██▏       | 215/980 [00:50<02:52,  4.43it/s, loss=0.0114][23:20:54] Unusual charge on atom 0 number of radical electrons set to zero
Epoch 10/10: 100%|██████████| 980/980 [03:47<00:00,  4.31it/s, loss=0.0180]


Epoch 10 | Train Loss: 0.0630 | Val AUC: 0.8511
Training finished.
------------------------


## 5. 使用最佳模型进行测试

In [4]:
best_model_path = 'mrsa_save_model/best_model.ckpt'
run_testing(args, test_data, best_model_path)
print('------------------------')


Starting final testing...
Loaded best model from: mindspore_mrsa_model/best_model.ckpt


Testing: 100%|██████████| 124/124 [00:20<00:00,  6.16it/s]



===== Final Test Results =====
  Test Set AUC:                 0.6448
  Test Set Accuracy:            0.9952
  Test Set Average Precision:   0.0153
Evaluation summary plot saved to mrsa_plot/testset_performance_evaluation_curves.png
------------------------
