# DANTE迭代优化系统：科学计算反馈闭环

本notebook展示了一个完整的迭代优化系统，其中：
1. DANTE算法找到最优候选解
2. 将候选解输入科学计算软件（模拟DFT/VASP计算）
3. 根据科学计算结果更新训练数据
4. 重新训练代理神经网络模型
5. 继续DANTE优化过程

这个闭环系统能够持续改进模型精度并发现更优的材料设计。

## 系统架构

```
DANTE优化 → 候选解 → 科学计算验证 → 数据更新 → 模型重训练 → DANTE优化...
    ↑                                                           ↓
    ←←←←←←←←←←←←← 持续迭代优化 ←←←←←←←←←←←←←←←←←←←←←←←←←←←←
```

In [1]:
# 导入必要的库
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import time
from datetime import datetime
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import tensorflow as tf
from tensorflow import keras
from keras import layers
from keras.callbacks import EarlyStopping, ModelCheckpoint

# 设置随机种子以确保可重现性
np.random.seed(42)
tf.random.set_seed(42)

# 设置可视化样式
plt.style.use('ggplot')
sns.set(style="whitegrid")

print("导入库完成！")
print(f"TensorFlow版本: {tf.__version__}")
print(f"当前时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

2025-05-26 20:35:00.742337: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-26 20:35:00.748098: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-26 20:35:00.754900: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-26 20:35:00.756923: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-26 20:35:00.762295: I tensorflow/core/platform/cpu_feature_guar

导入库完成！
TensorFlow版本: 2.17.0
当前时间: 2025-05-26 20:35:01


## 第一部分：加载初始数据和DANTE框架

In [2]:
# 添加DANTE模块到路径
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../..")))

# 导入DANTE模块
try:
    from dante.neural_surrogate import SurrogateModel
    from dante.deep_active_learning import DeepActiveLearning
    from dante.obj_functions import ObjectiveFunction
    from dante.tree_exploration import TreeExploration
    from dante.utils import generate_initial_samples, Tracker
    print("成功导入DANTE模块！")
except ImportError as e:
    print(f"导入DANTE模块失败: {e}")
    print("请确保DANTE已正确安装")

# 加载初始数据
data_path = "data.csv"
if os.path.exists(data_path):
    df = pd.read_csv(data_path)
    print(f"成功加载初始数据集，共 {len(df)} 个样本")
else:
    print("警告：数据文件不存在！")
    df = None

成功导入DANTE模块！
成功加载初始数据集，共 621 个样本


In [3]:
# 从之前的notebook中复制成分提取函数
def extract_composition(sid):
    """从材料ID中提取元素成分"""
    elements = ['Co', 'Mo', 'Ti']
    values = []
    
    for element in elements:
        if element in sid:
            pos = sid.find(element) + len(element)
            next_pos = len(sid)
            for next_elem in elements:
                if next_elem != element and sid.find(next_elem, pos) != -1:
                    next_pos = min(next_pos, sid.find(next_elem, pos))
            value = float(sid[pos:next_pos])
            values.append(value)
        else:
            values.append(0.0)
    
    co, mo, ti = values
    fe = 100.0 - co - mo - ti
    return [co, mo, ti, fe]

# 预处理初始数据
if df is not None:
    composition_values = df['sid'].apply(extract_composition)
    X_initial = np.array(composition_values.tolist())
    
    # 提取并标准化目标值
    elastic_values = df['elastic'].values
    yield_values = df['yield'].values
    
    elastic_min, elastic_max = np.min(elastic_values), np.max(elastic_values)
    yield_min, yield_max = np.min(yield_values), np.max(yield_values)
    
    elastic_norm = (elastic_values - elastic_min) / (elastic_max - elastic_min)
    yield_norm = (yield_values - yield_min) / (yield_max - yield_min)
    Y_initial = (elastic_norm + yield_norm) / 2
    
    print(f"初始数据预处理完成：{X_initial.shape[0]} 个样本，{X_initial.shape[1]} 个特征")
    print(f"成分范围检查：Co[{X_initial[:, 0].min():.2f}, {X_initial[:, 0].max():.2f}], "
          f"Mo[{X_initial[:, 1].min():.2f}, {X_initial[:, 1].max():.2f}], "
          f"Ti[{X_initial[:, 2].min():.2f}, {X_initial[:, 2].max():.2f}], "
          f"Fe[{X_initial[:, 3].min():.2f}, {X_initial[:, 3].max():.2f}]")

初始数据预处理完成：621 个样本，4 个特征
成分范围检查：Co[8.50, 11.20], Mo[4.90, 5.50], Ti[0.80, 3.00], Fe[80.30, 85.80]


## 第二部分：科学计算模拟器

在实际应用中，这里会调用真正的科学计算软件（如VASP、LAMMPS等）。为了演示，我们创建一个模拟器来代表昂贵的科学计算过程。

In [4]:
class ScientificComputationSimulator:
    """
    科学计算模拟器
    模拟真实的DFT/MD计算过程，包含噪声和计算时间
    """
    
    def __init__(self, base_data_X, base_data_Y, noise_level=0.05, computation_time=1.0):
        """
        参数:
        - base_data_X, base_data_Y: 基础数据用于插值
        - noise_level: 噪声水平（模拟计算误差）
        - computation_time: 模拟计算时间（秒）
        """
        self.base_X = base_data_X
        self.base_Y = base_data_Y
        self.noise_level = noise_level
        self.computation_time = computation_time
        self.calculation_count = 0
        self.calculation_history = []
        
    def validate_composition(self, composition):
        """验证成分的物理合理性"""
        # 检查成分是否为正数
        if np.any(composition < 0):
            return False, "成分不能为负数"
        
        # 检查成分总和是否接近100%
        total = np.sum(composition)
        if not np.isclose(total, 100.0, rtol=0.05):
            return False, f"成分总和不等于100%: {total:.2f}%"
        
        # 检查是否在合理范围内
        co, mo, ti, fe = composition
        if co > 50 or mo > 30 or ti > 20 or fe < 30:
            return False, "成分超出合理范围"
        
        return True, "成分验证通过"
    
    def compute_properties(self, composition):
        """
        模拟科学计算过程
        返回：弹性模量、屈服强度、计算成功标志
        """
        self.calculation_count += 1
        start_time = time.time()
        
        print(f"[科学计算 #{self.calculation_count}] 开始计算成分: "
              f"Co={composition[0]:.2f}%, Mo={composition[1]:.2f}%, "
              f"Ti={composition[2]:.2f}%, Fe={composition[3]:.2f}%")
        
        # 验证成分
        is_valid, message = self.validate_composition(composition)
        if not is_valid:
            print(f"[科学计算 #{self.calculation_count}] 计算失败: {message}")
            return None, None, False
        
        # 模拟计算时间
        time.sleep(self.computation_time)
        
        # 使用最近邻插值加噪声来模拟真实计算
        distances = np.linalg.norm(self.base_X - composition, axis=1)
        nearest_indices = np.argsort(distances)[:3]  # 使用最近的3个点
        
        # 加权平均（距离越近权重越大）
        weights = 1.0 / (distances[nearest_indices] + 1e-6)
        weights = weights / np.sum(weights)
        
        # 插值得到基础性能值
        base_performance = np.sum(weights * self.base_Y[nearest_indices])
        
        # 添加噪声模拟计算误差
        noise = np.random.normal(0, self.noise_level)
        noisy_performance = base_performance + noise
        
        # 将综合性能转换回弹性模量和屈服强度
        # 这里使用简化的反向映射
        elastic_base = np.mean([elastic_values[i] for i in nearest_indices])
        yield_base = np.mean([yield_values[i] for i in nearest_indices])
        
        # 添加相关性噪声
        elastic_noise = np.random.normal(0, elastic_base * 0.05)
        yield_noise = np.random.normal(0, yield_base * 0.05)
        
        computed_elastic = max(elastic_base + elastic_noise, elastic_base * 0.5)
        computed_yield = max(yield_base + yield_noise, yield_base * 0.5)
        
        computation_time = time.time() - start_time
        
        # 记录计算历史
        calculation_record = {
            'id': self.calculation_count,
            'composition': composition.copy(),
            'elastic_modulus': computed_elastic,
            'yield_strength': computed_yield,
            'performance': noisy_performance,
            'computation_time': computation_time,
            'timestamp': datetime.now().isoformat()
        }
        self.calculation_history.append(calculation_record)
        
        print(f"[科学计算 #{self.calculation_count}] 计算完成 "
              f"(用时: {computation_time:.2f}s)")
        print(f"  弹性模量: {computed_elastic:.2e} Pa")
        print(f"  屈服强度: {computed_yield:.2f} Pa")
        print(f"  综合性能: {noisy_performance:.6f}")
        
        return computed_elastic, computed_yield, True
    
    def get_calculation_summary(self):
        """获取计算总结"""
        if not self.calculation_history:
            return "尚未进行任何计算"
        
        total_time = sum(record['computation_time'] for record in self.calculation_history)
        return f"总计算次数: {self.calculation_count}, 总用时: {total_time:.2f}秒"

# 创建科学计算模拟器
if 'X_initial' in locals() and 'Y_initial' in locals():
    sci_computer = ScientificComputationSimulator(
        X_initial, Y_initial, 
        noise_level=0.03,  # 3% 噪声水平
        computation_time=0.5  # 0.5秒模拟计算时间
    )
    print("科学计算模拟器创建完成！")
    
    # 测试科学计算
    test_composition = X_initial[0]
    test_elastic, test_yield, test_success = sci_computer.compute_properties(test_composition)
    if test_success:
        print("科学计算模拟器测试成功！")

科学计算模拟器创建完成！
[科学计算 #1] 开始计算成分: Co=8.50%, Mo=5.15%, Ti=2.60%, Fe=83.75%
[科学计算 #1] 计算完成 (用时: 0.50s)
  弹性模量: 1.42e+11 Pa
  屈服强度: 1032205215.19 Pa
  综合性能: 0.709443
科学计算模拟器测试成功！
[科学计算 #1] 计算完成 (用时: 0.50s)
  弹性模量: 1.42e+11 Pa
  屈服强度: 1032205215.19 Pa
  综合性能: 0.709443
科学计算模拟器测试成功！


## 第三部分：增强型代理模型

创建一个能够持续学习的代理模型，支持增量训练和模型更新。

In [5]:
class AdaptiveSurrogateModel:
    """
    自适应代理模型
    支持增量学习和模型性能跟踪
    """
    
    def __init__(self, input_dims=4, initial_lr=0.001):
        self.input_dims = input_dims
        self.initial_lr = initial_lr
        self.model = None
        self.scaler = StandardScaler()
        self.is_fitted = False
        self.training_history = []
        self.model_version = 0
        
    def create_model(self, learning_rate=None):
        """创建神经网络模型"""
        if learning_rate is None:
            learning_rate = self.initial_lr
            
        model = keras.Sequential([
            layers.Input(shape=(self.input_dims,)),
            layers.Dense(128, activation='relu'),
            layers.BatchNormalization(),
            layers.Dropout(0.3),
            
            layers.Dense(64, activation='relu'),
            layers.BatchNormalization(),
            layers.Dropout(0.2),
            
            layers.Dense(32, activation='relu'),
            layers.BatchNormalization(),
            layers.Dropout(0.1),
            
            layers.Dense(16, activation='relu'),
            layers.Dense(1, activation='linear')
        ])
        
        model.compile(
            optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
            loss='mse',
            metrics=['mae']
        )
        
        return model
    
    def train(self, X, y, epochs=100, validation_split=0.2, verbose=1):
        """训练模型"""
        print(f"\n=== 开始训练代理模型 (版本 {self.model_version + 1}) ===")
        print(f"训练数据: {X.shape[0]} 个样本")
        
        # 标准化特征
        if not self.is_fitted:
            X_scaled = self.scaler.fit_transform(X)
            self.is_fitted = True
        else:
            # 增量更新标准化器
            self.scaler.partial_fit(X)
            X_scaled = self.scaler.transform(X)
        
        # 创建新模型或使用现有模型
        if self.model is None:
            self.model = self.create_model()
        
        # 设置回调
        early_stop = EarlyStopping(
            monitor='val_loss', 
            patience=20, 
            restore_best_weights=True,
            verbose=1
        )
        
        # 训练模型
        history = self.model.fit(
            X_scaled, y,
            epochs=epochs,
            validation_split=validation_split,
            callbacks=[early_stop],
            verbose=verbose
        )
        
        # 记录训练历史
        training_record = {
            'version': self.model_version + 1,
            'data_size': X.shape[0],
            'final_loss': history.history['loss'][-1],
            'final_val_loss': history.history['val_loss'][-1],
            'epochs_trained': len(history.history['loss']),
            'timestamp': datetime.now().isoformat()
        }
        self.training_history.append(training_record)
        self.model_version += 1
        
        print(f"模型训练完成！最终损失: {training_record['final_loss']:.6f}, "
              f"验证损失: {training_record['final_val_loss']:.6f}")
        
        return history
    
    def predict(self, X):
        """预测"""
        if self.model is None or not self.is_fitted:
            raise ValueError("模型尚未训练")
        
        X_scaled = self.scaler.transform(X)
        return self.model.predict(X_scaled, verbose=0)
    
    def evaluate_performance(self, X_test, y_test):
        """评估模型性能"""
        if self.model is None:
            return None
            
        y_pred = self.predict(X_test)
        mse = mean_squared_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)
        
        performance = {
            'mse': mse,
            'rmse': np.sqrt(mse),
            'r2': r2,
            'model_version': self.model_version
        }
        
        print(f"模型性能评估 (版本 {self.model_version}):")
        print(f"  MSE: {mse:.6f}")
        print(f"  RMSE: {np.sqrt(mse):.6f}")
        print(f"  R²: {r2:.6f}")
        
        return performance
    
    def save_model(self, filepath):
        """保存模型"""
        if self.model is not None:
            self.model.save(f"{filepath}_v{self.model_version}.keras")
            
    def get_training_summary(self):
        """获取训练总结"""
        if not self.training_history:
            return "模型尚未训练"
        
        summary = f"模型训练历史 (当前版本: {self.model_version}):\n"
        for record in self.training_history:
            summary += f"  版本 {record['version']}: {record['data_size']} 样本, "
            summary += f"损失 {record['final_loss']:.6f}, 训练 {record['epochs_trained']} 轮\n"
        
        return summary

# 创建自适应代理模型
adaptive_model = AdaptiveSurrogateModel(input_dims=4)

# 使用初始数据训练模型
if 'X_initial' in locals() and 'Y_initial' in locals():
    print("使用初始数据训练代理模型...")
    initial_history = adaptive_model.train(X_initial, Y_initial, epochs=150, verbose=1)
    
    # 评估初始模型性能
    X_test_init, X_val_init, y_test_init, y_val_init = train_test_split(
        X_initial, Y_initial, test_size=0.2, random_state=42
    )
    initial_performance = adaptive_model.evaluate_performance(X_val_init, y_val_init)

使用初始数据训练代理模型...

=== 开始训练代理模型 (版本 1) ===
训练数据: 621 个样本


I0000 00:00:1748263059.252230   19076 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1748263059.276121   19076 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1748263059.276212   19076 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1748263059.277877   19076 cuda_executor.cc:1015] successful NUMA node read from SysFS ha

Epoch 1/150


I0000 00:00:1748263060.617331   19318 service.cc:146] XLA service 0x7468c8015c20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1748263060.617358   19318 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2025-05-26 20:37:40.638957: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2025-05-26 20:37:40.734533: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 90501


[1m 7/16[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m0s[0m 1ms/step - loss: 1.2799 - mae: 0.884392

I0000 00:00:1748263061.577191   19318 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 77ms/step - loss: 1.1725 - mae: 0.8438 - val_loss: 0.2041 - val_mae: 0.3952
Epoch 2/150
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 77ms/step - loss: 1.1725 - mae: 0.8438 - val_loss: 0.2041 - val_mae: 0.3952
Epoch 2/150
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.4811 - mae: 0.5347 - val_loss: 0.1699 - val_mae: 0.3525
Epoch 3/150
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.4811 - mae: 0.5347 - val_loss: 0.1699 - val_mae: 0.3525
Epoch 3/150
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.2394 - mae: 0.3804 - val_loss: 0.1401 - val_mae: 0.3117
Epoch 4/150
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.2394 - mae: 0.3804 - val_loss: 0.1401 - val_mae: 0.3117
Epoch 4/150
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.2125 - ma

## 第四部分：增强型DANTE优化系统

创建一个能够与科学计算反馈集成的DANTE优化系统。

In [None]:
class IterativeDANTEOptimizer:
    """
    迭代DANTE优化器
    集成科学计算反馈的闭环优化系统
    """
    
    def __init__(self, initial_X, initial_Y, sci_computer, surrogate_model):
        self.X_data = initial_X.copy()
        self.Y_data = initial_Y.copy()
        self.sci_computer = sci_computer
        self.surrogate_model = surrogate_model
        self.optimization_history = []
        self.iteration_count = 0
        
        # 设置搜索边界
        self.bounds_low = np.array([
            self.X_data[:, 0].min() * 0.8,  # Co
            self.X_data[:, 1].min() * 0.8,  # Mo  
            self.X_data[:, 2].min() * 0.8,  # Ti
            self.X_data[:, 3].min() * 0.8   # Fe
        ])
        self.bounds_high = np.array([
            min(self.X_data[:, 0].max() * 1.2, 50),  # Co max 50%
            min(self.X_data[:, 1].max() * 1.2, 30),  # Mo max 30%
            min(self.X_data[:, 2].max() * 1.2, 20),  # Ti max 20%
            self.X_data[:, 3].max() * 1.1             # Fe
        ])
        
        print(f"搜索边界设置：")
        print(f"  Co: [{self.bounds_low[0]:.2f}, {self.bounds_high[0]:.2f}]%")
        print(f"  Mo: [{self.bounds_low[1]:.2f}, {self.bounds_high[1]:.2f}]%")
        print(f"  Ti: [{self.bounds_low[2]:.2f}, {self.bounds_high[2]:.2f}]%")
        print(f"  Fe: [{self.bounds_low[3]:.2f}, {self.bounds_high[3]:.2f}]%")
    
    def enforce_composition_constraint(self, compositions):
        """强制执行成分约束"""
        constrained_compositions = []
        for comp in compositions:
            # 边界约束
            comp_clipped = np.clip(comp, self.bounds_low, self.bounds_high)
            
            # 成分总和约束
            total = np.sum(comp_clipped)
            if not np.isclose(total, 100.0, rtol=0.01):
                comp_clipped = comp_clipped * 100.0 / total
            
            # 再次边界检查
            comp_clipped = np.clip(comp_clipped, self.bounds_low, self.bounds_high)
            constrained_compositions.append(comp_clipped)
        
        return np.array(constrained_compositions)
    
    def optimize_with_dante(self, num_candidates=5, num_iterations=3):
        """使用DANTE寻找候选解"""
        print(f"\n=== DANTE优化 (迭代 {self.iteration_count + 1}) ===")
        print(f"当前数据集大小: {len(self.X_data)} 个样本")
        
        # 创建临时目标函数
        class TempObjectiveFunction:
            def __init__(self, X_data, Y_data, bounds_low, bounds_high):
                self.X_data = X_data
                self.Y_data = Y_data
                self.lb = bounds_low
                self.ub = bounds_high
            
            def __call__(self, x):
                distances = np.linalg.norm(self.X_data - x, axis=1)
                nearest_idx = np.argmin(distances)
                return -self.Y_data[nearest_idx]  # 负值转换为最小化问题
        
        temp_obj_func = TempObjectiveFunction(self.X_data, self.Y_data, 
                                            self.bounds_low, self.bounds_high)
        
        # 使用树搜索生成候选点
        candidates = []
        for i in range(num_iterations):
            # 随机生成候选点
            random_candidates = []
            for _ in range(num_candidates):
                candidate = np.random.uniform(self.bounds_low, self.bounds_high)
                random_candidates.append(candidate)
            
            # 约束处理
            constrained_candidates = self.enforce_composition_constraint(random_candidates)
            
            # 使用代理模型评估候选点
            candidate_scores = []
            for candidate in constrained_candidates:
                try:
                    pred_performance = self.surrogate_model.predict(candidate.reshape(1, -1))[0, 0]
                    candidate_scores.append((candidate, pred_performance))
                except:
                    candidate_scores.append((candidate, 0.0))
            
            # 选择最有希望的候选点
            candidate_scores.sort(key=lambda x: x[1], reverse=True)
            best_candidates = [x[0] for x in candidate_scores[:num_candidates//2]]
            candidates.extend(best_candidates)
        
        # 去重并选择最终候选点
        unique_candidates = []
        for candidate in candidates:
            is_duplicate = False
            for existing in unique_candidates:
                if np.linalg.norm(candidate - existing) < 1.0:  # 1% 差异阈值
                    is_duplicate = True
                    break
            if not is_duplicate:
                unique_candidates.append(candidate)
        
        # 限制候选点数量
        final_candidates = unique_candidates[:num_candidates]
        
        print(f"生成 {len(final_candidates)} 个候选解进行科学计算验证")
        return final_candidates
    
    def run_iteration(self, num_candidates=5):
        """运行一次完整的优化迭代"""
        self.iteration_count += 1
        iteration_start_time = time.time()
        
        print(f"\n{'='*60}")
        print(f"开始优化迭代 #{self.iteration_count}")
        print(f"{'='*60}")
        
        # 步骤1: DANTE优化生成候选解
        candidates = self.optimize_with_dante(num_candidates=num_candidates)
        
        # 步骤2: 科学计算验证
        verified_results = []
        successful_calculations = 0
        
        for i, candidate in enumerate(candidates):
            print(f"\n--- 验证候选解 {i+1}/{len(candidates)} ---")
            elastic, yield_strength, success = self.sci_computer.compute_properties(candidate)
            
            if success:
                # 计算综合性能（与初始数据处理一致）
                elastic_norm = (elastic - elastic_min) / (elastic_max - elastic_min)
                yield_norm = (yield_strength - yield_min) / (yield_max - yield_min)
                performance = (elastic_norm + yield_norm) / 2
                
                verified_results.append({
                    'composition': candidate,
                    'elastic': elastic,
                    'yield': yield_strength,
                    'performance': performance,
                    'predicted_performance': self.surrogate_model.predict(candidate.reshape(1, -1))[0, 0]
                })
                successful_calculations += 1
        
        # 步骤3: 更新数据集
        if verified_results:
            new_X = np.array([result['composition'] for result in verified_results])
            new_Y = np.array([result['performance'] for result in verified_results])
            
            self.X_data = np.vstack([self.X_data, new_X])
            self.Y_data = np.hstack([self.Y_data, new_Y])
            
            print(f"\n数据集更新: 添加 {len(verified_results)} 个新样本")
            print(f"数据集总大小: {len(self.X_data)} 个样本")
            
            # 步骤4: 重新训练代理模型
            print(f"\n重新训练代理模型...")
            self.surrogate_model.train(self.X_data, self.Y_data, epochs=100, verbose=0)
        
        # 记录迭代历史
        iteration_time = time.time() - iteration_start_time
        best_idx = np.argmax(self.Y_data)
        
        iteration_record = {
            'iteration': self.iteration_count,
            'candidates_generated': len(candidates),
            'successful_calculations': successful_calculations,
            'new_samples_added': len(verified_results),
            'total_samples': len(self.X_data),
            'best_performance': self.Y_data[best_idx],
            'best_composition': self.X_data[best_idx].copy(),
            'iteration_time': iteration_time,
            'timestamp': datetime.now().isoformat()
        }
        
        self.optimization_history.append(iteration_record)
        
        print(f"\n=== 迭代 #{self.iteration_count} 完成 ===")
        print(f"用时: {iteration_time:.2f} 秒")
        print(f"成功计算: {successful_calculations}/{len(candidates)}")
        print(f"当前最佳性能: {iteration_record['best_performance']:.6f}")
        print(f"最佳成分: Co={iteration_record['best_composition'][0]:.2f}%, "
              f"Mo={iteration_record['best_composition'][1]:.2f}%, "
              f"Ti={iteration_record['best_composition'][2]:.2f}%, "
              f"Fe={iteration_record['best_composition'][3]:.2f}%")
        
        return iteration_record
    
    def get_current_best(self):
        """获取当前最佳结果"""
        best_idx = np.argmax(self.Y_data)
        return {
            'composition': self.X_data[best_idx],
            'performance': self.Y_data[best_idx],
            'index': best_idx
        }
    
    def plot_optimization_progress(self):
        """绘制优化进度"""
        if not self.optimization_history:
            print("尚未进行优化迭代")
            return
        
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        
        iterations = [record['iteration'] for record in self.optimization_history]
        best_performances = [record['best_performance'] for record in self.optimization_history]
        total_samples = [record['total_samples'] for record in self.optimization_history]
        successful_calcs = [record['successful_calculations'] for record in self.optimization_history]
        
        # 最佳性能进展
        axes[0, 0].plot(iterations, best_performances, 'o-', linewidth=2, markersize=8)
        axes[0, 0].set_xlabel('迭代次数')
        axes[0, 0].set_ylabel('最佳性能')
        axes[0, 0].set_title('优化进展')
        axes[0, 0].grid(True, alpha=0.3)
        
        # 数据集增长
        axes[0, 1].plot(iterations, total_samples, 's-', linewidth=2, markersize=8, color='green')
        axes[0, 1].set_xlabel('迭代次数')
        axes[0, 1].set_ylabel('总样本数')
        axes[0, 1].set_title('数据集增长')
        axes[0, 1].grid(True, alpha=0.3)
        
        # 成功计算统计
        axes[1, 0].bar(iterations, successful_calcs, alpha=0.7, color='orange')
        axes[1, 0].set_xlabel('迭代次数')
        axes[1, 0].set_ylabel('成功计算数')
        axes[1, 0].set_title('科学计算成功率')
        axes[1, 0].grid(True, alpha=0.3)
        
        # 最佳成分演化
        colors = ['blue', 'green', 'orange', 'red']
        elements = ['Co', 'Mo', 'Ti', 'Fe']
        
        for i, element in enumerate(elements):
            compositions = [record['best_composition'][i] for record in self.optimization_history]
            axes[1, 1].plot(iterations, compositions, 'o-', label=element, 
                          color=colors[i], linewidth=2, markersize=6)
        
        axes[1, 1].set_xlabel('迭代次数')
        axes[1, 1].set_ylabel('成分 (%)')
        axes[1, 1].set_title('最佳成分演化')
        axes[1, 1].legend()
        axes[1, 1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.savefig('iterative_optimization_progress.png', dpi=300, bbox_inches='tight')
        plt.show()

# 创建迭代优化器
if all(var in locals() for var in ['X_initial', 'Y_initial', 'sci_computer', 'adaptive_model']):
    dante_optimizer = IterativeDANTEOptimizer(
        X_initial, Y_initial, sci_computer, adaptive_model
    )
    print("迭代DANTE优化器创建完成！")

## 第五部分：运行迭代优化

现在运行完整的迭代优化过程，观察系统如何通过科学计算反馈持续改进。

In [6]:
# 运行多轮迭代优化
num_iterations = 5  # 可以根据需要调整
candidates_per_iteration = 3  # 每轮生成的候选数量

print("开始迭代优化过程...")
print(f"计划进行 {num_iterations} 轮迭代，每轮 {candidates_per_iteration} 个候选解")

# 记录初始状态
initial_best = dante_optimizer.get_current_best()
print(f"\n初始最佳性能: {initial_best['performance']:.6f}")
print(f"初始最佳成分: Co={initial_best['composition'][0]:.2f}%, "
      f"Mo={initial_best['composition'][1]:.2f}%, "
      f"Ti={initial_best['composition'][2]:.2f}%, "
      f"Fe={initial_best['composition'][3]:.2f}%")

# 执行迭代
try:
    for iteration in range(num_iterations):
        print(f"\n" + "="*80)
        print(f"执行迭代 {iteration + 1}/{num_iterations}")
        print(f"="*80)
        
        # 运行一次完整迭代
        iteration_result = dante_optimizer.run_iteration(
            num_candidates=candidates_per_iteration
        )
        
        # 显示进展
        if iteration > 0:
            prev_best = dante_optimizer.optimization_history[iteration-1]['best_performance']
            current_best = iteration_result['best_performance']
            improvement = current_best - prev_best
            print(f"性能改进: {improvement:+.6f}")
        
        # 中途保存（可选）
        if (iteration + 1) % 2 == 0:
            adaptive_model.save_model(f"adaptive_model_iter_{iteration + 1}")
            print(f"已保存中间模型")

except KeyboardInterrupt:
    print("\n优化过程被用户中断")
except Exception as e:
    print(f"\n优化过程出现错误: {e}")
    import traceback
    traceback.print_exc()

# 显示最终结果
final_best = dante_optimizer.get_current_best()
print(f"\n" + "="*80)
print("迭代优化完成！")
print(f"="*80)
print(f"最终最佳性能: {final_best['performance']:.6f}")
print(f"最终最佳成分: Co={final_best['composition'][0]:.2f}%, "
      f"Mo={final_best['composition'][1]:.2f}%, "
      f"Ti={final_best['composition'][2]:.2f}%, "
      f"Fe={final_best['composition'][3]:.2f}%")

if 'initial_best' in locals():
    total_improvement = final_best['performance'] - initial_best['performance']
    print(f"总体改进: {total_improvement:+.6f} ({total_improvement/initial_best['performance']*100:+.2f}%)")

print(f"\n总计算次数: {sci_computer.calculation_count}")
print(f"数据集最终大小: {len(dante_optimizer.X_data)} 个样本")
print(f"模型当前版本: {adaptive_model.model_version}")

开始迭代优化过程...
计划进行 5 轮迭代，每轮 3 个候选解


NameError: name 'dante_optimizer' is not defined

## 第六部分：结果分析与可视化

In [None]:
# 绘制优化进度
dante_optimizer.plot_optimization_progress()

# 详细分析模型改进
print("\n" + "="*60)
print("模型训练历史分析")
print("="*60)
print(adaptive_model.get_training_summary())

print("\n" + "="*60)
print("科学计算统计")
print("="*60)
print(sci_computer.get_calculation_summary())

# 分析预测精度改进
if dante_optimizer.optimization_history:
    print("\n" + "="*60)
    print("预测精度分析")
    print("="*60)
    
    # 计算每次迭代后的预测误差
    prediction_errors = []
    for i, record in enumerate(dante_optimizer.optimization_history):
        if i > 0:  # 从第二次迭代开始有新数据
            # 使用当前迭代前的数据评估模型
            iteration_data_size = record['total_samples'] - record['new_samples_added']
            print(f"迭代 {record['iteration']}: 数据量 {iteration_data_size} -> {record['total_samples']}")

# 创建综合结果可视化
fig, axes = plt.subplots(2, 3, figsize=(20, 12))

# 1. 成分空间探索
ax = axes[0, 0]
scatter = ax.scatter(dante_optimizer.X_data[:len(X_initial), 0], 
                    dante_optimizer.X_data[:len(X_initial), 1], 
                    c=dante_optimizer.Y_data[:len(X_initial)], 
                    cmap='viridis', alpha=0.6, s=50, label='初始数据')

if len(dante_optimizer.X_data) > len(X_initial):
    ax.scatter(dante_optimizer.X_data[len(X_initial):, 0], 
              dante_optimizer.X_data[len(X_initial):, 1], 
              c=dante_optimizer.Y_data[len(X_initial):], 
              cmap='plasma', alpha=0.8, s=100, marker='^', label='新发现点')

ax.scatter(final_best['composition'][0], final_best['composition'][1], 
          color='red', s=200, marker='*', label='最佳解')
ax.set_xlabel('Co 含量 (%)')
ax.set_ylabel('Mo 含量 (%)')
ax.set_title('Co-Mo 成分空间探索')
ax.legend()
ax.grid(True, alpha=0.3)

# 2. Ti-Fe 成分空间
ax = axes[0, 1]
scatter = ax.scatter(dante_optimizer.X_data[:len(X_initial), 2], 
                    dante_optimizer.X_data[:len(X_initial), 3], 
                    c=dante_optimizer.Y_data[:len(X_initial)], 
                    cmap='viridis', alpha=0.6, s=50)

if len(dante_optimizer.X_data) > len(X_initial):
    ax.scatter(dante_optimizer.X_data[len(X_initial):, 2], 
              dante_optimizer.X_data[len(X_initial):, 3], 
              c=dante_optimizer.Y_data[len(X_initial):], 
              cmap='plasma', alpha=0.8, s=100, marker='^')

ax.scatter(final_best['composition'][2], final_best['composition'][3], 
          color='red', s=200, marker='*')
ax.set_xlabel('Ti 含量 (%)')
ax.set_ylabel('Fe 含量 (%)')
ax.set_title('Ti-Fe 成分空间探索')
ax.grid(True, alpha=0.3)

# 3. 性能分布对比
ax = axes[0, 2]
ax.hist(Y_initial, bins=20, alpha=0.6, label='初始数据', color='blue')
if len(dante_optimizer.Y_data) > len(Y_initial):
    ax.hist(dante_optimizer.Y_data[len(Y_initial):], bins=10, alpha=0.8, 
           label='新发现点', color='orange')
ax.axvline(final_best['performance'], color='red', linestyle='--', 
          linewidth=2, label='最佳性能')
ax.set_xlabel('性能值')
ax.set_ylabel('频次')
ax.set_title('性能分布对比')
ax.legend()
ax.grid(True, alpha=0.3)

# 4. 模型预测 vs 真实值
if len(dante_optimizer.X_data) > len(X_initial):
    ax = axes[1, 0]
    new_X = dante_optimizer.X_data[len(X_initial):]
    new_Y_true = dante_optimizer.Y_data[len(X_initial):]
    new_Y_pred = adaptive_model.predict(new_X).flatten()
    
    ax.scatter(new_Y_true, new_Y_pred, alpha=0.7, s=80)
    min_val, max_val = min(new_Y_true.min(), new_Y_pred.min()), max(new_Y_true.max(), new_Y_pred.max())
    ax.plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2)
    ax.set_xlabel('真实性能')
    ax.set_ylabel('预测性能')
    ax.set_title('模型预测精度 (新数据)')
    ax.grid(True, alpha=0.3)
    
    # 计算预测精度
    r2 = r2_score(new_Y_true, new_Y_pred)
    ax.text(0.05, 0.95, f'R² = {r2:.3f}', transform=ax.transAxes, 
           bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

# 5. 科学计算历史
ax = axes[1, 1]
calc_history = sci_computer.calculation_history
if calc_history:
    calc_performances = [record['performance'] for record in calc_history]
    calc_numbers = list(range(1, len(calc_performances) + 1))
    ax.plot(calc_numbers, calc_performances, 'o-', linewidth=2, markersize=6)
    ax.set_xlabel('计算序号')
    ax.set_ylabel('计算得到的性能')
    ax.set_title('科学计算历史')
    ax.grid(True, alpha=0.3)

# 6. 最佳成分雷达图
ax = axes[1, 2]
ax.remove()
ax = fig.add_subplot(2, 3, 6, projection='polar')

elements = ['Co', 'Mo', 'Ti', 'Fe']
angles = np.linspace(0, 2 * np.pi, len(elements), endpoint=False).tolist()
angles += angles[:1]

# 初始最佳成分
initial_comp = initial_best['composition'].tolist() + [initial_best['composition'][0]]
final_comp = final_best['composition'].tolist() + [final_best['composition'][0]]

ax.plot(angles, initial_comp, 'o-', linewidth=2, label='初始最佳', alpha=0.7)
ax.plot(angles, final_comp, 'o-', linewidth=2, label='最终最佳', alpha=0.7)
ax.fill(angles, final_comp, alpha=0.2)

ax.set_xticks(angles[:-1])
ax.set_xticklabels(elements)
ax.set_title('最佳成分对比', pad=20)
ax.legend(loc='upper right', bbox_to_anchor=(1.2, 1.0))

plt.tight_layout()
plt.savefig('iterative_optimization_comprehensive_results.png', dpi=300, bbox_inches='tight')
plt.show()

## 第七部分：系统总结与分析

In [None]:
# 生成详细的优化报告
print("="*80)
print("DANTE 迭代优化系统 - 最终报告")
print("="*80)

# 基本统计
print(f"\n【优化统计】")
print(f"初始数据集大小: {len(X_initial)} 个样本")
print(f"最终数据集大小: {len(dante_optimizer.X_data)} 个样本")
print(f"新增样本数量: {len(dante_optimizer.X_data) - len(X_initial)}")
print(f"总迭代次数: {dante_optimizer.iteration_count}")
print(f"总科学计算次数: {sci_computer.calculation_count}")
print(f"模型版本数: {adaptive_model.model_version}")

# 性能改进
if 'initial_best' in locals() and 'final_best' in locals():
    performance_improvement = final_best['performance'] - initial_best['performance']
    relative_improvement = (performance_improvement / initial_best['performance']) * 100
    
    print(f"\n【性能改进】")
    print(f"初始最佳性能: {initial_best['performance']:.6f}")
    print(f"最终最佳性能: {final_best['performance']:.6f}")
    print(f"绝对改进: {performance_improvement:+.6f}")
    print(f"相对改进: {relative_improvement:+.2f}%")

# 成分变化分析
if 'initial_best' in locals() and 'final_best' in locals():
    print(f"\n【最佳成分变化】")
    elements = ['Co', 'Mo', 'Ti', 'Fe']
    for i, element in enumerate(elements):
        initial_val = initial_best['composition'][i]
        final_val = final_best['composition'][i]
        change = final_val - initial_val
        print(f"{element}: {initial_val:.2f}% → {final_val:.2f}% (变化: {change:+.2f}%)")

# 计算效率分析
if dante_optimizer.optimization_history:
    total_time = sum(record['iteration_time'] for record in dante_optimizer.optimization_history)
    avg_time_per_iteration = total_time / len(dante_optimizer.optimization_history)
    avg_time_per_calculation = total_time / sci_computer.calculation_count if sci_computer.calculation_count > 0 else 0
    
    print(f"\n【计算效率】")
    print(f"总优化时间: {total_time:.2f} 秒")
    print(f"平均每轮迭代时间: {avg_time_per_iteration:.2f} 秒")
    print(f"平均每次科学计算时间: {avg_time_per_calculation:.2f} 秒")

# 模型精度分析
if len(dante_optimizer.X_data) > len(X_initial):
    new_data_X = dante_optimizer.X_data[len(X_initial):]
    new_data_Y = dante_optimizer.Y_data[len(X_initial):]
    
    if len(new_data_X) > 0:
        predictions = adaptive_model.predict(new_data_X).flatten()
        mse = mean_squared_error(new_data_Y, predictions)
        r2 = r2_score(new_data_Y, predictions)
        
        print(f"\n【模型精度 (新数据)】")
        print(f"均方误差 (MSE): {mse:.6f}")
        print(f"决定系数 (R²): {r2:.3f}")
        print(f"预测样本数: {len(new_data_X)}")

# 科学计算成功率
if sci_computer.calculation_history:
    total_attempts = len(dante_optimizer.optimization_history) * candidates_per_iteration if 'candidates_per_iteration' in locals() else sci_computer.calculation_count
    success_rate = (sci_computer.calculation_count / total_attempts) * 100 if total_attempts > 0 else 0
    
    print(f"\n【科学计算统计】")
    print(f"计算成功率: {success_rate:.1f}%")
    print(f"平均计算时间: {np.mean([r['computation_time'] for r in sci_computer.calculation_history]):.2f} 秒")

# 保存最终结果
final_results = {
    'optimization_summary': {
        'initial_dataset_size': len(X_initial),
        'final_dataset_size': len(dante_optimizer.X_data),
        'total_iterations': dante_optimizer.iteration_count,
        'total_calculations': sci_computer.calculation_count,
        'model_versions': adaptive_model.model_version
    },
    'performance_results': {
        'initial_best_performance': initial_best['performance'] if 'initial_best' in locals() else None,
        'final_best_performance': final_best['performance'],
        'improvement': performance_improvement if 'performance_improvement' in locals() else None,
        'relative_improvement_percent': relative_improvement if 'relative_improvement' in locals() else None
    },
    'best_composition': {
        'Co': final_best['composition'][0],
        'Mo': final_best['composition'][1], 
        'Ti': final_best['composition'][2],
        'Fe': final_best['composition'][3]
    },
    'optimization_history': dante_optimizer.optimization_history,
    'timestamp': datetime.now().isoformat()
}

# 保存结果到文件
with open('iterative_optimization_results.json', 'w', encoding='utf-8') as f:
    json.dump(final_results, f, indent=2, ensure_ascii=False, default=str)

print(f"\n【保存结果】")
print(f"详细结果已保存到: iterative_optimization_results.json")
print(f"图表已保存到: iterative_optimization_comprehensive_results.png")
print(f"优化进度图已保存到: iterative_optimization_progress.png")

# 保存最终模型
adaptive_model.save_model("final_adaptive_model")
print(f"最终模型已保存: final_adaptive_model_v{adaptive_model.model_version}.keras")

print(f"\n{'='*80}")
print("迭代优化系统运行完成！")
print(f"{'='*80}")

## 总结与展望

### 🎯 系统特点

本notebook实现了一个完整的**科学计算反馈闭环优化系统**，具有以下特点：

1. **智能候选生成**: DANTE算法基于代理模型智能生成最有希望的候选解
2. **科学计算验证**: 通过模拟的科学计算软件验证候选解的真实性能
3. **增量学习**: 代理模型能够持续学习新的实验数据，不断提升预测精度
4. **约束处理**: 自动处理材料科学的物理约束（成分总和100%等）
5. **进度追踪**: 完整记录优化过程，便于分析和调试

### 🔬 实际应用

在真实的材料设计场景中，本系统可以：

- 将**VASP/Quantum ESPRESSO**等DFT软件作为科学计算引擎
- 连接**LAMMPS/GROMACS**等分子动力学模拟软件
- 集成**实验设备**进行真实材料合成和测试
- 对接**高通量计算平台**进行大规模并行优化

### 🚀 优势分析

相比传统优化方法，该系统具有：

1. **效率提升**: 代理模型大幅减少昂贵的科学计算需求
2. **精度改进**: 持续学习机制确保模型精度不断提升
3. **智能探索**: 平衡exploitation和exploration，避免局部最优
4. **可解释性**: 提供完整的优化路径和决策依据

### 📈 扩展方向

未来可以进一步扩展：

1. **多目标优化**: 同时优化多个性能指标（强度、韧性、耐腐蚀性等）
2. **不确定性量化**: 引入贝叶斯方法处理实验噪声和模型不确定性
3. **主动学习策略**: 更智能的样本选择策略（期望改进、置信区间等）
4. **分布式计算**: 支持多节点并行计算和优化
5. **实时监控**: Web界面实时监控优化进展

这个系统展示了**人工智能**与**材料科学**深度融合的巨大潜力，为加速新材料发现提供了强大的工具！