# Reinforcement Learning in Material Synthesis Optimization

This simulation bridges the advanced domains of materials science and artificial intelligence, particularly through the lens of reinforcement learning (RL). It offers an interactive environment to explore how RL can be leveraged to fine-tune the conditions under which materials are synthesized, aiming for optimal properties such as hardness and conductivity.

## Understanding RL in This Context

Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment to achieve a goal. In the context of material synthesis, the RL agent's objective is to discover the optimal set of synthesis parameters that result in materials with desired properties.

### The RL Framework

- **Agent**: In our simulation, the RL agent represents the decision-making system that iterates over various synthesis conditions.
- **Environment**: The environment is the simulated world of material synthesis, providing feedback on the synthesized material's properties based on the agent's actions.
- **States**: Each state represents a specific set of material properties and synthesis conditions at a given time.
- **Actions**: Actions are the possible adjustments the agent can make to the synthesis parameters.
- **Rewards**: Rewards are given based on how close the synthesized material's properties are to the desired targets.

### Core Features

- **Interactive Controls**: Allows you to set desired material properties, offering a hands-on experience with the optimization process.
- **Real-Time Visualization**: Utilizes Plotly for dynamic visualization of the agent's decision-making and the evolving material properties.
- **Step-by-Step Engagement**: Offers a granular view of the RL optimization process, demonstrating the adjustments made towards achieving the synthesis goals.

## Getting Started

To dive into this simulation:

1. Ensure Python and Streamlit are installed.
2. Install required libraries: `pip install streamlit plotly numpy`.
3. Run the simulation: `streamlit run material_synthesis_demo.py`.

The application will open in your browser, ready for an immersive exploration.

## The "Almost Real-Time" Experience

The simulation provides an "almost real-time" view of the optimization process. This design choice allows for interactive and deliberate exploration, making complex RL concepts accessible and engaging for materials researchers. It emphasizes the iterative nature of RL, where each action and its outcome can be scrutinized and understood in detail.


In [None]:
import gym
from gym import spaces
import numpy as np

class MaterialSynthesisEnv(gym.Env):
    """
    Custom Environment for Material Synthesis optimization using RL.
    Follows the gym interface.
    """
    metadata = {'render.modes': ['console']}

    def __init__(self, target_properties):
        super(MaterialSynthesisEnv, self).__init__()

        # Define action and state space
        # Example: actions could be increasing/decreasing temperature, pressure, etc.
        # States could be current conditions and material properties
        self.action_space = spaces.Discrete(4) # Example: 0: no change, 1: increase temp, 2: decrease temp, 3: change concentration
        self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([100, 100]), dtype=np.float32) # Example: temperature and pressure

        self.target_properties = target_properties
        self.current_state = None

    def reset(self):
        """
        Important: the observation must be a numpy array
        :return: (np.array)
        """
        # Reset the state of the environment to an initial state
        self.current_state = np.array([25.0, 1.0]) # Example initial condition
        return self.current_state

    def step(self, action):
        if action == 1:
            self.current_state[0] += 5  # Increase temp
        elif action == 2:
            self.current_state[0] -= 5  # Decrease temp
        elif action == 3:
            self.current_state[1] += 0.1  # Change concentration

        # Implement your simulation or experimental data feedback here
        # For simplicity, let's assume a dummy reward function
        reward = -np.linalg.norm(self.current_state - self.target_properties)
        done = bool(np.linalg.norm(self.current_state - self.target_properties) < 1)

        return self.current_state, reward, done, {}

    def render(self, mode='console'):
        if mode != 'console':
            raise NotImplementedError()
        # Render the environment to the console
        print(f"Current State: {self.current_state}")

    def close(self):
        pass


In [None]:
!pip install stable-baselines3[extra]


Collecting stable-baselines3[extra]
  Downloading stable_baselines3-2.3.0-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.1/182.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gymnasium<0.30,>=0.28.1 (from stable-baselines3[extra])
  Downloading gymnasium-0.29.1-py3-none-any.whl (953 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m953.9/953.9 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
Collecting shimmy[atari]~=1.3.0 (from stable-baselines3[extra])
  Downloading Shimmy-1.3.0-py3-none-any.whl (37 kB)
Collecting autorom[accept-rom-license]~=0.6.1 (from stable-baselines3[extra])
  Downloading AutoROM-0.6.1-py3-none-any.whl (9.4 kB)
Collecting AutoROM.accept-rom-license (from autorom[accept-rom-license]~=0.6.1->stable-baselines3[extra])
  Downloading AutoROM.accept-rom-license-0.6.1.tar.gz (434 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m434.7/434.7 kB[0m [31m17.7 MB/s[0m eta [

In [None]:
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

# Assuming the environment MaterialSynthesisEnv is already defined as per the previous code snippet.

# Initialize the environment with the desired target properties
target_properties = np.array([50.0, 0.5])  # Example target properties
env = MaterialSynthesisEnv(target_properties=target_properties)

# Optionally, wrap it in a vectorized environment for parallel execution
env = make_vec_env(lambda: env, n_envs=1)

# Instantiate the model with Multi-Layer Perceptron policy
model = PPO("MlpPolicy", env, verbose=1)

# Train the model
model.learn(total_timesteps=10000)

# Save the trained model for later use
model.save("ppo_material_synthesis_optimization")



  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")


Using cpu device


  deprecation(


-----------------------------
| time/              |      |
|    fps             | 1274 |
|    iterations      | 1    |
|    time_elapsed    | 1    |
|    total_timesteps | 2048 |
-----------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 966          |
|    iterations           | 2            |
|    time_elapsed         | 4            |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0025590463 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.39        |
|    explained_variance   | -4.99e-05    |
|    learning_rate        | 0.0003       |
|    loss                 | 3.04e+06     |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.00129     |
|    value_loss           | 6.16e+06     |
------------------------------------------
----------------

In [None]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.33.0-py2.py3-none-any.whl (8.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.1/8.1 MB[0m [31m45.1 MB/s[0m eta [36m0:00:00[0m
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.3/207.3 kB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m72.6 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-4.0.0-py3-none-manylinux2014_x86_64.whl (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.0/83.0 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3.0.7->streamlit)
  Downloading gitdb-4

In [None]:
!pip install streamlit matplotlib seaborn



In [None]:
%%writefile material_synthesis_demo.py
import streamlit as st
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Enhanced mock function to simulate model prediction and provide a history of adjustments
def simulate_optimization(target_properties, steps=10):
    history = []
    conditions = np.array([20.0, 20.0])  # Initial conditions

    for _ in range(steps):
        # Simulate adjustment towards target properties
        adjustment = np.random.normal(0, 2, size=2)
        conditions += adjustment * (target_properties - conditions) / 10
        history.append(conditions.copy())

    return conditions, np.array(history)

# Plotting function
def plot_optimization_history(history, target_properties):
    fig, axs = plt.subplots(2, 1, figsize=(10, 8))
    time = np.arange(history.shape[0])

    for i, label in enumerate(["Hardness", "Conductivity"]):
        axs[i].plot(time, history[:, i], label="Optimized Value")
        axs[i].hline(target_properties[i], color='r', linestyle='--', label="Target Value")
        axs[i].set_ylabel(label)
        axs[i].legend()
        axs[i].grid(True)

    plt.xlabel("Optimization Step")
    st.pyplot(fig)

# Streamlit UI Enhancement
st.title('Advanced Material Synthesis Optimization Demo')

st.markdown("""
This interactive demo simulates the optimization of synthesis conditions for materials using reinforcement learning.
Adjust the sliders to set your target material properties, and watch the algorithm optimize towards these goals.
""")

# Input sliders for target properties
target_hardness = st.slider('Target Hardness', min_value=0.0, max_value=100.0, value=50.0, step=0.5)
target_conductivity = st.slider('Target Conductivity', min_value=0.0, max_value=100.0, value=50.0, step=0.5)

target_properties = np.array([target_hardness, target_conductivity])

# Button to perform optimization
if st.button('Optimize Synthesis Conditions'):
    optimal_conditions, history = simulate_optimization(target_properties)

    st.write("### Optimal Conditions:")
    st.write(f"- Hardness: {optimal_conditions[0]:.2f}")
    st.write(f"- Conductivity: {optimal_conditions[1]:.2f}")

    st.write("### Optimization Process Visualization")
    plot_optimization_history(history, target_properties)

# Additional Information
st.markdown("""
## How It Works
- The algorithm simulates an RL agent's decision-making process to adjust synthesis conditions towards the target properties.
- The optimization process is visualized to show the dynamic adjustments and convergence towards the target conditions.
""")


Writing material_synthesis_demo.py


In [None]:
%%writefile material_synthesis.py
import streamlit as st
import numpy as np
import plotly.graph_objects as go

# Enhanced mock function to simulate model prediction and provide a history of adjustments
def simulate_optimization(target_properties, steps=10):
    history = []
    conditions = np.array([20.0, 20.0])  # Initial conditions

    for _ in range(steps):
        # Simulate adjustment towards target properties
        adjustment = np.random.normal(0, 2, size=2)
        conditions += adjustment * (target_properties - conditions) / 10
        history.append(conditions.copy())

    return conditions, np.array(history)

# Plotting function using Plotly
def plot_optimization_history(history, target_properties):
    steps = list(range(history.shape[0]))
    fig = go.Figure()

    # Adding Hardness trace
    fig.add_trace(go.Scatter(x=steps, y=history[:, 0], mode='lines+markers', name='Hardness'))
    fig.add_hline(y=target_properties[0], line=dict(color="red", width=2, dash="dash"), name="Target Hardness")

    # Adding Conductivity trace
    fig.add_trace(go.Scatter(x=steps, y=history[:, 1], mode='lines+markers', name='Conductivity'))
    fig.add_hline(y=target_properties[1], line=dict(color="green", width=2, dash="dash"), name="Target Conductivity")

    # Enhancing the layout
    fig.update_layout(title="Optimization Process Visualization",
                      xaxis_title="Optimization Step",
                      yaxis_title="Property Value",
                      legend_title="Properties")
    st.plotly_chart(fig, use_container_width=True)

# Streamlit UI Enhancement
st.title('Advanced Material Synthesis Optimization Demo with Interactive Visuals')

st.markdown("""
This interactive demo simulates the optimization of synthesis conditions for materials using reinforcement learning, visualized with Plotly for an engaging user experience. Adjust the sliders to set your target material properties, and observe how the algorithm optimizes towards these goals.
""")

# Input sliders for target properties
target_hardness = st.slider('Target Hardness', min_value=0.0, max_value=100.0, value=50.0, step=0.5)
target_conductivity = st.slider('Target Conductivity', min_value=0.0, max_value=100.0, value=50.0, step=0.5)

target_properties = np.array([target_hardness, target_conductivity])

# Button to perform optimization
if st.button('Optimize Synthesis Conditions'):
    optimal_conditions, history = simulate_optimization(target_properties)

    st.write("### Optimal Conditions:")
    st.write(f"- Hardness: {optimal_conditions[0]:.2f}")
    st.write(f"- Conductivity: {optimal_conditions[1]:.2f}")

    st.write("### Optimization Process Visualization")
    plot_optimization_history(history, target_properties)

st.markdown("""
## How It Works
- The algorithm simulates an RL agent's decision-making process to adjust synthesis conditions towards the target properties.
- The optimization process is visualized with Plotly, showing the dynamic adjustments and convergence towards the target conditions.
""")


Writing material_synthesis.py


In [None]:
%%writefile material.py
import streamlit as st
import numpy as np
import plotly.graph_objects as go

# Initialize session state if it's not already initialized
if 'history' not in st.session_state:
    st.session_state.history = []
if 'target_properties' not in st.session_state:
    st.session_state.target_properties = np.array([50.0, 50.0])  # Default target properties

# Simulate an optimization step
def optimization_step():
    if len(st.session_state.history) == 0:
        conditions = np.array([20.0, 20.0])  # Initial conditions
    else:
        conditions = st.session_state.history[-1] + np.random.normal(0, 2, size=2)  # Random step

    st.session_state.history.append(conditions)
    plot_optimization_history()

# Plotting function using Plotly, modified to use session state for history
def plot_optimization_history():
    history = np.array(st.session_state.history)
    steps = list(range(len(history)))
    fig = go.Figure()

    # Adding traces for Hardness and Conductivity
    fig.add_trace(go.Scatter(x=steps, y=history[:, 0], mode='lines+markers', name='Hardness'))
    fig.add_trace(go.Scatter(x=steps, y=history[:, 1], mode='lines+markers', name='Conductivity'))

    # Target lines
    fig.add_hline(y=st.session_state.target_properties[0], line=dict(color="red", width=2, dash="dash"), name="Target Hardness")
    fig.add_hline(y=st.session_state.target_properties[1], line=dict(color="green", width=2, dash="dash"), name="Target Conductivity")

    # Layout adjustments
    fig.update_layout(title="Optimization Process Visualization in 'Real-Time'",
                      xaxis_title="Step",
                      yaxis_title="Property Value",
                      legend_title="Properties")
    st.plotly_chart(fig, use_container_width=True)

st.title("Real-Time RL Optimization Simulation")

# Target property inputs
st.session_state.target_properties[0] = st.slider('Target Hardness', min_value=0.0, max_value=100.0, value=50.0, step=0.5)
st.session_state.target_properties[1] = st.slider('Target Conductivity', min_value=0.0, max_value=100.0, value=50.0, step=0.5)

# Button to perform optimization step
if st.button('Perform Optimization Step'):
    optimization_step()

# Reset button to clear the history and start over
if st.button('Reset Optimization'):
    st.session_state.history = []


Writing material.py
