# Reinforcement Learning Integration with NeqSim

This notebook demonstrates how **NeqSim** can be embedded into reinforcement learning (RL) workflows for process control and optimization.

- [Reinforcement learning (Wikipedia)](https://en.wikipedia.org/wiki/Reinforcement_learning)
- [OpenAI Gym](https://www.gymlibrary.dev/)
- [Introductory RL video](https://www.youtube.com/watch?v=2pWv7GOvuf0)


## Birth of Reinforcement Learning

The origins of RL trace back to early work on trial-and-error learning in the 1950s and the formalization of temporal-difference methods by Sutton and Barto in the 1980s.
The goal is to learn a policy $\pi$ that maximizes the expected discounted reward

$$ J(\pi) = \mathbb{E}_{\pi} \left[ \sum_{t=0}^{\infty} \gamma^t r_t \right] $$

where $\gamma$ is a discount factor and $r_t$ is the reward at time $t$.


## Embedding NeqSim in an RL Environment

By wrapping NeqSim simulations inside an [OpenAI Gym](https://www.gymlibrary.dev/) interface, agents can interact with a simulated process to learn control policies.


In [ ]:
%%capture
!pip install neqsim gym matplotlib


In [ ]:
import gym
from gym import spaces
import numpy as np
from neqsim import jneqsim

class NeqSimValveEnv(gym.Env):
    """Simple valve pressure control environment using NeqSim."""
    metadata = {'render.modes': []}
    def __init__(self):
        super().__init__()
        self.action_space = spaces.Box(low=0.0, high=100.0, shape=(1,), dtype=np.float32)
        self.observation_space = spaces.Box(low=0.0, high=10.0, shape=(1,), dtype=np.float32)
        self.target = 2.0
        fluid = jneqsim.thermo.system.SystemSrkEos(298.15, 10.0)
        fluid.addComponent('methane',0.9)
        fluid.addComponent('ethane',0.1)
        fluid.addComponent('n-heptane',1.0)
        fluid.setMixingRule('classic')
        feed = jneqsim.process.equipment.stream.Stream('Feed', fluid)
        feed.setFlowRate(50.0, 'kg/hr')
        feed.setPressure(10.0, 'bara')
        self.valve = jneqsim.process.equipment.valve.ThrottlingValve('valve', feed)
        self.valve.setOutletPressure(1.0)
        self.valve.setCalculateSteadyState(False)
        self.trans = jneqsim.process.measurementdevice.PressureTransmitter(self.valve.getOutletStream())
        self.trans.setUnit('bar')
        self.process = jneqsim.process.processmodel.ProcessSystem()
        self.process.add(feed); self.process.add(self.valve); self.process.add(self.trans)
        self.reset()
    def step(self, action):
        self.valve.setPercentValveOpening(float(action[0]))
        self.process.run()
        p = self.trans.getMeasuredValue()
        reward = -(p - self.target)**2
        obs = np.array([p], dtype=np.float32)
        done = False
        return obs, reward, done, {}
    def reset(self):
        self.valve.setPercentValveOpening(50.0)
        self.process.run()
        p = self.trans.getMeasuredValue()
        return np.array([p], dtype=np.float32)

# Demonstrate random interaction
env = NeqSimValveEnv()
obs = env.reset()
for _ in range(5):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    print(f'pressure {obs[0]:.2f} bar, reward {reward:.2f}')


## Reward Design with Thermodynamic Insights

NeqSim's detailed thermodynamic calculations enable reward functions that capture operational objectives such as energy efficiency, cost, and safety. A generic reward can be expressed as

$$ r = -\alpha E - \beta C + \gamma S $$

where $E$ is energy usage, $C$ is operating cost, and $S$ is a safety metric computed from NeqSim outputs. Selecting suitable weights $\alpha$, $\beta$, and $\gamma$ guides the RL agent toward efficient and safe operation.
