This command installs the necessary libraries *(numpy, pandas, scikit-learn, and matplotlib)*.



### Step 1: Set Up the Environment
First, ensure you have all the required libraries installed. You can do this by running the following command in your terminal or directly in a Jupyter Notebook cell:

### Step 2: Generate Synthetic Data
Next, generate the synthetic dataset that will simulate the oil extraction process.

In [26]:
import numpy as np
import pandas as pd

# Set the random seed for reproducibility
np.random.seed(42)

# Number of samples
n_samples = 1000

# Generate synthetic features
temperature = np.random.uniform(50, 100, n_samples)  # Temperature in degrees Celsius
pressure = np.random.uniform(5, 15, n_samples)       # Pressure in MPa
time = np.random.uniform(30, 120, n_samples)         # Time in minutes

# Simulate oil yield based on the features with some added noise
oil_yield = (0.5 * temperature) + (0.8 * pressure) + (0.3 * time) + np.random.normal(0, 5, n_samples)

# Combine into a DataFrame
data = pd.DataFrame({
    'Temperature': temperature,
    'Pressure': pressure,
    'Time': time,
    'Oil_Yield': oil_yield
})

# Display the first few rows of the synthetic data
data.head(11)


Unnamed: 0,Temperature,Pressure,Time,Oil_Yield
0,68.727006,6.851329,53.553512,61.210302
1,97.535715,10.419009,52.228092,75.856523
2,86.599697,13.729458,111.562912,91.170136
3,79.932924,12.322249,52.459158,58.73223
4,57.800932,13.065611,54.475475,61.755318
5,57.799726,11.587834,98.345844,68.980136
6,52.904181,11.922766,70.476586,55.286893
7,93.308807,13.491957,99.90395,88.136096
8,80.055751,7.49668,35.882954,47.908929
9,85.403629,9.89425,73.881407,74.8249


#### What this does:

Generates random values for temperature, pressure, and time.
Computes a synthetic oil yield with some added noise to make the data more realistic.
Stores the generated data in a Pandas DataFrame.

### Step 3: Preprocess the Data
Normalize the dataset to ensure all features are on a similar scale, which is important for many machine learning algorithms.

In [27]:
from sklearn.preprocessing import MinMaxScaler

# Initialize the scaler
scaler = MinMaxScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

# Convert the scaled data back to a DataFrame
scaled_data = pd.DataFrame(scaled_data, columns=data.columns)

# Display the first few rows of the scaled data
scaled_data.head(11)


Unnamed: 0,Temperature,Pressure,Time,Oil_Yield
0,0.371735,0.182609,0.262269,0.412167
1,0.950755,0.54074,0.247509,0.633126
2,0.730954,0.873049,0.908233,0.864155
3,0.59696,0.731791,0.250082,0.374781
4,0.152134,0.806411,0.272535,0.420389
5,0.15211,0.658069,0.761054,0.529386
6,0.053716,0.69169,0.450716,0.322803
7,0.865799,0.849208,0.778404,0.818382
8,0.599429,0.247391,0.065498,0.211496
9,0.706915,0.488064,0.48863,0.617563


#### What this does:

Scales all the features in the dataset to a range between 0 and 1 using MinMaxScaler.
Displays the first few rows of the scaled dataset.

### Step 4: Design the Reinforcement Learning Model
Now, we’ll define the action space and a simple reward function. This function will simulate the impact of actions on the oil yield.

In [29]:
import random

# Define the action space (adjustments in temperature, pressure, and time)
actions = ['increase_temp', 'decrease_temp', 'increase_pressure', 'decrease_pressure', 'increase_time', 'decrease_time']

#  Reward function
def reward_function(current_state, action):
    """
    Simulate the effect of an action on the oil yield.
    """
    temperature, pressure, time = current_state

    # Adjust state based on action
    if action == 'increase_temp':
        temperature += 2
    elif action == 'decrease_temp':
        temperature -= 2
    elif action == 'increase_pressure':
        pressure += 1
    elif action == 'decrease_pressure':
        pressure -= 1
    elif action == 'increase_time':
        time += 5
    elif action == 'decrease_time':
        time -= 5

    # Calculate the new oil yield
    oil_yield = (0.5 * temperature) + (0.8 * pressure) + (0.3 * time) + np.random.normal(0, 5)

    # Reward is proportional to oil yield (simplified)
    reward = oil_yield

    return np.array([temperature, pressure, time]), reward


#### What this does:

Defines a list of possible actions the reinforcement learning agent can take.
Provides a reward_function that calculates the new state and reward after an action is taken.

### Step 5: Train the Model
Implement a basic Q-learning algorithm and train it using the synthetic data.

In [22]:
# Initialize the Q-table
q_table = np.zeros((1000, len(actions)))  # 1000 can accommodate the largest index (999)

# Discretize the state space and flatten it into a single index
def discretize_state(state):
    discrete_state = (state * 0.1).astype(int)  # Adjusted to fit within [0,9]
    return discrete_state[0] * 100 + discrete_state[1] * 10 + discrete_state[2]

# Training loop remains the same


In [23]:
# Initialize the Q-table with appropriate dimensions
state_dimensions = [10, 10, 10]  # Discretized state has 10 possible values per dimension
q_table = np.zeros((*state_dimensions, len(actions)))  # Shape: (10, 10, 10, len(actions))

# Discretize the state space
def discretize_state(state):
    return tuple((state * 0.1).astype(int))  # Adjusted to fit within [0,9]

# Training loop remains the same


In [24]:
import numpy as np
import random

# Define the action space
actions = ['increase_temp', 'decrease_temp', 'increase_pressure', 'decrease_pressure', 'increase_time', 'decrease_time']

# Initialize the Q-table
state_size = 10  # Number of possible values after discretization per feature
q_table = np.zeros((state_size ** 3, len(actions)))  # 3 features, hence state_size^3

# Set learning parameters
alpha = 0.1  # Learning rate
gamma = 0.9  # Discount factor
epsilon = 0.1  # Exploration rate

# Discretize the state space for Q-learning
def discretize_state(state, state_min, state_max):
    """
    Normalize and discretize the continuous state into a single index.
    """
    normalized_state = (state - state_min) / (state_max - state_min)
    discrete_state = (normalized_state * (state_size - 1)).astype(int)
    discrete_state = np.clip(discrete_state, 0, state_size - 1)
    return np.ravel_multi_index(discrete_state, (state_size, state_size, state_size))

# Define the state ranges
state_min = np.array([50, 5, 30])
state_max = np.array([100, 15, 120])

# Simplified reward function
def reward_function(state, action):
    # Simpler state update and reward calculation
    new_state = state + np.random.randint(-2, 3, size=state.shape)  # Smaller random change
    reward = -np.sum(np.abs(new_state - np.array([75, 10, 75])))  # Reward is higher for states closer to (75, 10, 75)
    return new_state, reward

# Training loop with reduced iterations
for episode in range(100):  # Reduced from 1000 to 100 episodes
    state = np.array([50, 10, 60])  # Start from a baseline state
    for _ in range(10):  # Reduced from 100 to 10 steps per episode
        state_discrete = discretize_state(state, state_min, state_max)
        if random.uniform(0, 1) < epsilon:
            action_idx = random.randint(0, len(actions) - 1)  # Explore
        else:
            action_idx = np.argmax(q_table[state_discrete])  # Exploit

        action = actions[action_idx]
        new_state, reward = reward_function(state, action)
        new_state_discrete = discretize_state(new_state, state_min, state_max)

        # Q-learning update rule
        q_table[state_discrete][action_idx] = (1 - alpha) * q_table[state_discrete][action_idx] + alpha * (reward + gamma * np.max(q_table[new_state_discrete]))

        state = new_state  # Move to the new state

    if episode % 10 == 0:
        print(f'Episode {episode} complete.')


Episode 0 complete.
Episode 10 complete.
Episode 20 complete.
Episode 30 complete.
Episode 40 complete.
Episode 50 complete.
Episode 60 complete.
Episode 70 complete.
Episode 80 complete.
Episode 90 complete.


#### What this does:

Initializes a Q-table and sets learning parameters.
Trains the model using Q-learning, with episodes where the agent takes actions and updates the Q-table based on the rewards received.

### Step 6: Test the Model
After training, test the model to evaluate its performance.

In [25]:
# Testing the trained model
state = np.array([55, 12, 75])  # Start from a new initial state
total_reward = 0

for _ in range(10):  # Limit the number of steps to match the training setup
    state_discrete = discretize_state(state, state_min, state_max)  # Ensure state discretization matches training
    action_idx = np.argmax(q_table[state_discrete])  # Always exploit during testing
    action = actions[action_idx]
    new_state, reward = reward_function(state, action)
    total_reward += reward
    state = new_state

print(f"Total reward during testing: {total_reward}")


Total reward during testing: -212


#### What this does:

Tests the model by starting from a new initial state and seeing how well it optimizes the oil extraction process.
Outputs the total reward, which reflects how well the model performed.