# Load Balancing Optimization

    ## Load Balancing Optimization
    This notebook focuses on optimizing load distribution using Q-learning.
    
    Steps:
    1. Set up Q-learning with states, actions, and rewards.
    2. Train the Q-learning model for load balancing.
    3. Compare Q-learning with a rule-based optimization method.
    

The notebook I provided in the previous message sets up the **Q-learning model** and **rule-based optimization**, but it **does not yet fully implement training the Q-learning model** on your dataset for **load balancing** and **compare** the performance of the two models under **real-time conditions** (e.g., load surges or EV charging).

Let’s **extend** the notebook to include:

1. **Training the Q-learning model on the dataset** for **load balancing**.
2. **Comparing Q-learning** with **rule-based optimization** under various conditions.
3. **Simulating real-time grid behavior** (load surges, EV charging) and evaluating the performance.

---

### **Updated 3rd Notebook: `03_load_balancing_optimization.ipynb`**

#### **Markdown Cell**:

```markdown
# Load Balancing Optimization

In this notebook, we use Q-learning to optimize the distribution of power load across different sub-metering zones (Sub_metering_1, Sub_metering_2, Sub_metering_3).

Steps:
1. Set up the Q-learning environment with states, actions, and rewards.
2. Train the Q-learning model to optimize load distribution.
3. Compare the performance of Q-learning with a rule-based optimization approach.
4. Test the model under real-time conditions (e.g., load surges, EV charging).
```





In [8]:
!pip install stable-baselines3[extra] gym

Collecting stable-baselines3[extra]
  Downloading stable_baselines3-2.6.0-py3-none-any.whl (184 kB)
Collecting gym
  Downloading gym-0.26.2.tar.gz (721 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting gymnasium<1.2.0,>=0.29.1
  Downloading gymnasium-1.1.1-py3-none-any.whl (965 kB)
Collecting torch<3.0,>=2.3
  Downloading torch-2.7.0-cp310-cp310-win_amd64.whl (212.5 MB)
Collecting cloudpickle
  Using cached cloudpickle-3.1.1-py3-none-any.whl (20 kB)
Collecting pygame
  Downloading pygame-2.6.1-cp310-cp310-win_amd64.whl (10.6 MB)
Collecting tensorboard>=2.9.1
  Downloading tensorboard-2.19.0-py3-none-any.whl (5.5 MB)
Collecting opencv-python
  Downloading opencv_python-4.11.0.86-cp37-abi3-win_amd64.whl (39.5 MB)
Co

You should consider upgrading via the 'C:\Users\Ibrah\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [13]:
!pip install shimmy>=2.0

You should consider upgrading via the 'C:\Users\Ibrah\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [2]:
import numpy as np
import random
import pandas as pd

# Load dataset and preprocess states (as before)
file_path = 'preprocessed_power_consumption.csv'
df = pd.read_csv(file_path, index_col='Timestamp')

def categorize_voltage(voltage):
    if voltage < 225:
        return 'low_voltage'
    elif 225 <= voltage <= 240:
        return 'normal_voltage'
    else:
        return 'high_voltage'

def categorize_load(active_power):
    if active_power < 3:
        return 'low_load'
    elif 3 <= active_power <= 6:
        return 'medium_load'
    else:
        return 'high_load'

df['Voltage_State'] = df['Voltage'].apply(categorize_voltage)
df['Load_State'] = df['Global_active_power'].apply(categorize_load)
df['State'] = df['Voltage_State'] + '_' + df['Load_State']

# Map states to numerical indices
state_labels = df['State'].unique().tolist()
state_to_idx = {state: idx for idx, state in enumerate(state_labels)}
actions = ['adjust_submetering1', 'adjust_submetering2', 'adjust_submetering3']

# Initialize Q-table
Q = np.zeros((len(state_labels), len(actions)))

# Simulate state transitions
def take_action(state, action):
    current_load = df[df['State'] == state]['Global_active_power'].mean()
    current_voltage = df[df['State'] == state]['Voltage'].mean()
    
    # Hypothetical load adjustment
    if action == 'adjust_submetering1':
        new_load = current_load - 0.5
    elif action == 'adjust_submetering2':
        new_load = current_load + 0.3
    elif action == 'adjust_submetering3':
        new_load = current_load - 0.2
    
    new_voltage = current_voltage - 0.05 * (new_load - current_load)
    new_v_state = categorize_voltage(new_voltage)
    new_l_state = categorize_load(new_load)
    return f"{new_v_state}_{new_l_state}"

# Reward function with terminal state logic
def refined_reward(current_state, next_state):
    if 'normal_voltage' in next_state and 'medium_load' in next_state:
        return 10  # Terminal state: balanced
    elif 'low_voltage' in next_state or 'high_voltage' in next_state:
        return -5  # Penalize voltage extremes
    elif 'high_load' in next_state:
        return -2  # Penalize overload
    else:
        return 1  # Small reward for progress

# Hyperparameters with epsilon decay
epsilon = 1.0  # Start with high exploration
epsilon_min = 0.01
epsilon_decay = 0.995
learning_rate = 0.1
discount_factor = 0.9
episodes = 1000
max_steps_per_episode = 100  # Terminate after 100 steps



In [None]:
# Training loop with convergence checks
for episode in range(episodes):
    state = random.choice(state_labels)
    total_reward = 0
    
    for step in range(max_steps_per_episode):
        # Exploration vs. exploitation
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)
        else:
            action_idx = np.argmax(Q[state_to_idx[state]])
            action = actions[action_idx]
        
        next_state = take_action(state, action)
        
        # Check if next_state exists in Q-table
        if next_state not in state_to_idx:
            next_state = state  # Treat unknown states as terminal
            
        reward = refined_reward(state, next_state)
        total_reward += reward
        
        # Update Q-table
        current_idx = state_to_idx[state]
        next_idx = state_to_idx[next_state]
        Q[current_idx, actions.index(action)] = (1 - learning_rate) * Q[current_idx, actions.index(action)] + \
                                                learning_rate * (reward + discount_factor * np.max(Q[next_idx]))
        
        # Check terminal state (balanced condition)
        if 'normal_voltage_medium_load' in next_state:
            break
        
        state = next_state
    
    # Decay epsilon
    epsilon = max(epsilon_min, epsilon * epsilon_decay)
    
    # Print progress every 100 episodes
    if (episode + 1) % 2 == 0:
        print(f"Episode {episode+1}, Total Reward: {total_reward}, Epsilon: {epsilon:.2f}")

print("Training complete. Q-table:")
print(Q)

Episode 2, Total Reward: -500, Epsilon: 0.97


In [3]:
#### **Code Cell**:

import numpy as np
import random
import pandas as pd

# Load the preprocessed dataset
file_path = 'preprocessed_power_consumption.csv'  # Adjust path if needed
df = pd.read_csv(file_path, index_col='Timestamp')


# Define states based on voltage and load
def categorize_voltage(voltage):
    if voltage < 225:
        return 'low_voltage'
    elif 225 <= voltage <= 240:
        return 'normal_voltage'
    else:
        return 'high_voltage'

def categorize_load(active_power):
    if active_power < 3:
        return 'low_load'
    elif 3 <= active_power <= 6:
        return 'medium_load'
    else:
        return 'high_load'

# Create the state column for voltage and load
df['Voltage_State'] = df['Voltage'].apply(categorize_voltage)
df['Load_State'] = df['Global_active_power'].apply(categorize_load)

# Combine both states into one state representation
df['State'] = df['Voltage_State'] + '_' + df['Load_State']

# Define possible actions (adjusting load for each sub-metering)
actions = ['adjust_submetering1', 'adjust_submetering2', 'adjust_submetering3']

# Initialize the Q-table (state-action values)
Q = np.zeros((len(df['State'].unique()), len(actions)))

# Define a reward function for balancing
def reward(state, action):
    if 'low_voltage' in state or 'high_load' in state:
        return -1  # Penalty for low voltage or high load
    return 1  # Reward for balancing the load efficiently

# **Fine-Tuning the Reward Function**
def refined_reward(state, action):
    # Penalizing high load or low voltage, and rewarding balancing
    if 'low_voltage' in state:
        return -5  # Strong penalty for low voltage (voltage sag)
    if 'high_load' in state:
        return -2  # Penalty for high load (inefficient load distribution)
    return 1  # Reward for good load balancing

# Hyperparameters for Q-learning
learning_rate = 0.1
discount_factor = 0.9
epsilon = 0.2  # Exploration rate

In [9]:
df_sample = df.sample(n=10, random_state=42)

In [1]:
df.columns

NameError: name 'df' is not defined

In [10]:
df_sample.describe()

Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,0.065034,0.132497,0.575767,0.066448,0.002411,0.011623,0.18535
std,0.047805,0.079399,0.056902,0.044123,0.005094,0.009354,0.242011
min,0.009596,0.033094,0.452989,0.012448,0.0,0.0,0.0
25%,0.032727,0.078777,0.565104,0.03527,0.0,0.003125,0.008065
50%,0.048886,0.114286,0.604847,0.053942,0.0,0.0125,0.032258
75%,0.099711,0.199281,0.613813,0.097654,0.0,0.015299,0.342407
max,0.15535,0.253237,0.620032,0.149378,0.012749,0.025,0.580645


In [None]:


# Training the Q-learning agent (extended for load balancing)
for episode in range(10):  # Number of training episodes
    state = random.choice(df['State'].unique())  # Initialize random state
    done = False
    
    while not done:
        # Exploration vs. exploitation
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)  # Explore a random action
        else:
            action = actions[np.argmax(Q[df['State'].unique().tolist().index(state)])]  # Exploit learned action

        # Get reward for the current state-action pair
        r = reward(state, action)
        
        # Get next state (random for simplicity)
        next_state = random.choice(df['State'].unique())
        
        # Update Q-table using Q-learning formula
        Q[df['State'].unique().tolist().index(state), actions.index(action)] = (1 - learning_rate) * Q[df['State'].unique().tolist().index(state), actions.index(action)] + \
                                                                          learning_rate * (r + discount_factor * np.max(Q[df['State'].unique().tolist().index(next_state)]))
        
        state = next_state  # Transition to the next state
    
# Display the trained Q-table after training
print("Trained Q-table:")
print(Q)


In [9]:
import gym
from gym import spaces
import numpy as np

class LoadBalancingEnv(gym.Env):
    def __init__(self, df):
        super(LoadBalancingEnv, self).__init__()

        # The action space: Adjust sub-metering load (3 actions)
        self.action_space = spaces.Discrete(3)

        # The observation space: Features like voltage and load
        self.observation_space = spaces.Box(low=0, high=1, shape=(len(df.columns),), dtype=np.float32)

        # Store the dataset
        self.df = df
        self.current_step = 0
    
    def reset(self):
        self.current_step = 0
        return self.df.iloc[self.current_step].values  # Return the first observation
    
    def step(self, action):
        # Apply the action (adjust sub-metering load)
        if action == 0:
            # Simulate action: Adjust sub-metering 1
            pass
        elif action == 1:
            # Simulate action: Adjust sub-metering 2
            pass
        elif action == 2:
            # Simulate action: Adjust sub-metering 3
            pass

        # Calculate reward (simplified)
        reward = self.calculate_reward(action)

        # Increment the step counter
        self.current_step += 1

        # Check if we're at the end of the dataset
        done = self.current_step >= len(self.df) - 1

        # Return the next state, reward, and done flag
        next_state = self.df.iloc[self.current_step].values
        return next_state, reward, done, {}

    def calculate_reward(self, action):
        # Define reward function (to be refined based on your model's goals)
        if action == 0:
            return 1  # Reward for adjusting sub-metering 1
        elif action == 1:
            return -1  # Penalty for adjusting sub-metering 2
        else:
            return 0  # Neutral reward for adjusting sub-metering 3


In [10]:
#Step 3: Use Stable-Baselines3 to Train the DQN Model
#Now that we have the environment set up, we can use Stable-Baselines3 to train a DQN model on it.


from stable_baselines3 import DQN

# Initialize the environment
env = LoadBalancingEnv(df)

# Initialize the DQN model
model = DQN("MlpPolicy", env, verbose=1)

# Train the model
model.learn(total_timesteps=10000)  # Adjust the number of timesteps for your needs

# Save the model after training
model.save("dqn_load_balancing_model")

# Optionally, load the model to continue training or for inference
# model = DQN.load("dqn_load_balancing_model")

Using cpu device


ImportError: Missing shimmy installation. You provided an OpenAI Gym environment. Stable-Baselines3 (SB3) has transitioned to using Gymnasium internally. In order to use OpenAI Gym environments with SB3, you need to install shimmy (`pip install 'shimmy>=2.0'`).

In [None]:
# Evaluate the model
obs = env.reset()
total_reward = 0

for _ in range(len(df)):
    action, _states = model.predict(obs)
    obs, reward, done, _ = env.step(action)
    total_reward += reward
    if done:
        break

print(f"Total reward from model: {total_reward}")


In [None]:


### **Comparing Q-learning with Rule-based Optimization**

##Now, we’ll compare the **Q-learning model** with a **rule-based approach**. The rule-based method will adjust sub-metering load based on **simple voltage or load thresholds**.

#### **Code for Comparison**:


# Rule-based optimization for load balancing (Traditional method)
def rule_based_optimization(df):
    # Simple rules: If voltage is below 225V, adjust sub-metering load
    if df['Voltage'] < 225:
        action = 'adjust_submetering1'  # Increase sub-metering 1 load (simulate corrective action)
    elif df['Global_active_power'] > 5:
        action = 'adjust_submetering2'  # Distribute load to sub-metering 2
    else:
        action = 'adjust_submetering3'  # Default action
    
    return action

# Function to compare Q-learning with rule-based decisions
def compare_models(df, Q, actions):
    # Testing with Q-learning
    q_learning_action = actions[np.argmax(Q[df['State'].unique().tolist().index(df['State'].iloc[-1])])]
    
    # Testing with Rule-based
    rule_based_action = rule_based_optimization(df)
    
    print(f"Q-learning Action: {q_learning_action}, Rule-based Action: {rule_based_action}")

# Example comparison for a sample state
df['State'] = 'normal_voltage_medium_load'  # Example state for testing
compare_models(df, Q, actions)



### **Testing the Model Under Real-Time Conditions (e.g., Load Surges, EV Charging)**

To simulate **real-time grid behavior**:

1. We’ll simulate a **load surge** (e.g., increase in **Global\_active\_power** due to **EV charging**).
2. We’ll observe how **Q-learning** reacts and adjusts the load to **balance** the system.



In [None]:
#### **Script for Simulating Real-Time Grid Behavior**:


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Simulate multiple conditions for real-time grid behavior
def simulate_real_time_conditions(df, time_steps=10):
    # Track results over time
    voltage_history = []
    load_history = []
    actions_taken = []
    
    for i in range(time_steps):
        # Simulate Load Surge (e.g., EV charging)
        df['Global_active_power'] += np.random.normal(1, 0.5)  # Random surge in active power
        
        # Simulate renewable energy variability (voltage affected by load)
        df['Voltage'] = df['Voltage'] - (df['Global_active_power'] * 0.05)  # Voltage drop from increased load
        
        # Test the Q-learning model for the adjusted state
        current_state = categorize_voltage(df['Voltage'].iloc[-1]) + '_' + categorize_load(df['Global_active_power'].iloc[-1])
        action = actions[np.argmax(Q[df['State'].unique().tolist().index(current_state)])]  # Get the best action based on Q-table
        
        # Track results for analysis
        voltage_history.append(df['Voltage'].iloc[-1])
        load_history.append(df['Global_active_power'].iloc[-1])
        actions_taken.append(action)
        
        # Simulate other conditions:
        # Simulate a voltage spike (random sudden increase in voltage)
        if np.random.random() < 0.1:  # 10% chance of voltage spike
            df['Voltage'] += np.random.uniform(5, 15)  # Add voltage spike
        
        # Print results for analysis
        print(f"Time Step {i+1}: Voltage = {df['Voltage'].iloc[-1]:.2f}, Load = {df['Global_active_power'].iloc[-1]:.2f}, Action: {action}")
        
    # Plot the simulation results
    plt.figure(figsize=(12, 6))
    plt.subplot(2, 1, 1)
    plt.plot(voltage_history, label='Voltage (V)', color='b')
    plt.title('Voltage Over Time')
    plt.xlabel('Time Steps')
    plt.ylabel('Voltage (V)')
    plt.legend()
    
    plt.subplot(2, 1, 2)
    plt.plot(load_history, label='Load (kW)', color='r')
    plt.title('Load Over Time')
    plt.xlabel('Time Steps')
    plt.ylabel('Load (kW)')
    plt.legend()
    
    plt.tight_layout()
    plt.show()

    # Return actions taken for further analysis
    return actions_taken

# Simulate 10 time steps of grid behavior (load surge, voltage spike, and real-time adjustments)
simulate_real_time_conditions(df, time_steps=10)




### **Next Steps**:

1. **Train the Q-learning model** for **load balancing** and test its decisions.
2. **Compare the performance** of **Q-learning** and **rule-based optimization** under dynamic grid conditions.
3. **Simulate real-time grid behavior** (e.g., **EV charging** and **load surges**) to evaluate how the models perform in **real-world scenarios**.

Would you like me to help with **running the simulations** or **fine-tuning the Q-learning model**? Let me know how you’d like to proceed!
