# RL for Resource Optimization

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## ðŸ”— Prerequisites

- âœ… Basic Python
- âœ… Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 09, Unit 5** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# RL for Resource Optimization
## AIAT 123 - Reinforcement Learning

## Learning Objectives

- Apply RL to resource allocation problems
- Optimize data center resource usage
- Implement energy management systems
- Compare RL with traditional optimization

## Real-World Context

Data center optimization, energy grid management, and resource allocation.

**Industry Impact**: Google uses RL to reduce data center energy by 40%.

In [1]:
%pip install numpy matplotlib -q
import numpy as np
import matplotlib.pyplot as plt
print('âœ… Setup complete!')

Note: you may need to restart the kernel to use updated packages.


âœ… Setup complete!


## Part 1: Resource Allocation Problem


In [2]:
class ResourceAllocationEnv:
    """
    Simplified data center resource allocation environment.
    
    Real-world: Optimizing server allocation for workloads
    """
    def __init__(self, n_servers=5, n_workloads=10):
        self.n_servers = n_servers
        self.n_workloads = n_workloads
        self.reset()
    
    def reset(self):
        """Reset environment"""
        self.server_loads = np.zeros(self.n_servers)
        self.workloads = np.random.randint(1, 10, self.n_workloads)
        self.current_workload = 0
        return self.get_state()
    
    def get_state(self):
        """Get current state"""
        return np.concatenate([self.server_loads, [self.workloads[self.current_workload]]])
    
    def step(self, action):
        """
        Allocate workload to server.
        
        Reward: Negative of load imbalance + efficiency bonus
        """
        if self.current_workload >= len(self.workloads):
            return self.get_state(), 0, True, {}
        
        # Allocate workload
        self.server_loads[action] += self.workloads[self.current_workload]
        self.current_workload += 1
        
        # Calculate reward (negative load imbalance)
        load_std = np.std(self.server_loads)
        reward = -load_std - 0.1 * np.max(self.server_loads)  # Penalize overload
        
        done = self.current_workload >= len(self.workloads)
        return self.get_state(), reward, done, {}

print('âœ… Resource allocation environment created')

âœ… Resource allocation environment created


## Part 2: Q-Learning for Resource Optimization


In [3]:
# Simple Q-learning for resource allocation
class ResourceOptimizer:
    """RL-based resource optimizer"""
    def __init__(self, n_servers, learning_rate=0.1, discount=0.95, epsilon=0.1):
        self.n_servers = n_servers
        self.q_table = {}
        self.lr = learning_rate
        self.gamma = discount
        self.epsilon = epsilon
    
    def get_state_key(self, state):
        """Discretize state for Q-table"""
        # Simple discretization
        return tuple((state / 10).astype(int))
    
    def select_action(self, state):
        """Select server using epsilon-greedy"""
        if np.random.random() < self.epsilon:
            return np.random.randint(self.n_servers)
        
        state_key = self.get_state_key(state)
        q_values = [self.q_table.get((state_key, a), 0.0) for a in range(self.n_servers)]
        return np.argmax(q_values)
    
    def update(self, state, action, reward, next_state, done):
        """Update Q-values"""
        state_key = self.get_state_key(state)
        current_q = self.q_table.get((state_key, action), 0.0)
        
        if done:
            target_q = reward
        else:
            next_key = self.get_state_key(next_state)
            max_next_q = max([self.q_table.get((next_key, a), 0.0) for a in range(self.n_servers)])
            target_q = reward + self.gamma * max_next_q
        
        self.q_table[(state_key, action)] = current_q + self.lr * (target_q - current_q)

print('âœ… Resource optimizer implemented')

âœ… Resource optimizer implemented


## Real-World Applications

- **Data Centers**: Google DeepMind reduced energy by 40%
- **Energy Grids**: Load balancing and demand response
- **Cloud Computing**: Auto-scaling and resource allocation
- **Manufacturing**: Production line optimization

---

**End of Notebook**