# DeepPack3D Training Notebook for 3D Bin Packing

This notebook trains a reinforcement learning model based on the DeepPack3D implementation, specifically customized for the 3D bin packing problem with constraints to prevent floating boxes, overlapping, and boxes extending outside the container.

## Setup and Environment

In [None]:
# Install required packages
!pip install tensorflow==2.10.0
!pip install matplotlib seaborn numpy pandas

In [None]:
# Import necessary libraries
import os
import sys
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
import random
import time
import json
from datetime import datetime
import pandas as pd
from IPython.display import display, clear_output

## Clone and Import DeepPack3D

First, we'll clone the DeepPack3D repository if it's not already available, and then import its modules.

In [None]:
# Check if DeepPack3D is already available, otherwise clone it
if not os.path.exists('DeepPack3D'):
    !git clone https://github.com/zgtcktom/DeepPack3D.git

# Add DeepPack3D to the Python path
sys.path.append('DeepPack3D')

In [None]:
# Import DeepPack3D modules
try:
    from DeepPack3D.env import MultiBinPackerEnv
    from DeepPack3D.agent import Agent, HeuristicAgent
    from DeepPack3D.geometry import Cuboid
    from DeepPack3D.SpacePartitioner import SpacePartitioner
    from DeepPack3D.conveyor import Conveyor, FileConveyor, InputConveyor
    print("Successfully imported DeepPack3D modules")
except ImportError as e:
    print(f"Error importing DeepPack3D modules: {e}")
    print("Please check the repository structure and paths")

## Define Custom Environment

We'll extend the MultiBinPackerEnv to create a custom environment that specifically addresses our requirements:
1. No floating boxes
2. No overlapping boxes
3. No boxes outside the container
4. Optimized space utilization

In [None]:
class CustomBinPackerEnv(MultiBinPackerEnv):
    def __init__(self, n_bins=1, size=(32, 32, 32), k=5, max_bins=None, max_items=None, 
                 replace='all', verbose=False, prealloc_bins=0, prealloc_items=0, 
                 shuffle=False, use_rotate=True, use_skip=True, 
                 strict_support=True, min_support_percentage=0.5):
        super().__init__(n_bins, size, k, max_bins, max_items, replace, verbose, 
                        prealloc_bins, prealloc_items, shuffle, use_rotate, use_skip)
        
        # Additional parameters for strict stability checking
        self.strict_support = strict_support
        self.min_support_percentage = min_support_percentage
    
    def placeable_coords(self, packer, h_map, size):
        """Enhanced version of placeable_coords that enforces stricter stability requirements"""
        xz = []
        splits = {}
        for split in packer.free_splits:
            if (split.top < self.size[1]) or (not split.fit(size)):
                continue
            x, y, z = split.coord
            xz.append((x, z))
            splits[(x, z)] = split
        xz = set(xz)
        
        w, h, d = size
        xyz = []
        for x, z in xz:
            placement = h_map[z:z + d, x:x + w]
            y = np.amax(placement)
            
            # Calculate support percentage (how much of the bottom face is supported)
            if y > 0 and self.strict_support:  # Only check if not on the ground and strict mode is enabled
                support_count = np.count_nonzero(placement == y)
                support_percentage = support_count / (d * w)
                
                # Only allow placement if support percentage meets minimum requirement
                if support_percentage >= self.min_support_percentage:
                    xyz.append((x, y, z, splits[(x, z)]))
            else:
                # Original condition for ground placement or when strict mode is disabled
                if np.count_nonzero(placement == y) / (d * w) > 0.5:
                    xyz.append((x, y, z, splits[(x, z)]))
        
        return xyz
    
    def step(self, action):
        """Enhanced step function with improved reward calculation"""
        items, h_maps, actions = self.state()
        
        # item, bin, rotation_placement
        i, j, k = action
        _, (x, y, z), (w, h, d), _ = actions[i][j][k]
        
        packer = self.packers[j]
        cuboid = Cuboid(x, y, z, w, h, d)
        if not packer.add(cuboid):
            raise Exception(f'invalid space {cuboid}')
        self.conveyor.grab(i)
        
        next_state = self.state(step=True)
        
        # Enhanced reward shaping
        items, h_maps, actions = next_state
        
        item = items[i] if i < len(items) else None
        h_map = h_maps[j]
        
        # Volume utilization
        volume = np.sum([split.volume for split in packer.splits])
        
        # Pyramid score (encourages stable configurations)
        pyramid = volume / np.sum(h_map) if np.sum(h_map) > 0 else 0
        
        # Compactness score (encourages dense packing)
        max_height = np.amax(h_map) if h_map.size > 0 else 0
        compactness = volume / np.prod((packer.size[0], max_height, packer.size[2])) if max_height > 0 else 0
        
        # Contact score (encourages contact with walls and other items)
        contact_score = 0
        if x == 0 or x + w == packer.size[0]:  # Contact with X walls
            contact_score += 0.1
        if z == 0 or z + d == packer.size[2]:  # Contact with Z walls
            contact_score += 0.1
        if y == 0:  # Contact with floor
            contact_score += 0.2
        
        # Combined reward
        reward = 0.3 * pyramid + 0.3 * compactness + 0.4 * contact_score
        
        # Check if we're done
        done = len(self.indices(actions)) == 0
        
        # Handle bin replacement logic (same as original)
        if done:
            if self.max_bins != -1 and self.used_bins + 1 > self.max_bins:
                for i, packer in enumerate(packer for packer in self.packers if packer.space_utilization() != 0):
                    self.used_packers.append(packer)
                    loc = self.packers.index(packer)
                    if self.verbose:
                        print(f'bin {self.used_bins - self.n_bins + i}, loc: {loc}, space util: {packer.space_utilization() * 100:.2f}, packed items: {len(packer.splits)}')
                done = True
            else:
                if self.replace == 'max':
                    loc = np.argmax([packer.space_utilization() for packer in self.packers])
                    packer = self.packers[loc]
                    if self.verbose:
                        print(f'bin {self.used_bins - self.n_bins}, loc: {loc}, space util: {packer.space_utilization() * 100:.2f}, packed items: {len(packer.splits)}')

                    self.used_packers.append(self.packers[loc])
                    self.packers[loc] = SpacePartitioner(self.size)
                    self.packers[loc].reset()
                    added = 1
                    self.used_bins += 1
                elif self.replace == 'all':
                    added = 0
                    while True:
                        loc = np.argmax([packer.space_utilization() for packer in self.packers])
                        packer = self.packers[loc]
                        
                        if packer.space_utilization() == 0:
                            break
                        if self.max_bins != -1 and self.used_bins + 1 > self.max_bins:
                            break
                        if self.verbose:
                            print(f'bin {self.used_bins - self.n_bins}, loc: {loc}, space util: {packer.space_utilization() * 100:.2f}, packed items: {len(packer.splits)}')

                        self.used_packers.append(self.packers[loc])
                        self.packers[loc] = SpacePartitioner(self.size)
                        self.packers[loc].reset()
                        added += 1
                        self.used_bins += 1
                else:
                    raise Exception('not implemented')
                
                next_state = self.state(step=True)
        
        return next_state, reward, done
    
    def indices(self, actions):
        """Helper method to get indices of available actions"""
        return [
            (i, j, k) 
            for i in range(len(actions)) 
            for j in range(len(actions[i])) 
            for k in range(len(actions[i][j]))
        ]

In [None]:
# Define a custom neural network model for 3D bin packing
def create_packing_model(input_shape=(15,), learning_rate=0.001):
    """
    Create a neural network model for 3D bin packing with the following architecture:
    - Input: State features (container state, item dimensions, etc.)
    - Output: Q-value for each action
    """
    model = tf.keras.Sequential([
        layers.Dense(256, activation='relu', input_shape=input_shape),
        layers.BatchNormalization(),
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.2),
        layers.Dense(64, activation='relu'),
        layers.Dense(1, activation='linear')  # Q-value output
    ])
    
    model.compile(
        optimizer=optimizers.Adam(learning_rate=learning_rate),
        loss='mse'
    )
    
    return model

In [None]:
class CustomRLAgent:
    def __init__(self, env, model=None, learning_rate=0.001, gamma=0.95, 
                 epsilon=1.0, epsilon_min=0.01, epsilon_decay=0.995,
                 batch_size=32, memory_size=10000, target_update_freq=10):
        self.env = env
        self.state_size = 15  # Number of state features
        self.gamma = gamma  # Discount factor
        self.epsilon = epsilon  # Exploration rate
        self.epsilon_min = epsilon_min
        self.epsilon_decay = epsilon_decay
        self.learning_rate = learning_rate
        self.batch_size = batch_size
        self.memory = []  # Experience replay memory
        self.memory_size = memory_size
        self.target_update_freq = target_update_freq
        self.step_count = 0
        
        # Create models
        if model is None:
            self.model = create_packing_model(input_shape=(self.state_size,), 
                                             learning_rate=learning_rate)
        else:
            self.model = model
            
        # Create target model for stable training
        self.target_model = create_packing_model(input_shape=(self.state_size,), 
                                                learning_rate=learning_rate)
        self.update_target_model()
        
        # Metrics tracking
        self.rewards_history = []
        self.space_utilization_history = []
        self.loss_history = []
        
    def update_target_model(self):
        """Update the target model with weights from the primary model"""
        self.target_model.set_weights(self.model.get_weights())
        
    def remember(self, state, action, reward, next_state, done):
        """Store experience in replay memory"""
        if len(self.memory) >= self.memory_size:
            self.memory.pop(0)  # Remove oldest memory if at capacity
        self.memory.append((state, action, reward, next_state, done))
        
    def state_to_features(self, state, action=None):
        """
        Convert environment state and action to feature vector for neural network
        
        Args:
            state: Tuple of (items, h_maps, actions) from environment
            action: Optional tuple of (i, j, k) indices for selected action
            
        Returns:
            Feature vector (numpy array)
        """
        items, h_maps, actions = state
        
        # Container features
        container_volume = np.prod(self.env.size)
        container_dims = np.array(self.env.size) / max(self.env.size)  # Normalized dimensions
        
        # Current state features
        h_map = h_maps[0]  # Use first bin's height map
        avg_height = np.mean(h_map) / self.env.size[1]  # Normalized average height
        max_height = np.max(h_map) / self.env.size[1]  # Normalized maximum height
        height_variance = np.var(h_map) / (self.env.size[1] ** 2)  # Normalized height variance
        
        # If action is provided, get specific action features
        if action is not None:
            i, j, k = action
            _, (x, y, z), (w, h, d), _ = actions[i][j][k]
            
            # Item features
            item = items[i]
            item_volume = w * h * d / container_volume  # Normalized volume
            
            # Position features
            pos_x = x / self.env.size[0]  # Normalized x position
            pos_y = y / self.env.size[1]  # Normalized y position
            pos_z = z / self.env.size[2]  # Normalized z position
            
            # Contact features (walls and floor)
            wall_contact = 0
            if x == 0 or x + w == self.env.size[0]:  # Contact with X walls
                wall_contact += 1
            if z == 0 or z + d == self.env.size[2]:  # Contact with Z walls
                wall_contact += 1
            floor_contact = 1 if y == 0 else 0  # Contact with floor
            
            # Support feature
            placement = h_map[z:z+d, x:x+w]
            support = np.count_nonzero(placement == y) / (w * d) if y > 0 else 1.0
        else:
            # Default values when no action is provided
            item_volume = 0
            pos_x, pos_y, pos_z = 0, 0, 0
            wall_contact, floor_contact = 0, 0
            support = 0
        
        # Remaining items feature
        remaining_items = len([item for item in items if item is not None]) / self.env.k
        
        # Combine all features
        features = np.array([
            container_dims[0], container_dims[1], container_dims[2],
            avg_height, max_height, height_variance,
            item_volume,
            pos_x, pos_y, pos_z,
            wall_contact / 2, floor_contact,
            support,
            remaining_items,
            self.env.used_bins / max(1, self.env.max_bins if self.env.max_bins != -1 else 10)
        ])
        
        return features
        
    def act(self, state):
        """
        Select an action using epsilon-greedy policy
        
        Args:
            state: Current environment state
            
        Returns:
            Selected action indices (i, j, k)
        """
        items, h_maps, actions = state
        action_indices = self.env.indices(actions)
        
        if not action_indices:  # No valid actions
            return None
        
        # Explore: choose random action
        if np.random.rand() <= self.epsilon:
            return random.choice(action_indices)
        
        # Exploit: choose best action based on Q-values
        q_values = []
        for i, j, k in action_indices:
            action = (i, j, k)
            features = self.state_to_features(state, action)
            features = features.reshape(1, -1)  # Reshape for model input
            q_value = self.model.predict(features, verbose=0)[0][0]
            q_values.append((action, q_value))
        
        # Select action with highest Q-value
        best_action, _ = max(q_values, key=lambda x: x[1])
        return best_action
    
    def replay(self):
        """Train the model using experience replay"""
        if len(self.memory) < self.batch_size:
            return 0  # Not enough memories to train
        
        # Sample random batch from memory
        batch = random.sample(self.memory, self.batch_size)
        
        # Prepare training data
        states = []
        targets = []
        
        for state, action, reward, next_state, done in batch:
            # Convert state and action to features
            state_features = self.state_to_features(state, action)
            
            # Calculate target Q-value
            if done:
                target = reward
            else:
                # Get Q-values for all possible next actions
                items, h_maps, actions = next_state
                next_action_indices = self.env.indices(actions)
                
                if not next_action_indices:  # No valid next actions
                    target = reward
                else:
                    next_q_values = []
                    for i, j, k in next_action_indices:
                        next_action = (i, j, k)
                        next_features = self.state_to_features(next_state, next_action)
                        next_features = next_features.reshape(1, -1)
                        next_q = self.target_model.predict(next_features, verbose=0)[0][0]
                        next_q_values.append(next_q)
                    
                    # Use maximum Q-value for next state
                    max_next_q = max(next_q_values) if next_q_values else 0
                    target = reward + self.gamma * max_next_q
            
            # Prepare input and target for training
            states.append(state_features)
            targets.append(target)
        
        # Train the model
        states = np.array(states)
        targets = np.array(targets).reshape(-1, 1)
        
        history = self.model.fit(states, targets, epochs=1, verbose=0, batch_size=self.batch_size)
        loss = history.history['loss'][0]
        self.loss_history.append(loss)
        
        # Update target model periodically
        self.step_count += 1
        if self.step_count % self.target_update_freq == 0:
            self.update_target_model()
            
        # Decay epsilon
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay
            
        return loss
    
    def train(self, episodes=1000, max_steps=1000, visualize=False, 
              save_freq=100, model_path='bin_packing_model.h5'):
        """
        Train the agent on the environment
        
        Args:
            episodes: Number of episodes to train
            max_steps: Maximum steps per episode
            visualize: Whether to visualize the training
            save_freq: Frequency to save the model
            model_path: Path to save the model
            
        Returns:
            Training history
        """
        training_history = {
            'episode_rewards': [],
            'episode_steps': [],
            'space_utilization': [],
            'loss': []
        }
        
        for episode in range(episodes):
            # Reset environment
            state = self.env.reset()
            total_reward = 0
            steps = 0
            
            # Episode loop
            for step in range(max_steps):
                # Select and perform action
                action = self.act(state)
                if action is None:  # No valid actions
                    break
                    
                next_state, reward, done = self.env.step(action)
                total_reward += reward
                
                # Store experience in memory
                self.remember(state, action, reward, next_state, done)
                
                # Train the model
                loss = self.replay()
                
                # Update state
                state = next_state
                steps += 1
                
                # Visualize if requested
                if visualize and step % 10 == 0:
                    self.visualize_state(state, episode, step)
                
                if done:
                    break
            
            # Calculate space utilization
            space_util = np.mean([packer.space_utilization() for packer in self.env.used_packers]) * 100 if self.env.used_packers else 0
            
            # Update history
            training_history['episode_rewards'].append(total_reward)
            training_history['episode_steps'].append(steps)
            training_history['space_utilization'].append(space_util)
            training_history['loss'].append(np.mean(self.loss_history[-steps:]) if steps > 0 else 0)
            
            # Print progress
            if episode % 10 == 0:
                print(f"Episode {episode}/{episodes}, Reward: {total_reward:.2f}, Steps: {steps}, "
                      f"Space Util: {space_util:.2f}%, Epsilon: {self.epsilon:.4f}")
                
                # Plot training progress
                if episode % 50 == 0:
                    self.plot_training_progress(training_history)
            
            # Save model periodically
            if episode % save_freq == 0 and episode > 0:
                self.model.save(f"{model_path.split('.')[0]}_{episode}.h5")
                print(f"Model saved at episode {episode}")
        
        # Save final model
        self.model.save(model_path)
        print(f"Final model saved to {model_path}")
        
        return training_history
    
    def visualize_state(self, state, episode, step):
        """Visualize the current state"""
        try:
            items, h_maps, actions = state
            
            # Plot height map
            plt.figure(figsize=(10, 8))
            
            # Plot height map for each bin
            for i, h_map in enumerate(h_maps):
                plt.subplot(1, len(h_maps), i+1)
                sns.heatmap(h_map, cmap='viridis', vmin=0, vmax=self.env.size[1])
                plt.title(f"Bin {i} Height Map")
            
            plt.tight_layout()
            plt.savefig(f"outputs/state_ep{episode}_step{step}.png")
            plt.close()
            
            # Optionally render 3D view if DeepPack3D has visualization
            for i, packer in enumerate(self.env.packers):
                if hasattr(packer, 'render'):
                    fig = packer.render()
                    plt.savefig(f"outputs/3d_ep{episode}_step{step}_bin{i}.png")
                    plt.close()
        except Exception as e:
            print(f"Visualization error: {e}")
    
    def plot_training_progress(self, history):
        """Plot training progress metrics"""
        plt.figure(figsize=(15, 10))
        
        # Plot rewards
        plt.subplot(2, 2, 1)
        plt.plot(history['episode_rewards'])
        plt.title('Episode Rewards')
        plt.xlabel('Episode')
        plt.ylabel('Total Reward')
        
        # Plot steps
        plt.subplot(2, 2, 2)
        plt.plot(history['episode_steps'])
        plt.title('Episode Steps')
        plt.xlabel('Episode')
        plt.ylabel('Steps')
        
        # Plot space utilization
        plt.subplot(2, 2, 3)
        plt.plot(history['space_utilization'])
        plt.title('Space Utilization')
        plt.xlabel('Episode')
        plt.ylabel('Utilization %')
        
        # Plot loss
        plt.subplot(2, 2, 4)
        plt.plot(history['loss'])
        plt.title('Training Loss')
        plt.xlabel('Episode')
        plt.ylabel('Loss')
        
        plt.tight_layout()
        plt.savefig('training_progress.png')
        plt.close()

In [None]:
# Create output directory for visualizations
if not os.path.exists('outputs'):
    os.makedirs('outputs')

# Create environment with strict support requirements
env = CustomBinPackerEnv(
    n_bins=1,
    size=(32, 32, 32),
    k=5,  # Look ahead 5 items
    max_bins=10,
    strict_support=True,
    min_support_percentage=0.6,  # Require 60% support to prevent floating
    verbose=True
)

# Create agent
agent = CustomRLAgent(
    env=env,
    learning_rate=0.001,
    gamma=0.95,
    epsilon=1.0,
    epsilon_min=0.05,
    epsilon_decay=0.995,
    batch_size=32,
    memory_size=10000,
    target_update_freq=10
)

# Train the agent
training_history = agent.train(
    episodes=500,
    max_steps=1000,
    visualize=True,
    save_freq=50,
    model_path='bin_packing_model.h5'
)

# Plot final training results
plt.figure(figsize=(15, 10))

# Plot rewards
plt.subplot(2, 2, 1)
plt.plot(training_history['episode_rewards'])
plt.title('Episode Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')

# Plot steps
plt.subplot(2, 2, 2)
plt.plot(training_history['episode_steps'])
plt.title('Episode Steps')
plt.xlabel('Episode')
plt.ylabel('Steps')

# Plot space utilization
plt.subplot(2, 2, 3)
plt.plot(training_history['space_utilization'])
plt.title('Space Utilization')
plt.xlabel('Episode')
plt.ylabel('Utilization %')

# Plot loss
plt.subplot(2, 2, 4)
plt.plot(training_history['loss'])
plt.title('Training Loss')
plt.xlabel('Episode')
plt.ylabel('Loss')

plt.tight_layout()
plt.savefig('final_training_results.png')
plt.show()

In [None]:
# Function to evaluate the trained model
def evaluate_model(model_path, num_episodes=10, visualize=True):
    """
    Evaluate a trained model on the bin packing environment
    
    Args:
        model_path: Path to the trained model
        num_episodes: Number of episodes to evaluate
        visualize: Whether to visualize the evaluation
        
    Returns:
        Evaluation metrics
    """
    # Load the trained model
    model = tf.keras.models.load_model(model_path)
    
    # Create environment for evaluation
    eval_env = CustomBinPackerEnv(
        n_bins=1,
        size=(32, 32, 32),
        k=5,
        max_bins=10,
        strict_support=True,
        min_support_percentage=0.6,
        verbose=True
    )
    
    # Create agent with loaded model and no exploration
    eval_agent = CustomRLAgent(
        env=eval_env,
        model=model,
        epsilon=0.0,  # No exploration during evaluation
        epsilon_min=0.0,
        epsilon_decay=1.0
    )
    
    # Evaluation metrics
    metrics = {
        'space_utilization': [],
        'items_packed': [],
        'bins_used': [],
        'rewards': []
    }
    
    # Run evaluation episodes
    for episode in range(num_episodes):
        state = eval_env.reset()
        total_reward = 0
        steps = 0
        
        while True:
            # Select action
            action = eval_agent.act(state)
            if action is None:
                break
                
            # Take action
            next_state, reward, done = eval_env.step(action)
            total_reward += reward
            state = next_state
            steps += 1
            
            # Visualize if requested
            if visualize and steps % 5 == 0:
                eval_agent.visualize_state(state, episode, steps)
            
            if done:
                break
        
        # Calculate metrics
        space_util = np.mean([packer.space_utilization() for packer in eval_env.used_packers]) * 100 if eval_env.used_packers else 0
        items_packed = sum(len(packer.splits) for packer in eval_env.used_packers)
        
        # Update metrics
        metrics['space_utilization'].append(space_util)
        metrics['items_packed'].append(items_packed)
        metrics['bins_used'].append(eval_env.used_bins)
        metrics['rewards'].append(total_reward)
        
        print(f"Evaluation Episode {episode+1}/{num_episodes}:")
        print(f"  Space Utilization: {space_util:.2f}%")
        print(f"  Items Packed: {items_packed}")
        print(f"  Bins Used: {eval_env.used_bins}")
        print(f"  Total Reward: {total_reward:.2f}")
    
    # Calculate average metrics
    avg_metrics = {
        'avg_space_utilization': np.mean(metrics['space_utilization']),
        'avg_items_packed': np.mean(metrics['items_packed']),
        'avg_bins_used': np.mean(metrics['bins_used']),
        'avg_reward': np.mean(metrics['rewards'])
    }
    
    print("\nEvaluation Results:")
    print(f"Average Space Utilization: {avg_metrics['avg_space_utilization']:.2f}%")
    print(f"Average Items Packed: {avg_metrics['avg_items_packed']:.2f}")
    print(f"Average Bins Used: {avg_metrics['avg_bins_used']:.2f}")
    print(f"Average Reward: {avg_metrics['avg_reward']:.2f}")
    
    return metrics, avg_metrics

# Evaluate the trained model
eval_metrics, avg_metrics = evaluate_model('bin_packing_model.h5', num_episodes=5, visualize=True)

# Plot evaluation results
plt.figure(figsize=(12, 8))

# Plot space utilization for each evaluation episode
plt.subplot(2, 2, 1)
plt.bar(range(1, len(eval_metrics['space_utilization'])+1), eval_metrics['space_utilization'])
plt.axhline(y=avg_metrics['avg_space_utilization'], color='r', linestyle='--', label=f'Avg: {avg_metrics["avg_space_utilization"]:.2f}%')
plt.title('Space Utilization')
plt.xlabel('Episode')
plt.ylabel('Utilization %')
plt.legend()

# Plot items packed for each evaluation episode
plt.subplot(2, 2, 2)
plt.bar(range(1, len(eval_metrics['items_packed'])+1), eval_metrics['items_packed'])
plt.axhline(y=avg_metrics['avg_items_packed'], color='r', linestyle='--', label=f'Avg: {avg_metrics["avg_items_packed"]:.2f}')
plt.title('Items Packed')
plt.xlabel('Episode')
plt.ylabel('Count')
plt.legend()

# Plot bins used for each evaluation episode
plt.subplot(2, 2, 3)
plt.bar(range(1, len(eval_metrics['bins_used'])+1), eval_metrics['bins_used'])
plt.axhline(y=avg_metrics['avg_bins_used'], color='r', linestyle='--', label=f'Avg: {avg_metrics["avg_bins_used"]:.2f}')
plt.title('Bins Used')
plt.xlabel('Episode')
plt.ylabel('Count')
plt.legend()

# Plot rewards for each evaluation episode
plt.subplot(2, 2, 4)
plt.bar(range(1, len(eval_metrics['rewards'])+1), eval_metrics['rewards'])
plt.axhline(y=avg_metrics['avg_reward'], color='r', linestyle='--', label=f'Avg: {avg_metrics["avg_reward"]:.2f}')
plt.title('Episode Rewards')
plt.xlabel('Episode')
plt.ylabel('Reward')
plt.legend()

plt.tight_layout()
plt.savefig('evaluation_results.png')
plt.show()

In [None]:
# Export the model to TensorFlow.js format for use in your web application
!pip install tensorflowjs

import tensorflowjs as tfjs

# Load the trained model
model = tf.keras.models.load_model('bin_packing_model.h5')

# Create output directory for the TensorFlow.js model
if not os.path.exists('tfjs_model'):
    os.makedirs('tfjs_model')

# Convert the model to TensorFlow.js format
tfjs.converters.save_keras_model(model, 'tfjs_model')
print("Model converted to TensorFlow.js format and saved to 'tfjs_model' directory")

# Create a simple model info file with metadata
model_info = {
    "name": "3D Bin Packing Model",
    "version": "1.0",
    "description": "Reinforcement learning model for 3D bin packing",
    "input_shape": [15],  # Number of input features
    "output_shape": [1],  # Q-value output
    "date_created": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
}

with open('tfjs_model/model_info.json', 'w') as f:
    json.dump(model_info, f, indent=2)

print("Model info saved to 'tfjs_model/model_info.json'")

In [None]:
# Display instructions for using the trained model in your project
from IPython.display import Markdown

instructions = """
## Using the Trained Model in Your Project

To use the trained model in your TypeScript/JavaScript project:

1. **Copy the exported model files**:
   - Copy the entire `tfjs_model` directory to your project's assets or public directory.

2. **Install TensorFlow.js in your project**:
   ```bash
   npm install @tensorflow/tfjs

In [None]:
import * as tf from '@tensorflow/tfjs';

// Load the model
async function loadModel() {
  const model = await tf.loadLayersModel('path/to/tfjs_model/model.json');
  return model;
}

In [None]:
// Convert state and action to features
function stateActionToFeatures(state, action) {
  // Implement the same feature extraction logic as in the Python code
  // ...
  
  // Return features as a tensor
  return tf.tensor2d([features], [1, 15]);
}

// Predict Q-value for a state-action pair
async function predictQValue(model, state, action) {
  const features = stateActionToFeatures(state, action);
  const prediction = model.predict(features);
  const qValue = prediction.dataSync()[0];
  features.dispose();
  prediction.dispose();
  return qValue;
}

In [None]:
// Example integration with your existing code
async function packWithRL(items, container) {
  const model = await loadModel();
  
  // Initialize state
  let state = initializeState(items, container);
  
  // Pack items one by one
  while (hasRemainingItems(state)) {
    // Generate valid actions
    const validActions = generateValidActions(state);
    
    // Select best action using the model
    let bestAction = null;
    let bestQValue = -Infinity;
    
    for (const action of validActions) {
      const qValue = await predictQValue(model, state, action);
      if (qValue > bestQValue) {
        bestQValue = qValue;
        bestAction = action;
      }
    }
    
    // Apply the selected action
    state = applyAction(state, bestAction);
  }
  
  // Return the packed result
  return {
    packedItems: state.packedItems,
    unpackedItems: state.remainingItems,
    // ... other result properties
  };
}

In [None]:

## How to Use This Notebook

1. **Run the notebook in Google Colab**:
   - Upload this notebook to Google Colab
   - Make sure to select a GPU runtime for faster training

2. **Training Process**:
   - The notebook will train a reinforcement learning model specifically designed to prevent floating boxes, overlapping, and boxes extending outside the container
   - Training will take several hours depending on the number of episodes
   - Progress will be visualized and models will be saved periodically

3. **After Training**:
   - The final model will be exported to TensorFlow.js format
   - You can download the `tfjs_model` directory and integrate it with your project
   - Follow the instructions in the last cell to properly use the model

4. **Customization**:
   - You can adjust the training parameters, model architecture, and environment settings to suit your specific requirements
   - The `min_support_percentage` parameter is particularly important for preventing floating boxes

This approach will give you a specialized model that learns from experience and continuously improves as it encounters more packing scenarios.