# Deep RL for inventory control

# Instructions for Using This Notebook

## Overview
This notebook trains a deep reinforcement learning agent for inventory control using Hindsight Differentiable Policy Optimization (HDPO). Customize the problem setup and neural network architecture by editing configuration files before running. 

The notebook currently uses a "raw" dataset, in the sense that we do not pre-process or filter any data. For example, we take the sales data provided in the competition, and do not filter or try to replace missing or censored data. Additionally, we consider a simple NN architecture. Specifically, the DataDriven net represents a Multi-Layer Perceptron that takes one "long" vector and outputs an order quantity. Most likely, better results can be obtained by improving the quality of the dataset and using a more powerful neural architecture.

---

## Neural Network Inputs

The neural network receives several types of inputs, all defined in the config files. These inputs represent the state of the inventory system and relevant features.

**Note on dimensions:** You will see a "1" in many tensor dimensions below. This represents the number of stores. In this exercise, we work with a single store (dimension = 1). The sizes below represent the sizes of the inputs given to the neural networks, as opposed to being the size of the tensors you need to create. For example, the complete dataset of Past Demands is of size `[batch, 1, past_demand_periods]`, while the size of the past demand state tensor is `[batch, 1, past_demand_periods]`.

### Input Tensors

1. **Inventory State** - `[batch, 1, 2]`
   - For each product (and the unique store), the first coordinate specifies the units of inventory on hand, and the second the number of units arriving at the beginning of next week

2. **Past Demands** - `[batch, 1, past_demand_periods]`
   - Historical demand for each product
   - Number of periods tracked set in config

3. **instocks** - `[batch, 1, past_instock_periods]`
   - Historical stock availability (in stock or not)
   - Same structure as past demands

4. **Product Features** - `[batch, 1, number_of_product_features]`
   - Fixed product information (store ID, product ID, etc.)
   - Does not change over time

5. **Time Features** - `[features]`
   - Time-related information common across products
   - Examples: month, day of week, holidays

6. **Time-Product Features** - `[features, batch, 1, past_periods]`
   - Product-specific information that changes over time
   - Examples: promotional periods, seasonal patterns
   - `features` = number of different metrics tracked per product

**See example data preparation:** Check `vn2_data_analysis.ipynb` for examples of creating these dataframes and tensors. You can create your own version of these structures by, for example, preprocessing the sales data into a better form. If you do this, remember to update the filenames in the setting config file!

---

## Configuration Files

You need to edit **two config files**:

1. **Setting Config** (inventory problem setup): `config_files/settings/your_setting.yaml`
2. **Policy Config** (neural network architecture): `config_files/policies_and_hyperparams/your_policy.yaml`

---

## Key Parameters to Customize

### Setting Config (`config_files/settings/`)

#### Time Periods
**Location:** `sample_data_params` and `problem_params`
- Set `periods` for train, dev, and test datasets in `sample_data_params`
- **Important:** Update `periods` in `problem_params` to match the ranges specified in `sample_data_params`

#### Neural Network Inputs

**Past demand/instock history:**
- `observation_params/demand/past_periods` - past demand periods to include
- `observation_params/instock/past_periods` - past instock periods to include

**Time features:**
- `observation_params/time_features_file` - filename with time data (e.g., holidays)
- `observation_params/time_features` - column names to read and pass to neural network

**Product features:**
- `observation_params/product_features/features` - which product feature columns to use
- `store_params/product_features/features` - **must match the above list**

**Time-Product features:**
- `store_params/time_product_features_tensor/file_location` - filename with the tensor (will use the entire tensor)

#### Data Files
**Location:** `store_params`
- Comment out any parameters you don't need (add `#` at the start of the line). For example, if you don't need time_product_features_tensor, comment out all the info related to it

---

### Policy Config (`config_files/policies_and_hyperparams/`)

#### Neural Network Architecture
**Location:** `nn_params`
- `neurons_per_hidden_layer` - layer sizes (e.g., `[64, 64]` for two layers with 64 neurons)
- `inner_layer_activations` - activation functions (e.g., `relu`, `tanh`)

Additionally, you can create your own neural network by creating a class in neural_networks.py, adding the class to the get_architecture method of NeuralNetworkCreator and creating a config that calls your architecture. See `config_files/policies_and_hyperparams/data_driven_net.yml` for an example on how to do this

#### Training
**Location:** `trainer_params` and `optimizer_params`
- `trainer_params/epochs` - number of training epochs
- `optimizer_params/learning_rate` - optimizer learning rate
- `params_by_dataset/train/batch_size` - batch size for train set (you can also change for dev and test)

---

## Important Notes

- **Product features consistency:** The `features` list in `observation_params/product_features` and `store_params/product_features` must match exactly
- **Comment out unused parameters:** Add `#` before any line you don't need
- **Refer to README:** For complete parameter descriptions, see the "Populating a config file" section in the main README

# Code 

## Import libraries and create helpful functions

In [1]:
import yaml
import pandas as pd
from trainer import *
import os
from datetime import datetime
import numpy as np
from datetime import datetime


In [2]:
def get_best_model_state(model, trainer, optimizer, config_setting, config_hyperparams):
    """
    Get the model state dictionary with the best weights (smallest dev_loss).
    
    Parameters:
    -----------
    model : torch.nn.Module
        The model instance
    trainer : Trainer
        The trainer instance containing best performance data
    optimizer : torch.optim.Optimizer
        The optimizer instance
    config_setting : dict
        Configuration dictionary for settings
    config_hyperparams : dict
        Configuration dictionary for hyperparameters
    
    Returns:
    --------
    dict
        Dictionary containing all model information and best weights
    """
    model_state = {
        'epoch': trainer.best_epoch,
        'model_state_dict': trainer.best_performance_data['model_params_to_save'],
        'optimizer_state_dict': optimizer.state_dict(),
        'best_train_loss': trainer.best_performance_data['train_loss'],
        'best_dev_loss': trainer.best_performance_data['dev_loss'],
        'all_train_losses': trainer.all_train_losses,
        'all_dev_losses': trainer.all_dev_losses,
        'all_test_losses': trainer.all_test_losses,
        'warehouse_upper_bound': model.warehouse_upper_bound,
        'config_setting': config_setting,
        'config_hyperparams': config_hyperparams,
        'save_timestamp': datetime.now().isoformat()
    }
    
    return model_state


def save_trained_model(filename, model, trainer, optimizer, config_setting, config_hyperparams):
    """
    Save the trained model that achieved the smallest dev_loss.
    
    Parameters:
    -----------
    filename : str
        The filename to save the model as (without extension)
    model : torch.nn.Module
        The model instance
    trainer : Trainer
        The trainer instance containing best performance data
    optimizer : torch.optim.Optimizer
        The optimizer instance
    config_setting : dict
        Configuration dictionary for settings
    config_hyperparams : dict
        Configuration dictionary for hyperparameters
    
    Returns:
    --------
    str
        The full path where the model was saved
    """
    # Get the best model state
    model_state = get_best_model_state(model, trainer, optimizer, config_setting, config_hyperparams)
    
    # Get today's date in YYYY_MM_DD format
    today = datetime.now().strftime("%Y_%m_%d")
    
    # Create the directory structure: saved_models/{today's date}/
    save_dir = f"saved_models/{today}"
    os.makedirs(save_dir, exist_ok=True)
    
    # Create the full path
    model_path = f"{save_dir}/{filename}.pt"
    
    # Save the model
    torch.save(model_state, model_path)
    
    print(f"Model saved successfully!")
    print(f"Path: {model_path}")
    print(f"Best dev loss: {trainer.best_performance_data['dev_loss']:.6f}")
    print(f"Best epoch: {trainer.best_epoch + 1}")
    
    return model_path

In [3]:
def make_predictions_from_config(model, device, config_setting_file=None, config_hyperparams_file=None, 
                                inventory_state_path=None, data_dir=None, time_period_idx=None, round_predictions=True):
    """
    Make predictions using a trained model by reading configuration files and automatically
    building the observation data structure.
    
    Parameters:
    -----------
    model : torch.nn.Module
        The trained model
    device : str
        Device to run inference on
    config_setting_file : str, optional
        Path to settings config file. If None, uses the current config files from the session.
    config_hyperparams_file : str, optional  
        Path to hyperparams config file. If None, uses the current config files from the session.
    inventory_state_path : str, optional
        Path to inventory state file. If None, uses default path.
    data_dir : str, optional
        Directory containing data files. If None, uses default path.
    time_period_idx : int, optional
        Time period index to use for prediction. If None, uses the last period.
    round_predictions : bool, optional
        Whether to round predictions to the nearest integer. Default is True.
        
    Returns:
    --------
    predictions : dict
        Dictionary containing model predictions
    """
    
    # Use current config if not provided
    if config_setting_file is None:
        config_setting_file = 'config_files/settings/vn2_round_1.yml'
    if config_hyperparams_file is None:
        config_hyperparams_file = 'config_files/policies_and_hyperparams/data_driven_net.yml'
    if inventory_state_path is None:
        inventory_state_path = 'vn2_processed_data/all_data/inventory_state.pt'
    if data_dir is None:
        data_dir = 'vn2_processed_data/all_data/'
    
    # Load configuration files
    print("Loading configuration files...")
    with open(config_setting_file, 'r') as file:
        config_setting = yaml.safe_load(file)
    
    with open(config_hyperparams_file, 'r') as file:
        config_hyperparams = yaml.safe_load(file)
    
    # Extract configuration parameters
    setting_keys = 'seeds', 'test_seeds', 'problem_params', 'params_by_dataset', 'observation_params', 'store_params', 'warehouse_params', 'echelon_params', 'sample_data_params'
    hyperparams_keys = 'trainer_params', 'optimizer_params', 'nn_params'
    seeds, test_seeds, problem_params, params_by_dataset, observation_params, store_params, warehouse_params, echelon_params, sample_data_params = [
        config_setting[key] for key in setting_keys
    ]
    
    observation_params = DefaultDict(lambda: None, observation_params)
    
    # Load data files
    print("Loading data files...")
    inventory_data = torch.load(inventory_state_path, map_location=device)
    
    # Load data files based on store_params configuration
    data_files = {}
    if 'demand' in store_params and 'file_location' in store_params['demand']:
        data_files['demands'] = torch.load(data_dir + store_params['demand']['file_location'].split('/')[-1], map_location=device)
    
    if 'instock' in store_params and 'file_location' in store_params['instock']:
        data_files['instocks'] = torch.load(data_dir + store_params['instock']['file_location'].split('/')[-1], map_location=device)
    
    # Load time-product features tensor if specified
    if 'time_product_features_tensor' in store_params and 'file_location' in store_params['time_product_features_tensor']:
        data_files['time_product_features'] = torch.load(
            data_dir + store_params['time_product_features_tensor']['file_location'].split('/')[-1], 
            map_location=device
        )
    
    # Load time features
    if observation_params['time_features_file']:
        date_features = pd.read_csv(data_dir + observation_params['time_features_file'].split('/')[-1])
        # Create a mapping from feature names to column indices
        feature_to_index = {}
        for i, col in enumerate(date_features.columns):
            if col != 'date':  # Skip the date column
                feature_to_index[col] = i - 1  # Adjust for dropped date column
        
        # Create tensor with features in the order specified by config
        if observation_params['time_features']:
            ordered_features = []
            for feature_name in observation_params['time_features']:
                if feature_name in feature_to_index:
                    col_idx = feature_to_index[feature_name]
                    ordered_features.append(date_features.iloc[:, col_idx + 1].values)  # +1 because we skip date column
                else:
                    print(f"Warning: Feature '{feature_name}' not found in date features")
                    ordered_features.append(np.zeros(len(date_features)))
            
            date_features_tensor = torch.tensor(np.column_stack(ordered_features), dtype=torch.float32).to(device)
        else:
            # Fallback to original method if no specific order is specified
            date_features_tensor = torch.tensor(date_features.drop('date', axis=1).values, dtype=torch.float32).to(device)
    
    # Load product features
    if 'product_features' in store_params and 'file_location' in store_params['product_features']:
        product_features = pd.read_csv(data_dir + store_params['product_features']['file_location'].split('/')[-1])
        product_features_tensor = torch.tensor(product_features.values, dtype=torch.float32).to(device)
    
    # Determine time period for demand/instock data
    if time_period_idx is None:
        # Use the last period from demand data for past demands/instocks
        if 'demands' in data_files:
            time_period_idx = data_files['demands'].shape[2] - 1
        else:
            time_period_idx = 0

    # ALWAYS use the last period for time features, regardless of time_period_idx parameter
    if 'date_features_tensor' in locals():
        time_features_idx = len(date_features_tensor) - 1
    else:
        time_features_idx = time_period_idx
    
    # Prepare past demands
    if 'demands' in data_files and observation_params['demand']['past_periods'] > 0:
        past_periods = observation_params['demand']['past_periods']
        start_idx = max(0, time_period_idx - past_periods + 1)
        past_demands = data_files['demands'][:, :, start_idx:time_period_idx+1]
        # Pad with zeros if we don't have enough history
        if past_demands.shape[2] < past_periods:
            padding = torch.zeros(past_demands.shape[0], past_demands.shape[1], past_periods - past_demands.shape[2], device=device)
            past_demands = torch.cat([padding, past_demands], dim=2)
    
    # Prepare past instocks
    if 'instocks' in data_files and 'instock' in observation_params and observation_params['instock']['past_periods'] > 0:
        past_periods = observation_params['instock']['past_periods']
        start_idx = max(0, time_period_idx - past_periods + 1)
        past_instocks = data_files['instocks'][:, :, start_idx:time_period_idx+1]
        # Pad with zeros if we don't have enough history
        if past_instocks.shape[2] < past_periods:
            padding = torch.zeros(past_instocks.shape[0], past_instocks.shape[1], past_periods - past_instocks.shape[2], device=device)
            past_instocks = torch.cat([padding, past_instocks], dim=2)
    
    # Prepare past time-product features
    if 'time_product_features' in data_files and 'time_product_features_tensor' in observation_params and observation_params['time_product_features_tensor']['past_periods'] > 0:
        past_periods = observation_params['time_product_features_tensor']['past_periods']
        start_idx = max(0, time_period_idx - past_periods + 1)
        # Original shape: [features, batch, stores, periods]
        # Extract the relevant time window
        time_product_features_full = data_files['time_product_features'][:, :, :, start_idx:time_period_idx+1]
        
        # Pad with zeros if we don't have enough history
        if time_product_features_full.shape[3] < past_periods:
            n_features, n_batch, n_stores, current_periods = time_product_features_full.shape
            padding = torch.zeros(n_features, n_batch, n_stores, past_periods - current_periods, device=device)
            time_product_features_full = torch.cat([padding, time_product_features_full], dim=3)
    
    # Prepare time features for the current period (use last available period)
    if observation_params['time_features']:
        time_features_expanded = date_features_tensor[time_features_idx:time_features_idx+1].expand(inventory_data.shape[0], -1)
    
    # Prepare product features
    if 'product_features' in store_params and 'features' in store_params['product_features']:
        feature_indices = []
        for feature in store_params['product_features']['features']:
            if feature in product_features.columns:
                feature_indices.append(product_features.columns.get_loc(feature))
        product_features_subset = product_features_tensor[:, feature_indices]
    
    # Create static features from store_params
    static_features = {}
    if observation_params['include_static_features']:
        for feature_name, include in observation_params['include_static_features'].items():
            if include and feature_name in store_params:
                if 'value' in store_params[feature_name]:
                    value = store_params[feature_name]['value']
                    static_features[feature_name] = torch.full(
                        (inventory_data.shape[0], inventory_data.shape[1]), 
                        value, 
                        device=device
                    )
    
    # Build observation dictionary
    observation = {
        'store_inventories': inventory_data,
        'current_period': torch.tensor([time_period_idx], device=device),
    }
    
    # Add past demands
    if 'past_demands' in locals():
        observation['past_demands'] = past_demands
    
    # Add past instocks
    if 'past_instocks' in locals():
        observation['past_instocks'] = past_instocks
    
    # Add time-product features (each feature separately)
    if 'time_product_features_full' in locals():
        # Shape is [n_features, batch, stores, periods]
        n_features = time_product_features_full.shape[0]
        for feature_idx in range(n_features):
            # Extract one feature: [batch, stores, periods]
            feature_data = time_product_features_full[feature_idx]
            observation[f'past_time_product_feature_{feature_idx}'] = feature_data
    
    # Add product features
    if 'product_features_subset' in locals():
        observation['product_features'] = product_features_subset
    
    # Add static features
    observation.update(static_features)
    
    # Add time features
    if observation_params['time_features']:
        for i, feature_name in enumerate(observation_params['time_features']):
            if i < time_features_expanded.shape[1]:
                observation[feature_name] = time_features_expanded[:, i:i+1]
    
    # Add internal data
    internal_data = {}
    if 'demands' in data_files:
        internal_data['demands'] = data_files['demands']
    if 'instocks' in data_files:
        internal_data['instocks'] = data_files['instocks']
    if 'time_product_features' in data_files:
        internal_data['time_product_features_tensor'] = data_files['time_product_features']
    
    internal_data['period_shift'] = observation_params['demand'].get('period_shift', 0)
    observation['internal_data'] = internal_data
    
    # Make prediction
    print("Making prediction...")
    model.eval()
    with torch.no_grad():
        predictions = model(observation)
    
    # Round predictions if requested
    if round_predictions:
        predictions['stores'] = torch.round(predictions['stores'])
    
    # Debug output
    print(f"Time period index (f(for time features): {time_features_idx}")
    if 'past_demands' in locals():
        print(f"Past demands[0:3]: {past_demands[0:3]}")
    if 'past_instocks' in locals():
        print(f"Past instocks[0:3]: {past_instocks[0:3]}")
    if 'time_product_features_full' in locals():
        print(f"Time-product feature 0 [0]: {time_product_features_full[0, 0:3]}")
    print(f"Inventory state[0:3]: {inventory_data[0:3]}")
    if observation_params['time_features']:
        print(f"Time features (common to all products): {time_features_expanded[0]}")
    if 'product_features_subset' in locals():
        print(f"Product features[0:3]: {product_features_subset[0:3]}")
    print(f"Sample predictions (first 3): {predictions['stores'][:3]}")
    
    return predictions

In [4]:
# Function to save predictions in submission format
def save_predictions_to_submission_format(predictions, output_filename="predictions_submission.csv", 
                                        reference_data_path="vn2_data/Week 0 - 2024-04-08 - Initial State.csv"):
    """
    Save model predictions in the submission template format.
    
    Parameters:
    -----------
    predictions : dict
        Dictionary containing model predictions with 'stores' key
    output_filename : str
        Name of the output CSV file
    reference_data_path : str
        Path to reference data file to get Store and Product mapping
    """
    import pandas as pd
    
    # Load reference data to get Store and Product mapping
    print(f"Loading reference data from {reference_data_path}...")
    reference_data = pd.read_csv(reference_data_path)
    
    # Extract Store and Product columns
    store_product_mapping = reference_data[['Store', 'Product']].copy()
    
    # Get predictions tensor and convert to numpy
    predictions_tensor = predictions['stores'].cpu().numpy()  # Convert to CPU numpy array
    
    # Flatten predictions to match the number of store-product combinations
    # predictions_tensor shape is [batch_size, n_stores, n_warehouses]
    # We need to flatten it to match the number of rows in reference data
    if predictions_tensor.ndim == 3:
        # If 3D, flatten the last two dimensions
        predictions_flat = predictions_tensor.reshape(-1)
    else:
        # If already 2D or 1D, use as is
        predictions_flat = predictions_tensor.flatten()
    
    # Ensure we have the right number of predictions
    if len(predictions_flat) != len(store_product_mapping):
        print(f"Warning: Number of predictions ({len(predictions_flat)}) doesn't match number of store-product combinations ({len(store_product_mapping)})")
        # Truncate or pad as needed
        if len(predictions_flat) > len(store_product_mapping):
            predictions_flat = predictions_flat[:len(store_product_mapping)]
        else:
            # Pad with zeros if we have fewer predictions
            padding = np.zeros(len(store_product_mapping) - len(predictions_flat))
            predictions_flat = np.concatenate([predictions_flat, padding])
    
    # Add predictions to the dataframe with column name '0'
    store_product_mapping['0'] = predictions_flat.astype(int)
    
    # Save to CSV
    print(f"Saving predictions to {output_filename}...")
    store_product_mapping.to_csv(output_filename, index=False)
    
    print(f"Saved {len(store_product_mapping)} predictions to {output_filename}")
    print(f"Sample predictions:")
    print(store_product_mapping.head(10))
    print(f"Prediction statistics:")
    print(f"  Mean: {store_product_mapping['0'].mean():.2f}")
    print(f"  Min: {store_product_mapping['0'].min()}")
    print(f"  Max: {store_product_mapping['0'].max()}")
    print(f"  Non-zero predictions: {(store_product_mapping['0'] > 0).sum()}")
    
    return store_product_mapping

## Defining the config files
In this example, we will apply an MLP to a setting of one store under a lost demand assumption.
Go to the respective config files to change the hyperparameters of the neural network or the setting, or create your own configs and call them below.

In [5]:
config_setting_file = 'config_files/settings/vn2_round_1.yml'
config_hyperparams_file = 'config_files/policies_and_hyperparams/data_driven_net.yml' # Multi-layer perceptron
# config_hyperparams_file = 'config_files/policies_and_hyperparams/mean_last_x.yml'

### Here, we create the datasets, trainer, optimizer, model and other instances we will need to train our agent (I do not recommend editing this cell)

In [6]:
with open(config_setting_file, 'r') as file:
    config_setting = yaml.safe_load(file)

with open(config_hyperparams_file, 'r') as file:
    config_hyperparams = yaml.safe_load(file)

setting_keys = 'seeds', 'test_seeds', 'problem_params', 'params_by_dataset', 'observation_params', 'store_params', 'warehouse_params', 'echelon_params', 'sample_data_params'
hyperparams_keys = 'trainer_params', 'optimizer_params', 'nn_params'
seeds, test_seeds, problem_params, params_by_dataset, observation_params, store_params, warehouse_params, echelon_params, sample_data_params = [
    config_setting[key] for key in setting_keys
    ]

trainer_params, optimizer_params, nn_params = [config_hyperparams[key] for key in hyperparams_keys]
observation_params = DefaultDict(lambda: None, observation_params)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
dataset_creator = DatasetCreator()

# For realistic data, train, dev and test sets correspond to the same products, but over disjoint periods.
# We will therefore create one scenario, and then split the data into train, dev and test sets by 
# "copying" all non-period related information, and then splitting the period related information
if sample_data_params['split_by_period']:
    
    scenario = Scenario(
        periods=None,  # period info for each dataset is given in sample_data_params
        problem_params=problem_params, 
        store_params=store_params, 
        warehouse_params=warehouse_params, 
        echelon_params=echelon_params, 
        num_samples=params_by_dataset['train']['n_samples'],  # in this case, num_samples=number of products, which has to be the same across all datasets
        observation_params=observation_params, 
        seeds=seeds
        )
    
    train_dataset, dev_dataset, test_dataset = dataset_creator.create_datasets(
        scenario, 
        split=True, 
        by_period=True, 
        periods_for_split=[sample_data_params[k] for  k in ['train_periods', 'dev_periods', 'test_periods']],)

# For synthetic data, we will first create a scenario that we will divide into train and dev sets by sample index.
# Then, we will create a separate scenario for the test set, which will be exaclty the same as the previous scenario, 
# but with different seeds to generate demand traces, and with a longer time horizon.
# One can use this method of generating scenarios to train a model using some specific problem primitives, 
# and then test it on a different set of problem primitives, by simply creating a new scenario with the desired primitives.
else:
    max_periods = max(params_by_dataset['train']['periods'], params_by_dataset['dev']['periods'])
    scenario = Scenario(
        periods=max_periods, 
        # periods=params_by_dataset['train']['periods'], 
        problem_params=problem_params, 
        store_params=store_params, 
        warehouse_params=warehouse_params, 
        echelon_params=echelon_params, 
        num_samples=params_by_dataset['train']['n_samples'] + params_by_dataset['dev']['n_samples'], 
        observation_params=observation_params, 
        seeds=seeds
        )

    train_dataset, dev_dataset = dataset_creator.create_datasets(scenario, split=True, by_sample_indexes=True, sample_index_for_split=params_by_dataset['dev']['n_samples'])

    scenario = Scenario(
        params_by_dataset['test']['periods'], 
        problem_params, 
        store_params, 
        warehouse_params, 
        echelon_params, 
        params_by_dataset['test']['n_samples'], 
        observation_params, 
        test_seeds
        )

    test_dataset = dataset_creator.create_datasets(scenario, split=False)

train_loader = DataLoader(train_dataset, batch_size=params_by_dataset['train']['batch_size'], shuffle=True)
dev_loader = DataLoader(dev_dataset, batch_size=params_by_dataset['dev']['batch_size'], shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=params_by_dataset['test']['batch_size'], shuffle=False)
data_loaders = {'train': train_loader, 'dev': dev_loader, 'test': test_loader}

neural_net_creator = NeuralNetworkCreator
model = neural_net_creator().create_neural_network(scenario, nn_params, device=device)

loss_function = PolicyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=optimizer_params['learning_rate'])

simulator = Simulator(device=device)
trainer = Trainer(device=device)

# We will create a folder for each day of the year, and a subfolder for each model
# When executing with different problem primitives (i.e. instance), it might be useful to create an additional subfolder for each instance
trainer_params['base_dir'] = 'saved_models'
trainer_params['save_model_folders'] = [trainer.get_year_month_day(), nn_params['name']]

# We will simply name the model with the current time stamp
trainer_params['save_model_filename'] = trainer.get_time_stamp()

# Load previous model if load_model is set to True in the config file
if trainer_params['load_previous_model']:
    print(f'Loading model from {trainer_params["load_model_path"]}')
    model, optimizer = trainer.load_model(model, optimizer, trainer_params['load_model_path'])

  return torch.tensor(demand)


## Training

In [7]:
# optional: load a model to continue training
load_model = False
if load_model:
    model_filename = "saved_models/2025_10_06/model_32_demands.pt"
    model, optimizer = trainer.load_model(model, optimizer, model_filename)


### Start training. You can stop training whenever you want, and you will still have access to the model later on. Sometimes it can get stuck in a bad local minima, so multiple runs may be necessary (or a different learning rate)

In [8]:
trainer.train(
    trainer_params['epochs'], 
    loss_function, simulator, 
    model, 
    data_loaders, 
    optimizer, 
    problem_params, 
    observation_params, 
    params_by_dataset, 
    trainer_params)

epoch: 1
Average per-period train loss: 2.802469176057318
Average per-period dev loss: 3.1358658716103416
Best per-period dev loss: 3.1358658716103416
epoch: 2
Average per-period train loss: 2.6335298692710785
Average per-period dev loss: 2.9716613133375445
Best per-period dev loss: 2.9716613133375445
epoch: 3
Average per-period train loss: 2.455733960090179
Average per-period dev loss: 2.7243597155168526
Best per-period dev loss: 2.7243597155168526
epoch: 4
Average per-period train loss: 2.147103785645235
Average per-period dev loss: 2.4199016942652167
Best per-period dev loss: 2.4199016942652167
epoch: 5
Average per-period train loss: 1.8725784153922849
Average per-period dev loss: 2.2618424781730813
Best per-period dev loss: 2.2618424781730813
epoch: 6
Average per-period train loss: 1.776724255748541
Average per-period dev loss: 2.199228091300817
Best per-period dev loss: 2.199228091300817
epoch: 7
Average per-period train loss: 1.686502828565772
Average per-period dev loss: 2.07450

KeyboardInterrupt: 

### Set weights to the version that achieved the smallest dev loss (i.e., we are using early stopping)


In [9]:
best_model_state = get_best_model_state(model, trainer, optimizer, config_setting, config_hyperparams)
model.load_state_dict(best_model_state['model_state_dict'])

<All keys matched successfully>

### Optional: Save the trained model.

In [10]:
# This will save the model that achieved the smallest dev_loss during training
save_model = False
if save_model:
    # The function will save the model that achieved the smallest dev_loss during training
    model_filename = "model_32_demands"
    saved_model_path = save_trained_model(model_filename, model, trainer, optimizer, config_setting, config_hyperparams)


### Evaluate on test set

In [32]:
# optional: load a model to evaluate. Otherwise, will use the model from training.
load_model_for_test = False
if load_model_for_test:
    model_filename = "saved_models/2025_10_06/model_32_demands.pt"
    model, optimizer = trainer.load_model(model, optimizer, model_filename)
    print(f"Loaded model from {model_filename}")

In [None]:
average_test_loss, average_test_loss_to_report = trainer.test(
    loss_function, 
    simulator, 
    model, 
    data_loaders, 
    optimizer, 
    problem_params, 
    observation_params, 
    params_by_dataset, 
    discrete_allocation=True # we discretize the allocation to the nearest integer
    )

print(f'Average per-period per-product cost: {average_test_loss_to_report}')

Average per-period per-product cost: 1.6665793636917385


### Create predictions using the model. For this, we need `config_setting_file` and `config_hyperparams_file ` to match the ones we used during training, since this info will be used to create the state input for the neural network. If it does not exactly match, it might raise an error or (worse) we will get a prediction using wrong inputs. `inventory_state_path` is a filename to the tensor of size [products, 2] defining the inventory state for each product

In [None]:
# Make sure to use the correct data directory and inventory state path!!
# we print the inputs to the neural network, so check that they are correct

predictions = make_predictions_from_config(
    model, device,
    config_setting_file=config_setting_file,
    config_hyperparams_file=config_hyperparams_file,
    inventory_state_path='vn2_processed_data/new_data/inventory_state.pt',
    data_dir='vn2_processed_data/new_data/',
    round_predictions=True # we round the outputs to the nearest integer
)

# optional: save predictions in submission format
save_the_predictions = True
if save_the_predictions:
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_filename = f"predictions/submission_{timestamp}.csv"
    save_predictions_to_submission_format(
        predictions,
        output_filename=output_filename,
        reference_data_path="vn2_data/Week 0 - 2024-04-08 - Initial State.csv"
    )

Loading configuration files...
Loading data files...
Making prediction...
Time period index (f(for time features): 157
Past demands[0:3]: tensor([[[ 0.,  0.,  2.,  2.,  0.,  0.,  0.,  2.,  2.,  0.,  0.,  0.,  0.,  0.,
           2.,  2.]],

        [[ 1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.,  0.,  0.,  1.,  1.,  0.,  3.,
           1.,  1.]],

        [[ 7., 10.,  5., 11.,  7., 13.,  8., 17.,  6., 11.,  8., 12.,  6.,  7.,
           9.,  7.]]], device='cuda:0')
Past instocks[0:3]: tensor([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]],
       device='cuda:0')
Inventory state[0:3]: tensor([[[3., 3.]],

        [[1., 1.]],

        [[6., 6.]]], device='cuda:0')
Time features (common to all products): tensor([112.,  15.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.], device='cuda:0')
P