In [1]:
import sys 
sys.path.append('..')

from model_discovery import BuildSystem

In [2]:
system = BuildSystem(
    debug_steps=True,
    cache_type="diskcache", #<-- agent caching method 
    temperature=0.1,
    jupyter=True
)

INFO:exec_utils.base.DesignerAgent:Agent name=`designer`, model_details={
    "model_name": "gpt-4o-2024-05-13",
    "max_output_tokens": 1500
}
INFO:exec_utils.base.DesignerAgent:Set up disk caching, loc=/Users/kyler/.cache/42_DesignerAgent_b59c47e9_OpenAIModel_b59c47e9
INFO:exec_utils.base.ReviewerAgent:Agent name=`reviewer`, model_details={}
INFO:exec_utils.base.ReviewerAgent:Set up disk caching, loc=/Users/kyler/.cache/42_ReviewerAgent_c455ec49_OpenAIModel_c455ec49


In [3]:
system(None)

INFO:exec_utils.base.ModelDiscoverySystem:Attempting design, attempt=0


Model authored code block...
```python

# gab.py

import torch
import torch.nn as nn

class GAB(nn.Module):
    """Generalized Autoregressive Block
        Input:        X: (batch, seqlen, embed_dim)
        Output:       Y: (batch, seqlen, embed_dim)
        Constraints:  Causal, differentiable, parameter number, complexity, parallelizable
    """
    def __init__(self, embed_dim: int, layer_idx: int, device=None, dtype=None, **kwargs):
        # argv: list of hyperparameters
        factory_kwargs = {"device": device, "dtype": dtype} # remember to pass it to nn layers
        super().__init__()
        self.embed_dim = embed_dim
        self.layer_idx = layer_idx
        
        # Define different types of blocks based on layer_idx
        if layer_idx % 3 == 0:
            self.block = nn.Sequential(
                nn.Linear(embed_dim, embed_dim, **factory_kwargs),
                nn.ReLU(),
                nn.Linear(embed_dim, embed_dim, **factory_kwargs)
            )
        el

INFO:exec_utils.base.Checker:Model initialization succeed
Number of parameters: 9509376
Layers: 1316864, 219477 per layer
Embedding: 8192000
INFO:exec_utils.base.Checker:Checking causality... It checks the causality by changing the future step X[t+delta] of X[t] and see if Y[t] changes.
Causality test: 100%|[32m████████████████████████████████████████████████████████████████████████████[0m| 100/100 [00:04<00:00, 21.83it/s][0m
INFO:exec_utils.base.Checker:Causality test passed
INFO:exec_utils.base.Checker:Checking differentiability...
INFO:exec_utils.base.Checker:Differentiability test passed
INFO:exec_utils.base.Checker:All tests passed!

INFO:exec_utils.base.ModelDiscoverySystem:Now trying to compile self report...


<details><summary>code check</summary>Model initialization succeed
Number of parameters: 9509376
Layers: 1316864, 219477 per layer
Embedding: 8192000
Checking causality... It checks the causality by changing the future step X[t+delta] of X[t] and see if Y[t] changes.
Causality test passed
Checking differentiability...
Differentiability test passed
All tests passed!

</details>
### Report on the Design of the Generalized Autoregressive Block (GAB)

#### Introduction
The Generalized Autoregressive Block (GAB) is a novel autoregressive model block designed to achieve low perplexity, high accuracy, good scalability, and efficiency. The GAB is intended to be used within a larger language model framework, specifically the Generalized Autoregressive Model (GAM). This report explains the design choices made for the GAB, justifies these choices, and discusses how they align with the desired properties of the model.

#### Design Choices

1. **Layer Index-Based Block Differentiation**:
   - The G