<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173_Fall2025/blob/main/AI_Scientist_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

##### **Module 6: Advanced Topics**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 6 Material

* Part 6.1: Reenforcement Learning
* **Part 6.2: AI-Scientist**
* Part 6.3: Generative AI
* Part 6.4: Text to Images with Stable Diffusion


## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [1]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    # from google.colab import auth
    # auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    # import requests
    # gcloud_token = !gcloud auth print-access-token
    # gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    # print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

Mounted at /content/drive
Note: Using Google CoLab


Make sure your GMAIL address is included as the last line in the output above.

# **AI-Scientist YouTube Introduction**

If you want to see a YouTube introduction to this lesson, you can run the next 2 code cells. These YouTube videos are optional.

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("BplDEidA6So", width=800, height=450)  # First video

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("hP-IzCZAZDc", width=800, height=450)  # Second video

# **AI-Scientist Project by SakanaAI**

The **AI-Scientist project** by SakanaAI is an ambitious initiative aimed at automating the entire scientific research process. It leverages advanced **Large Language Models (LLMs)** and other AI technologies to independently conduct research, from generating ideas to writing full scientific papers.

### **Below are its key features:**

**1. Automated Research Lifecycle**  
- The AI-Scientist can:
  - Brainstorm novel research ideas.
  - Write code and execute experiments.
  - Analyze results and present findings in a scientific manuscript.

**2. Peer Review and Feedback**  
- Includes an automated peer-review system to:
  - Evaluate the quality of generated papers.
  - Provide constructive feedback.
  - Iteratively improve research outputs.

**3. Applications**
- The system has been applied to various machine learning subfields such as:
  - Machine vision.
  - Diffusion models.
  - Transformers.
  - Grokking phenomena.

**4. Cost-Effectiveness**
- Each research paper can be generated at a cost of approximately **\$15**, making it highly accessible for democratizing scientific research.

**5. Open-Ended Discovery**
- Mimics the human scientific community by:
  - Continuously developing and refining ideas.
  - Creating a growing archive of knowledge.

**Impact**
This project represents a significant step toward fully autonomous scientific discovery, with the potential to accelerate progress across multiple disciplines.



## **AI-Scientist Template: MobileNetV3**

**Description:** The `AI-Scienctist` template `MobileNetV3` investigates transformer-based autoregressive next-token prediction tasks.

This description refers to a type of machine learning task where a transformer-based model is used for autoregressive next-token prediction. Here's what the terms mean:

* **Transformer-Based:** A transformer is a type of deep learning model architecture, most famously used in natural language processing (NLP). Transformers use attention mechanisms to understand and process sequences of data, such as text or code.

* **Autoregressive:** Autoregressive models predict one token (or word) at a time, based on the previous context. In this case, the model generates text by continually predicting the next token, and it uses its own previous predictions as input for subsequent steps.

* **Next-Token Prediction:** The task involves predicting the next token (e.g., word or character) in a sequence, given the preceding tokens. For example, if the input is "The quick brown", the model might predict "fox" as the next token.

In summary, this template is investigating how transformer models can be used to predict the next token in a sequence in an autoregressive manner—an approach commonly seen in language models like GPT. Let me know if you want a deeper dive into any of these concepts!

### **STEP 1: Prepare the Data**

The code in the cell below prepare the `enwik8` dataset for character-level language modeling. Instead of encoding with `GPT-2 BPE tokens`, we just map characters to `ints`. Will save `train.bin`, `val.bin` containing the `ids`, and `meta.pkl` containing the encoder and decoder and some other related info.

In [2]:
# Prepare the data

import os
import pickle
import requests
import numpy as np

__file__ = "/content/data"

# download the enwik8 dataset
input_file_path = os.path.join(os.path.dirname(__file__), 'enwik8')

if not os.path.exists(input_file_path):
    data_url = 'https://biologicslab.co/BIO1173/data/enwik8.zip'
    r = requests.get(data_url)
    with open(os.path.join(os.path.dirname(__file__), 'enwik8.zip'), 'wb') as f:
        f.write(r.content)

    # unzip the enwik8 dataset
    import zipfile
    with zipfile.ZipFile(os.path.join(os.path.dirname(__file__), 'enwik8.zip'), 'r') as zip_ref:
        zip_ref.extractall(os.path.dirname(__file__))

with open(input_file_path, 'r', encoding='latin-1') as f:
    data = f.read()
print(f"length of dataset in characters: {len(data):,}")

# get all the unique characters that occur in this text
chars = sorted(list(set(data)))
vocab_size = len(chars)
print("all the unique characters:", ''.join(chars))
print(f"vocab size: {vocab_size:,}")

# create a mapping from characters to integers
stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }

# Create endcode/decode functions
def encode(s):
    return [stoi[c] for c in s] # encoder: take a string, output a list of integers
def decode(l):
    return ''.join([itos[i] for i in l]) # decoder: take a list of integers, output a string

# create the train, validation, and test splits
n = len(data)
num_test_chars = 5000000

# Split data
train_data = data[: -2 * num_test_chars]
val_data = data[-2 * num_test_chars: -num_test_chars]
test_data = data[-num_test_chars:]

# Encode all splits to integers
train_ids = encode(train_data)
val_ids = encode(val_data)
test_ids = encode(test_data)

# Print results
print(f"train has {len(train_ids):,} tokens")
print(f"val has {len(val_ids):,} tokens")
print(f"test has {len(test_ids):,} tokens")

# Export to integer arrays
train_ids = np.array(train_ids, dtype=np.uint16)
val_ids = np.array(val_ids, dtype=np.uint16)
test_ids = np.array(test_ids, dtype=np.uint16)

# Convert to binary files
train_ids.tofile(os.path.join(os.path.dirname(__file__), 'train.bin'))
val_ids.tofile(os.path.join(os.path.dirname(__file__), 'val.bin'))
test_ids.tofile(os.path.join(os.path.dirname(__file__), 'test.bin'))

# save the meta information as dictionary to later encode/decode
meta = {
    'vocab_size': vocab_size,
    'itos': itos,
    'stoi': stoi,
}

# Save data in the meta diction in Pickel format
with open(os.path.join(os.path.dirname(__file__), 'meta.pkl'), 'wb') as f:
    pickle.dump(meta, f)


length of dataset in characters: 100,000,000
all the unique characters: 	
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÞàáâãäåæçèéêëìíïð
vocab size: 205
train has 90,000,000 tokens
val has 5,000,000 tokens
test has 5,000,000 tokens


In [4]:
%cd /content/AI-Scientist/templates/mobilenetV3
#!python experiment.py

/content/AI-Scientist/templates/mobilenetV3


# **Create Functions**


### **1. `_make_divisible` Function**
- **Purpose**: Adjusts channel numbers to ensure they are divisible by a specified value (commonly 8).
- **Details**:
  - Rounds the channel count (`v`) to the nearest multiple of `divisor`.
  - Ensures that the new value does not deviate by more than 10% below the original.

---

### **2. `SqueezeExcitation` Block**
- **Purpose**: Implements the Squeeze-and-Excitation (SE) mechanism to recalibrate channel-wise feature responses.
- **Components**:
  - **Adaptive Average Pooling**: Aggregates spatial features into global channel-wise descriptors.
  - **Fully Connected Layers (`fc1` and `fc2`)**: Reduce and then expand channel dimensions to learn importance weights.
  - **Activation Functions**: Uses `ReLU` for intermediate activation and `Hardsigmoid` for scaling the weights.
- **Forward Pass**:
  - Computes the importance scale using pooling and fully connected layers.
  - Applies this scale to the input tensor, emphasizing relevant features.

---

### **3. `ConvNormActivation` Block**
- **Purpose**: Combines convolution, normalization, and activation into a single reusable block.
- **Parameters**:
  - Supports flexible configurations for kernel size, stride, padding, dilation, groups, etc.
  - Uses `BatchNorm2d` for normalization and `ReLU` for activation by default.
- **Details**:
  - Dynamically computes padding based on kernel size and dilation.
  - Adds a sequence of convolution, normalization, and activation layers.
- **Utility**:
  - Simplifies the construction of deep learning models by encapsulating common operations.

---

### **4. `InvertedResidualConfig` Class**
- **Purpose**: Encodes configuration details for an inverted residual block, which is a core component of MobileNet-like architectures.
- **Parameters**:
  - Includes channel dimensions (`input_channels`, `expanded_channels`, `out_channels`), kernel size, stride, dilation, activation type, and whether to use SE.
  - Applies width scaling (`width_mult`) using `_make_divisible` to adjust channel counts.
- **Utility**:
  - Provides structured data for constructing and initializing inverted residual blocks.

---

### **Overall Workflow**
These components work together to construct building blocks for resource-efficient neural networks:
1. **Squeeze-and-Excitation** adds feature recalibration.
2. **ConvNormActivation** provides modular convolutional layers with normalization and activation.
3. **InvertedResidualConfig** serves as a blueprint for defining configurations for inverted residual blocks.
4. `_make_divisible` ensures compatibility with hardware constraints by aligning channel dimensions to multiples of 8.


In [3]:
# Create Functions

import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


# _make_divisible function from torchvision
def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) -> int:
    """
    This function ensures that all layers have a channel number that is divisible by 8.
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that rounding down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


# Squeeze-and-Excitation block
class SqueezeExcitation(nn.Module):
    def __init__(
            self,
            input_channels: int,
            squeeze_channels: int,
            activation: Callable[..., nn.Module] = nn.ReLU,
            scale_activation: Callable[..., nn.Module] = nn.Hardsigmoid,
    ) -> None:
        super().__init__()
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.fc1 = nn.Conv2d(input_channels, squeeze_channels, 1)
        self.fc2 = nn.Conv2d(squeeze_channels, input_channels, 1)
        self.activation = activation(inplace=True)
        self.scale_activation = scale_activation(inplace=True)

    def _scale(self, input: torch.Tensor) -> torch.Tensor:
        scale = self.avgpool(input)
        scale = self.fc1(scale)
        scale = self.activation(scale)
        scale = self.fc2(scale)
        scale = self.scale_activation(scale)
        return scale

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        scale = self._scale(input)
        return input * scale


# ConvNormActivation block
class ConvNormActivation(nn.Sequential):
    def __init__(
            self,
            in_channels: int,
            out_channels: int,
            kernel_size: Union[int, Tuple[int]] = 3,
            stride: Union[int, Tuple[int]] = 1,
            padding: Optional[Union[int, Tuple[int], str]] = None,
            groups: int = 1,
            norm_layer: Optional[Callable[..., nn.Module]] = nn.BatchNorm2d,
            activation_layer: Optional[Callable[..., nn.Module]] = nn.ReLU,
            dilation: Union[int, Tuple[int]] = 1,
            bias: Optional[bool] = None,
    ) -> None:

        if padding is None:
            if isinstance(kernel_size, int):
                padding = (kernel_size - 1) // 2 * dilation
            else:
                padding = tuple((k - 1) // 2 * d for k, d in zip(kernel_size, dilation))
        if bias is None:
            bias = norm_layer is None

        layers = []
        layers.append(
            nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size,
                stride,
                padding,
                dilation=dilation,
                groups=groups,
                bias=bias,
            )
        )

        if norm_layer is not None:
            layers.append(norm_layer(out_channels))
        if activation_layer is not None:
            layers.append(activation_layer(inplace=True))
        super().__init__(*layers)
        self.out_channels = out_channels


# InvertedResidualConfig class
class InvertedResidualConfig:
    def __init__(
            self,
            input_channels: int,
            kernel: int,
            expanded_channels: int,
            out_channels: int,
            use_se: bool,
            activation: str,
            stride: int,
            dilation: int,
            width_mult: float,
    ):
        self.input_channels = self.adjust_channels(input_channels, width_mult)
        self.kernel = kernel
        self.expanded_channels = self.adjust_channels(expanded_channels, width_mult)
        self.out_channels = self.adjust_channels(out_channels, width_mult)
        self.use_se = use_se
        self.activation = activation
        self.stride = stride
        self.dilation = dilation

    @staticmethod
    def adjust_channels(channels: int, width_mult: float):
        return _make_divisible(channels * width_mult, 8)





# **InvertedResidual block**

### **Overview of the InvertedResidual Block**
- The InvertedResidual block is a **lightweight module** that balances computational efficiency and model performance.
- It features **expansion**, **depthwise convolution**, and **projection** operations, along with optional **squeeze-and-excitation (SE)** mechanisms to improve representational power.

---

### **Key Components of the Code**

#### **1. Initialization (`__init__`)**
- **Parameters**:
  - `cnf` (InvertedResidualConfig): Specifies configuration details like input/output channels, stride, kernel size, activation type, etc.
  - `norm_layer`: Callable defining the normalization layer (e.g., BatchNorm2D).
  - `se_layer`: Callable defining the squeeze-and-excitation layer (default uses `SqueezeExcitation` with `Hardsigmoid` activation).
- **Steps**:
  1. **Residual Connection**:
     - Enabled when stride is `1` **and** input/output channels match.
  2. **Expand Phase**:
     - Expands input channels using a 1x1 pointwise convolution if `expanded_channels` differs from `input_channels`.
  3. **Depthwise Convolution**:
     - Applies depthwise separable convolution (lightweight operation).
     - Convolution groups are equal to the number of input channels, reducing complexity.
  4. **Squeeze-and-Excitation (Optional)**:
     - Enhances important features while suppressing irrelevant ones.
  5. **Projection Phase**:
     - Reduces expanded channels back to the required `out_channels` using another 1x1 pointwise convolution.

---

#### **2. Residual Connection**
- **Purpose**: Combines the input and output of the block (shortcut connection) when:
  - The stride is `1`.
  - The input and output channels match (`use_res_connect` is `True`).
- This aids in learning identity mappings and improves gradient flow.

---

#### **3. Forward Pass (`forward`)**
- **Steps**:
  1. Passes the input through the `self.block` sequence of layers.
  2. If `use_res_connect` is `True`, adds the input tensor to the block's output.
  3. Returns the result.

---

### **Detailed Breakdown of Operations**
1. **Expand Phase**:
   - Uses a 1x1 convolution to increase feature dimensions.
   - Followed by a normalization layer and activation (ReLU or Hardswish based on `cnf.activation`).

2. **Depthwise Convolution**:
   - Performs channel-wise convolution using a kernel size defined in `cnf`.
   - Supports dilation for extended receptive fields.

3. **Squeeze-and-Excitation (Optional)**:
   - Applies the SE mechanism to learn channel-wise importance.
   - The number of squeeze channels is a fraction (1/4) of `expanded_channels`, adjusted to be divisible by 8.

4. **Projection Phase**:
   - Reduces feature dimensions back to `out_channels`.
   - Final 1x1 convolution without an activation function.

---

### **Key Advantages**
- **Efficiency**: Combines depthwise convolution and pointwise convolution for fewer parameters and computations.
- **Flexibility**: The SE mechanism improves performance on a wide range of tasks.
- **Residual Connections**: Simplifies optimization by propagating gradients through shortcut paths.

---

This block is foundational for resource-efficient neural networks like MobileNetV2 and MobileNetV3. Let me know if you'd like to dive deeper into any specific part!


In [4]:
import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


# InvertedResidual block
class InvertedResidual(nn.Module):
    def __init__(
            self,
            cnf: InvertedResidualConfig,
            norm_layer: Callable[..., nn.Module],
            se_layer: Callable[..., nn.Module] = partial(SqueezeExcitation, scale_activation=nn.Hardsigmoid),
    ):
        super().__init__()
        if not (1 <= cnf.stride <= 2):
            raise ValueError("Illegal stride value")

        self.use_res_connect = cnf.stride == 1 and cnf.input_channels == cnf.out_channels

        layers: List[nn.Module] = []
        activation_layer = nn.Hardswish if cnf.activation == "HS" else nn.ReLU

        # Expand phase
        if cnf.expanded_channels != cnf.input_channels:
            layers.append(
                ConvNormActivation(
                    cnf.input_channels,
                    cnf.expanded_channels,
                    kernel_size=1,
                    norm_layer=norm_layer,
                    activation_layer=activation_layer,
                )
            )

        # Depthwise convolution
        layers.append(
            ConvNormActivation(
                cnf.expanded_channels,
                cnf.expanded_channels,
                kernel_size=cnf.kernel,
                stride=cnf.stride,
                groups=cnf.expanded_channels,
                norm_layer=norm_layer,
                activation_layer=activation_layer,
                dilation=cnf.dilation,
            )
        )

        # Squeeze-and-Excitation
        if cnf.use_se:
            squeeze_channels = _make_divisible(cnf.expanded_channels // 4, 8)
            layers.append(
                se_layer(
                    cnf.expanded_channels,
                    squeeze_channels,
                    activation=nn.ReLU,
                )
            )

        # Project phase
        layers.append(
            ConvNormActivation(
                cnf.expanded_channels,
                cnf.out_channels,
                kernel_size=1,
                norm_layer=norm_layer,
                activation_layer=None,
            )
        )

        self.block = nn.Sequential(*layers)
        self.out_channels = cnf.out_channels
        self.is_strided = cnf.stride > 1

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        result = self.block(input)
        if self.use_res_connect:
            return input + result
        else:
            return result




# **MobileNetV3 Small model**

### **Overview of the Code**
This code defines the `MobileNetV3Small` model, a variant of the MobileNetV3 architecture used for lightweight and efficient deep learning applications, particularly for mobile and embedded devices.

---

### **Key Components and Steps**

#### **1. Initialization (`__init__`)**
- **Purpose**: Sets up the model architecture, consisting of feature extraction, pooling, and classification components.
- **Parameters**:
  - `num_classes`: Number of output classes for the classification task (default is 1000).
  - `width_mult`: Multiplier for scaling the network width, useful for adjusting model size.
  - `dropout`: Dropout rate to prevent overfitting (default is 0.2).
  - `reduced_tail`: Reduces the number of filters in later layers if set to `True`.
  - `dilated`: Uses dilated convolutions for larger receptive fields if set to `True`.
  - `norm_layer`: Defines the normalization layer (default is `BatchNorm2d`).

---

#### **2. Inverted Residual Blocks**
- **Purpose**: Implements the core building blocks of MobileNetV3, known as *Inverted Residuals*.
- **Details**:
  - Uses a series of configurations (`InvertedResidualConfig`) to define the input/output channels, kernel sizes, and other properties for each block.
  - Layers include depthwise convolutions and pointwise convolutions with optional squeeze-and-excitation (SE) mechanisms and activation functions like ReLU (`RE`) or Hardswish (`HS`).

---

#### **3. Feature Extraction**
- The backbone of the network is built with:
  - An initial convolutional layer followed by normalization and activation (`ConvNormActivation`).
  - A sequence of **inverted residual blocks**, defined by the `inverted_residual_setting`.
  - Final layers to extract features with a pointwise convolution.

---

#### **4. Pooling and Classification**
- **Adaptive Average Pooling**:
  - Pools the extracted features into a fixed size.
- **Fully Connected Classifier**:
  - A sequential classifier that includes:
    - A linear layer to reduce dimensions.
    - Hardswish activation for non-linearity.
    - Dropout for regularization.
    - A final linear layer for output predictions (`num_classes`).

---

#### **5. Weight Initialization (`_initialize_weights`)**
- **Purpose**: Initializes weights and biases for different types of layers:
  - Convolution layers: Initialized with Kaiming normalization.
  - BatchNorm layers: Initialized with ones for weights and zeros for biases.
  - Linear layers: Initialized with Gaussian distribution.

---

#### **6. Forward Pass (`forward`)**
- **Input**: A tensor `x` (e.g., an image batch).
- **Steps**:
  1. Passes the input through the feature extraction layers (`self.features`).
  2. Applies average pooling (`self.avgpool`).
  3. Flattens the tensor into a 1D vector.
  4. Feeds the vector into the classifier for predictions.

---

### **Overall Architecture**
- The `MobileNetV3Small` model is highly efficient and lightweight, making it suitable for devices with limited computational resources.
- It uses innovations like inverted residual blocks, squeeze-and-excitation, and Hardswish activations to achieve a good balance of performance and efficiency.

---

Let me know if you'd like more details about any part of the code or the theory behind MobileNetV3!


In [5]:
import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# MobileNetV3 Small model
class MobileNetV3Small(nn.Module):
    def __init__(
            self,
            num_classes: int = 1000,
            width_mult: float = 1.0,
            dropout: float = 0.2,
            reduced_tail: bool = False,
            dilated: bool = False,
            norm_layer: Optional[Callable[..., nn.Module]] = None,
    ) -> None:
        super().__init__()

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)

        layers: List[nn.Module] = []

        bneck_conf = partial(InvertedResidualConfig, width_mult=width_mult)

        # Build inverted residual setting
        reduce_divider = 2 if reduced_tail else 1
        dilation = 2 if dilated else 1

        inverted_residual_setting = [
            # input_c, kernel, exp_c, out_c, se, nl, s, d
            bneck_conf(16, 3, 16, 16, True, "RE", 2, 1),
            bneck_conf(16, 3, 72, 24, False, "RE", 2, 1),
            bneck_conf(24, 3, 88, 24, False, "RE", 1, 1),
            bneck_conf(24, 5, 96, 40, True, "HS", 2, 1),
            bneck_conf(40, 5, 240, 40, True, "HS", 1, 1),
            bneck_conf(40, 5, 240, 40, True, "HS", 1, 1),
            bneck_conf(40, 5, 120, 48, True, "HS", 1, 1),
            bneck_conf(48, 5, 144, 48, True, "HS", 1, 1),
            bneck_conf(48, 5, 288 // reduce_divider, 96 // reduce_divider, True, "HS", 2, dilation),
            bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1, dilation),
            bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1, dilation),
        ]

        last_channel = _make_divisible(1024 // reduce_divider * width_mult, 8)

        # First layer
        firstconv_output_channels = inverted_residual_setting[0].input_channels
        layers.append(
            ConvNormActivation(
                3,
                firstconv_output_channels,
                kernel_size=3,
                stride=2,
                norm_layer=norm_layer,
                activation_layer=nn.Hardswish,
            )
        )

        # Building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.append(InvertedResidual(cnf, norm_layer))

        # Building last several layers
        lastconv_input_channels = inverted_residual_setting[-1].out_channels
        lastconv_output_channels = _make_divisible(576 * width_mult, 8)
        layers.append(
            ConvNormActivation(
                lastconv_input_channels,
                lastconv_output_channels,
                kernel_size=1,
                norm_layer=norm_layer,
                activation_layer=nn.Hardswish,
            )
        )

        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Linear(lastconv_output_channels, last_channel),
            nn.Hardswish(inplace=True),
            nn.Dropout(p=dropout, inplace=True),
            nn.Linear(last_channel, num_classes),
        )

        # Initialize weights
        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm, nn.SyncBatchNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, mean=0.0, std=0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x


# **Function to create the model and load pretrained weights**

### **Overview of the Code**
This code provides a full pipeline to create, configure, and initialize a `MobileNetV3 Small` deep learning model, alongside data loading and preprocessing for training and testing.

---

### **1. `mobilenet_v3_small` Function**
- **Purpose**: Builds a `MobileNetV3 Small` model and optionally loads pretrained weights.
- **Details**:
  - If `pretrained=True`, it downloads weights for the model from TorchVision's pretrained models.
  - Handles scenarios where the number of classes in the pretrained weights (`1000`) doesn't match the number of output classes defined in the model (`num_classes`).
  - If the classes differ, it discards classifier weights from the pretrained state and updates the current model's state dictionary with the appropriate layers.

---

### **2. `Config` Data Class**
- **Purpose**: Encapsulates all configurable parameters for the training process.
- **Parameters**:
  - **Data settings**: Includes dataset name (`cifar10` or `cifar100`), path, and the number of output classes (`num_classes`).
  - **Model settings**: Specifies the model architecture (e.g., `mobilenet_v3_small`).
  - **Training settings**: Contains hyperparameters like batch size, learning rate, weight decay, and the number of epochs.
  - **System settings**: Specifies the compute device (e.g., `cuda` for GPU or `cpu`) and number of data loading workers.
  - **Logging and output**: Includes intervals for logging and evaluation, and output directories for saving results.
  - **Miscellaneous**: Includes options like random seed for reproducibility and model compilation for faster training.

---

### **3. `get_data_loaders` Function**
- **Purpose**: Prepares DataLoader objects to feed training and testing data to the model.
- **Steps**:
  1. Checks the dataset specified in the `config` (currently supports `cifar10` and `cifar100`).
  2. Applies data augmentation and normalization:
     - **Data augmentation**: Random cropping and horizontal flipping are applied during training for better generalization.
     - **Normalization**: Adjusts data pixel values to have zero mean and unit variance.
  3. Downloads the datasets (CIFAR-10 or CIFAR-100) and applies the transformations.
  4. Creates `DataLoader` objects to load batches of data during training and testing.

---

### **4. Overall Workflow**
1. **Model Creation**:
   - A `MobileNetV3 Small` model is built using the `mobilenet_v3_small` function.
   - Pretrained weights can optionally be loaded and adjusted for specific tasks.
2. **Configuration**:
   - Training and system settings (e.g., device type, batch size) are initialized using the `Config` class.
3. **Data Loading**:
   - Training and testing datasets (e.g., CIFAR-10) are loaded, preprocessed, and batched using the `get_data_loaders` function.

---

This code is modular, enabling easy customization and scalability for training `MobileNetV3 Small` on different datasets. Let me know if you'd like more details on any specific part or need help extending this code!


In [6]:
# Function to create the model and load pretrained weights
import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

def mobilenet_v3_small(pretrained=False, progress=True, **kwargs):
    model = MobileNetV3Small(**kwargs)

    if pretrained:
        # Load the torchvision model with pretrained weights
        from torchvision.models import mobilenet_v3_small as tv_mobilenet_v3_small
        from torchvision.models import MobileNet_V3_Small_Weights

        # Check for number of classes
        if kwargs.get('num_classes', 1000) != 1000:
            # We cannot load the classifier weights (different classes)
            pretrained_model = tv_mobilenet_v3_small(weights=MobileNet_V3_Small_Weights.DEFAULT, progress=progress)
            pretrained_state_dict = pretrained_model.state_dict()
            # Remove classifier weights
            pretrained_state_dict = {k: v for k, v in pretrained_state_dict.items() if not k.startswith('classifier')}
            model_dict = model.state_dict()
            print(model_dict.keys())
            # Update the model dict
            model_dict.update(pretrained_state_dict)
            model.load_state_dict(model_dict)
        else:
            # Load all weights
            pretrained_model = tv_mobilenet_v3_small(weights=MobileNet_V3_Small_Weights.DEFAULT, progress=progress)
            model.load_state_dict(pretrained_model.state_dict())

    return model


@dataclass
class Config:
    # data
    data_path: str = './data'
    dataset: str = 'cifar10'
    num_classes: int = 10
    # model
    model: str = 'mobilenet_v3_small'
    # training
    batch_size: int = 128
    learning_rate: float = 0.01
    weight_decay: float = 1e-4
    epochs: int = 2
    # system
    device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
    num_workers: int = 2
    # logging
    log_interval: int = 100
    eval_interval: int = 1000
    # output
    out_dir: str = 'run_0'
    seed: int = 0
    # compile for SPEED!
    compile_model: bool = False


def get_data_loaders(config):
    if config.dataset == 'cifar10':
        transform_train = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
        ])

        transform_test = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
        ])

        train_dataset = datasets.CIFAR10(root=config.data_path, train=True, download=True, transform=transform_train)
        test_dataset = datasets.CIFAR10(root=config.data_path, train=False, download=True, transform=transform_test)
    elif config.dataset == 'cifar100':
        # Placeholder for CIFAR-100 (for future use)
        transform_train = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize((0.5071, 0.4867, 0.4408),
                                 (0.2675, 0.2565, 0.2761)),
        ])

        transform_test = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.5071, 0.4867, 0.4408),
                                 (0.2675, 0.2565, 0.2761)),
        ])

        train_dataset = datasets.CIFAR100(root=config.data_path, train=True, download=True, transform=transform_train)
        test_dataset = datasets.CIFAR100(root=config.data_path, train=False, download=True, transform=transform_test)
        config.num_classes = 100  # Update number of classes for CIFAR-100
    else:
        raise ValueError(f"Unknown dataset: {config.dataset}")

    train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True, num_workers=config.num_workers)
    test_loader = DataLoader(test_dataset, batch_size=config.batch_size, shuffle=False, num_workers=config.num_workers)

    return train_loader, test_loader

# **Train Model**

### **Overview of the Code**

This Python code defines a training pipeline for a deep learning model using PyTorch.

### **Key Components of the Code**

1. **Configuration Input**:
   - The `config` parameter contains settings like random seed, device type (e.g., `cpu` or `cuda`), learning rate, batch size, number of epochs, etc.
   - It ensures reproducibility of results by setting random seeds for `torch`, `numpy`, and Python's `random`.

2. **Model Initialization**:
   - A `MobileNetV3 Small` model is created and moved to the appropriate device (CPU or GPU).
   - If the `config.compile_model` flag is set, it compiles the model for improved performance.

3. **Loss Function and Optimization**:
   - The loss function is `CrossEntropyLoss`, commonly used for classification tasks.
   - The optimizer is `SGD` with momentum and weight decay, combined with a cosine annealing learning rate scheduler for dynamic learning rate adjustment over epochs.

4. **Data Loaders**:
   - The function `get_data_loaders(config)` is invoked to prepare the training and testing datasets. It loads data in batches for efficiency.

5. **Training Loop**:
   - For each epoch:
     - Sets the model to training mode (`model.train()`).
     - Iterates over the training dataset, calculates predictions, computes the loss, and updates model weights using backpropagation (`loss.backward()` and `optimizer.step()`).
     - Logs training loss, accuracy, and learning rate at intervals defined by `config.log_interval`.

6. **Validation**:
   - After training in each epoch, the `evaluate()` function computes loss and accuracy on the validation set.
   - If the validation accuracy improves, the model's state is saved to a file (`best_model.pth`).

7. **Learning Rate Scheduler**:
   - The learning rate scheduler (`CosineAnnealingLR`) adjusts the optimizer's learning rate after each epoch for better convergence.

8. **Return Values**:
   - The function returns:
     - `train_log_info`: A log of training statistics (loss, accuracy, learning rate).
     - `val_log_info`: A log of validation statistics (loss, accuracy).
     - `best_acc`: The best validation accuracy achieved during training.

---

This code is structured for training a deep learning model on image classification tasks. Let me know if you'd like to explore any part of it further!


In [7]:
# Train Model
import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

def train(config):
    # Set random seeds
    torch.manual_seed(config.seed)
    np.random.seed(config.seed)
    random.seed(config.seed)
    if config.device == 'cuda':
        torch.cuda.manual_seed_all(config.seed)

    model = mobilenet_v3_small(pretrained=False, progress=True, num_classes=config.num_classes).to(config.device)

    if config.compile_model:
        print("Compiling the model...")
        model = torch.compile(model)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=config.learning_rate, momentum=0.9, weight_decay=config.weight_decay)
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config.epochs)

    train_loader, test_loader = get_data_loaders(config)

    best_acc = 0.0
    train_log_info = []
    val_log_info = []

    for epoch in range(config.epochs):
        model.train()
        train_loss = 0.0
        train_correct = 0
        train_total = 0

        for batch_idx, (inputs, targets) in enumerate(train_loader):
            inputs, targets = inputs.to(config.device), targets.to(config.device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            _, predicted = outputs.max(1)
            train_total += targets.size(0)
            train_correct += predicted.eq(targets).sum().item()

            if batch_idx % config.log_interval == 0:
                train_log_info.append({
                    'epoch': epoch,
                    'batch': batch_idx,
                    'loss': train_loss / (batch_idx + 1),
                    'acc': 100. * train_correct / train_total,
                    'lr': optimizer.param_groups[0]['lr']
                })
                print(f'Epoch: {epoch}, Batch: {batch_idx}, Loss: {train_loss / (batch_idx + 1):.3f}, '
                      f'Acc: {100. * train_correct / train_total:.3f}%, '
                      f'LR: {optimizer.param_groups[0]["lr"]:.6f}')

        val_loss, val_acc = evaluate(model, test_loader, criterion, config)
        val_log_info.append({
            'epoch': epoch,
            'loss': val_loss,
            'acc': val_acc
        })
        print(f'Validation - Loss: {val_loss:.3f}, Acc: {val_acc:.3f}%')

        if val_acc > best_acc:
            best_acc = val_acc
            torch.save(model.state_dict(), os.path.join(config.out_dir, 'best_model.pth'))

        scheduler.step()

    return train_log_info, val_log_info, best_acc

# **Evaluate and Test Model Functions**

In [8]:
# Evaluate and Test Model Functions

import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

def evaluate(model, dataloader, criterion, config):
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0

    with torch.no_grad():
        for inputs, targets in dataloader:
            inputs, targets = inputs.to(config.device), targets.to(config.device)
            outputs = model(inputs)
            loss = criterion(outputs, targets)

            val_loss += loss.item()
            _, predicted = outputs.max(1)
            val_total += targets.size(0)
            val_correct += predicted.eq(targets).sum().item()

    val_loss = val_loss / len(dataloader)
    val_acc = 100. * val_correct / val_total

    return val_loss, val_acc


def test(config):
    model = MobileNetV3Small(num_classes=config.num_classes).to(config.device)
    if config.compile_model:
        print("Compiling the model for testing...")
        model = torch.compile(model)
    model.load_state_dict(torch.load(os.path.join(config.out_dir, 'best_model.pth')))
    _, test_loader = get_data_loaders(config)
    criterion = nn.CrossEntropyLoss()

    test_loss, test_acc = evaluate(model, test_loader, criterion, config)
    print(f'Test - Loss: {test_loss:.3f}, Acc: {test_acc:.3f}%')
    return test_loss, test_acc

# **Train Model**

In [9]:
import argparse
import json
import os
import random
import time
from dataclasses import dataclass
from functools import partial
from typing import Callable, List, Optional, Union, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


def main():
    parser = argparse.ArgumentParser(description="Train MobileNetV3 for Image Classification")
    parser.add_argument("--data_path", type=str, default="./data", help="Path to save/load the dataset")
    parser.add_argument("--batch_size", type=int, default=128, help="Batch size")
    parser.add_argument("--learning_rate", type=float, default=0.01, help="Initial learning rate")
    parser.add_argument("--epochs", type=int, default=30, help="Number of epochs to train")
    parser.add_argument("--out_dir", type=str, default="run_0", help="Output directory")

    # Parse arguments and ignore unrecognized ones
    args, unknown = parser.parse_known_args()

    os.makedirs(args.out_dir, exist_ok=True)
    print(f"Outputs will be saved to {args.out_dir}")

    # Define datasets and number of seeds per dataset
    datasets = ['cifar10']  # For now, only CIFAR-10; can add 'cifar100' in the future
    num_seeds = {
        'cifar10': 1  # Change the number of seeds as desired
    }

    all_results = {}
    final_infos = {}

    for dataset in datasets:
        final_info_list = []
        for seed_offset in range(num_seeds[dataset]):
            # Update the config for each run
            config = Config(
                data_path=args.data_path,
                dataset=dataset,
                batch_size=args.batch_size,
                learning_rate=args.learning_rate,
                epochs=args.epochs,
                out_dir=args.out_dir,
                seed=seed_offset  # Set the seed
            )
            os.makedirs(config.out_dir, exist_ok=True)
            print(f"Starting training for {dataset} with seed {seed_offset}")
            start_time = time.time()
            train_log_info, val_log_info, best_acc = train(config)
            total_time = time.time() - start_time

            # Run test after training
            test_loss, test_acc = test(config)

            # Prepare final_info dictionary
            final_info = {
                "best_val_acc": best_acc,
                "test_acc": test_acc,
                "total_train_time": total_time,
                "config": vars(config)
            }
            final_info_list.append(final_info)

            # Store results in all_results
            key_prefix = f"{dataset}_{seed_offset}"
            all_results[f"{key_prefix}_final_info"] = final_info
            all_results[f"{key_prefix}_train_log_info"] = train_log_info
            all_results[f"{key_prefix}_val_log_info"] = val_log_info

            print(f"Training completed for {dataset} seed {seed_offset}. Best validation accuracy: {best_acc:.2f}%, Test accuracy: {test_acc:.2f}%")

        # Aggregate results over seeds
        final_info_dict = {k: [d[k] for d in final_info_list if k in d] for k in final_info_list[0].keys()}
        means = {f"{k}_mean": np.mean(v) for k, v in final_info_dict.items() if isinstance(v[0], (int, float, float))}
        stderrs = {f"{k}_stderr": np.std(v) / np.sqrt(len(v)) for k, v in final_info_dict.items() if isinstance(v[0], (int, float, float))}
        final_infos[dataset] = {
            "means": means,
            "stderrs": stderrs,
            "final_info_dict": final_info_dict
        }

    # Save final_infos to final_info.json
    with open(os.path.join(args.out_dir, "final_info.json"), "w") as f:
        json.dump(final_infos, f, indent=2)

    # Save all_results to all_results.npy
    with open(os.path.join(args.out_dir, "all_results.npy"), "wb") as f:
        np.save(f, all_results)

    print(f"All results saved to {args.out_dir}")

# Run program
if __name__ == "__main__":
    main()

Outputs will be saved to run_0
Starting training for cifar10 with seed 0


100%|██████████| 170M/170M [00:02<00:00, 73.7MB/s]


Epoch: 0, Batch: 0, Loss: 2.305, Acc: 10.156%, LR: 0.010000
Epoch: 0, Batch: 100, Loss: 2.220, Acc: 15.873%, LR: 0.010000
Epoch: 0, Batch: 200, Loss: 2.071, Acc: 20.180%, LR: 0.010000
Epoch: 0, Batch: 300, Loss: 1.974, Acc: 23.915%, LR: 0.010000
Validation - Loss: 2.308, Acc: 10.000%
Epoch: 1, Batch: 0, Loss: 1.691, Acc: 37.500%, LR: 0.009973
Epoch: 1, Batch: 100, Loss: 1.647, Acc: 37.686%, LR: 0.009973
Epoch: 1, Batch: 200, Loss: 1.622, Acc: 38.891%, LR: 0.009973
Epoch: 1, Batch: 300, Loss: 1.598, Acc: 39.942%, LR: 0.009973
Validation - Loss: 1.486, Acc: 45.470%
Epoch: 2, Batch: 0, Loss: 1.549, Acc: 48.438%, LR: 0.009891
Epoch: 2, Batch: 100, Loss: 1.488, Acc: 45.189%, LR: 0.009891
Epoch: 2, Batch: 200, Loss: 1.484, Acc: 45.445%, LR: 0.009891
Epoch: 2, Batch: 300, Loss: 1.478, Acc: 45.824%, LR: 0.009891
Validation - Loss: 1.370, Acc: 50.100%
Epoch: 3, Batch: 0, Loss: 1.435, Acc: 47.656%, LR: 0.009755
Epoch: 3, Batch: 100, Loss: 1.428, Acc: 47.602%, LR: 0.009755
Epoch: 3, Batch: 200, L

# **NanoGPT**

**Description:** This template investigates transformer-based autoregressive next-token prediction tasks.

1. Prepare the data

In [None]:
# Step 1: Prepare the data

!python data/enwik8/prepare.py
!python data/shakespeare_char/prepare.py
!python data/text8/prepare.py

2. Create baseline runs (machine dependent)

In [None]:
%cd /content/AI-Scientist/templates/nanoGPT
!python experiment.py

/content/AI-Scientist/templates/nanoGPT
tokens per iteration will be: 16,384
found vocab_size = 65 (inside ../../data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
  scaler = torch.cuda.amp.GradScaler(enabled=(dtype == "float16"))
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2654, time 33093.79ms
iter 10: loss 3.2457, time 10.59ms
iter 20: loss 2.7914, time 10.98ms
iter 30: loss 2.6356, time 10.61ms
iter 40: loss 2.5776, time 10.62ms
iter 50: loss 2.5276, time 10.68ms
iter 60: loss 2.5195, time 10.78ms
iter 70: loss 2.4966, time 10.57ms
iter 80: loss 2.4972, time 10.53ms
iter 90: loss 2.4686, time 10.68ms
iter 100: loss 2.4580, time 10.47ms
iter 110: loss 2.4629, time 10.80ms
iter 120: loss 2.4277, time 10.49ms
iter 130: loss 2.4116, time 10

## **2D Diffussion**

Description: This template studies improving the performance of diffusion generative models on low-dimensional datasets.

1. Install dependencies:

In [None]:
# Set up 2D Diffusion
!git clone https://github.com/gregversteeg/NPEET.git
!cd NPEET
#!pip install .
!pip install scikit-learn

2. Create baseline runs:

In [None]:
%ls

DatasaurusDozen.tsv  ema_pytorch.py  ideas.json  plot.py      seed_ideas.json
datasets.py          experiment.py   [0m[01;34mlatex[0m/      prompt.json  train_loss.png


In [None]:
# Set up 2D Diffusion baseline run

%cd /content/AI-Scientist/templates/2d_diffusion
#%cd templates/2d_diffusion
!python experiment.py --out_dir run_0
!python plot.py

/content/AI-Scientist/templates/2d_diffusion
Traceback (most recent call last):
  File "/content/AI-Scientist/templates/2d_diffusion/experiment.py", line 10, in <module>
    import npeet.entropy_estimators as ee
ModuleNotFoundError: No module named 'npeet'
Figure(1400x800)
Traceback (most recent call last):
  File "/content/AI-Scientist/templates/2d_diffusion/plot.py", line 81, in <module>
    fig, axs = plt.subplots(num_runs, 4, figsize=(14, 3 * num_runs))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/matplotlib/pyplot.py", line 1776, in subplots
    axs = fig.subplots(nrows=nrows, ncols=ncols, sharex=sharex, sharey=sharey,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/matplotlib/figure.py", line 918, in subplots
    gs = self.add_gridspec(nrows, ncols, figure=self, **gridspec_kw)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

## **Grokking**

Description: This template investigates questions about generalization and learning speed in deep neural networks.

1. Install dependencies:

In [None]:
# Set up Grokking

!pip install einops



2. Create baseline runs:

In [None]:
# Set up 2D Diffusion baseline run

%cd /content/AI-Scientist/templates/grokking

!python experiment.py --out_dir run_0
!python plot.py

/content/AI-Scientist/templates/grokking
Running x_div_y with seed offset 0
{'final_train_loss': 0.0043488978408277035, 'final_val_loss': 0.00575057789683342, 'final_train_acc': 1.0, 'final_val_acc': 1.0, 'step_val_acc_99': 4470}
Running x_div_y with seed offset 1
{'final_train_loss': 0.005255431402474642, 'final_val_loss': 0.006333010271191597, 'final_train_acc': 1.0, 'final_val_acc': 1.0, 'step_val_acc_99': 4200}
Running x_div_y with seed offset 2
{'final_train_loss': 1.6195591688156128, 'final_val_loss': 0.6360082030296326, 'final_train_acc': 0.644335925579071, 'final_val_acc': 0.873779296875, 'step_val_acc_99': 5380}
Running x_minus_y with seed offset 0
{'final_train_loss': 0.004558430518954992, 'final_val_loss': 0.005506541114300489, 'final_train_acc': 1.0, 'final_val_acc': 1.0, 'step_val_acc_99': 4210}
Running x_minus_y with seed offset 1
{'final_train_loss': 0.047082994133234024, 'final_val_loss': 0.054392553865909576, 'final_train_acc': 0.999218761920929, 'final_val_acc': 0.997

In [None]:
# perform_experiments.py

import json
import os.path as osp
import shutil
import subprocess
import sys
from subprocess import TimeoutExpired

MAX_ITERS = 4
MAX_RUNS = 5
MAX_STDERR_OUTPUT = 1500

coder_prompt = """Your goal is to implement the following idea: {title}.
The proposed experiment is as follows: {idea}.
You are given a total of up to {max_runs} runs to complete the necessary experiments. You do not need to use all {max_runs}.

First, plan the list of experiments you would like to run. For example, if you are sweeping over a specific hyperparameter, plan each value you would like to test for each run.

Note that we already provide the vanilla baseline results, so you do not need to re-run it.

For reference, the baseline results are as follows:

{baseline_results}

After you complete each change, we will run the command `python experiment.py --out_dir=run_i' where i is the run number and evaluate the results.
YOUR PROPOSED CHANGE MUST USE THIS COMMAND FORMAT, DO NOT ADD ADDITIONAL COMMAND LINE ARGS.
You can then implement the next thing on your list."""


# RUN EXPERIMENT
def run_experiment(folder_name, run_num, timeout=7200):
    cwd = osp.abspath(folder_name)
    # COPY CODE SO WE CAN SEE IT.
    shutil.copy(
        osp.join(folder_name, "experiment.py"),
        osp.join(folder_name, f"run_{run_num}.py"),
    )

    # LAUNCH COMMAND
    command = [
        "python",
        "experiment.py",
        f"--out_dir=run_{run_num}",
    ]
    try:
        result = subprocess.run(
            command, cwd=cwd, stderr=subprocess.PIPE, text=True, timeout=timeout
        )

        if result.stderr:
            print(result.stderr, file=sys.stderr)

        if result.returncode != 0:
            print(f"Run {run_num} failed with return code {result.returncode}")
            if osp.exists(osp.join(cwd, f"run_{run_num}")):
                shutil.rmtree(osp.join(cwd, f"run_{run_num}"))
            print(f"Run failed with the following error {result.stderr}")
            stderr_output = result.stderr
            if len(stderr_output) > MAX_STDERR_OUTPUT:
                stderr_output = "..." + stderr_output[-MAX_STDERR_OUTPUT:]
            next_prompt = f"Run failed with the following error {stderr_output}"
        else:
            with open(osp.join(cwd, f"run_{run_num}", "final_info.json"), "r") as f:
                results = json.load(f)
            results = {k: v["means"] for k, v in results.items()}

            next_prompt = f"""Run {run_num} completed. Here are the results:
{results}

Decide if you need to re-plan your experiments given the result (you often will not need to).

Someone else will be using `notes.txt` to perform a writeup on this in the future.
Please include *all* relevant information for the writeup on Run {run_num}, including an experiment description and the run number. Be as verbose as necessary.

Then, implement the next thing on your list.
We will then run the command `python experiment.py --out_dir=run_{run_num + 1}'.
YOUR PROPOSED CHANGE MUST USE THIS COMMAND FORMAT, DO NOT ADD ADDITIONAL COMMAND LINE ARGS.
If you are finished with experiments, respond with 'ALL_COMPLETED'."""
        return result.returncode, next_prompt
    except TimeoutExpired:
        print(f"Run {run_num} timed out after {timeout} seconds")
        if osp.exists(osp.join(cwd, f"run_{run_num}")):
            shutil.rmtree(osp.join(cwd, f"run_{run_num}"))
        next_prompt = f"Run timed out after {timeout} seconds"
        return 1, next_prompt


# RUN PLOTTING
def run_plotting(folder_name, timeout=600):
    cwd = osp.abspath(folder_name)
    # LAUNCH COMMAND
    command = [
        "python",
        "plot.py",
    ]
    try:
        result = subprocess.run(
            command, cwd=cwd, stderr=subprocess.PIPE, text=True, timeout=timeout
        )

        if result.stderr:
            print(result.stderr, file=sys.stderr)

        if result.returncode != 0:
            print(f"Plotting failed with return code {result.returncode}")
            next_prompt = f"Plotting failed with the following error {result.stderr}"
        else:
            next_prompt = ""
        return result.returncode, next_prompt
    except TimeoutExpired:
        print(f"Plotting timed out after {timeout} seconds")
        next_prompt = f"Plotting timed out after {timeout} seconds"
        return 1, next_prompt


# PERFORM EXPERIMENTS
def perform_experiments(idea, folder_name, coder, baseline_results) -> bool:
    ## RUN EXPERIMENT
    current_iter = 0
    run = 1
    next_prompt = coder_prompt.format(
        title=idea["Title"],
        idea=idea["Experiment"],
        max_runs=MAX_RUNS,
        baseline_results=baseline_results,
    )
    while run < MAX_RUNS + 1:
        if current_iter >= MAX_ITERS:
            print("Max iterations reached")
            break
        coder_out = coder.run(next_prompt)
        print(coder_out)
        if "ALL_COMPLETED" in coder_out:
            break
        return_code, next_prompt = run_experiment(folder_name, run)
        if return_code == 0:
            run += 1
            current_iter = 0
        current_iter += 1
    if current_iter >= MAX_ITERS:
        print("Not all experiments completed.")
        return False

    current_iter = 0
    next_prompt = """
Great job! Please modify `plot.py` to generate the most relevant plots for the final writeup.

In particular, be sure to fill in the "labels" dictionary with the correct names for each run that you want to plot.

Only the runs in the `labels` dictionary will be plotted, so make sure to include all relevant runs.

We will be running the command `python plot.py` to generate the plots.
"""
    while True:
        _ = coder.run(next_prompt)
        return_code, next_prompt = run_plotting(folder_name)
        current_iter += 1
        if return_code == 0 or current_iter >= MAX_ITERS:
            break
    next_prompt = """
Please modify `notes.txt` with a description of what each plot shows along with the filename of the figure. Please do so in-depth.

Somebody else will be using `notes.txt` to write a report on this in the future.
"""
    coder.run(next_prompt)

    return True

# **Run AI Scientist Paper Generation Experiments**

In [None]:
# Run paper generation

!python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT_lite --num-ideas 2
#!python launch_scientist.py --model "claude-3-5-sonnet-20241022" --experiment nanoGPT_lite --num-ideas

# **Getting an LLM-Generated Paper Review**


In [None]:
!pip install pypdf
!pip install pymupdf
!pip install pymupdf4llm
!pip install backoff

Collecting pymupdf4llm
  Downloading pymupdf4llm-0.0.17-py3-none-any.whl.metadata (4.1 kB)
Downloading pymupdf4llm-0.0.17-py3-none-any.whl (26 kB)
Installing collected packages: pymupdf4llm
Successfully installed pymupdf4llm-0.0.17


In [None]:
!pip install anthropic

Collecting anthropic
  Downloading anthropic-0.49.0-py3-none-any.whl.metadata (24 kB)
Downloading anthropic-0.49.0-py3-none-any.whl (243 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/243.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.4/243.4 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: anthropic
Successfully installed anthropic-0.49.0


In [None]:
# llm.py

import json
import os
import re

import anthropic
import backoff
import openai
import google.generativeai as genai
from google.generativeai.types import GenerationConfig

MAX_NUM_TOKENS = 4096

AVAILABLE_LLMS = [
    # Anthropic models
    "claude-3-5-sonnet-20240620",
    "claude-3-5-sonnet-20241022",
    # OpenAI models
    "gpt-4o-mini-2024-07-18",
    "gpt-4o-2024-05-13",
    "gpt-4o-2024-08-06",
    "o1-preview-2024-09-12",
    "o1-mini-2024-09-12",
    "o1-2024-12-17",
    # OpenRouter models
    "llama3.1-405b",
    # Anthropic Claude models via Amazon Bedrock
    "bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    "bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
    "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
    "bedrock/anthropic.claude-3-opus-20240229-v1:0",
    # Anthropic Claude models Vertex AI
    "vertex_ai/claude-3-opus@20240229",
    "vertex_ai/claude-3-5-sonnet@20240620",
    "vertex_ai/claude-3-5-sonnet-v2@20241022",
    "vertex_ai/claude-3-sonnet@20240229",
    "vertex_ai/claude-3-haiku@20240307",
    # DeepSeek models
    "deepseek-chat",
    "deepseek-coder",
    "deepseek-reasoner",
    # Google Gemini models
    "gemini-1.5-flash",
    "gemini-1.5-pro",
]


# Get N responses from a single message, used for ensembling.
@backoff.on_exception(backoff.expo, (openai.RateLimitError, openai.APITimeoutError))
def get_batch_responses_from_llm(
        msg,
        client,
        model,
        system_message,
        print_debug=False,
        msg_history=None,
        temperature=0.75,
        n_responses=1,
):
    if msg_history is None:
        msg_history = []

    if model in [
        "gpt-4o-2024-05-13",
        "gpt-4o-mini-2024-07-18",
        "gpt-4o-2024-08-06",
    ]:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_message},
                *new_msg_history,
            ],
            temperature=temperature,
            max_tokens=MAX_NUM_TOKENS,
            n=n_responses,
            stop=None,
            seed=0,
        )
        content = [r.message.content for r in response.choices]
        new_msg_history = [
            new_msg_history + [{"role": "assistant", "content": c}] for c in content
        ]
    elif model == "llama-3-1-405b-instruct":
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model="meta-llama/llama-3.1-405b-instruct",
            messages=[
                {"role": "system", "content": system_message},
                *new_msg_history,
            ],
            temperature=temperature,
            max_tokens=MAX_NUM_TOKENS,
            n=n_responses,
            stop=None,
        )
        content = [r.message.content for r in response.choices]
        new_msg_history = [
            new_msg_history + [{"role": "assistant", "content": c}] for c in content
        ]
    else:
        content, new_msg_history = [], []
        for _ in range(n_responses):
            c, hist = get_response_from_llm(
                msg,
                client,
                model,
                system_message,
                print_debug=False,
                msg_history=None,
                temperature=temperature,
            )
            content.append(c)
            new_msg_history.append(hist)

    if print_debug:
        print()
        print("*" * 20 + " LLM START " + "*" * 20)
        for j, msg in enumerate(new_msg_history[0]):
            print(f'{j}, {msg["role"]}: {msg["content"]}')
        print(content)
        print("*" * 21 + " LLM END " + "*" * 21)
        print()

    return content, new_msg_history


@backoff.on_exception(backoff.expo, (openai.RateLimitError, openai.APITimeoutError))
def get_response_from_llm(
        msg,
        client,
        model,
        system_message,
        print_debug=False,
        msg_history=None,
        temperature=0.75,
):
    if msg_history is None:
        msg_history = []

    if "claude" in model:
        new_msg_history = msg_history + [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": msg,
                    }
                ],
            }
        ]
        response = client.messages.create(
            model=model,
            max_tokens=MAX_NUM_TOKENS,
            temperature=temperature,
            system=system_message,
            messages=new_msg_history,
        )
        content = response.content[0].text
        new_msg_history = new_msg_history + [
            {
                "role": "assistant",
                "content": [
                    {
                        "type": "text",
                        "text": content,
                    }
                ],
            }
        ]
    elif model in [
        "gpt-4o-2024-05-13",
        "gpt-4o-mini-2024-07-18",
        "gpt-4o-2024-08-06",
    ]:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_message},
                *new_msg_history,
            ],
            temperature=temperature,
            max_tokens=MAX_NUM_TOKENS,
            n=1,
            stop=None,
            seed=0,
        )
        content = response.choices[0].message.content
        new_msg_history = new_msg_history + [{"role": "assistant", "content": content}]
    elif model in ["o1-preview-2024-09-12", "o1-mini-2024-09-12"]:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": system_message},
                *new_msg_history,
            ],
            temperature=1,
            max_completion_tokens=MAX_NUM_TOKENS,
            n=1,
            seed=0,
        )
        content = response.choices[0].message.content
        new_msg_history = new_msg_history + [{"role": "assistant", "content": content}]
    elif model in ["meta-llama/llama-3.1-405b-instruct", "llama-3-1-405b-instruct"]:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model="meta-llama/llama-3.1-405b-instruct",
            messages=[
                {"role": "system", "content": system_message},
                *new_msg_history,
            ],
            temperature=temperature,
            max_tokens=MAX_NUM_TOKENS,
            n=1,
            stop=None,
        )
        content = response.choices[0].message.content
        new_msg_history = new_msg_history + [{"role": "assistant", "content": content}]
    elif model in ["deepseek-chat", "deepseek-coder"]:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_message},
                *new_msg_history,
            ],
            temperature=temperature,
            max_tokens=MAX_NUM_TOKENS,
            n=1,
            stop=None,
        )
        content = response.choices[0].message.content
        new_msg_history = new_msg_history + [{"role": "assistant", "content": content}]
    elif model in ["deepseek-reasoner"]:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_message},
                *new_msg_history,
            ],
            n=1,
            stop=None,
        )
        content = response.choices[0].message.content
        new_msg_history = new_msg_history + [{"role": "assistant", "content": content}]
    elif "gemini" in model:
        new_msg_history = msg_history + [{"role": "user", "content": msg}]
        gemini_contents = [{"role": "system", "parts": system_message}]
        for m in new_msg_history:
            gemini_contents.append({"role": m["role"], "parts": m["content"]})
        response = client.generate_content(
            contents=gemini_contents,
            generation_config=GenerationConfig(
                temperature=temperature,
                max_output_tokens=MAX_NUM_TOKENS,
                candidate_count=1,
            ),
        )
        content = response.text
        new_msg_history = new_msg_history + [{"role": "assistant", "content": content}]
    else:
        raise ValueError(f"Model {model} not supported.")

    if print_debug:
        print()
        print("*" * 20 + " LLM START " + "*" * 20)
        for j, msg in enumerate(new_msg_history):
            print(f'{j}, {msg["role"]}: {msg["content"]}')
        print(content)
        print("*" * 21 + " LLM END " + "*" * 21)
        print()

    return content, new_msg_history


def extract_json_between_markers(llm_output):
    # Regular expression pattern to find JSON content between ```json and ```
    json_pattern = r"```json(.*?)```"
    matches = re.findall(json_pattern, llm_output, re.DOTALL)

    if not matches:
        # Fallback: Try to find any JSON-like content in the output
        json_pattern = r"\{.*?\}"
        matches = re.findall(json_pattern, llm_output, re.DOTALL)

    for json_string in matches:
        json_string = json_string.strip()
        try:
            parsed_json = json.loads(json_string)
            return parsed_json
        except json.JSONDecodeError:
            # Attempt to fix common JSON issues
            try:
                # Remove invalid control characters
                json_string_clean = re.sub(r"[\x00-\x1F\x7F]", "", json_string)
                parsed_json = json.loads(json_string_clean)
                return parsed_json
            except json.JSONDecodeError:
                continue  # Try next match

    return None  # No valid JSON found


def create_client(model):
    if model.startswith("claude-"):
        print(f"Using Anthropic API with model {model}.")
        return anthropic.Anthropic(), model
    elif model.startswith("bedrock") and "claude" in model:
        client_model = model.split("/")[-1]
        print(f"Using Amazon Bedrock with model {client_model}.")
        return anthropic.AnthropicBedrock(), client_model
    elif model.startswith("vertex_ai") and "claude" in model:
        client_model = model.split("/")[-1]
        print(f"Using Vertex AI with model {client_model}.")
        return anthropic.AnthropicVertex(), client_model
    elif 'gpt' in model:
        print(f"Using OpenAI API with model {model}.")
        return openai.OpenAI(), model
    elif model in ["o1-preview-2024-09-12", "o1-mini-2024-09-12"]:
        print(f"Using OpenAI API with model {model}.")
        return openai.OpenAI(), model
    elif model in ["deepseek-chat", "deepseek-reasoner"]:
        print(f"Using OpenAI API with {model}.")
        return openai.OpenAI(
            api_key=os.environ["DEEPSEEK_API_KEY"],
            base_url="https://api.deepseek.com"
        ), model
    elif model == "llama3.1-405b":
        print(f"Using OpenAI API with {model}.")
        return openai.OpenAI(
            api_key=os.environ["OPENROUTER_API_KEY"],
            base_url="https://openrouter.ai/api/v1"
        ), "meta-llama/llama-3.1-405b-instruct"
    elif "gemini" in model:
        print(f"Using Google Generative AI with model {model}.")
        genai.configure(api_key=os.environ["GEMINI_API_KEY"])
        client = genai.GenerativeModel(model)
        return client, model
    else:
        raise ValueError(f"Model {model} not supported.")

In [None]:
# perform_writeup

import argparse
import json
import os
import os.path as osp
import re
import shutil
import subprocess
from typing import Optional, Tuple

from ai_scientist.generate_ideas import search_for_papers
from ai_scientist.llm import get_response_from_llm, extract_json_between_markers, create_client, AVAILABLE_LLMS


# GENERATE LATEX
def generate_latex(coder, folder_name, pdf_file, timeout=30, num_error_corrections=5):
    folder = osp.abspath(folder_name)
    cwd = osp.join(folder, "latex")  # Fixed potential issue with path
    writeup_file = osp.join(cwd, "template.tex")

    # Check all references are valid and in the references.bib file
    with open(writeup_file, "r") as f:
        tex_text = f.read()
    cites = re.findall(r"\\cite[a-z]*{([^}]*)}", tex_text)
    references_bib = re.search(
        r"\\begin{filecontents}{references.bib}(.*?)\\end{filecontents}",
        tex_text,
        re.DOTALL,
    )
    if references_bib is None:
        print("No references.bib found in template.tex")
        return
    bib_text = references_bib.group(1)
    cites = [cite.strip() for item in cites for cite in item.split(",")]
    for cite in cites:
        if cite not in bib_text:
            print(f"Reference {cite} not found in references.")
            prompt = f"""Reference {cite} not found in references.bib. Is this included under a different name?
If so, please modify the citation in template.tex to match the name in references.bib at the top. Otherwise, remove the cite."""
            coder.run(prompt)

    # Check all included figures are actually in the directory.
    with open(writeup_file, "r") as f:
        tex_text = f.read()
    referenced_figs = re.findall(r"\\includegraphics.*?{(.*?)}", tex_text)
    all_figs = [f for f in os.listdir(folder) if f.endswith(".png")]
    for figure in referenced_figs:
        if figure not in all_figs:
            print(f"Figure {figure} not found in directory.")
            prompt = f"""The image {figure} not found in the directory. The images in the directory are: {all_figs}.
Please ensure that the figure is in the directory and that the filename is correct. Check the notes to see what each figure contains."""
            coder.run(prompt)

    # Remove duplicate figures.
    with open(writeup_file, "r") as f:
        tex_text = f.read()
    referenced_figs = re.findall(r"\\includegraphics.*?{(.*?)}", tex_text)
    duplicates = {x for x in referenced_figs if referenced_figs.count(x) > 1}
    if duplicates:
        for dup in duplicates:
            print(f"Duplicate figure found: {dup}.")
            prompt = f"""Duplicate figures found: {dup}. Ensure any figure is only included once.
If duplicated, identify the best location for the figure and remove any other."""
            coder.run(prompt)

    # Remove duplicate section headers.
    with open(writeup_file, "r") as f:
        tex_text = f.read()
    sections = re.findall(r"\\section{([^}]*)}", tex_text)
    duplicates = {x for x in sections if sections.count(x) > 1}
    if duplicates:
        for dup in duplicates:
            print(f"Duplicate section header found: {dup}")
            prompt = f"""Duplicate section header found: {dup}. Ensure any section header is declared once.
If duplicated, identify the best location for the section header and remove any other."""
            coder.run(prompt)

    # Iteratively fix any LaTeX bugs
    for i in range(num_error_corrections):
        # Filter trivial bugs in chktex
        check_output = os.popen(f"chktex {writeup_file} -q -n2 -n24 -n13 -n1").read()
        if check_output:
            prompt = f"""Please fix the following LaTeX errors in `template.tex` guided by the output of `chktek`:
{check_output}.

Make the minimal fix required and do not remove or change any packages.
Pay attention to any accidental uses of HTML syntax, e.g. </end instead of \\end.
"""
            coder.run(prompt)
        else:
            break
    compile_latex(cwd, pdf_file, timeout=timeout)


def compile_latex(cwd, pdf_file, timeout=30):
    print("GENERATING LATEX")

    commands = [
        ["pdflatex", "-interaction=nonstopmode", "template.tex"],
        ["bibtex", "template"],
        ["pdflatex", "-interaction=nonstopmode", "template.tex"],
        ["pdflatex", "-interaction=nonstopmode", "template.tex"],
    ]

    for command in commands:
        try:
            result = subprocess.run(
                command,
                cwd=cwd,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                text=True,
                timeout=timeout,
            )
            print("Standard Output:\n", result.stdout)
            print("Standard Error:\n", result.stderr)
        except subprocess.TimeoutExpired:
            print(f"Latex timed out after {timeout} seconds")
        except subprocess.CalledProcessError as e:
            print(f"Error running command {' '.join(command)}: {e}")

    print("FINISHED GENERATING LATEX")

    # Attempt to move the PDF to the desired location
    try:
        shutil.move(osp.join(cwd, "template.pdf"), pdf_file)
    except FileNotFoundError:
        print("Failed to rename PDF.")


per_section_tips = {
    "Abstract": """
- TL;DR of the paper
- What are we trying to do and why is it relevant?
- Why is this hard?
- How do we solve it (i.e. our contribution!)
- How do we verify that we solved it (e.g. Experiments and results)

Please make sure the abstract reads smoothly and is well-motivated. This should be one continuous paragraph with no breaks between the lines.
""",
    "Introduction": """
- Longer version of the Abstract, i.e. of the entire paper
- What are we trying to do and why is it relevant?
- Why is this hard?
- How do we solve it (i.e. our contribution!)
- How do we verify that we solved it (e.g. Experiments and results)
- New trend: specifically list your contributions as bullet points
- Extra space? Future work!
""",
    "Related Work": """
- Academic siblings of our work, i.e. alternative attempts in literature at trying to solve the same problem.
- Goal is to “Compare and contrast” - how does their approach differ in either assumptions or method? If their method is applicable to our Problem Setting I expect a comparison in the experimental section. If not, there needs to be a clear statement why a given method is not applicable.
- Note: Just describing what another paper is doing is not enough. We need to compare and contrast.
""",
    "Background": """
- Academic Ancestors of our work, i.e. all concepts and prior work that are required for understanding our method.
- Usually includes a subsection, Problem Setting, which formally introduces the problem setting and notation (Formalism) for our method. Highlights any specific assumptions that are made that are unusual.
- Note: If our paper introduces a novel problem setting as part of its contributions, it's best to have a separate Section.
""",
    "Method": """
- What we do. Why we do it. All described using the general Formalism introduced in the Problem Setting and building on top of the concepts / foundations introduced in Background.
""",
    "Experimental Setup": """
- How do we test that our stuff works? Introduces a specific instantiation of the Problem Setting and specific implementation details of our Method for this Problem Setting.
- Do not imagine unknown hardware details.
- Includes a description of the dataset, evaluation metrics, important hyperparameters, and implementation details.
""",
    "Results": """
- Shows the results of running Method on our problem described in Experimental Setup.
- Includes statements on hyperparameters and other potential issues of fairness.
- Only includes results that have actually been run and saved in the logs. Do not hallucinate results that don't exist.
- If results exist: compares to baselines and includes statistics and confidence intervals.
- If results exist: includes ablation studies to show that specific parts of the method are relevant.
- Discusses limitations of the method.
- Make sure to include all the results from the experiments, and include all relevant figures.
""",
    "Conclusion": """
- Brief recap of the entire paper.
- To keep going with the analogy, you can think of future work as (potential) academic offspring.
""",
}

error_list = """- Unenclosed math symbols
- Only reference figures that exist in our directory
- LaTeX syntax errors
- Numerical results that do not come from explicit experiments and logs
- Repeatedly defined figure labels
- References to papers that are not in the .bib file, DO NOT ADD ANY NEW CITATIONS!
- Unnecessary verbosity or repetition, unclear text
- Results or insights in the `notes.txt` that have not yet need included
- Any relevant figures that have not yet been included in the text
- Closing any \\begin{{figure}} with a \\end{{figure}} and \\begin{{table}} with a \\end{{table}}, etc.
- Duplicate headers, e.g. duplicated \\section{{Introduction}} or \\end{{document}}
- Unescaped symbols, e.g. shakespeare_char should be shakespeare\\_char in text
- Incorrect closing of environments, e.g. </end{{figure}}> instead of \\end{{figure}}
"""

refinement_prompt = (
    """Great job! Now criticize and refine only the {section} that you just wrote.
Make this complete in this pass, do not leave any placeholders.

Pay particular attention to fixing any errors such as:
"""
    + error_list
)

second_refinement_prompt = (
    """Criticize and refine the {section} only. Recall the advice:
{tips}
Make this complete in this pass, do not leave any placeholders.

Pay attention to how it fits in with the rest of the paper.
Identify any redundancies (e.g. repeated figures or repeated text), if there are any, decide where in the paper things should be cut.
Identify where we can save space, and be more concise without weakening the message of the text.
Fix any remaining errors as before:
"""
    + error_list
)

# CITATION HELPERS
citation_system_msg = """You are an ambitious AI PhD student who is looking to publish a paper that will contribute significantly to the field.
You have already written an initial draft of the paper and now you are looking to add missing citations to related papers throughout the paper.
The related work section already has some initial comments on which papers to add and discuss.

Focus on completing the existing write-up and do not add entirely new elements unless necessary.
Ensure every point in the paper is substantiated with sufficient evidence.
Feel free to add more cites to a particular point if there is only one or two references.
Ensure no paper is cited without a corresponding reference in the `references.bib` file.
Ensure each paragraph of the related work has sufficient background, e.g. a few papers cited.
You will be given access to the Semantic Scholar API, only add citations that you have found using the API.
Aim to discuss a broad range of relevant papers, not just the most popular ones.
Make sure not to copy verbatim from prior literature to avoid plagiarism.

You will be prompted to give a precise description of where and how to add the cite, and a search query for the paper to be cited.
Finally, you will select the most relevant cite from the search results (top 10 results will be shown).
You will have {total_rounds} rounds to add to the references, but do not need to use them all.

DO NOT ADD A CITATION THAT ALREADY EXISTS!"""

citation_first_prompt = '''Round {current_round}/{total_rounds}:

You have written this LaTeX draft so far:

"""
{draft}
"""

Identify the most important citation that you still need to add, and the query to find the paper.

Respond in the following format:

THOUGHT:
<THOUGHT>

RESPONSE:
```json
<JSON>
```

In <THOUGHT>, first briefly reason over the paper and identify where citations should be added.
If no more citations are needed, add "No more citations needed" to your thoughts.
Do not add "No more citations needed" if you are adding citations this round.

In <JSON>, respond in JSON format with the following fields:
- "Description": A precise description of the required edit, along with the proposed text and location where it should be made.
- "Query": The search query to find the paper (e.g. attention is all you need).

Ensure the description is sufficient to make the change without further context. Someone else will make the change.
The query will work best if you are able to recall the exact name of the paper you are looking for, or the authors.
This JSON will be automatically parsed, so ensure the format is precise.'''

citation_second_prompt = """Search has recovered the following articles:

{papers}

Respond in the following format:

THOUGHT:
<THOUGHT>

RESPONSE:
```json
<JSON>
```

In <THOUGHT>, first briefly reason over the search results and identify which citation best fits your paper and the location is to be added at.
If none are appropriate, add "Do not add any" to your thoughts.

In <JSON>, respond in JSON format with the following fields:
- "Selected": A list of the indices of the selected papers to be cited, e.g. "[0, 1]". Can be "[]" if no papers are selected. This must be a string.
- "Description": Update the previous description of the required edit if needed. Ensure that any cites precisely match the name in the bibtex!!!

Do not select papers that are already in the `references.bib` file at the top of the draft, or if the same citation exists under a different name.
This JSON will be automatically parsed, so ensure the format is precise."""


def get_citation_aider_prompt(
        client, model, draft, current_round, total_rounds, engine="semanticscholar"
) -> Tuple[Optional[str], bool]:
    msg_history = []
    try:
        text, msg_history = get_response_from_llm(
            citation_first_prompt.format(
                draft=draft, current_round=current_round, total_rounds=total_rounds
            ),
            client=client,
            model=model,
            system_message=citation_system_msg.format(total_rounds=total_rounds),
            msg_history=msg_history,
        )
        if "No more citations needed" in text:
            print("No more citations needed.")
            return None, True

        ## PARSE OUTPUT
        json_output = extract_json_between_markers(text)
        assert json_output is not None, "Failed to extract JSON from LLM output"
        query = json_output["Query"]
        papers = search_for_papers(query, engine=engine)
    except Exception as e:
        print(f"Error: {e}")
        return None, False

    if papers is None:
        print("No papers found.")
        return None, False

    paper_strings = []
    for i, paper in enumerate(papers):
        paper_strings.append(
            """{i}: {title}. {authors}. {venue}, {year}.\nAbstract: {abstract}""".format(
                i=i,
                title=paper["title"],
                authors=paper["authors"],
                venue=paper["venue"],
                year=paper["year"],
                abstract=paper["abstract"],
            )
        )
    papers_str = "\n\n".join(paper_strings)

    try:
        text, msg_history = get_response_from_llm(
            citation_second_prompt.format(
                papers=papers_str,
                current_round=current_round,
                total_rounds=total_rounds,
            ),
            client=client,
            model=model,
            system_message=citation_system_msg.format(total_rounds=total_rounds),
            msg_history=msg_history,
        )
        if "Do not add any" in text:
            print("Do not add any.")
            return None, False
        ## PARSE OUTPUT
        json_output = extract_json_between_markers(text)
        assert json_output is not None, "Failed to extract JSON from LLM output"
        desc = json_output["Description"]
        selected_papers = json_output["Selected"]
        selected_papers = str(selected_papers)

        # convert to list
        if selected_papers != "[]":
            selected_papers = list(map(int, selected_papers.strip("[]").split(",")))
            assert all(
                [0 <= i < len(papers) for i in selected_papers]
            ), "Invalid paper index"
            bibtexs = [papers[i]["citationStyles"]["bibtex"] for i in selected_papers]
            bibtex_string = "\n".join(bibtexs)
        else:
            return None, False

    except Exception as e:
        print(f"Error: {e}")
        return None, False

    # Add citation to draft
    aider_format = '''The following citations have just been added to the end of the `references.bib` file definition at the top of the file:
"""
{bibtex}
"""
You do not need to add them yourself.
ABSOLUTELY DO NOT ADD IT AGAIN!!!

Make the proposed change to the draft incorporating these new cites:
{description}

Use your judgment for whether these should be cited anywhere else.
Make sure that any citation precisely matches the name in `references.bib`. Change its name to the correct name in the bibtex if needed.
Ensure the citation is well-integrated into the text.'''

    aider_prompt = (
            aider_format.format(bibtex=bibtex_string, description=desc)
            + """\n You must use \cite or \citet to reference papers, do not manually type out author names."""
    )
    return aider_prompt, False


# PERFORM WRITEUP
def perform_writeup(
        idea, folder_name, coder, cite_client, cite_model, num_cite_rounds=20, engine="semanticscholar"
):
    # CURRENTLY ASSUMES LATEX
    abstract_prompt = f"""We've provided the `latex/template.tex` file to the project. We will be filling it in section by section.

First, please fill in the "Title" and "Abstract" sections of the writeup.

Some tips are provided below:
{per_section_tips["Abstract"]}

Before every paragraph, please include a brief description of what you plan to write in that paragraph in a comment.

Be sure to first name the file and use *SEARCH/REPLACE* blocks to perform these edits.
"""
    coder_out = coder.run(abstract_prompt)
    coder_out = coder.run(
        refinement_prompt.format(section="Abstract")
        .replace(r"{{", "{")
        .replace(r"}}", "}")
    )
    for section in [
        "Introduction",
        "Background",
        "Method",
        "Experimental Setup",
        "Results",
        "Conclusion",
    ]:
        section_prompt = f"""Please fill in the {section} of the writeup. Some tips are provided below:
{per_section_tips[section]}

Be sure to use \cite or \citet where relevant, referring to the works provided in the file.
Do not cite anything that is not already in `references.bib`. Do not add any new entries to this.

Keep the experimental results (figures and tables) only in the Results section, and make sure that any captions are filled in.
In this pass, do not reference anything in later sections of the paper.

Before every paragraph, please include a brief description of what you plan to write in that paragraph in a comment.

Be sure to first name the file and use *SEARCH/REPLACE* blocks to perform these edits.
"""
        coder_out = coder.run(section_prompt)
        coder_out = coder.run(
            refinement_prompt.format(section=section)
            .replace(r"{{", "{")
            .replace(r"}}", "}")
        )

    # SKETCH THE RELATED WORK
    section_prompt = f"""Please fill in the Related Work of the writeup. Some tips are provided below:

{per_section_tips["Related Work"]}

For this section, very briefly sketch out the structure of the section, and clearly indicate what papers you intend to include.
Do this all in LaTeX comments using %.
The related work should be concise, only plan to discuss the most relevant work.
Do not modify `references.bib` to add any new citations, this will be filled in at a later stage.

Be sure to first name the file and use *SEARCH/REPLACE* blocks to perform these edits.
"""
    coder_out = coder.run(section_prompt)

    # Fill paper with cites.
    for _ in range(num_cite_rounds):
        with open(osp.join(folder_name, "latex", "template.tex"), "r") as f:
            draft = f.read()
        prompt, done = get_citation_aider_prompt(
            cite_client, cite_model, draft, _, num_cite_rounds, engine=engine
        )
        if done:
            break
        if prompt is not None:
            # extract bibtex string
            bibtex_string = prompt.split('"""')[1]
            # insert this into draft before the "\end{filecontents}" line
            search_str = r"\end{filecontents}"
            draft = draft.replace(search_str, f"{bibtex_string}{search_str}")
            with open(osp.join(folder_name, "latex", "template.tex"), "w") as f:
                f.write(draft)
            coder_out = coder.run(prompt)

    coder_out = coder.run(
        refinement_prompt.format(section="Related Work")
        .replace(r"{{", "{")
        .replace(r"}}", "}")
    )

    ## SECOND REFINEMENT LOOP
    coder.run(
        """Great job! Now that there is a complete draft of the entire paper, let's refine each section again.
First, re-think the Title if necessary. Keep this concise and descriptive of the paper's concept, but try by creative with it."""
    )
    for section in [
        "Abstract",
        "Related Work",
        "Introduction",
        "Background",
        "Method",
        "Experimental Setup",
        "Results",
        "Conclusion",
    ]:
        coder_out = coder.run(
            second_refinement_prompt.format(
                section=section, tips=per_section_tips[section]
            )
            .replace(r"{{", "{")
            .replace(r"}}", "}")
        )

    generate_latex(coder, folder_name, f"{folder_name}/{idea['Name']}.pdf")


if __name__ == "__main__":
    from aider.coders import Coder
    from aider.models import Model
    from aider.io import InputOutput
    import json

    parser = argparse.ArgumentParser(description="Perform writeup for a project")
    parser.add_argument("--folder", type=str)
    parser.add_argument("--no-writing", action="store_true", help="Only generate")
    parser.add_argument(
        "--model",
        type=str,
        default="gpt-4o-2024-05-13",
        choices=AVAILABLE_LLMS,
        help="Model to use for AI Scientist.",
    )
    parser.add_argument(
        "--engine",
        type=str,
        default="semanticscholar",
        choices=["semanticscholar", "openalex"],
        help="Scholar engine to use.",
    )
    args = parser.parse_args()
    client, client_model = create_client(args.model)
    print("Make sure you cleaned the Aider logs if re-generating the writeup!")
    folder_name = args.folder
    idea_name = osp.basename(folder_name)
    exp_file = osp.join(folder_name, "experiment.py")
    vis_file = osp.join(folder_name, "plot.py")
    notes = osp.join(folder_name, "notes.txt")
    model = args.model
    writeup_file = osp.join(folder_name, "latex", "template.tex")
    ideas_file = osp.join(folder_name, "ideas.json")
    with open(ideas_file, "r") as f:
        ideas = json.load(f)
    for idea in ideas:
        if idea["Name"] in idea_name:
            print(f"Found idea: {idea['Name']}")
            break
    if idea["Name"] not in idea_name:
        raise ValueError(f"Idea {idea_name} not found")
    fnames = [exp_file, writeup_file, notes]
    io = InputOutput(yes=True, chat_history_file=f"{folder_name}/{idea_name}_aider.txt")
    if args.model == "deepseek-coder-v2-0724":
        main_model = Model("deepseek/deepseek-coder")
    elif args.model == "llama3.1-405b":
        main_model = Model("openrouter/meta-llama/llama-3.1-405b-instruct")
    else:
        main_model = Model(model)
    coder = Coder.create(
        main_model=main_model,
        fnames=fnames,
        io=io,
        stream=False,
        use_git=False,
        edit_format="diff",
    )
    if args.no_writing:
        generate_latex(coder, args.folder, f"{args.folder}/test.pdf")
    else:
        try:
            perform_writeup(idea, folder_name, coder, client, client_model, engine=args.engine)
        except Exception as e:
            print(f"Failed to perform writeup: {e}")

ModuleNotFoundError: No module named 'aider'

In [None]:
# perform_review.py

import os
import numpy as np
import json
from pypdf import PdfReader
import pymupdf
import pymupdf4llm
from ai_scientist.llm import (
    get_response_from_llm,
    get_batch_responses_from_llm,
    extract_json_between_markers,
)

reviewer_system_prompt_base = (
    "You are an AI researcher who is reviewing a paper that was submitted to a prestigious ML venue."
    "Be critical and cautious in your decision."
)

reviewer_system_prompt_neg = (
    reviewer_system_prompt_base
    + "If a paper is bad or you are unsure, give it bad scores and reject it."
)
reviewer_system_prompt_pos = (
    reviewer_system_prompt_base
    + "If a paper is good or you are unsure, give it good scores and accept it."
)

template_instructions = """
Respond in the following format:

THOUGHT:
<THOUGHT>

REVIEW JSON:
```json
<JSON>
```

In <THOUGHT>, first briefly discuss your intuitions and reasoning for the evaluation.
Detail your high-level arguments, necessary choices and desired outcomes of the review.
Do not make generic comments here, but be specific to your current paper.
Treat this as the note-taking phase of your review.

In <JSON>, provide the review in JSON format with the following fields in the order:
- "Summary": A summary of the paper content and its contributions.
- "Strengths": A list of strengths of the paper.
- "Weaknesses": A list of weaknesses of the paper.
- "Originality": A rating from 1 to 4 (low, medium, high, very high).
- "Quality": A rating from 1 to 4 (low, medium, high, very high).
- "Clarity": A rating from 1 to 4 (low, medium, high, very high).
- "Significance": A rating from 1 to 4 (low, medium, high, very high).
- "Questions": A set of clarifying questions to be answered by the paper authors.
- "Limitations": A set of limitations and potential negative societal impacts of the work.
- "Ethical Concerns": A boolean value indicating whether there are ethical concerns.
- "Soundness": A rating from 1 to 4 (poor, fair, good, excellent).
- "Presentation": A rating from 1 to 4 (poor, fair, good, excellent).
- "Contribution": A rating from 1 to 4 (poor, fair, good, excellent).
- "Overall": A rating from 1 to 10 (very strong reject to award quality).
- "Confidence": A rating from 1 to 5 (low, medium, high, very high, absolute).
- "Decision": A decision that has to be one of the following: Accept, Reject.

For the "Decision" field, don't use Weak Accept, Borderline Accept, Borderline Reject, or Strong Reject. Instead, only use Accept or Reject.
This JSON will be automatically parsed, so ensure the format is precise.
"""

neurips_form = (
    """
## Review Form
Below is a description of the questions you will be asked on the review form for each paper and some guidelines on what to consider when answering these questions.
When writing your review, please keep in mind that after decisions have been made, reviews and meta-reviews of accepted papers and opted-in rejected papers will be made public.

1. Summary: Briefly summarize the paper and its contributions. This is not the place to critique the paper; the authors should generally agree with a well-written summary.
  - Strengths and Weaknesses: Please provide a thorough assessment of the strengths and weaknesses of the paper, touching on each of the following dimensions:
  - Originality: Are the tasks or methods new? Is the work a novel combination of well-known techniques? (This can be valuable!) Is it clear how this work differs from previous contributions? Is related work adequately cited
  - Quality: Is the submission technically sound? Are claims well supported (e.g., by theoretical analysis or experimental results)? Are the methods used appropriate? Is this a complete piece of work or work in progress? Are the authors careful and honest about evaluating both the strengths and weaknesses of their work
  - Clarity: Is the submission clearly written? Is it well organized? (If not, please make constructive suggestions for improving its clarity.) Does it adequately inform the reader? (Note that a superbly written paper provides enough information for an expert reader to reproduce its results.)
  - Significance: Are the results important? Are others (researchers or practitioners) likely to use the ideas or build on them? Does the submission address a difficult task in a better way than previous work? Does it advance the state of the art in a demonstrable way? Does it provide unique data, unique conclusions about existing data, or a unique theoretical or experimental approach?

2. Questions: Please list up and carefully describe any questions and suggestions for the authors. Think of the things where a response from the author can change your opinion, clarify a confusion or address a limitation. This can be very important for a productive rebuttal and discussion phase with the authors.

3. Limitations: Have the authors adequately addressed the limitations and potential negative societal impact of their work? If not, please include constructive suggestions for improvement.
In general, authors should be rewarded rather than punished for being up front about the limitations of their work and any potential negative societal impact. You are encouraged to think through whether any critical points are missing and provide these as feedback for the authors.

4. Ethical concerns: If there are ethical issues with this paper, please flag the paper for an ethics review. For guidance on when this is appropriate, please review the NeurIPS ethics guidelines.

5. Soundness: Please assign the paper a numerical rating on the following scale to indicate the soundness of the technical claims, experimental and research methodology and on whether the central claims of the paper are adequately supported with evidence.
  4: excellent
  3: good
  2: fair
  1: poor

6. Presentation: Please assign the paper a numerical rating on the following scale to indicate the quality of the presentation. This should take into account the writing style and clarity, as well as contextualization relative to prior work.
  4: excellent
  3: good
  2: fair
  1: poor

7. Contribution: Please assign the paper a numerical rating on the following scale to indicate the quality of the overall contribution this paper makes to the research area being studied. Are the questions being asked important? Does the paper bring a significant originality of ideas and/or execution? Are the results valuable to share with the broader NeurIPS community.
  4: excellent
  3: good
  2: fair
  1: poor

8. Overall: Please provide an "overall score" for this submission. Choices:
  10: Award quality: Technically flawless paper with groundbreaking impact on one or more areas of AI, with exceptionally strong evaluation, reproducibility, and resources, and no unaddressed ethical considerations.
  9: Very Strong Accept: Technically flawless paper with groundbreaking impact on at least one area of AI and excellent impact on multiple areas of AI, with flawless evaluation, resources, and reproducibility, and no unaddressed ethical considerations.
  8: Strong Accept: Technically strong paper with, with novel ideas, excellent impact on at least one area of AI or high-to-excellent impact on multiple areas of AI, with excellent evaluation, resources, and reproducibility, and no unaddressed ethical considerations.
  7: Accept: Technically solid paper, with high impact on at least one sub-area of AI or moderate-to-high impact on more than one area of AI, with good-to-excellent evaluation, resources, reproducibility, and no unaddressed ethical considerations.
  6: Weak Accept: Technically solid, moderate-to-high impact paper, with no major concerns with respect to evaluation, resources, reproducibility, ethical considerations.
  5: Borderline accept: Technically solid paper where reasons to accept outweigh reasons to reject, e.g., limited evaluation. Please use sparingly.
  4: Borderline reject: Technically solid paper where reasons to reject, e.g., limited evaluation, outweigh reasons to accept, e.g., good evaluation. Please use sparingly.
  3: Reject: For instance, a paper with technical flaws, weak evaluation, inadequate reproducibility and incompletely addressed ethical considerations.
  2: Strong Reject: For instance, a paper with major technical flaws, and/or poor evaluation, limited impact, poor reproducibility and mostly unaddressed ethical considerations.
  1: Very Strong Reject: For instance, a paper with trivial results or unaddressed ethical considerations

9. Confidence:  Please provide a "confidence score" for your assessment of this submission to indicate how confident you are in your evaluation. Choices:
  5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.
  4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
  3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.
  2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.
  1: Your assessment is an educated guess. The submission is not in your area or the submission was difficult to understand. Math/other details were not carefully checked.
"""
    + template_instructions
)


def perform_review(
    text,
    model,
    client,
    num_reflections=1,
    num_fs_examples=1,
    num_reviews_ensemble=1,
    temperature=0.75,
    msg_history=None,
    return_msg_history=False,
    reviewer_system_prompt=reviewer_system_prompt_neg,
    review_instruction_form=neurips_form,
):
    if num_fs_examples > 0:
        fs_prompt = get_review_fewshot_examples(num_fs_examples)
        base_prompt = review_instruction_form + fs_prompt
    else:
        base_prompt = review_instruction_form

    base_prompt += f"""
Here is the paper you are asked to review:
```
{text}
```"""

    if num_reviews_ensemble > 1:
        llm_review, msg_histories = get_batch_responses_from_llm(
            base_prompt,
            model=model,
            client=client,
            system_message=reviewer_system_prompt,
            print_debug=False,
            msg_history=msg_history,
            # Higher temperature to encourage diversity.
            temperature=0.75,
            n_responses=num_reviews_ensemble,
        )
        parsed_reviews = []
        for idx, rev in enumerate(llm_review):
            try:
                parsed_reviews.append(extract_json_between_markers(rev))
            except Exception as e:
                print(f"Ensemble review {idx} failed: {e}")
        parsed_reviews = [r for r in parsed_reviews if r is not None]
        review = get_meta_review(model, client, temperature, parsed_reviews)

        # take first valid in case meta-reviewer fails
        if review is None:
            review = parsed_reviews[0]

        # Replace numerical scores with the average of the ensemble.
        for score, limits in [
            ("Originality", (1, 4)),
            ("Quality", (1, 4)),
            ("Clarity", (1, 4)),
            ("Significance", (1, 4)),
            ("Soundness", (1, 4)),
            ("Presentation", (1, 4)),
            ("Contribution", (1, 4)),
            ("Overall", (1, 10)),
            ("Confidence", (1, 5)),
        ]:
            scores = []
            for r in parsed_reviews:
                if score in r and limits[1] >= r[score] >= limits[0]:
                    scores.append(r[score])
            review[score] = int(round(np.mean(scores)))

        # Rewrite the message history with the valid one and new aggregated review.
        msg_history = msg_histories[0][:-1]
        msg_history += [
            {
                "role": "assistant",
                "content": f"""
THOUGHT:
I will start by aggregating the opinions of {num_reviews_ensemble} reviewers that I previously obtained.

REVIEW JSON:
```json
{json.dumps(review)}
```
""",
            }
        ]
    else:
        llm_review, msg_history = get_response_from_llm(
            base_prompt,
            model=model,
            client=client,
            system_message=reviewer_system_prompt,
            print_debug=False,
            msg_history=msg_history,
            temperature=temperature,
        )
        review = extract_json_between_markers(llm_review)

    if num_reflections > 1:
        for j in range(num_reflections - 1):
            # print(f"Relection: {j + 2}/{num_reflections}")
            text, msg_history = get_response_from_llm(
                reviewer_reflection_prompt,
                client=client,
                model=model,
                system_message=reviewer_system_prompt,
                msg_history=msg_history,
                temperature=temperature,
            )
            review = extract_json_between_markers(text)
            assert review is not None, "Failed to extract JSON from LLM output"

            if "I am done" in text:
                # print(f"Review generation converged after {j + 2} iterations.")
                break

    if return_msg_history:
        return review, msg_history
    else:
        return review


reviewer_reflection_prompt = """Round {current_round}/{num_reflections}.
In your thoughts, first carefully consider the accuracy and soundness of the review you just created.
Include any other factors that you think are important in evaluating the paper.
Ensure the review is clear and concise, and the JSON is in the correct format.
Do not make things overly complicated.
In the next attempt, try and refine and improve your review.
Stick to the spirit of the original review unless there are glaring issues.

Respond in the same format as before:
THOUGHT:
<THOUGHT>

REVIEW JSON:
```json
<JSON>
```

If there is nothing to improve, simply repeat the previous JSON EXACTLY after the thought and include "I am done" at the end of the thoughts but before the JSON.
ONLY INCLUDE "I am done" IF YOU ARE MAKING NO MORE CHANGES."""


def load_paper(pdf_path, num_pages=None, min_size=100):
    try:
        if num_pages is None:
            text = pymupdf4llm.to_markdown(pdf_path)
        else:
            reader = PdfReader(pdf_path)
            min_pages = min(len(reader.pages), num_pages)
            text = pymupdf4llm.to_markdown(pdf_path, pages=list(range(min_pages)))
        if len(text) < min_size:
            raise Exception("Text too short")
    except Exception as e:
        print(f"Error with pymupdf4llm, falling back to pymupdf: {e}")
        try:
            doc = pymupdf.open(pdf_path)  # open a document
            if num_pages:
                doc = doc[:num_pages]
            text = ""
            for page in doc:  # iterate the document pages
                text = text + page.get_text()  # get plain text encoded as UTF-8
            if len(text) < min_size:
                raise Exception("Text too short")
        except Exception as e:
            print(f"Error with pymupdf, falling back to pypdf: {e}")
            reader = PdfReader(pdf_path)
            if num_pages is None:
                text = "".join(page.extract_text() for page in reader.pages)
            else:
                text = "".join(page.extract_text() for page in reader.pages[:num_pages])
            if len(text) < min_size:
                raise Exception("Text too short")

    return text


def load_review(path):
    with open(path, "r") as json_file:
        loaded = json.load(json_file)
    return loaded["review"]


# get directory of this file
dir_path = os.path.dirname(os.path.realpath(__file__))

fewshot_papers = [
    os.path.join(dir_path, "fewshot_examples/132_automated_relational.pdf"),
    os.path.join(dir_path, "fewshot_examples/attention.pdf"),
    os.path.join(dir_path, "fewshot_examples/2_carpe_diem.pdf"),
]

fewshot_reviews = [
    os.path.join(dir_path, "fewshot_examples/132_automated_relational.json"),
    os.path.join(dir_path, "fewshot_examples/attention.json"),
    os.path.join(dir_path, "fewshot_examples/2_carpe_diem.json"),
]


def get_review_fewshot_examples(num_fs_examples=1):
    fewshot_prompt = """
Below are some sample reviews, copied from previous machine learning conferences.
Note that while each review is formatted differently according to each reviewer's style, the reviews are well-structured and therefore easy to navigate.
"""
    for paper, review in zip(
        fewshot_papers[:num_fs_examples], fewshot_reviews[:num_fs_examples]
    ):
        txt_path = paper.replace(".pdf", ".txt")
        if os.path.exists(txt_path):
            with open(txt_path, "r") as f:
                paper_text = f.read()
        else:
            paper_text = load_paper(paper)
        review_text = load_review(review)
        fewshot_prompt += f"""
Paper:

```
{paper_text}
```

Review:

```
{review_text}
```
"""

    return fewshot_prompt


meta_reviewer_system_prompt = """You are an Area Chair at a machine learning conference.
You are in charge of meta-reviewing a paper that was reviewed by {reviewer_count} reviewers.
Your job is to aggregate the reviews into a single meta-review in the same format.
Be critical and cautious in your decision, find consensus, and respect the opinion of all the reviewers."""


def get_meta_review(model, client, temperature, reviews):
    # Write a meta-review from a set of individual reviews
    review_text = ""
    for i, r in enumerate(reviews):
        review_text += f"""
Review {i + 1}/{len(reviews)}:
```
{json.dumps(r)}
```
"""
    base_prompt = neurips_form + review_text

    llm_review, msg_history = get_response_from_llm(
        base_prompt,
        model=model,
        client=client,
        system_message=meta_reviewer_system_prompt.format(reviewer_count=len(reviews)),
        print_debug=False,
        msg_history=None,
        temperature=temperature,
    )
    meta_review = extract_json_between_markers(llm_review)
    return meta_review


def perform_improvement(review, coder):
    improvement_prompt = '''The following review has been created for your research paper:
"""
{review}
"""

Improve the text using the review.'''.format(
        review=json.dumps(review)
    )
    coder_out = coder.run(improvement_prompt)

NameError: name '__file__' is not defined

In [None]:
#

import openai
from ai_scientist.perform_review import load_paper, perform_review

client = openai.OpenAI()
model = "gpt-4o-2024-05-13"

# Load paper from PDF file (raw text)
paper_txt = load_paper("report.pdf")

# Get the review dictionary
review = perform_review(
    paper_txt,
    model,
    client,
    num_reflections=5,
    num_fs_examples=1,
    num_reviews_ensemble=5,
    temperature=0.1,
)

# Inspect review results
review["Overall"]    # Overall score (1-10)
review["Decision"]   # 'Accept' or 'Reject'
review["Weaknesses"] # List of weaknesses (strings)

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

# **Making Your Own Template**

If there is an area of study you would like The AI Scientist to explore, it is straightforward to create your own templates. In general, follow the structure of the existing templates, which consist of:

* **experiment.py** — This is the main script where the core content is. It takes an argument --out_dir, which specifies where it should create the folder and save the relevant information from the run.
* **plot.py** — This script takes the information from the run folders and creates plots. The code should be clear and easy to edit.
* **prompt.json** — Put information about your template here.
* **seed_ideas.json** — Place example ideas here. You can also try to generate ideas without any examples and then pick the best one or two to put here.
* **latex/template.tex** — We recommend using our LaTeX folder but be sure to replace the pre-loaded citations with ones that you expect to be more relevant.

The key to making new templates work is matching the base filenames and output JSONs to the existing format; everything else is free to change. You should also ensure that the template.tex file is updated to use the correct citation style / base plots for your template.

## **Community-Contributed Templates**

We welcome community contributions in the form of new templates. While these are not maintained by us, we are delighted to highlight your templates to others. Below, we list community-contributed templates along with links to their pull requests (PRs):

* Infectious Disease Modeling (seir) - PR #137
* Image Classification with MobileNetV3 (mobilenetV3) - PR #141
* Sketch RNN (sketch_rnn) - PR #143
* AI in Quantum Chemistry (MACE) - PR#157
* Earthquake Prediction (earthquake-prediction) - PR #167
* Tensorial Radiance Fields (tensorf) - PR #175

# **Template Resources**

We provide three templates, which heavily use code from other repositories, credited below:

* NanoGPT Template uses code from NanoGPT and this PR.
* 2D Diffusion Template uses code from tiny-diffusion, ema-pytorch, and Datasaur.
* Grokking Template uses code from Sea-Snell/grokking and danielmamay/grokking.

We would like to thank the developers of the open-source models and packages for their contributions and for making their work available.

# **Citing The AI Scientist**

If you use The AI Scientist in your research, please cite it as follows:

~~~text
@article{lu2024aiscientist,
  title={The {AI} {S}cientist: Towards Fully Automated Open-Ended Scientific Discovery},
  author={Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David},
  journal={arXiv preprint arXiv:2408.06292},
  year={2024}
}
~~~

# **Frequently Asked Questions**

We recommend reading our paper first for any questions you have on The AI Scientist.

**Why am I missing files when running The AI Scientist?**

Ensure you have completed all the setup and preparation steps before the main experiment script.

**Why has a PDF or a review not been generated?**

The AI Scientist finishes an idea with a success rate that depends on the template, the base foundation model, and the complexity of the idea. We advise referring to our main paper. The highest success rates are observed with Claude Sonnet 3.5. Reviews are best done with GPT-4o; all other models have issues with positivity bias or failure to conform to required outputs.

**What is the cost of each idea generated?**

Typically less than $15 per paper with Claude Sonnet 3.5. We recommend DeepSeek Coder V2 for a much more cost-effective approach. A good place to look for new models is the Aider leaderboard.

**How do I change the base conference format associated with the write-ups?**

Change the base template.tex files contained within each template.

**How do I run The AI Scientist for different subject fields?**

Please refer to the instructions for different templates. In this current iteration, this is restricted to ideas that can be expressed in code. However, lifting this restriction would represent exciting future work! :)

**How do I add support for a new foundation model?**

You may modify ai_scientist/llm.py to add support for a new foundation model. We do not advise using any model that is significantly weaker than GPT-4 level for The AI Scientist.

**Why do I need to run the baseline runs myself?**

These appear as run_0 and should be run per machine you execute The AI Scientist on for accurate run-time comparisons due to hardware differences.

**What if I have problems accessing the Semantic Scholar API?**

We use the Semantic Scholar API to check ideas for novelty and collect citations for the paper write-up. You may be able to skip these phases if you don't have an API key or the API is slow to access.