<a href="https://colab.research.google.com/github/KyleAMoore/LLM-UQ-HC/blob/main/LLM_UQ_HC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We got all of our survey data from the Pew Research Center's American trends Panel Datasets. All survey data ranges from 10/24/2022 to 1/18/2024 so an LLM trained on recent data will give responses accurate to the surveys. This resulted in a search through articles in waves 114-132, which can be found at https://www.pewresearch.org/american-trends-panel-datasets/. We searched for opinion-based survey data that presents a group uncertainty in the targeted populous. We avoided surveys and questions with any of the following properties: time-sensitive data that an LLM trained on more recent data would not accurately represent; volatile data that can change day-to-day such as political figure approval ratings; personal experience data such as questions asking "have you ever...;" surveys targetting demographics that are too specific or skewed to represent the whole of the data an LLM is trained on; questions that seperate the survey data by any demographic without giving the raw, undifferentiated data;

Running record of prior work:
*   https://arxiv.org/pdf/2307.10236


Self-reported uncertainty:
*   https://aclanthology.org/2024.acl-long.283.pdf - primary inspiration for our method
*   https://arxiv.org/pdf/2207.05221 - Earlier method, iterated on by 2024.acl-long.283
*   https://aclanthology.org/2024.emnlp-main.299.pdf - only applies to long form (multi-sentence) generation. Also requires multiple long-form responses to compute the uncertainty for a single response.
*   https://aclanthology.org/2024.emnlp-main.1205.pdf - sentence level, requires multisampling, SAR scores are on shaky theoretical grounding

# Installs and Imports

1. able to load file into pandas? 2. context column into list of strings instead of string that looks like list. 3. change contexts and complements to question answer format. 4. run each row independently instead of grouped.

In [None]:
import csv
import itertools
import json
import tqdm
import os
import ast
import gc
import numpy as np
import pandas as pd
import scipy
import time

import torch
import transformers as tf

In [None]:
import google.colab.drive
import google.colab.output
import google.colab.userdata

In [None]:
google.colab.drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
pip_output = !pip install PopulationLM@git+https://github.com/JesseTNRoberts/PopulationLM \
                          accelerate
err_lines = [ln for ln in pip_output if 'error' in ln.lower()]

google.colab.output.clear()

if err_lines:
  print(*err_lines, sep='\n')
else:
  print('Libraries successfully installed')



Libraries successfully installed


In [None]:
import PopulationLM as pop

In [None]:
my_token = google.colab.userdata.get('hf_token')

# Loading Functions

In [None]:
def load_data(filenames):
    ctx = filenames['Questions']
    cmplts = filenames['Answers']

    contexts = []
    for i in ctx:
      contexts.append(i)

    completions = []
    for i in cmplts:
      completions.append(i)

    return contexts, completions


def load_model(model_name, device='cuda', token=None, verbose=False):
    config = tf.AutoConfig.from_pretrained(model_name, token=my_token)
    tokenizer = tf.AutoTokenizer.from_pretrained(model_name, token=my_token)

    tokenizer.pad_token = tokenizer.eos_token

    try:
        transformer = tf.AutoModelForCausalLM.from_pretrained(model_name,
                                                   config=config,
                                                   token=my_token,
                                                   resume_download=True,
                                                   low_cpu_mem_usage=True,
                                                   device_map=device)
        if verbose:
            print(f'Successfully loaded model: {model_name}')
    except Exception as e:
        if verbose:
            print(f'[ERROR] Failed to load model: {model_name}')
        raise e

    transformer.pop_ready = False

    return transformer, tokenizer

# Uncertainty Functions

In [None]:
def next_token_pdf(model, tokenizer, ctx):
    tokenized = tokenizer(ctx, return_tensors='pt', padding=True, add_special_tokens=False).to(model.device.type)
    outputs = model(**tokenized)

    return torch.nn.LogSoftmax(dim=-1)(outputs.logits[0, -1, :]).exp().detach()

def cloze_test(model, tokenizer, pdf, tok_cmplts):
    probs = pdf[tok_cmplts['input_ids'][:,0]]

    return np.argmax(probs.detach().cpu().numpy())

def unc_self_report(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt):
    '''
      result is a list of shape (cmplts_count, 2)

      [
        answer choice 1 results = [prob(best), prob(worst)],
        answer choice 2 results = [prob(best), prob(worst)],
        ...
      ]
    '''
    choice_uqs = []
    unc_cmplts = ['best', 'worst']
    uq_cmplts_tok = tokenizer(unc_cmplts, return_tensors='pt', padding=True, add_special_tokens=False)
    for c in cmplts:
        uncertainty_prompt = f'{c}\nThis answer is '
        uq_pdf = next_token_pdf(model, tokenizer, ctx + uncertainty_prompt)
        choice_uqs.append(list(uq_pdf[uq_cmplts_tok['input_ids'][:,0]].detach().cpu().numpy()))
    return choice_uqs

def unc_frequency(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt, trials=100):
    tok_cmplts = tokenizer(cmplts, return_tensors='pt', padding=True, add_special_tokens=False)
    choice_uqs = list(output_pdf[tok_cmplts['input_ids'][:,0]].detach().cpu().numpy() * trials)

    return choice_uqs

def unc_cred_int(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt, p=0.95):
    sorted_probs, indices = torch.sort(output_pdf, dim=-1, descending=True)
    cum_sum_probs = torch.cumsum(sorted_probs, dim=-1)
    nucleus = torch.count_nonzero(cum_sum_probs < p)

    return nucleus.item()

def unc_entropy(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt, choices_only=False):
    if choices_only:
        tok_cmplts = tokenizer(cmplts, return_tensors='pt', padding=True, add_special_tokens=False)
        choices_pdf = output_pdf[tok_cmplts['input_ids'][:,0]]
        return np.sum(choices_pdf.detach().cpu().numpy() * np.log(choices_pdf.detach().cpu().numpy()))
    else:
        return np.sum(output_pdf.detach().cpu().numpy() * np.log(output_pdf.detach().cpu().numpy()))

def unc_choice_entropy(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt):
    return unc_entropy(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt, choices_only=True)

def unc_topk_entropy(model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt, k=10):
    topk = torch.topk(output_pdf, k)[0].detach().cpu().numpy()

    return np.sum(topk * np.log(topk))

def get_uncertainty(model, tokenizer, ctx, cmplts, uncertainty_type, output_pdf, choice, choice_txt, **kwargs):
    return unc_funcs[uncertainty_type](model, tokenizer, ctx, cmplts, output_pdf, choice, choice_txt, **kwargs)

def next_token_pdf_alt(model, tokenizer, ctx):
    tokenized = tokenizer(ctx, return_tensors='pt', padding=True, add_special_tokens=False).to(model.device.type)
    outputs = model(**tokenized)

    return torch.nn.LogSoftmax(dim=-1)(outputs.logits[:, -1, :]).exp().detach()

def unc_self_report_pop(model, tokenizer, ctx, cmplts, pop_size=30):
    '''
      result is a list of shape (pop_size, cmplts_count, 2)

      [
        pop 1 results = [
          answer choice 1 results = [prob(best), prob(worst)],
          answer choice 2 results = [prob(best), prob(worst)],
          ...
        ],
        pop 2 results = [
          answer choice 1 results = [prob(best), prob(worst)],
          answer choice 2 results = [prob(best), prob(worst)],
          ...
        ],
        ...
      ]
    '''
    if not model.pop_ready:
        prepare_model_for_pop(model)

    unc_cmplts = ['best', 'worst']
    uq_cmplts_tok = tokenizer(unc_cmplts, return_tensors='pt', padding=True, add_special_tokens=False)

    contexts = [ctx + f'{c}\nThis answer is ' for c in cmplts]

    population = pop.generate_dropout_population(model,
                                                 lambda: next_token_pdf_alt(model, tokenizer, contexts),
                                                 committee_size=pop_size)
    pop_results = list(pop.call_function_with_population(model,
                                                    population,
                                                    lambda: next_token_pdf_alt(model, tokenizer, contexts)))

    return [pr[:, uq_cmplts_tok['input_ids'][:,0]].detach().cpu().numpy().tolist() for pr in pop_results]

def gawc_population(model, tokenizer, ctx, cmplts, pop_size=30):
    if not model.pop_ready:
        prepare_model_for_pop(model)

    population = pop.generate_dropout_population(model,
                                                 lambda: next_token_pdf(model, tokenizer, ctx),
                                                 committee_size=pop_size)
    pop_results = list(pop.call_function_with_population(model,
                                                         population,
                                                         lambda: next_token_pdf(model, tokenizer, ctx)))

    tok_cmplts = tokenizer(cmplts, return_tensors='pt', padding=True, add_special_tokens=False)
    choice_pop_probs = [ind_pdf[tok_cmplts['input_ids'][:,0]].detach().cpu().numpy() for ind_pdf in pop_results]
    choice_pop_probs = np.array(choice_pop_probs)

    choices = [cloze_test(model, tokenizer, ind_pdf, tok_cmplts) for ind_pdf in pop_results]
    top_choice = scipy.stats.mode(choices)
    choice_txt = cmplts[top_choice.mode]

    choice_uqs = list(np.std(choice_pop_probs, axis=0))

    return choice_txt, choice_uqs

unc_funcs = {
    'self-report' : unc_self_report,
    'freq'        : unc_frequency,
    'top-p'       : unc_cred_int,
    'entropy'     : unc_entropy,
    'c-ent'       : unc_choice_entropy,
    'top-k-ent'   : unc_topk_entropy,
}


# Main Functions

In [None]:
def prepare_model_for_pop(model):
    pop.DropoutUtils.add_new_dropout_layers(model)
    pop.DropoutUtils.convert_dropouts(model)
    pop.DropoutUtils.activate_mc_dropout(model, activate=True, random=0.1)

    model.pop_ready = True

def run_experiment(model,
                   tokenizer,
                   data_locs,
                   output_dir,
                   causal=True,
                   device='cuda',
                   token=None,
                   early_exit=False,
                   verbose=False,
                   instr_str = "Following is a question and a selection of answer choices. Provide the label for the answer with which you most agree.\nQuestion: ",
                   query_str = "\nAnswer: ",
                   quest_fmt = "\n {}. {}",
                   **kwargs):
    #copy original table
    df = pd.read_csv(data_locs)
    # df.to_csv(output_dir, mode = 'w')

    if early_exit:
        df = df.head(3)

    #get contexts/completions as list of strings, not a string that looks like a list
    contexts, completions = load_data(df)
    indices = 0

    for cmplt, ctx in zip(completions, contexts):
        cmplt = cmplt.split(",")
        for index, j in enumerate(cmplt):
            letter = chr(index + 65)
            ctx = ctx + quest_fmt.format(letter, j)
            cmplt[index] = letter
        completions[indices] = cmplt
        contexts[indices] = instr_str + ctx + query_str
        indices += 1

    certainties = {k: [] for k in unc_funcs.keys()}
    answers = []

    for ctx, cmplt in zip(contexts, completions):
        tok_cmplts = tokenizer(cmplt, return_tensors='pt', padding=True, add_special_tokens=False)
        output_pdf = next_token_pdf(model, tokenizer, ctx)
        choice = cloze_test(model, tokenizer, output_pdf, tok_cmplts)
        choice_txt = cmplt[choice]

        answers.append(choice_txt)

        for unc_t in unc_funcs.keys():
            if verbose:
                print(f'\tUncertainty type: {unc_t}                                        ',end='\r')

            unc = get_uncertainty(model, tokenizer, ctx, cmplt, unc_t, output_pdf, choice, choice_txt, **kwargs)

            certainties[unc_t].append(unc)

    df['answers'] = answers

    pop_answers = []
    pop_std_certainties = []
    pop_sr_certainties = []
    for ctx, cmplt in zip(contexts, completions):
        pop_ans, pop_ctnty = gawc_population(model, tokenizer, ctx, cmplt)
        pop_answers.append(pop_ans)
        pop_std_certainties.append(pop_ctnty)

        pop_sr_certainties.append(unc_self_report_pop(model, tokenizer, ctx, cmplt))

    df['pop answers'] = pop_answers

    for unc_t in certainties.keys():
        df[str(unc_t) + ' certainties'] = certainties[unc_t]
    df['pop_std certainties'] = pop_std_certainties
    df['pop_sr certainties'] = pop_sr_certainties

    df.to_csv(output_dir, na_rep = 'No Value',  mode = 'w')
    print(f'results saved to {output_dir}')


llama-3.1 1,3,8
solar
mistral
mistral 2

In [None]:
models = [
    # 'meta-llama/Llama-3.2-1B',
    # 'meta-llama/Llama-3.2-3B',
    # 'meta-llama/Llama-3.1-8B',
    # 'mistralai/Mistral-7B-v0.1',
    # 'mistralai/Mistral-7B-v0.3',
    'meta-llama/Llama-3.2-1B-Instruct',
    'meta-llama/Llama-3.2-3B-Instruct',
    'meta-llama/Llama-3.1-8B-Instruct',
    'mistralai/Mistral-7B-Instruct-v0.1',
    'mistralai/Mistral-7B-Instruct-v0.3',
    # 'upstage/SOLAR-10.7B-v1.0',
    # 'upstage/SOLAR-10.7B-Instruct-v1.0',
]

In [None]:
#ALWAYS PLACE EXPERIMENT TIMES HERE BEFORE RUNNING NEW EXPERIMENT. LABEL WITH EXPERIMENT DETAILS TO AVOID DOUBLE COUNTING
'''
    'meta-llama/Llama-3.2-1B',
    'meta-llama/Llama-3.2-3B',
    'meta-llama/Llama-3.1-8B',
    'mistralai/Mistral-7B-v0.1',
    'mistralai/Mistral-7B-v0.3',
    'meta-llama/Llama-3.2-1B-Instruct',
    'meta-llama/L1lama-3.2-3B-Instruct',
    'meta-llama/Llama-3.1-8B-Instruct',
    'mistralai/Mistral-7B-Instruct-v0.1',
    'mistralai/Mistral-7B-Instruct-v0.3',

    2928.920788526535
'''

'''
  reruns (non-instruct, non-solar)
  1894.7749633789062

  (instruct, non-solar)

'''


In [None]:
base_dir = '/content/drive/MyDrive/Research/LLM-UQ/'

os.makedirs(base_dir + 'results', exist_ok=True)

start_time = time.time()

try:
  for model_name in models:
      model, tokenizer = load_model(model_name)
      print('Running experiment on {}                                            '.format(model_name))

      data_loc_in = base_dir + 'Data.csv'
      data_loc_out = base_dir + 'results/' + model_name.replace('/','-') + '.csv'

      # instr_str = 'Following is a question followed by a collection of answer choices. Provide the label of the answer choice you most agree with.'
      # query_str = '. Of the answer choices, the one I most agree with is answer choice '
      # quest_fmt = ' {}. {}'

      run_experiment(model,
                     tokenizer,
                     data_loc_in,
                     data_loc_out,
                     verbose=True,)
                    #  instr_str=instr_str,
                    #  query_str=query_str,
                    #  quest_fmt=quest_fmt)

      print('Experiments completed: {}'.format(model_name))

      del model
      del tokenizer
      torch.cuda.empty_cache()
      gc.collect()
except Exception as e:
  stop_time = time.time()
  print('Experiment time: {}'.format(stop_time - start_time))
  raise e
else:
  stop_time = time.time()
  print('Experiment time: {}'.format(stop_time - start_time))

config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

Running experiment on meta-llama/Llama-3.2-1B-Instruct                                            
[LlamaModel(
  (embed_tokens): Embedding(128256, 2048)
  (layers): ModuleList(
    (0-15): 16 x LlamaDecoderLayer(
      (self_attn): LlamaAttention(
        (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
        (k_proj): Linear(in_features=2048, out_features=512, bias=False)
        (v_proj): Linear(in_features=2048, out_features=512, bias=False)
        (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
      )
      (mlp): LlamaMLP(
        (gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
        (up_proj): Linear(in_features=2048, out_features=8192, bias=False)
        (down_proj): Sequential(
          (0): Linear(in_features=8192, out_features=2048, bias=False)
          (1): StratifiedDropoutMC()
        )
        (act_fn): SiLU()
      )
      (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
      (post_attention_layernorm)

config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

Running experiment on meta-llama/Llama-3.2-3B-Instruct                                            
[LlamaModel(
  (embed_tokens): Embedding(128256, 3072)
  (layers): ModuleList(
    (0-27): 28 x LlamaDecoderLayer(
      (self_attn): LlamaAttention(
        (q_proj): Linear(in_features=3072, out_features=3072, bias=False)
        (k_proj): Linear(in_features=3072, out_features=1024, bias=False)
        (v_proj): Linear(in_features=3072, out_features=1024, bias=False)
        (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
      )
      (mlp): LlamaMLP(
        (gate_proj): Linear(in_features=3072, out_features=8192, bias=False)
        (up_proj): Linear(in_features=3072, out_features=8192, bias=False)
        (down_proj): Sequential(
          (0): Linear(in_features=8192, out_features=3072, bias=False)
          (1): StratifiedDropoutMC()
        )
        (act_fn): SiLU()
      )
      (input_layernorm): LlamaRMSNorm((3072,), eps=1e-05)
      (post_attention_layernor

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

Running experiment on meta-llama/Llama-3.1-8B-Instruct                                            
[LlamaModel(
  (embed_tokens): Embedding(128256, 4096)
  (layers): ModuleList(
    (0-31): 32 x LlamaDecoderLayer(
      (self_attn): LlamaAttention(
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
        (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
        (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): LlamaMLP(
        (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (down_proj): Sequential(
          (0): Linear(in_features=14336, out_features=4096, bias=False)
          (1): StratifiedDropoutMC()
        )
        (act_fn): SiLU()
      )
      (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      (post_attention_layer

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Running experiment on mistralai/Mistral-7B-Instruct-v0.1                                            
[MistralModel(
  (embed_tokens): Embedding(32000, 4096)
  (layers): ModuleList(
    (0-31): 32 x MistralDecoderLayer(
      (self_attn): MistralAttention(
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
        (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
        (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): MistralMLP(
        (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (down_proj): Sequential(
          (0): Linear(in_features=14336, out_features=4096, bias=False)
          (1): StratifiedDropoutMC()
        )
        (act_fn): SiLU()
      )
      (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
      (post_atte

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Running experiment on mistralai/Mistral-7B-Instruct-v0.3                                            
[MistralModel(
  (embed_tokens): Embedding(32768, 4096)
  (layers): ModuleList(
    (0-31): 32 x MistralDecoderLayer(
      (self_attn): MistralAttention(
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
        (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
        (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): MistralMLP(
        (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (down_proj): Sequential(
          (0): Linear(in_features=14336, out_features=4096, bias=False)
          (1): StratifiedDropoutMC()
        )
        (act_fn): SiLU()
      )
      (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
      (post_atte