## Medical Question Answering (Alpaca Dataset): Prompt Tuning

Dataset Source: https://huggingface.co/datasets/prognosis/medical_qa_alpaca

#### Install Necessary Libraries

In [1]:
%pip install peft transformers datasets
%pip install bitsandbytes
%pip install accelerate -U
%pip install einops

Collecting peft
  Downloading peft-0.4.0-py3-none-any.whl (72 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.14.4-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate (from peft)
  Downloading accelerate-0.21.0-py3-none-any.whl (244 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors (from peft)
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

#### Enter HuggingFace Access Token

In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


#### Import Necessary Libraries

In [3]:
import os, sys
os.environ['TOKENIZERS_PARALLELISM']='false'

import torch
from torch.utils.data import DataLoader
from torch import nn
torch.cuda.empty_cache()

from tqdm import tqdm

import datasets
from datasets import load_dataset, DatasetDict

import gc
gc.collect()

import transformers
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    default_data_collator,
    get_linear_schedule_with_warmup
)

import peft
from peft import (
    get_peft_config,
    get_peft_model,
    PromptTuningInit,
    PromptTuningConfig,
    TaskType,
    PeftType
)

import bitsandbytes as bnb

!git lfs install

Git LFS initialized.


#### Display Library Versions

In [4]:
library_len = 14
version_len = 12

print(f"+{'-' * (library_len + version_len + 5)}+")
print("|", "Library".rjust(library_len), "|", "Version".ljust(version_len), "|")
print(f"|{'*' * (library_len + version_len + 5)}|")
print("|", "Python".rjust(library_len), "|", sys.version[0:6].ljust(version_len), "|")
print("|", "Torch".rjust(library_len), "|", torch.__version__.ljust(version_len), "|")
print("|", "Datasets".rjust(library_len), "|", datasets.__version__.ljust(version_len), "|")
print("|", "Transformer".rjust(library_len), "|", transformers.__version__.ljust(version_len), "|")
print("|", "PEFT".rjust(library_len), "|", peft.__version__.ljust(version_len), "|")
print(f"+{'-' * (library_len + version_len + 5)}+")

+-------------------------------+
|        Library | Version      |
|*******************************|
|         Python | 3.10.1       |
|          Torch | 2.0.1+cu118  |
|       Datasets | 2.14.4       |
|    Transformer | 4.31.0       |
|           PEFT | 0.4.0        |
+-------------------------------+


#### Basic Values/Constants

In [5]:
DATASET_NAME = "prognosis/medical_qa_alpaca"
MODEL_CKPT = "EleutherAI/gpt-neo-2.7B"

MODEL_NAME = f"{MODEL_CKPT.split('/')[-1]}-Prompt_Tuned-{DATASET_NAME}"
LR = 3e-2

MAX_LENGTH = 500
NUM_OF_EPOCHS = 18

TEXT_COLUMN = "instruction"
LABEL_COLUMN = "output"

BATCH_SIZE = 6
DEVICE = torch.device("cuda")

#### Define Peft Configuration

In [6]:
peft_configuration = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=8,
    prompt_tuning_init_text="Provide Answers for Medical Questions:",
    tokenizer_name_or_path=MODEL_CKPT,
)

#### Load Dataset

In [7]:
data = load_dataset(DATASET_NAME)

# I made sure that all samples had no value in 'input' feature
data = data.remove_columns(['input'])

print(data)

print(data['train'][12])

Downloading readme:   0%|          | 0.00/154 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.60M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['instruction', 'output'],
        num_rows: 1003
    })
})
{'instruction': 'I would test Seum Copper levels and Zinc. Copper deficiency can cause peripheral neuropathy, histamine intolerance (copper is a cofactor of DAO), dysautonomia (copper is a cofactor of DBH, required to convert dopamine to norepinephrine), EKG abnormalities.Copper deficiency can be caused by zinc toxicity, and zinc has become a very popular supplement "to prevent" COVID. Too much zinc supplementation without Copper can lead to copper deficiency.We havent looked into that. She doesnt take any zinc supplements so were not sure what would cause her to have copper deficiency or zinc toxicity. Well write it down to mention to her PCP at her next appointment.She has been checked for carcinoid syndrome. Her GI doctor suggested it and she had both the CgA blood test and 24-hour urine 5-HIAA test that luckily both came back in normal range.', 'output': 'I agree with an

#### Determine Maximum Length of Tokenized Labels

In [8]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_CKPT)

if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

target_max_length = max([len(tokenizer(label)["input_ids"])
                         for label in data['train']['output']])

print(f"The maximum tokenized response length is {target_max_length}")

Downloading (…)okenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

The maximum tokenized response length is 1087


#### Trim Overly Lengthy Tokenized Samples

In [9]:
data['train'] = data['train'].select(
    (
        sample for sample in range(len(data['train']))
        if len(tokenizer(data['train'][sample]["output"])['input_ids']) < 501
    )
)

data.shape



{'train': (993, 2)}

#### Create Function to Preprocess Dataset

It will:
- tokenize the entire dataset
- for each example in a batch, pad the labels with the tokenizers pad_token_id
- concatenate the input text & labels to form model_inputs
- create separate attention mask for labels & model_inputs
- loop through each example in the batch again to pad the input_ids, labels, and attention_mask to the max_length & convert them to PyTorch tensors.

In [10]:
def function_to_preprocess_data(examples):
    batch_size = len(examples[TEXT_COLUMN])
    inputs = [f"{TEXT_COLUMN} : {x} Label : " for x in examples[TEXT_COLUMN]]
    targets = [str(x) for x in examples[LABEL_COLUMN]]
    model_inputs = tokenizer(inputs)
    labels = tokenizer(targets)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i] + [tokenizer.pad_token_id]

        model_inputs["input_ids"][i] = sample_input_ids + label_input_ids
        labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids
        model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])

    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i]
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
            MAX_LENGTH - len(sample_input_ids)
            ) + sample_input_ids

        model_inputs["attention_mask"][i] = [0] * (
            MAX_LENGTH - len(sample_input_ids)
            ) + model_inputs["attention_mask"][i]

        labels["input_ids"][i] = [-100] * (
            MAX_LENGTH - len(sample_input_ids)) + label_input_ids

        model_inputs["input_ids"][i] = torch.tensor(
            model_inputs["input_ids"][i][:MAX_LENGTH])

        model_inputs["attention_mask"][i] = torch.tensor(
            model_inputs["attention_mask"][i][:MAX_LENGTH])

        labels["input_ids"][i] = torch.tensor(
            labels["input_ids"][i][:MAX_LENGTH])

    model_inputs["labels"] = labels["input_ids"]

    return model_inputs

#### Map Preprocessing Function to Entire Dataset

In [11]:
encoded_data = data.map(
    function_to_preprocess_data,
    batched=True,
    num_proc=1,
    remove_columns=data["train"].column_names,
    load_from_cache_file=False,
    desc="Tokenizing Dataset",
)

del data

Tokenizing Dataset:   0%|          | 0/993 [00:00<?, ? examples/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (2148 > 2048). Running this sequence through the model will result in indexing errors


#### Split Dataset into Training & Evaluation Datasets

In [12]:
train_eval = encoded_data['train'].train_test_split(train_size=0.80)

encoded_ds = DatasetDict({
    'train' : train_eval['train'],
    'eval' : train_eval['test'],
})

del train_eval

print("Training Dataset Shape:", encoded_ds['train'].shape)
print("Evaluation Dataset Shape:", encoded_ds['eval'].shape)

Training Dataset Shape: (794, 3)
Evaluation Dataset Shape: (199, 3)


#### Create DataLoaders for Both Training & Evaluation Datasets

In [13]:
train_ds = encoded_ds['train']
eval_ds = encoded_ds['eval']

del encoded_ds

train_dataloader = DataLoader(train_ds,
                              shuffle=True,
                              collate_fn=default_data_collator,
                              batch_size=BATCH_SIZE,
                              pin_memory=True,
                              )

eval_dataloader = DataLoader(eval_ds,
                             shuffle=True,
                             collate_fn=default_data_collator,
                             batch_size=BATCH_SIZE,
                             pin_memory=True,
                             )

#### Define Model

In [14]:
model = AutoModelForCausalLM.from_pretrained(MODEL_CKPT,
                                             load_in_8bit=True,
                                             trust_remote_code=True,
                                             device_map="auto")
model = get_peft_model(model,
                       peft_configuration)

for param in model.parameters():
    param.requires_grad = False
    if param.ndim == 1:
        param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float16)

model.lm_head = CastOutputToFloat(model.lm_head)

print("Model Trainable Parameters: ")
print(model.print_trainable_parameters())

print("Model Memory Footprint: ")
print(model.get_memory_footprint())

Downloading model.safetensors:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

Some weights of GPTNeoForCausalLM were not initialized from the model checkpoint at EleutherAI/gpt-neo-2.7B and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model Trainable Parameters: 
trainable params: 0 || all params: 2,651,328,000 || trainable%: 0.0
None
Model Memory Footprint: 
2921899072


#### Define Optimizer & Learning Rate Scheduler

In [15]:
optimizer = torch.optim.AdamW(model.parameters(),
                              lr=LR)

lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * NUM_OF_EPOCHS),
    )

#### Define Training Loop

In [16]:
model.to(DEVICE)

with torch.autocast("cuda"):
    for epoch in range(NUM_OF_EPOCHS):
        model.train()
        total_loss = 0
        for step, batch in enumerate(tqdm(train_dataloader)):
            batch = {k: v.to(DEVICE) for k, v in batch.items()}
            outputs = model(**batch)
            loss = outputs.loss
            total_loss += loss.detach().float()
            loss.backward()
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

        model.eval()
        eval_loss = 0
        eval_preds = []
        for step, batch in enumerate(tqdm(eval_dataloader)):
            batch = {k: v.to(DEVICE) for k,v in batch.items()}
            with torch.no_grad():
                outputs = model(**batch)
            loss = outputs.loss
            eval_loss += loss.detach().float()
            eval_preds.extend(
                tokenizer.batch_decode(
                    torch.argmax(outputs.logits, -1).detach().cpu().numpy(),
                    skip_special_tokens=True)
                )

        eval_epoch_loss = eval_loss / len(eval_dataloader)
        eval_ppl = torch.exp(eval_epoch_loss)
        train_epoch_loss = total_loss / len(train_dataloader)
        train_ppl = torch.exp(train_epoch_loss)
        print(f"{epoch=}: {train_ppl=}{train_epoch_loss=}{eval_ppl=}{eval_epoch_loss=}")

  0%|          | 0/133 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=0: train_ppl=tensor(48.1720, device='cuda:0')train_epoch_loss=tensor(3.8748, device='cuda:0')eval_ppl=tensor(73.2514, device='cuda:0')eval_epoch_loss=tensor(4.2939, device='cuda:0')


100%|██████████| 133/133 [10:23<00:00,  4.69s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=1: train_ppl=tensor(49.1520, device='cuda:0')train_epoch_loss=tensor(3.8949, device='cuda:0')eval_ppl=tensor(67.9055, device='cuda:0')eval_epoch_loss=tensor(4.2181, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=2: train_ppl=tensor(50.2388, device='cuda:0')train_epoch_loss=tensor(3.9168, device='cuda:0')eval_ppl=tensor(64.1954, device='cuda:0')eval_epoch_loss=tensor(4.1619, device='cuda:0')


100%|██████████| 133/133 [10:21<00:00,  4.67s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=3: train_ppl=tensor(49.6791, device='cuda:0')train_epoch_loss=tensor(3.9056, device='cuda:0')eval_ppl=tensor(nan, device='cuda:0')eval_epoch_loss=tensor(nan, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=4: train_ppl=tensor(50.1282, device='cuda:0')train_epoch_loss=tensor(3.9146, device='cuda:0')eval_ppl=tensor(69.1890, device='cuda:0')eval_epoch_loss=tensor(4.2368, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=5: train_ppl=tensor(49.0691, device='cuda:0')train_epoch_loss=tensor(3.8932, device='cuda:0')eval_ppl=tensor(73.7725, device='cuda:0')eval_epoch_loss=tensor(4.3010, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=6: train_ppl=tensor(48.6242, device='cuda:0')train_epoch_loss=tensor(3.8841, device='cuda:0')eval_ppl=tensor(66.5782, device='cuda:0')eval_epoch_loss=tensor(4.1984, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=7: train_ppl=tensor(49.4355, device='cuda:0')train_epoch_loss=tensor(3.9007, device='cuda:0')eval_ppl=tensor(68.3250, device='cuda:0')eval_epoch_loss=tensor(4.2243, device='cuda:0')


100%|██████████| 133/133 [10:21<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=8: train_ppl=tensor(50.3764, device='cuda:0')train_epoch_loss=tensor(3.9195, device='cuda:0')eval_ppl=tensor(60.4538, device='cuda:0')eval_epoch_loss=tensor(4.1019, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=9: train_ppl=tensor(53.8732, device='cuda:0')train_epoch_loss=tensor(3.9866, device='cuda:0')eval_ppl=tensor(71.3032, device='cuda:0')eval_epoch_loss=tensor(4.2669, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=10: train_ppl=tensor(50.4724, device='cuda:0')train_epoch_loss=tensor(3.9214, device='cuda:0')eval_ppl=tensor(66.8300, device='cuda:0')eval_epoch_loss=tensor(4.2022, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=11: train_ppl=tensor(49.6376, device='cuda:0')train_epoch_loss=tensor(3.9047, device='cuda:0')eval_ppl=tensor(62.7634, device='cuda:0')eval_epoch_loss=tensor(4.1394, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=12: train_ppl=tensor(49.7269, device='cuda:0')train_epoch_loss=tensor(3.9065, device='cuda:0')eval_ppl=tensor(68.5783, device='cuda:0')eval_epoch_loss=tensor(4.2280, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=13: train_ppl=tensor(47.6926, device='cuda:0')train_epoch_loss=tensor(3.8648, device='cuda:0')eval_ppl=tensor(67.1322, device='cuda:0')eval_epoch_loss=tensor(4.2067, device='cuda:0')


100%|██████████| 133/133 [10:21<00:00,  4.67s/it]
100%|██████████| 34/34 [00:49<00:00,  1.44s/it]


epoch=14: train_ppl=tensor(48.7074, device='cuda:0')train_epoch_loss=tensor(3.8858, device='cuda:0')eval_ppl=tensor(69.0900, device='cuda:0')eval_epoch_loss=tensor(4.2354, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=15: train_ppl=tensor(48.4929, device='cuda:0')train_epoch_loss=tensor(3.8814, device='cuda:0')eval_ppl=tensor(65.9063, device='cuda:0')eval_epoch_loss=tensor(4.1882, device='cuda:0')


100%|██████████| 133/133 [10:21<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]


epoch=16: train_ppl=tensor(49.4117, device='cuda:0')train_epoch_loss=tensor(3.9002, device='cuda:0')eval_ppl=tensor(60.8672, device='cuda:0')eval_epoch_loss=tensor(4.1087, device='cuda:0')


100%|██████████| 133/133 [10:22<00:00,  4.68s/it]
100%|██████████| 34/34 [00:48<00:00,  1.44s/it]

epoch=17: train_ppl=tensor(48.3665, device='cuda:0')train_epoch_loss=tensor(3.8788, device='cuda:0')eval_ppl=tensor(74.5205, device='cuda:0')eval_epoch_loss=tensor(4.3111, device='cuda:0')





#### Push Model to Hub

In [24]:
model.push_to_hub(repo_id=f"gpt-neo-2.7B-Prompt_Tuned-prognosis-medical_qa_alpaca", use_auth_token=True)

adapter_model.bin:   0%|          | 0.00/82.7k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/DunnBC22/gpt-neo-2.7B-Prompt_Tuned-prognosis-medical_qa_alpaca/commit/7fe4640c5ad237d18db27052efbdd74154062022', commit_message='Upload model', commit_description='', oid='7fe4640c5ad237d18db27052efbdd74154062022', pr_url=None, pr_revision=None, pr_num=None)

### Notes & Other Takeaways From This Project

****
- While the resulting metric was not what I was hoping for, there is a good reason. In order to prompt tune this model, one needs to use a GPU. The only way for me to use aGPU is via Google Colab. The limits for Google Colab mean that I have to reduce the number of training epochs to complete the project.

- From what I have read, it looks like the preferred duration for this project is about 50 epochs; I trained the above project for 18 epochs.

****

### Citations

- Model Checkpoint
    > @software{gpt-neo, author = {Black, Sid and Leo, Gao and Wang, Phil and Leahy, Connor and Biderman, Stella}, title = {{GPT-Neo: Large Scale Autoregressive Language Modeling with  Mesh-Tensorflow}}, month = mar, year = 2021, note = {{If you use this software, please cite it using these metadata.}}, publisher = {Zenodo}, version = {1.0}, doi = {10.5281/zenodo.5297715}, url = {https://doi.org/10.5281/zenodo.5297715}}

    >@article{gao2020pile, title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling}, author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others}, journal={arXiv preprint arXiv:2101.00027}, year={2020}}

- Dataset
    > https://huggingface.co/datasets/prognosis/medical_qa_alpaca

- Metric (Perplexity)
    > @article{jelinek1977perplexity, title={Perplexity—a measure of the difficulty of speech recognition tasks}, author={Jelinek, Fred and Mercer, Robert L and Bahl, Lalit R and Baker, James K}, journal={The Journal of the Acoustical Society of America}, volume={62}, number={S1}, pages={S63--S63}, year={1977}, publisher={Acoustical Society of America}}