<a href="https://colab.research.google.com/github/EdBerg21/AI-Based-Fraud-Detection/blob/main/finetune_redpajama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuning RedPajama ("OpenLlama")
by John Robinson 05/15/2023 [Follow @johnrobinsn on Twitter](https://twitter.com/johnrobinsn)

<img src="https://www.storminthecastle.com/img/finetune_redpajama_files/RedPajama_256.png"><br/>

[<img src="https://www.storminthecastle.com/img/github.svg">](https://github.com/johnrobinsn/redpajama/blob/main/finetune_redpajama.ipynb) [<img src="https://www.storminthecastle.com/img/colab.svg">](https://colab.research.google.com/github/johnrobinsn/redpajama/blob/main/finetune_redpajama.ipynb)

[Llama](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), the large language model from Meta was [_effectively_ released](https://github.com/facebookresearch/llama/pull/73/files) to the world a little over two months ago, resulting in an explosion of [experimentation](https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor), [exploration](https://lmsys.org/blog/2023-03-30-vicuna/) and [innovation](https://github.com/ggerganov/llama.cpp). Unfortunately Llama has a fairly restrictive "research-only" license which prohibits commercial use.  

[RedPajama](https://www.together.xyz/blog/redpajama) is an effort to create an open-source clone of Llama.  It's a project to create leading open-source language models and starts by reproducing the LLaMA training dataset of over 1.2 trillion tokens. Leveraging this dataset, the team from [together.xyz](http://together.xyz) has just released initial versions of their 3B and 7B RedPajama models. Both of these come in a number of flavors including a [base model](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) and a couple of finetuned variants (an instruction-tuned variant and a chat-tuned variant). **All of these have been released under a liberal Apache-2.0 Open Source License. Making them suitable for commercial applications**

Out of the box, the base model (given a sequence of words) has been trained to "statistically" predict what word should come next based on the very large dataset described above. As such, the language model has in a way compressed and encoded much of the information contained within that dataset and when prompted can regenerate information that it has learned, one token at a time. These generated sequences echo many of the ideas, concepts and "thoughts" that were encoded in the dataset, reflecting thoughts about society, humanity, general knowledge and more. You can sort of think of the base model as having compressed a large portion of the Internet into a set of a few billion numbers.  

But while predicting what word comes next is a powerful capability, it is fairly limited in practical application. Enter the process of alignment or finetuning. As you might imagine, the base model has been exposed to many different patterns of written language, ranging from freeform prose, to poetry, Q&As, dialogue etc. Finetuning is about bringing one or more of these latent language patterns to the surface, so that the language model performs better at a specific desired behavior or task. A few of the more common downstream tasks that large language models have been proven to be good at include:

* Document Summarization
* Sentiment Analysis
* Chatbot (Human/Bot. Something like a ChatGPT)
* Instruction Following

In this notebook, I'll demonstrate finetuning the RedPJ model into an "Instruction Following" variant using the [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html) dataset. Once RedPJ has been finetuned for the task of Instruction Following, we'll expect our model to be able to take in a natural language instruction and be able to generate coherent responses to those instructions.  My goal here is two-fold, one to demonstrate the mechanics of how to finetune a LLM on a dataset of your own choosing and two to show the profound impact that finetuning can have on the behavior of a LLM.

There are several approaches that we can take to finetuning an LLM model. Full finetuning involves using back propogation to iteratively modify all of the model weights, which can be quite resource intensive from a compute and a memory perspective. Another popular approach to finetuning is through the use of [LoRA adapters](https://arxiv.org/abs/2106.09685).  LoRA is a technique that allows you to finetune a large language model on a much smaller GPU. It does this by freezing the base model weights and dynamically injecting trainable "adapter" layers into the model. The number of trainable parameters in these adapter layers are much much smaller than the base model. We'll also use [int8 quantization](https://arxiv.org/abs/2208.07339) of the base model weights to further reduce the amount of memory needed for training.

One additional benefit, since LoRA adapters are a separate set of weights from the base model and are much smaller. You can have just one copy of the large base model weights and dynamically load different LoRA adapters weights to support multiple downstream tasks for the same shared base model weights.

Later in this article, I'll demonstrate interacting with the base model before and after finetuning, so that you can see the effects of finetuning more clearly.

The HuggingFace ecosystem, makes downloading, training, saving models and datasets very quick and easy. This notebook heavily leverages the HuggingFace libraries and platform.  

_Note: This notebook should support any of the upcoming RedPajama model releases (bigger ones are coming... assuming you have enough VRAM on your GPU). I've trained and tested this notebook on both the 3B and 7B models that have been released so far._

We can start by defining a few variables that will determine which specific base model that we're finetuning (either the 3B or the 7B model).

## Setup
Configure the base model and a few other variables that we'll use later.

In [1]:
model = '3B' #'7B' # Pick your poison

if model == '7B':
    model_name = ("togethercomputer/RedPajama-INCITE-Base-7B-v0.1","togethercomputer/RedPajama-INCITE-Base-7B-v0.1")
    run_name = 'redpj7B-lora-int8-alpaca'
    dataset = 'johnrobinsn/alpaca-cleaned'
    peft_name = 'redpj7B-lora-int8-alpaca'
    output_dir = 'redpj7B-lora-int8-alpaca-results'
else: #3B
    model_name = ("togethercomputer/RedPajama-INCITE-Base-3B-v1","togethercomputer/RedPajama-INCITE-Base-3B-v1")
    run_name = 'redpj3B-lora-int8-alpaca'
    dataset = 'johnrobinsn/alpaca-cleaned'
    peft_name = 'redpj3B-lora-int8-alpaca'
    output_dir = 'redpj3B-lora-int8-alpaca-results'

model_name[1],dataset,peft_name,run_name

('togethercomputer/RedPajama-INCITE-Base-3B-v1',
 'johnrobinsn/alpaca-cleaned',
 'redpj3B-lora-int8-alpaca',
 'redpj3B-lora-int8-alpaca')

Install the required dependencies.

In [2]:
def install_dependencies():
    !pip install -Uqq  git+https://github.com/huggingface/peft.git
    !pip install -Uqq transformers datasets accelerate bitsandbytes
    !pip install -Uqq wandb

# uncomment the following line to install the required dependencies
install_dependencies()

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m56.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m42.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

__Note: If you just want to do inference you can jump all the way down to the ["Evaluate"](#evaluate) cell and start running from there to download my adapter weights from HF hub and try some prompts through the finetuned model.__

But if you want to train keep going...

## Setting Up Tracking and Monitoring using Weights and Biases

This notebook has support for logging the training run to [weights and biases (wandb)](https://wandb.ai/site).  This makes it very easy to track, monitor and annotate your training sessions from anywhere.  

Run the next cell and follow the directions to authenticate with wandb.

In [3]:
report_to = "wandb" # "none"

if report_to != "none":
    import wandb
    wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


After authenticating, we have to initialize wandb.  We add a few key-value pairs about the run to the information that will be logged to the wandb dashboard.  

_Note: You can add more key/values if you'd like._

In [4]:
wandb.init(project=run_name,config={
    "model": model_name[1],
    "dataset":dataset
})

[34m[1mwandb[0m: Currently logged in as: [33mharpermia882[0m. Use [1m`wandb login --relogin`[0m to force relogin


After you get training started below.  You can revisit the wandb links shown above to monitor the status of your training run from anywhere with Internet connectivity.  

_Note: I like to send the link (View run) to my phone so that I can monitor on the go..._

## Tokenizer
The tokenizer converts words into a list/tensor of numbers so that the model can process them.  Each language model has been trained using a specific tokenizer.  If your base model is already supported by HuggingFace then the transformer library makes it very easy to load the correct tokenizer for your given model.  Just use the AutoTokenizer class to create an instance of the correct tokenizer by just specifying the model name.

In [5]:
from transformers import AutoTokenizer

print("Loading tokenizer for model: ", model_name[1])
tokenizer = AutoTokenizer.from_pretrained(model_name[1],add_eos_token=True)
tokenizer.pad_token_id = 0

Loading tokenizer for model:  togethercomputer/RedPajama-INCITE-Base-3B-v1


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


One problem that I've found with many of the finetuning scripts and notebooks found online is that the "end-of-stream" handling is not done correctly, so in many cases the finetuned models don't know when to stop emitting tokens and tend to "blabber" on.  Since we are finetuning on an instruction following task, we would like the model to respond to the instruction prompt succintly and then stop.  There are a number of ways to approach this, but the way I approach it here is to explicitly add a new token to represent end-of-stream, &lt;eos&gt; and use that eos token during training to teach the model when it should stop. Then during inference, we can use that token to recognize when the model is done responding.

In [6]:
tokenizer.add_special_tokens({'eos_token':'<eos>'})
print('eos_token_id:',tokenizer.eos_token_id)

eos_token_id: 50277


In [7]:
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data in the alpaca dataset

def tokenize(prompt, tokenizer,add_eos_token=True):
    result = tokenizer(
        prompt+"<eos>",  # add the end-of-stream token
        truncation=True,
        max_length=CUTOFF_LEN,
        padding="max_length",
    )
    return {
        "input_ids": result["input_ids"],
        "attention_mask": result["attention_mask"],
    }


Let's give it a quick try and note the <eos> token id at the end of the sequence.

In [8]:
tokenizer('hi there<eos>')

{'input_ids': [5801, 627, 50277, 0], 'attention_mask': [1, 1, 1, 1]}

## Dataset

When finetuning your model the dataset that you choose has to be aligned with your downstream task. We're using a popular Instruction Following dataset, called Alpaca. For convenience, I have a copy of the alpaca dataset that has been cleaned and published on the HuggingFace hub. We can just download it and access it from cache using the load_dataset API shown below.

In [9]:
from datasets import load_dataset

# Load dataset from the hub
data = load_dataset(dataset)
data

Downloading data:   0%|          | 0.00/22.8M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51942 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['output', 'instruction', 'input'],
        num_rows: 51942
    })
})

We can see that the dataset consists of 51,942 rows with the following features ['instruction','input','output'].  Let's take a look at one.

In [10]:
data['train'][5]

{'output': 'Telegram',
 'instruction': 'Identify the odd one out.',
 'input': 'Twitter, Instagram, Telegram'}

We can see an item that includes an 'instruction' to direct our model.  An optional 'input' which provides context to the instruction.  And then an expected output for the model.

Our goal in finetuning our model is to use this dataset to train our model to "behave" in a similar way.  Given an instruction respond with an appropriate response generalizing to the knowledge already encoded in the base model.

But we can't directly use this JSON object to train our model.  Our model can only process an ordered sequence of tokens that represent words.  So we use a "prompt template" to convert each of these JSON objects in our dataset into a sequence of words.  The prompt template follows a consistent pattern.

In [11]:
def generate_prompt(data_point):
    # sorry about the formatting disaster gotta move fast
    if data_point["input"]:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Input:
{data_point["input"]}

### Response:
{data_point["output"]}"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:
{data_point["output"]}"""


Let's see what what our example looks like when "templatized".

In [12]:
print(generate_prompt(data['train'][5]))

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Identify the odd one out.

### Input:
Twitter, Instagram, Telegram

### Response:
Telegram


The exact wording of the template is somewhat arbitrary.  It's more of a consistent pattern that after training will drive the model into responding similarly when exposed to a similar prompt.  You should be able to pick out the "instruction", "input", and "output" from the example.  

It is important that the output from the dataset is at the end of templatized prompt, since at inference time we will only provide the prompt up to **but not including the output**.  We'll expect our model to respond to our instruction on its own.

We now split out a validation dataset from our training dataset. so that we can track how well the finetuning process is learning to generalize to unseen prompts and so that we make sure we're only checkpointing our model when the validation loss is improving.

In [13]:
VAL_SET_SIZE = 2000  # we set aside 2000 items from our dataset for validation during training

train_val = data["train"].train_test_split(
    test_size=VAL_SET_SIZE, shuffle=True, seed=42
)
train_data = train_val["train"]
val_data = train_val["test"]

We prepare the training dataset and the validation dataset by running the data through the prompt templating process and then by tokenizing the prompts.

In [14]:
train_data = train_data.shuffle().map(lambda x: tokenize(generate_prompt(x), tokenizer))
val_data = val_data.shuffle().map(lambda x: tokenize(generate_prompt(x), tokenizer))

Map:   0%|          | 0/49942 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

## Load and Configure the Model for Training

Load the specified RedPajama base model from the HuggingFace hub.

_Note: Llama, Redpajama and other decoder-only models are supported by the AutoModelForCausalLM class.  But for encoder-decoder models such as the [**google/t5**](https://huggingface.co/google/flan-t5-xxl) models you'll need to use the AutoModelForSeq2SeqLM class and the training details are a little bit different.  Here is a [similar notebook](https://github.com/johnrobinsn/flan_ul2/blob/main/train-peft-flan-ul2-int8-alpaca.ipynb) for finetuning t5* models._

In [15]:
from transformers import AutoModelForCausalLM

print("Loading model for model: ", model_name[0])

model = AutoModelForCausalLM.from_pretrained(
    model_name[0],
    load_in_8bit=True,
    device_map="auto",
)

Loading model for model:  togethercomputer/RedPajama-INCITE-Base-3B-v1


config.json:   0%|          | 0.00/604 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


pytorch_model.bin:   0%|          | 0.00/5.69G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Now, we can prepare our model for the LoRA int-8 training using the HF peft library.

In [20]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType

# Define LoRA Config
lora_config = LoraConfig(
 r= 8,
 lora_alpha=16,
 target_modules=["query_key_value"],
 lora_dropout=0.05,
 bias="none",
 task_type=TaskType.CAUSAL_LM
)

# prepare int-8 model for training
model = prepare_model_for_kbit_training(model)

# add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 2,621,440 || all params: 2,778,485,760 || trainable%: 0.0943


_Note: After installing the Lora Adapters into the model notice the significant reduction in the number of trainable paramters._

We'll leverage the training loop from the transformers library since it does a pretty good job with handling the details.

In [21]:
import transformers
eval_steps = 200
save_steps = 200
logging_steps = 20

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=transformers.TrainingArguments(
        num_train_epochs=3,
        learning_rate=3e-4,
        logging_steps=logging_steps,
        evaluation_strategy="steps",
        save_strategy="steps",
        eval_steps=eval_steps,
        save_steps=save_steps,
        output_dir=output_dir,
        report_to=report_to if report_to else "none",
        save_total_limit=3,
        load_best_model_at_end=True,
        push_to_hub=False,
        auto_find_batch_size=True
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!



## Train
Run the training loop.

In [None]:
trainer.train()



Step,Training Loss,Validation Loss


## Save the Trained Adpater Model to Disk

Now that we've trained the model we'll want to save our weights.  First I demonstrate how to save them to disk.

In [None]:
# Save our LoRA model & tokenizer results
trainer.model.save_pretrained(peft_name)
tokenizer.save_pretrained(peft_name)

# if you want to save the base model to disk call
# trainer.model.base_model.save_pretrained(peft_model_id)

## Push the Trained Adapter Model to the HuggingFace Hub

Even better than saving your trained weights to disk you can push them up the HuggingFace Hub.  This makes it super easy to share your trained adapter with others or to setup your model for inference on other devices.

In [None]:
!pip install -Uqq huggingface_hub
import huggingface_hub
huggingface_hub.login()

In [None]:
# If you don't already have the git extensions for large file storage you might have to install it now.
# Here is how you can do this for Linux from the shell.  For other OSs please refer to the git-lfs documentation.
# sudo apt install git-lfs

In [None]:
repo_id = f'{huggingface_hub.whoami()["name"]}/{peft_name}'
trainer.model.push_to_hub(repo_id)
tokenizer.push_to_hub(repo_id)

You chould be able to check out HuggingFace and see your LoRA Adapter Model.

## Free Up Memory

Since we likely used a lot of memory during training and we'll need that memory back to try the model out we take a few steps to free up VRAM here.

In [None]:
import torch
import gc
config = None
model = None
tokenizer=None
trainer=None
gc.collect()
torch.cuda.empty_cache()

## Evaluate
Here we'll try out the model for inference.

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
# load base LLM model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name[0],
    load_in_8bit=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name[1])
tokenizer.pad_token_id = 0
tokenizer.add_special_tokens({'eos_token':'<eos>'})

model.eval()


Here is the prompt template we'll use for inference.

_Note: It's important that it's identical to one we used for training above, but it omits the "output/response" as our model will generate that for us._

In [None]:
def generate_prompt(data_point):
    # sorry about the formatting disaster gotta move fast
    if data_point["input"]:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Input:
{data_point["input"]}

### Response:"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:"""


Here is a small utility function that lets us easily prompt our model with an instruction and an optional input.  It handles templating the prompt, tokenizing the templatized prompt, decoding the result and then finally stripping off the prompt from the response and just leaving us with the model response.

In [None]:
def generate(instruction,input=None,maxTokens=256):
    prompt = generate_prompt({'instruction':instruction,'input':input})
    input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
    outputs = model.generate(input_ids=input_ids, max_new_tokens=maxTokens,
                             do_sample=True, top_p=0.9,pad_token_id=tokenizer.eos_token_id,
                             forced_eos_token_id=tokenizer.eos_token_id)
    outputs = outputs[0].tolist()
    # Stop decoding when hitting the EOS token
    if tokenizer.eos_token_id in outputs:
        eos_index = outputs.index(tokenizer.eos_token_id)
        decoded = tokenizer.decode(outputs[:eos_index])
        # Don't show the prompt template
        sentinel = "### Response:"
        sentinelLoc = decoded.find(sentinel)
        if sentinelLoc >= 0:
            print(decoded[sentinelLoc+len(sentinel):])
        else:
            print('Warning: Expected prompt template to be emitted.  Ignoring output.')
    else:
        print('Warning: no <eos> detected ignoring output')

### Generating using the Base Model

This demonstrates the behavior of the RedPajama model with no finetuning applied.

**BEFORE FINETUNING**

In [None]:
torch.manual_seed(42)
generate('Write a short story in third person narration about a protagonist who has to make an important career decision.',maxTokens=300)

### Load the LoRA Adapter

As you can see the generated text doesn't seem very responsive to the prompt.  Now let's load the trained LoRA adapter and see what happens.

_Note: Here you can either load up my pretrained Lora adapter from HuggingFace hub.  Or if you trained your own adapter above you can uncomment the specified line below to load your adapter from disk._

In [None]:
peft_model_id = f'johnrobinsn/{peft_name}' # By default use my pretrained adapter weights
#peft_model_id = peft_name # Uncomment to use locally saved adapter weights if you trained above

# Load the LoRA model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()

print("Peft model adapter loaded")

let's try the same prompt again.

**AFTER FINETUNING**

In [None]:
torch.manual_seed(42)
generate('Write a short story in third person narration about a protagonist who has to make an important career decision.',maxTokens=300)

As you can see this response is much much more responsive to the provided instruction.

### A Few More Prompts

In [None]:
torch.manual_seed(42)
generate('Who was the first man to walk on the moon and tell me where he was born.')

In this example, we provide not only an instruction but also provide some context for the instruction which is a list of possible answers.

In [None]:
torch.manual_seed(42)
generate('Identify the odd one out','Twitter, Instagram, Telegram')

In [None]:
torch.manual_seed(42)
generate('Write a poem about about a cat',maxTokens=1000)

Meh.  Not that great... But Llama doesn't seem to be very good at poetry either in my experience.  Would be worthwhile to see how the larger RedPJ models fair here...   Still lot's of fun probing the limits of what works well and what doesn't.

### Conclusion

I hope you've enjoyed this quick tour of finetuning.  I've also included a ["cleaned" version](https://github.com/johnrobinsn/redpajama/blob/main/finetune_redpajama_clean.ipynb) of this notebook in the github repo without the blog narrative.

If you'd like to try your hand at a different finetuning task.  You could give summarization a try.  Please check out the [samsum](https://huggingface.co/datasets/samsum) summarization dataset on HF.  Primarily, you'll need to adjust the prompt templates during training and inference.

Please **like** my content on twitter, [@johnrobinsn](https://twitter.com/johnrobinsn)

