<a href="https://colab.research.google.com/github/BoxOfCereal/Fine-Tuning-Loop/blob/main/fine_tune_loop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#title
[link text](https://huggingface.co/blog/falcon#fine-tuning-with-peft)
[link text](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
[link text](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)

[ AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/)
https://llm.extractum.io/

# FTL - Fine Tune Loop

## Intro

In this notebook I will attempt to show all the steps necessary to go from data to text generation. The main headings will demonstrate the easiest way to go through a whole training Loop including loading data, loading your model from the hugging face ecosystem, training the model, benchmarking the model, inferencing the model, and taking that model and using it in your prompt library in our case we'll be using line chain



The subsections of each heading will contain more in depth variations of each of these steps. It is my hope that seeing multiple examples that are trying to accomplish the same thing will show the underlying patterns needed to not only understand how to collect data train a model and run inference on it but also adapt it to your use case with your own custom data, model, and inferencing needs.

## Pre-requisites:
* A hugging face account - [Sign up](https://huggingface.co/join)
* A weights and biases account [Sign Up](https://wandb.ai/login?signup=true)
* Some python experience
* Some basic experience with large language models
[Course](https://huggingface.co/learn/nlp-course/chapter0/1?fw=pt)

## Fine tuning a model

## Finetune Falcon-7b on a Google colab

Welcome to this Google Colab notebook that shows how to fine-tune the recent Falcon-7b model on a single Google colab and turn it into a chatbot

We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning

### Setup

Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install `einops` as it is a requirement to load Falcon models.

In [None]:
!pip install -q -U git+https://github.com/lvwerra/trl.git git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/accelerate.git git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb #einops for falcon

### GLOBALS

In [None]:
MODEL_NAME = None
REPO_NAME = None

### Dataset

For our experiment, we will use the Guanaco dataset, which is a clean subset of the OpenAssistant dataset adapted to train general purpose chatbots.

The dataset can be found [here](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)

####⭐ "timdettmers/openassistant-guanaco"

In [None]:
from datasets import load_dataset

dataset_name = "timdettmers/openassistant-guanaco"
dataset = load_dataset(dataset_name, split="train")

Downloading readme:   0%|          | 0.00/395 [00:00<?, ?B/s]

Downloading and preparing dataset json/timdettmers--openassistant-guanaco to /root/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/20.9M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.11M [00:00<?, ?B/s]

[know your dataset](https://huggingface.co/docs/datasets/access)

In [None]:
# dataset[0]

#### "Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format"

In [None]:
from datasets import load_dataset

dataset_name = "Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format"
dataset = load_dataset(dataset_name, split="train")

#### Custom Dataset (Coming soon!)

In [None]:
import PyPDF2
import os
from transformers import AutoTokenizer, AutoModelForCausalLM

ROOT_DIRECTORY = os.path.dirname(os.path.realpath(__file__))

# Read PDF
def read_pdf(file_path):
    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = " ".join([page.extract_text() for page in reader.pages])
    return text

# dividing text into smaller chunks:
def divide_text(text, section_size):
    sections = []
    start = 0
    end = section_size
    while start < len(text):
        section = text[start:end]
        sections.append(section)
        start = end
        end += section_size
    return sections

# Create Anki cards
def create_anki_cards(pdf_text):
    # Initialize the tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
    model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")

    # Limit the number of prompts to avoid excessive API usage
    SECTION_SIZE = 1000
    divided_sections = divide_text(pdf_text, SECTION_SIZE)
    generated_flashcards = ' '
    for i, text in enumerate(divided_sections):
        # Generate the prompt
        prompt = f"Create anki flashcards with the provided text using a format: question;answer next line question;answer etc. Keep question and the corresponding answer on the same line {text}"
        # Encode the prompt
        input_ids = tokenizer.encode(prompt, return_tensors="pt")
        # Generate the response
        response = model.generate(input_ids, max_length=2048)
        # Decode the response
        response_from_api = tokenizer.decode(response[0], skip_special_tokens=True)
        generated_flashcards += response_from_api

        if i==0:
            break

    # Save the cards to a text file
    with open("flashcards.txt", "w") as f:
        f.write(generated_flashcards)

pdf_text = read_pdf(f'{ROOT_DIRECTORY}/SOURCE_DOCUMENTS/constitution.pdf')
create_anki_cards(pdf_text)


### Loading the model

###    🟥👕 togethercomputer/RedPajama-INCITE-Base-3B-v1

####⭐ "ybelkada/falcon-7b-sharded-bf16" in 4bit

In this section we will load the [Falcon 7B model](https://huggingface.co/tiiuae/falcon-7b), quantize it in 4bit and attach LoRA adapters on it. Let's get started!

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "ybelkada/falcon-7b-sharded-bf16"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

Let's also load the tokenizer below

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Below we will load the configuration file in order to create the LoRA model. According to [QLoRA paper](https://arxiv.org/abs/2305.14314), it is important to consider all linear layers in the transformer block for maximum performance. Therefore we will add `dense`, `dense_h_to_4_h` and `dense_4h_to_h` layers in the target modules in addition to the mixed query key value layer.

A really good video on QLoRA is[AemonAlgiz](https://www.youtube.com/@AemonAlgiz)'s video
[QLoRA Is More Than Memory Optimization. Train Your Models With 10% of the Data for More Performance.](https://youtu.be/v6sf4EF45fI) . WARNING: he does go into some math, but even if you don't understand it all ( which I certainly don't ) he explains it in a very satisfying way.

In [None]:
from peft import LoraConfig

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

Let's take a look at the models modules so we can see what we're targeting and how to find the linear modules in any other architecture we're interested in:

In [None]:
model.modules

### Loading the trainer

#### Supervised Fine-tuning Trainer 1️⃣

Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [None]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 500
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

Then finally pass everthing to the trainer

In [None]:
from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [None]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

##### Train the model

Now let's train the model! Simply call `trainer.train()`

In [None]:
trainer.train()

During training, the model should converge nicely as follows:

![image](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/loss-falcon-7b.png)

The `SFTTrainer` also takes care of properly saving only the adapters during training instead of saving the entire model.

### Inference

#### pipeline 1️⃣

### Saving

In [None]:
trainer.save_model()

In [None]:
from datasets.utils.file_utils import huggingface_hub
huggingface_hub.login(token="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")

In [None]:
model_name = "ybelkada/falcon-7b-sharded-bf16"
repo_name = f"nolestock/{model_name.split('/')[-1]}-finetuned-guanaco-lora"


In [None]:
repo_name

In [None]:
trainer.model.push_to_hub(repo_name)

## loading (RESTART)

### From Huggingface hub (Model and Adapter)

In [None]:
%%capture
!pip install -q -U git+https://github.com/lvwerra/trl.git git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/accelerate.git git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

In [None]:
model_name = MODEL_NAME or "ybelkada/falcon-7b-sharded-bf16"
repo_name = REPO_NAME or f"nolestock/{model_name.split('/')[-1]}-finetuned-guanaco-lora"

In [None]:
from peft import PeftConfig, PeftModel
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    load_in_4bit=True,
    device_map="auto",
)
inference_model = PeftModel.from_pretrained(model, repo_name)


### (Adapter only)

In [None]:
# peft_model_id = PEFT_MODEL_ID
peft_model_id = "Bruno/Harpia-7b-guanacoLora"

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer, GenerationConfig


config = PeftConfig.from_pretrained(peft_model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             return_dict=True,
                                             quantization_config=bnb_config,
                                             trust_remote_code=True,
                                             device_map={"":0})

### Inference

#### pipeline

In [None]:
pipeline = transformers.pipeline(
    "text-generation",
    model=inference_model,
    tokenizer=tokenizer,
)

p = """### Human: Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.### Assistant:"""
sequences = pipeline(
    p,
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")


NameError: ignored

#### more control


See [behind the pipeline](https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt) page for more info.

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer, GenerationConfig

peft_model_id = "Bruno/Harpia-7b-guanacoLora"

config = PeftConfig.from_pretrained(peft_model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             return_dict=True,
                                             quantization_config=bnb_config,
                                             trust_remote_code=True,
                                             device_map={"":0})


prompt_input = ""
prompt_no_input = ""

def create_prompt(instruction, input=None):
  if input:
    return  prompt_input.format(instruction=instruction, input=input)
  else:
    return prompt_no_input.format(instruction=instruction)

def generate(
        instruction,
        input=None,
        max_new_tokens=128,
        temperature=0.1,
        top_p=0.75,
        top_k=40,
        num_beams=4,
        **kwargs,
):
    prompt = create_prompt(instruction, input)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to("cuda")
    attention_mask = inputs["attention_mask"].to("cuda")
    generation_config = GenerationConfig(
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        num_beams=num_beams,
        **kwargs,
    )
    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=max_new_tokens
        )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    return output.split("### Respuesta:")[1]

instruction = "Me conte algumas curiosidades sobre o Brasil"

print("Instruções:", instruction)
print("Resposta:", generate(instruction))


## Merging (OPTIONAL)(RESTART)
[link text](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/apply_lora.py)

[Add PEFT checkpoint merger script ](https://github.com/artidoro/qlora/pull/173/commits/d1765faa571a5c856bac34f02535544ce19d7b81)

### Model already loaded

In [None]:
%%capture
!pip install -q -U git+https://github.com/lvwerra/trl.git git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/accelerate.git git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb #einops for falcon

In [None]:
model_name = "ybelkada/falcon-7b-sharded-bf16"
adapter_repo = f"nolestock/{model_name.split('/')[-1]}-finetuned-guanaco-lora"

In [None]:
from peft import PeftConfig, PeftModel, LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # load_in_4bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_repo)

print("Applying the LoRA")
model = model.merge_and_unload()

print(f"Saving the target model")
# ValueError: Cannot merge LORA layers when the model is loaded in 8-bit mode
model.save_pretrained()
tokenizer.save_pretrained()




### Pushing Merged model to the hub

## Eval (RESTART)

### Language Model Evaluation Harness

#### Getting a base

In [None]:
model_name = model_name = "ybelkada/falcon-7b-sharded-bf16"

In [None]:
!python ./lm-evaluation-harness/main.py \
    --model hf-causal \
    --model_args pretrained={model_name},dtype=float16,trust_remote_code=True,load_in_8bit=True\
    --tasks hendrycksTest-high_school_computer_science \
    --device cuda:0

#### QLoRA

In [None]:
!git clone https://github.com/EleutherAI/lm-evaluation-harness
!cd lm-evaluation-harness && pip install -e ".[auto-gptq]"

Cloning into 'lm-evaluation-harness'...
remote: Enumerating objects: 11471, done.[K
remote: Counting objects: 100% (3753/3753), done.[K
remote: Compressing objects: 100% (619/619), done.[K
remote: Total 11471 (delta 3360), reused 3243 (delta 3128), pack-reused 7718[K
Receiving objects: 100% (11471/11471), 12.59 MiB | 17.09 MiB/s, done.
Resolving deltas: 100% (7632/7632), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/lm-evaluation-harness
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting auto-gptq[triton]@ git+https://github.com/PanQiWei/AutoGPTQ (from lm-eval==0.3.0)
  Cloning https://github.com/PanQiWei/AutoGPTQ to /tmp/pip-install-f_dg9qk6/auto-gptq_5a81bfbd135744afb895e3ab10f54527
  Running command git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ /tmp/pip-install-f_dg9qk6/auto-gptq_5a81bfbd135744afb895e3ab10f54527
  Resolved https://github.com/PanQiWei/Auto

In [None]:
%%capture
!pip install -q datasets bitsandbytes einops git+https://github.com/huggingface/peft #wandb

In [None]:
model_name = "ybelkada/falcon-7b-sharded-bf16"
repo_name = f"nolestock/{model_name.split('/')[-1]}-finetuned-guanaco-lora"
repo_name

'nolestock/falcon-7b-sharded-bf16-finetuned-guanaco-lora'

[docs/task_table.md](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md)

bigbench_causal_judgement,hendrycksTest-high_school_computer_science,
hendrycksTest-machine_learning,hendrycksTest-management,

pile_bookcorpus2,pile_gutenberg,wikitext

In [None]:
!python ./lm-evaluation-harness/main.py \
    --model hf-causal-experimental \
    --model_args pretrained=EleutherAI/gpt-j-6b,peft=nomic-ai/gpt4all-j-lora,load_in_4bit=True \
    --tasks hendrycksTest-high_school_computer_science \
    --device cuda:0


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Selected Tasks: ['hendrycksTest-high_school_computer_science']


In [None]:
!python ./lm-evaluation-harness/main.py \
    --model hf-causal-experimental \
    --model_args pretrained={model_name},peft={repo_name},dtype=float16,trust_remote_code=True,load_in_4bit=True \
    --tasks hendrycksTest-high_school_computer_science \
    --device cuda:0 \
    --output_base_path ./


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Selected Tasks: ['hendrycksTest-high_school_computer_science']
[31m

In [None]:
!python ./lm-evaluation-harness/main.py \
    --model hf-causal-experimental \
    --model_args pretrained={model_name},peft={repo_name},dtype=float16,trust_remote_code=True,load_in_4bit=True \
    --tasks hendrycksTest-high_school_computer_science \
    --device cuda:0 \
    --output_base_path ./

## Integration

### Langchain

In [None]:
%%capture
!pip install -q -U git+https://github.com/lvwerra/trl.git git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/accelerate.git git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops langchain

In [None]:
model_name = "ybelkada/falcon-7b-sharded-bf16"
repo_name =  f"nolestock/{model_name.split('/')[-1]}-finetuned-guanaco-lora"

In [None]:
from peft import PeftConfig, PeftModel
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    load_in_4bit=True,
    device_map="auto",
)
inference_model = PeftModel.from_pretrained(model, repo_name)



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


Downloading (…)main/modelling_RW.py:   0%|          | 0.00/47.6k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00008.bin:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00004-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00005-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00006-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00007-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00008-of-00008.bin:   0%|          | 0.00/921M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)/adapter_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/522M [00:00<?, ?B/s]

In [None]:
pipeline = transformers.pipeline(
    "text-generation",
    model=inference_model,
    tokenizer=tokenizer,
    max_new_tokens=100,
    temperature = .8
)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerN

In [None]:
from langchain.llms import HuggingFacePipeline

hf = HuggingFacePipeline(pipeline=pipeline,)

In [None]:
from langchain import PromptTemplate, LLMChain

# Multiple inputs example

template = """### Human: Can you write a short introduction about a {adjective} {subject}? ### Assistant:"""
prompt = PromptTemplate(template=template, input_variables=["adjective", "subject"])
llm_chain = LLMChain(prompt=prompt, llm=hf)

llm_chain.predict(adjective="shitty", subject="porn star",verbose=True)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


" Sure, here's a short introduction about a shitty porn star:\n\nHer name is [insert porn star name here]. She's been in the industry for [insert number of years here] and has starred in [insert number of movies here]. She's known for her [insert body part here] and [insert sexual fetish here].\n\nHer movies are usually low budget and poorly made, but she's still managed to make a name for herself"

### MiniHack

In [None]:
!git clone https://github.com/facebookresearch/minihack
!cd minihack && pip install -e ".[dev]" && pre-commit install

Cloning into 'minihack'...
remote: Enumerating objects: 7087, done.[K
remote: Counting objects: 100% (275/275), done.[K
remote: Compressing objects: 100% (147/147), done.[K
remote: Total 7087 (delta 151), reused 227 (delta 126), pack-reused 6812[K
Receiving objects: 100% (7087/7087), 33.36 MiB | 37.45 MiB/s, done.
Resolving deltas: 100% (4897/4897), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/minihack
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting gym<=0.23,>=0.15 (from minihack==0.1.5+07e18c0)
  Using cached gym-0.23.0-py3-none-any.whl
Collecting nle==0.9.0 (from minihack==0.1.5+07e18c0)
  Using cached nle-0.9.0.tar.gz (7.0 MB)
  Installing build dependencies ... [?25l[?25hdone

In [None]:
# Python and most build deps
! sudo apt-get install -y build-essential autoconf libtool pkg-config \
    python3-dev python3-pip python3-numpy git flex bison libbz2-dev

# recent cmake version
! wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | sudo apt-key add -
! sudo apt-add-repository 'deb https://apt.kitware.com/ubuntu/ bionic main'
! sudo apt-get update && apt-get --allow-unauthenticated install -y \
    cmake \
    kitware-archive-keyring

Reading package lists... Done
Building dependency tree       
Reading state information... Done
autoconf is already the newest version (2.69-11.1).
autoconf set to manually installed.
libbz2-dev is already the newest version (1.0.8-2).
libbz2-dev set to manually installed.
pkg-config is already the newest version (0.29.1-0ubuntu4).
python3-dev is already the newest version (3.8.2-0ubuntu2).
python3-dev set to manually installed.
build-essential is already the newest version (12.8ubuntu1.1).
git is already the newest version (1:2.25.1-1ubuntu3.11).
The following additional packages will be installed:
  libfl-dev libfl2 python-pip-whl python3-setuptools python3-wheel
Suggested packages:
  bison-doc flex-doc libtool-doc gcj-jdk python-numpy-doc python3-pytest
  python3-numpy-dbg python-setuptools-doc
The following NEW packages will be installed:
  bison flex libfl-dev libfl2 libtool python-pip-whl python3-numpy python3-pip
  python3-setuptools python3-wheel
0 upgraded, 10 newly installed,

In [None]:
!pip install minihack

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting minihack
  Using cached minihack-0.1.5-py3-none-any.whl
Collecting gym<=0.23,>=0.15 (from minihack)
  Using cached gym-0.23.0-py3-none-any.whl
Collecting nle==0.9.0 (from minihack)
  Using cached nle-0.9.0.tar.gz (7.0 MB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pybind11>=2.2 (from nle==0.9.0->minihack)
  Using cached pybind11-2.10.4-py3-none-any.whl (222 kB)
Building wheels for collected packages: nle
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mBuilding wheel for nle [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error or

In [None]:
import gym
import minihack
env = gym.make("MiniHack-River-v0")
env.reset() # each reset generates a new environment instance
env.step(1)  # move agent '@' north
env.render()