## Generating Code with Llama Code

In this quick tutorial, you'll learn:
- how to run Llama Code in free Colab
- how to generate code with Llama Code

*This Notebook is based on the official blog post from Hugging Face - [Code Llama: Llama 2 learns to code](https://huggingface.co/blog/codellama).*

Other useful links:
- [Code Llama Docs](https://huggingface.co/docs/transformers/main/model_doc/code_llama)
- [Code Llama Model on HF](https://huggingface.co/codellama/CodeLlama-7b-hf)

*Note: Ensure to run this notebook with enabled GPU. `Runtime` -> `Change Runtime Type` -> `T4`*

Let's dive in!

### Installing Hugging Face Dev

In [1]:
!pip install git+https://github.com/huggingface/transformers.git@main accelerate

Collecting git+https://github.com/huggingface/transformers.git@main
  Cloning https://github.com/huggingface/transformers.git (to revision main) to /private/var/folders/19/9w2nhv2n3dd54ppkjml8cts00000gn/T/pip-req-build-setxtxw4
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /private/var/folders/19/9w2nhv2n3dd54ppkjml8cts00000gn/T/pip-req-build-setxtxw4
  Resolved https://github.com/huggingface/transformers.git to commit ef42cb62744e2be04f5b41b7e36dd1d609734675
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting tokenizers<0.15,>=0.14 (from transformers==4.35.0.dev0)
  Downloading tokenizers-0.14.1-cp311-cp311-macosx_11_0_arm64.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting safetensors>=0.3.1

### Loading the model and tokenizer

In [3]:
from transformers import AutoTokenizer
import transformers
import torch


model_id = "codellama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)


### Preparing the Pipeline

In [5]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:14<00:00,  7.48s/it]
Loading checkpoint shards:   0%|          | 0/2 [00:09<?, ?it/s]


KeyboardInterrupt: 

### Generating Code

In [None]:
def generate_code(prompt):
    sequences = pipeline(
        prompt,
        do_sample=True,
        temperature=0.1,
        top_p=0.9,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=128,
    )
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

### Testing on Several Queries

In [None]:
generate_code("def fibonacci(")

In [None]:
generate_code("def factorial(")


In [None]:
generate_code("def remove_last_word(")

In [None]:
generate_code("def remove_non_ascii(s: str) -> str:")

### Code Infilling

For Future...

In [None]:
from transformers import pipeline
import torch

generator = pipeline("text-generation",model="codellama/CodeLlama-7b-hf",torch_dtype=torch.float16, device_map="auto")
# generator('def remove_non_ascii(s: str) -> str:\n    """ <FILL_ME>\n    return result', max_new_tokens = 128, return_type = 1)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "codellama/CodeLlama-7b-hf"
tokenizer2 = AutoTokenizer.from_pretrained(model_id)
model2 = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16
).to("cuda")




In [None]:
prompt = '''def remove_non_ascii(s: str) -> str:
    """ <FILL_ME>
    return result
'''

input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to("cuda")
output = model.generate(
    input_ids,
    max_new_tokens=200,
)
output = output[0].to("cpu")

filling = tokenizer.decode(output[input_ids.shape[1]:], skip_special_tokens=True)


In [None]:
print(prompt.replace("<FILL_ME>", filling))