<a href="https://colab.research.google.com/github/ak2742/mlplay/blob/Fine-Tuning/12)_CodeGemma_Code_completion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CodeGemma 2B & 7B

CodeGemma is a family of code-specialist LLM models by Google, based on the pre-trained 2B and 7B Gemma checkpoints.

They were trained on top of the base Gemma 2B and 7B models using a mixture of 500 billion tokens of primarily English language data, mathematics, and code.

They leverage the natural language capabilities of their ancestors, improve on logical and mathematical reasoning, and are suitable for code completion and generation.

*Note: make sure, you're on a GPU run-time with Colab.*

## Setup the inference environment

In [None]:
!pip install -q --upgrade transformers

## Load the tokenizer and the model

CodeGemma 2B was trained exclusively on the Code Infilling task and is meant for fast code completion and generation, especially in settings where latency and/or privacy are crucial.

CodeGemma 7B training mix includes code infilling data (80%) and natural language. It can be used for code completion, as well as code and language understanding and generation.

CodeGemma 7B Instruct, in fact, was fine-tuned for instruction following on top of CodeGemma 7B. It’s meant for conversational use, especially around code, programming, or mathematical reasoning topics. It’s not as powerful as the other versions for code completion.

In [None]:
from transformers import GemmaTokenizer, AutoModelForCausalLM
import torch

#device Check
device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)

model_id = "google/codegemma-2b"

tokenizer = GemmaTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
	torch_dtype=torch.float16
).to(device)

tokenizer_config.json:   0%|          | 0.00/33.5k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/555 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/668 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

## Prompt for Code Infilling i.e. Fill in the Middle.

Code completion can be used for infilling inside code editors. CodeGemma was trained for this task using the fill-in-the-middle (FIM) objective, where you provide a prefix and a suffix as context for the completion. The following tokens are used to separate the different parts of the input:

 - `<|fim_prefix|>` precedes the context before the completion we want to run.
- `<|fim_suffix|>` precedes the suffix. You must put this token exactly where the cursor would be positioned in an editor, as this is the location that will be completed by the model.
- `<|fim_middle|>` is the prompt that invites the model to run the generation.

 In addition to these, there's also `<|file_separator|>`, which is used to provide multi-file contexts.

 Please, make sure to not provide any extra spaces or newlines around the tokens, other than those that would naturally occur in the code fragment you want to complete. Here's an example:


In [None]:
prompt = '''\
<|fim_prefix|>import torch
import torch.nn as nn

def bigramLanguageModel(nn.Module):
    """Create a bigram language model."""
    def __init__<|fim_suffix|><|fim_middle|>\
'''

## Generate code

We tokenize the inputs and pass them through model.generate. You can pass a wide variety of arguments and strategies, read more about them [here](https://huggingface.co/docs/transformers/generation_strategies).

In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
prompt_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=1200)

In [None]:
print(tokenizer.decode(outputs[0][prompt_len:]))

(self):
        super().__init__()
        # each token directly reads off the logits for the next token from a lookup table
        self.token_embedding_table = nn.Embedding(vocab_size, embedding_dim)
        self.position_embedding_table = nn.Embedding(seq_len, embedding_dim)

    def forward(self, idx, targets=None):
        B, T = idx.shape

        # idx and targets are both (B,T) tensor of integers
        tok_emb = self.token_embedding_table(idx) # (B,T,C)
        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)
        x = tok_emb + pos_emb # (B,T,C)
        logits = self.lm_head(x) # (B,T,vocab_size)

        if targets is None:
            loss = None
        else:
            B, T, C = logits.shape
            logits = logits.view(B*T, C)
            targets = targets.view(B*T)
            loss = F.cross_entropy(logits, targets)

        return logits, loss<|file_separator|><eos>


Let's print all of it together?

In [None]:
print(tokenizer.decode(outputs[0]))

<bos><|fim_prefix|>import torch
import torch.nn as nn

def bigramLanguageModel(nn.Module):
    """Create a bigram language model."""
    def __init__<|fim_suffix|><|fim_middle|>(self):
        super().__init__()
        # each token directly reads off the logits for the next token from a lookup table
        self.token_embedding_table = nn.Embedding(vocab_size, embedding_dim)
        self.position_embedding_table = nn.Embedding(seq_len, embedding_dim)

    def forward(self, idx, targets=None):
        B, T = idx.shape

        # idx and targets are


Voila! You have your own personal Code Completion assistant now!