<a href="https://colab.research.google.com/github/ak2742/code-completion-using-google-codegemma/blob/main/CodeGemma_Gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CodeGemma 2B & 7B

CodeGemma is a family of code-specialist LLM models by Google, based on the pre-trained 2B and 7B Gemma checkpoints.

They were trained on top of the base Gemma 2B and 7B models using a mixture of 500 billion tokens of primarily English language data, mathematics, and code.

They leverage the natural language capabilities of their ancestors, improve on logical and mathematical reasoning, and are suitable for code completion and generation.

*Note: make sure, you're on a GPU run-time with Colab.*

## Setup the inference environment

In [None]:
!pip install -q --upgrade transformers
!pip install -q --upgrade gradio

## Load the tokenizer and the model

CodeGemma 2B was trained exclusively on the Code Infilling task and is meant for fast code completion and generation, especially in settings where latency and/or privacy are crucial.

CodeGemma 7B training mix includes code infilling data (80%) and natural language. It can be used for code completion, as well as code and language understanding and generation.

CodeGemma 7B Instruct, in fact, was fine-tuned for instruction following on top of CodeGemma 7B. It’s meant for conversational use, especially around code, programming, or mathematical reasoning topics. It’s not as powerful as the other versions for code completion.

In [None]:
from transformers import GemmaTokenizer, AutoModelForCausalLM
import torch

#device Check
device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)

model_id = "google/codegemma-2b"

tokenizer = GemmaTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
	torch_dtype=torch.float16
).to(device)

## Prompt Guide for Code Infilling i.e. Fill in the Middle.

Code completion can be used for infilling inside code editors. CodeGemma was trained for this task using the fill-in-the-middle (FIM) objective, where you provide a prefix and a suffix as context for the completion. The following tokens are used to separate the different parts of the input:

 - `<|fim_prefix|>` precedes the context before the completion we want to run.
- `<|fim_suffix|>` precedes the suffix. You must put this token exactly where the cursor would be positioned in an editor, as this is the location that will be completed by the model.
- `<|fim_middle|>` is the prompt that invites the model to run the generation.

 In addition to these, there's also `<|file_separator|>`, which is used to provide multi-file contexts.

 Please, make sure to not provide any extra spaces or newlines around the tokens, other than those that would naturally occur in the code fragment you want to complete. Here's an example:


## Generate code fn

We tokenize the inputs and pass them through model.generate. You can pass a wide variety of arguments and strategies, read more about them [here](https://huggingface.co/docs/transformers/generation_strategies).

In [None]:
tokenizer.add_special_tokens({"additional_special_tokens": ["<|fim_prefix|>", "<|fim_suffix|>", "<|fim_middle|>", "<|file_separator|>"]})

def generate(prompt, return_full_text=False, max_new_tokens=1200):
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
  prompt_len = inputs["input_ids"].shape[-1]
  print(f"Input Tokens: {prompt_len}")
  if return_full_text:
    response = tokenizer.decode(model.generate(**inputs, max_new_tokens=max_new_tokens)[0], skip_special_tokens=True)
  else:
    response = tokenizer.decode(model.generate(**inputs, max_new_tokens=max_new_tokens)[0][prompt_len:], skip_special_tokens=True)
  return response

In [None]:
#@title Add Gradio UI

import gradio as gr

def gradio_fn(code, return_full_text=True, max_new_tokens=1200):
    msg = f"<|fim_prefix|>{code}<|fim_suffix|><|fim_middle|>"
    response = generate(msg, return_full_text, max_new_tokens)
    return response

gr.Interface(
    fn=gradio_fn,
    inputs=[gr.Textbox(lines=4, label="Input Code"),
         gr.Checkbox(label="Return Full Text", value=True),
         gr.Slider(minimum=0, maximum=2000, step=100, label="Max New Tokens", value=1200)],
    outputs=gr.Textbox(lines=4, label="Output Code"),
    theme="soft"
    ).launch(share=True, debug=True)