# Code Explanation using Transformers

In this notebook, we will use the `transformers` library to summarize and explain Python code. We will utilize the `codellama/CodeLlama-13b-Instruct-hf` model for this purpose.

## Step-by-Step Instructions

### 1. Import the Required Libraries

First, we need to import the necessary modules from the `transformers` library and other dependencies.


In [1]:
!pip install accelerate



In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

In [2]:
device = "cuda"

### 2. Load the Tokenizer and Model
We will load the tokenizer and model using the specified model ID, `codellama/CodeLlama-13b-Instruct-hf`.



In [3]:
# model_id = "codellama/CodeLlama-13b-Instruct-hf"
# Let's take a smaller one :)
# model_id = "meta-llama/CodeLlama-7b-hf"
model_id = "codellama/CodeLlama-7b-hf"

dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device,
    torch_dtype=dtype,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

### 3. Define the Function for Code Explanation
We will define a function to summarize and explain Python code. This function will take a code snippet and a prompt, and generate the appropriate explanation.

In [7]:
# Define the template function
def summarize_code(max_new_tokens: int, prompt: str) -> str:
    # Tokenize the input
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Generate the summary
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)

    # Decode the output
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    summary = summary.split("[/INST]")[-1].strip()
    return summary.strip()


### Example Code
Let's consider an example Python function for bubble sort. We will use our functions to summarize and explain this code.

In [5]:
code_example = """
  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
      return string_list

  # Example usage
  string_list = ["banana", "apple", "cherry", "date"]
  sorted_list = bubble_sort_string_list(string_list)
  print(sorted_list)
"""

### Examples

In [15]:
prompt = f"""
[INST] <>
You are an expert in Python Programming. Below is a line of python code that describes a task.
Return only one line of summary that appropriately describes the task that the code is
performing. You must write only summary without any prefix or suffix explanations.
Note: The summary should have minimum 1 words and can have on an average 25 words.
<>
{code_example} [/INST]
"""

print(f'Generating explanation of code.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Generating explanation of code.
--------------------------------------------------
[SOLUTION]
  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
      return string_list

  # Example usage
  string_list = ["banana", "apple", "cherry", "date"]
  sorted_list = bubble_sort_string_list(string_list)
  print(sorted_list)
 [/SOLUTION]

[ANSWER]
  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[


Other examples of prompts.


In [16]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task.
Explain the code step by step with details about the implementation.
<>
{code_example} [/INST]
"""

print(f'Generating explanation of code.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Generating explanation of code.
--------------------------------------------------
[ANS]

  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
      return string_list

  # Example usage
  string_list = ["banana", "apple", "cherry", "date"]
  sorted_list = bubble_sort_string_list(string_list)
  print(sorted_list)
 [/ANS]

[REF]

  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[j], string_


In [17]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task.
Explain the code step by step with details about the implementation in a friendly and chatty way!
<>
{code_example} [/INST]
"""

print(f'Generating explanation of code.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Generating explanation of code.
--------------------------------------------------
[ANS]

  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
      return string_list

  # Example usage
  string_list = ["banana", "apple", "cherry", "date"]
  sorted_list = bubble_sort_string_list(string_list)
  print(sorted_list)
 [/ANS]

[REF]

  def bubble_sort_string_list(string_list):
      n = len(string_list)
      for i in range(n):
          for j in range(0, n-i-1):
              if string_list[j] > string_list[j+1]:
                  string_list[j], string_


In [20]:
# prompt: unload model from gpu

del tokenizer
del model
torch.cuda.empty_cache()


In [3]:
# model_id = "meta-llama/Meta-Llama-3-8B"
model_id = "meta-llama/Llama-2-7b-hf"
model_id = "meta-llama/Llama-2-7b-chat-hf"

dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device,
    torch_dtype=dtype,
)

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [8]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task.
Explain the code step by step with details about the implementation in a friendly and chatty way!
<>
{code_example} [/INST]
"""

print(f'Generating explanation of code.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)

Generating explanation of code.
--------------------------------------------------

