# Code Summarization and Explanation using Transformers

In this notebook, we will use the `transformers` library to summarize and explain Python code. We will utilize the `codellama/CodeLlama-13b-Instruct-hf` model for this purpose.

## Step-by-Step Instructions

### 1. Import the Required Libraries

First, we need to import the necessary modules from the `transformers` library and other dependencies.


In [None]:
# !pip install -U jupyter ipywidgets

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


  from .autonotebook import tqdm as notebook_tqdm


### 2. Load the Tokenizer and Model
We will load the tokenizer and model using the specified model ID, `codellama/CodeLlama-13b-Instruct-hf`.



In [2]:
model_id = "codellama/CodeLlama-13b-Instruct-hf"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cpu",
    torch_dtype=dtype,
)



Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

### 3. Define the Function for Code Explanation
We will define a function to summarize and explain Python code. This function will take a code snippet and a prompt, and generate the appropriate explanation.

In [2]:
# Define the template function
def summarize_code(code: str, max_new_tokens: int, prompt: str) -> str:
    # Tokenize the input
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # Generate the summary
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    
    # Decode the output
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    summary = summary.split("[/INST]")[-1].strip()
    return summary.strip()


### Example Code
Let's consider an example Python function for bubble sort. We will use our functions to summarize and explain this code.

In [3]:
# Code example

def bubble_sort_string_list(string_list):
    n = len(string_list)
    for i in range(n):
        for j in range(0, n-i-1):
            if string_list[j] > string_list[j+1]:
                string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
    return string_list

# Example usage
string_list = ["banana", "apple", "cherry", "date"]
sorted_list = bubble_sort_string_list(string_list)
print(sorted_list)


['apple', 'banana', 'cherry', 'date']


### 5. Extracting Code from Jupyter Notebook
We will use a utility function to extract a specific code cell from a Jupyter Notebook and then generate explanations for it.



In [4]:
from utils import get_cell_with_comment

# Define the path to your notebook and the comment to search for
notebook_path = 'Code Explainer.ipynb'

# Get the cell containing the specific comment
cell = get_cell_with_comment(notebook_path)

if cell:
    print("Found the cell:")
    print(cell['source'])
else:
    print("No cell found with the specified comment.")

Found the cell:
# Code example

def bubble_sort_string_list(string_list):
    n = len(string_list)
    for i in range(n):
        for j in range(0, n-i-1):
            if string_list[j] > string_list[j+1]:
                string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
    return string_list

# Example usage
string_list = ["banana", "apple", "cherry", "date"]
sorted_list = bubble_sort_string_list(string_list)
print(sorted_list)



### Examples

In [7]:
prompt = f"""
[INST] <>
You are an expert in Python Programming. Below is a line of python code that describes a task.
Return only one line of summary that appropriately describes the task that the code is
performing. You must write only summary without any prefix or suffix explanations.
Note: The summary should have minimum 1 words and can have on an average 25 words.
<>
{cell['source']} [/INST]
"""
    
print(f'Generating explanation of code in cell {cell}.')
# Get the summary of the code
summary = summarize_code(code=cell['source'], max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)


Generating explanation of code in cell {'cell_type': 'code', 'execution_count': 19, 'id': '565747b7-0117-463b-b022-7f307751e5db', 'metadata': {}, 'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': "['apple', 'banana', 'cherry', 'date']\n"}], 'source': '# Code example\n\ndef bubble_sort_string_list(string_list):\n    n = len(string_list)\n    for i in range(n):\n        for j in range(0, n-i-1):\n            if string_list[j] > string_list[j+1]:\n                string_list[j], string_list[j+1] = string_list[j+1], string_list[j]\n    return string_list\n\n# Example usage\nstring_list = ["banana", "apple", "cherry", "date"]\nsorted_list = bubble_sort_string_list(string_list)\nprint(sorted_list)\n'}.


NameError: name 'tokenizer' is not defined

Other examples of prompts.


In [None]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task. 
Explain the code step by step with details about the implementation.
<>
{cell['source']} [/INST]
"""
    
print(f'Generating explanation of code in cell {cell}.')
# Get the summary of the code
summary = summarize_code(code=cell['source'], max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)

In [None]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task. 
Explain the code step by step with details about the implementation in a friendly and chatty way!
<>
{cell['source']} [/INST]
"""
    
print(f'Generating explanation of code in cell {cell}.')
# Get the summary of the code
summary = summarize_code(code=cell['source'], max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)