# Code Summarization and Explanation using Transformers

In this notebook, we will use the `transformers` library to summarize and explain Python code. We will utilize the `codellama/CodeLlama-13b-Instruct-hf` model for this purpose.

## Step-by-Step Instructions

### 1. Import the Required Libraries

First, we need to import the necessary modules from the `transformers` library and other dependencies.


In [None]:
# !pip install -U jupyter ipywidgets
!pip install torch
!pip install transformers
!pip install accelerate

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

In [3]:
# To ensure that PyTorch was installed correctly, we can verify the installation by running sample PyTorch code. Here we will construct a randomly initialized tensor.
# import torch
# x = torch.rand(5, 3)
# print(x)

### 2. Load the Tokenizer and Model
We will load the tokenizer and model using the specified model ID, `codellama/CodeLlama-13b-Instruct-hf`.



In [3]:
model_id = "codellama/CodeLlama-13b-Instruct-hf"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cpu",
    torch_dtype=dtype,
)

### 3. Define the Function for Code Explanation
We will define a function to summarize and explain Python code. This function will take a code snippet and a prompt, and generate the appropriate explanation.

In [5]:
# Define the template function
def summarize_code(max_new_tokens: int, prompt: str) -> str:
    # Tokenize the input
    print('[summarize_code] Your Prompt:', prompt)
    inputs = tokenizer(prompt, return_tensors="pt")
    print('[summarize_code] Tokenized Inputs:', inputs)
    # Generate the summary
    # concerning pad_token_id see https://stackoverflow.com/questions/69609401/suppress-huggingface-logging-warning-setting-pad-token-id-to-eos-token-id
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, pad_token_id=tokenizer.eos_token_id)
    
    # Decode the output
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    summary = summary.split("[/INST]")[-1].strip()
    return summary.strip()


### Example Code
Let's consider an example Python function for bubble sort. We will use our functions to summarize and explain this code.

In [6]:
# Code example

def bubble_sort_string_list(string_list):
    n = len(string_list)
    for i in range(n):
        for j in range(0, n-i-1):
            if string_list[j] > string_list[j+1]:
                string_list[j], string_list[j+1] = string_list[j+1], string_list[j]
    return string_list

# Example usage
string_list = ["banana", "apple", "cherry", "date"]
sorted_list = bubble_sort_string_list(string_list)
print(sorted_list)


['apple', 'banana', 'cherry', 'date']


### 5. Extracting Code from Jupyter Notebook
We will use a utility function to extract a specific code cell from a Jupyter Notebook and then generate explanations for it.



In [9]:
import requests 
raw_ipynb_url = 'https://raw.githubusercontent.com/C2DH/ai-notebooks-summer-workshop/master/examples/AI_Workshop_Semantic_Search.ipynb'
ipynb_data = requests.get(raw_ipynb_url).json()
print(ipynb_data['cells'][17]['source'])
cell = ipynb_data['cells'][17]

### Examples

In [None]:
prompt = f"""
[INST] <>
You are an expert in Python Programming. Below is a line of python code that describes a task.
Return only one line of summary that appropriately describes the task that the code is
performing. You must write only summary without any prefix or suffix explanations.
Note: The summary should have minimum 1 words and can have on an average 25 words.
<>
{cell['source']} [/INST]
"""
    
print(f'Generating explanation of code in cell {cell}.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)


Other examples of prompts.


In [None]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task. 
Explain the code step by step with details about the implementation.
<>
{cell['source']} [/INST]
"""
    
print(f'Generating explanation of code in cell {cell}.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)

In [None]:
prompt = f"""
[INST] <>
You are an expert in Python programming. Below is a Python code that describes a task. 
Explain the code step by step with details about the implementation in a friendly and chatty way!
<>
{cell['source']} [/INST]
"""
    
print(f'Generating explanation of code in cell {cell}.')
# Get the summary of the code
summary = summarize_code(max_new_tokens=256, prompt=prompt)

print('-'*50)
print(summary)