1: Function Summarization

Objective:
The objective of this task is to analyze and summarize the purpose and functionality of each function within the code repository using a Language Model (LM).

2: Search Mechanism Development

Objective:
The objective of this task is to develop a search mechanism capable of navigating through the code repository to find specific elements, such as identifying where a particular data type (e.g., data type 3) is handled or locating the implementation of a particular function.

In [None]:
pip install GitPython

Collecting GitPython
  Downloading GitPython-3.1.42-py3-none-any.whl (195 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m195.4/195.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gitdb<5,>=4.0.1 (from GitPython)
  Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->GitPython)
  Downloading smmap-5.0.1-py3-none-any.whl (24 kB)
Installing collected packages: smmap, gitdb, GitPython
Successfully installed GitPython-3.1.42 gitdb-4.0.11 smmap-5.0.1


In [None]:
import os
import shutil
import tempfile
import ast
from git import Repo
from transformers import pipeline

# Initialize the summarization pipeline
summarization_pipeline = pipeline("summarization",model="sshleifer/distilbart-cnn-12-6")

# Clone the GitHub repository into a temporary directory
def clone_repo_to_temp(repo_url):
    temp_dir = tempfile.mkdtemp()
    Repo.clone_from(repo_url, temp_dir)
    return temp_dir

# Parse the Python code and extract function definitions
def extract_functions(code):
    tree = ast.parse(code)
    return [node for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)]

# Generate a summary for a function's purpose and functionality
def summarize_function(function_code):
    summary = summarization_pipeline(function_code, max_length=50, min_length=10, do_sample=False)[0]['summary_text'].strip()
    return summary

# Analyze the repository
def analyze_repo(repo_url):
    temp_dir = clone_repo_to_temp(repo_url)
    file_count = 0
    function_summaries = []
    try:
        # Count the number of files
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    file_count += 1
        print(f"Analyzing {file_count} files...")
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    with open(os.path.join(root, file), 'r') as f:
                        code = f.read()
                        functions = extract_functions(code)
                        for function in functions:
                            function_code = ast.get_source_segment(code, function)
                            function_summary = summarize_function(function_code)
                            function_summaries.append((os.path.join(root, file), function.name, function_summary))
        # Print the summaries for each function
        for file_path, function_name, function_summary in function_summaries:
            print(f"File: {file_path}")
            print(f"Function: {function_name}")
            print(f"Summary: {function_summary}\n")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Clean up the temporary directory
        shutil.rmtree(temp_dir)
    print(f"Analyzed {file_count} files.")

# Example usage
repo_url = 'https://github.com/Shivaknt/AI_chatbot'
analyze_repo(repo_url)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Analyzing 2 files...
File: /tmp/tmpwbdtm9w9/langchain_helper.py
Function: create_vector_db
Summary: # Load data from FAQ sheet and create a FAISS instance for vector database from 'data' # create_vector_db() # Save vector database locally . # Save database locally using FAISS .

File: /tmp/tmpwbdtm9w9/langchain_helper.py
Function: get_qa_chain
Summary: The get_qa_chain function is used to generate a chain of answers to a question . The answer is based on the question and the context of the question . If the answer is not found in the context, kindly state "I

Analyzed 2 files.


Try to get function name with function code along with summary

In [None]:
# Clone the GitHub repository into a temporary directory
def clone_repo_to_temp(repo_url):
    temp_dir = tempfile.mkdtemp()
    Repo.clone_from(repo_url, temp_dir)
    return temp_dir

# Parse the Python code and extract function definitions
def extract_functions(code):
    tree = ast.parse(code)
    return [node for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)]

# Generate a summary for a function's purpose and functionality
def summarize_function(function_code):
    summary = summarization_pipeline(function_code, max_length=100, min_length=10, do_sample=False)[0]['summary_text'].strip()
    return summary

# Analyze the repository
def analyze_repo(repo_url):
    temp_dir = clone_repo_to_temp(repo_url)
    file_count = 0
    function_details = []
    try:
        # Count the number of files
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    file_count += 1
        print(f"Analyzing {file_count} files...")
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    with open(os.path.join(root, file), 'r') as f:
                        code = f.read()
                        functions = extract_functions(code)
                        for function in functions:
                            function_code = ast.get_source_segment(code, function)
                            function_summary = summarize_function(function_code)
                            function_details.append((os.path.join(root, file), function_code, function_summary))
        # Print the summaries for each function
        for file_path, function_code, function_summary in function_details:
            print(f"File: {file_path}")
            print(f"Function Code:\n{function_code}")
            print(f"Summary: {function_summary}\n")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Clean up the temporary directory
        shutil.rmtree(temp_dir)
    print(f"Analyzed {file_count} files.")

# Example usage
repo_url = 'https://github.com/the-vishal/Fb_Automated_Birthday_Wisher'
analyze_repo(repo_url)


Your max_length is set to 100, but your input_length is only 86. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=43)


Analyzing 2 files...


Your max_length is set to 100, but your input_length is only 86. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=43)
Your max_length is set to 100, but your input_length is only 61. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=30)
Your max_length is set to 100, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


File: /tmp/tmp4nu_x2t1/Background.py
Function Code:
def create_connection(db_file):
    """ create a database connection to a SQLite database """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
    except Error as e:
        print(e)

    return conn
Summary: Def create_connection(db_file): create a database connection to a database database . Create_connection() is created by creating a new database connection .

File: /tmp/tmp4nu_x2t1/Background.py
Function Code:
def insert_birthday(birthday = ""):
    
    """ function to insert new birthdays in the database """
    conn = create_connection(db_file)
    with conn:
	    day = datetime.datetime.now()
	    day = now.strftime("%d-%m")
	    names = "Testname"
	    birthday = (names, day)
	    sql = ''' INSERT INTO birthdays(name, birthday)VALUES(?,?)'''
	    cur = conn.cursor()
	    cur.execute(sql, birthday)
	    return cur.lastrowid
Summary: Def insert_birthday(birthday) is a function to insert new birthdays in the dat

In [None]:
import csv

# Analyze the repository and save the output to a CSV file
def analyze_repo_to_csv(repo_url, output_file):
    temp_dir = clone_repo_to_temp(repo_url)
    file_count = 0
    function_details = []
    try:
        # Count the number of files
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    file_count += 1
        print(f"Analyzing {file_count} files...")
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    with open(os.path.join(root, file), 'r') as f:
                        code = f.read()
                        functions = extract_functions(code)
                        for function in functions:
                            function_code = ast.get_source_segment(code, function)
                            function_summary = summarize_function(function_code)
                            function_details.append((f"Function {len(function_details)+1}", os.path.join(root, file), function_code, function_summary))
        # Save the function details to a CSV file
        with open(output_file, 'w', newline='') as csvfile:
            fieldnames = ['Function number', 'File', 'Function code', 'Summary']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            for function_number, file_path, function_code, function_summary in function_details:
                writer.writerow({'Function number': function_number, 'File': file_path, 'Function code': function_code, 'Summary': function_summary})
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Clean up the temporary directory
        shutil.rmtree(temp_dir)
    print(f"Analyzed {file_count} files.")

# Example usage
repo_url = 'https://github.com/the-vishal/Fb_Automated_Birthday_Wisher'
output_file = 'function_details.csv'
analyze_repo_to_csv(repo_url, output_file)
print(f"Output saved to {output_file}.")


Analyzing 2 files...


Your max_length is set to 50, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


Analyzed 2 files.
Output saved to function_details.csv.


In [None]:
import pandas as pd

# Load CSV file
df = pd.read_csv('function_details.csv')

# Display DataFrame
print(df)


  Function number                                  File  \
0      Function 1        /tmp/tmpi0slgw87/Background.py   
1      Function 2        /tmp/tmpi0slgw87/Background.py   
2      Function 3        /tmp/tmpi0slgw87/Background.py   
3      Function 4  /tmp/tmpi0slgw87/Automated_wisher.py   
4      Function 5  /tmp/tmpi0slgw87/Automated_wisher.py   
5      Function 6  /tmp/tmpi0slgw87/Automated_wisher.py   
6      Function 7  /tmp/tmpi0slgw87/Automated_wisher.py   
7      Function 8  /tmp/tmpi0slgw87/Automated_wisher.py   
8      Function 9  /tmp/tmpi0slgw87/Automated_wisher.py   

                                       Function code  \
0  def create_connection(db_file):\n    """ creat...   
1  def insert_birthday(birthday = ""):\n    \n   ...   
2  def select_name(date=k):\n\n    """Get name fo...   
3  def create_connection(db_file):\n    """ creat...   
4  def insert_birthday(names, day):\n    \n    ""...   
5  def select_name(date=k):\n\n    """Get name fo...   
6  def user_detai

Try to search the words from given summary and get the function code

In [None]:
# Analyze the repository with search functionality
def analyze_repo_with_search(repo_url, search_query):
    temp_dir = clone_repo_to_temp(repo_url)
    file_count = 0
    function_details = []
    try:
        # Count the number of files
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    file_count += 1
        print(f"Analyzing {file_count} files...")
        for root, dirs, files in os.walk(temp_dir):
            for file in files:
                if file.endswith('.py'):
                    with open(os.path.join(root, file), 'r') as f:
                        code = f.read()
                        functions = extract_functions(code)
                        for function in functions:
                            function_code = ast.get_source_segment(code, function)
                            function_summary = summarize_function(function_code)
                            # Check if search query is in the function summary
                            if search_query.lower() in function_summary.lower():
                                function_details.append((os.path.join(root, file), function_code, function_summary))
        # Print the function code for functions matching the search query
        if function_details:
            print(f"Functions containing '{search_query}':")
            for file_path, function_code, function_summary in function_details:
                print(f"File: {file_path}")
                print(f"Function Code:\n{function_code}")
                print(f"Summary: {function_summary}\n")
        else:
            print(f"No functions containing '{search_query}' found.")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Clean up the temporary directory
        shutil.rmtree(temp_dir)
    print(f"Analyzed {file_count} files.")

# Example usage with search functionality
repo_url = 'https://github.com/the-vishal/Fb_Automated_Birthday_Wisher'
search_query = 'create a database connection to a database'
analyze_repo_with_search(repo_url, search_query)


Your max_length is set to 100, but your input_length is only 86. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=43)


Analyzing 2 files...


Your max_length is set to 100, but your input_length is only 86. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=43)
Your max_length is set to 100, but your input_length is only 61. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=30)
Your max_length is set to 100, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


Functions containing 'create a database connection to a database':
File: /tmp/tmp7ju99lgv/Background.py
Function Code:
def create_connection(db_file):
    """ create a database connection to a SQLite database """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
    except Error as e:
        print(e)

    return conn
Summary: Def create_connection(db_file): create a database connection to a database database . Create_connection() is created by creating a new database connection .

File: /tmp/tmp7ju99lgv/Automated_wisher.py
Function Code:
def create_connection(db_file):
    """ create a database connection to a SQLite database """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
    except Error as e:
        print(e)

    return conn
Summary: Def create_connection(db_file): create a database connection to a database database . Create_connection() is created by creating a new database connection .

Analyzed 2 files.
