# Extension 1 - Autocomplete and Suggesting Method Names

In this extension I appleid the current Python2Vec model in other software engineering applications,like autocomplete and suggesting method names.

autocomplete function:

- Tokenize the prefix string into a list of words.
- Get the top 5 most similar words to each word in the prefix list using the most_similar method of the trained Word2Vec model.
- Merge the lists of similar words for each prefix word and remove duplicates.
- Remove any words that are already in the prefix list to avoid duplicates.
- Return the top 5 similar words that are not already in the prefix list.

suggest_method_names function:

- Tokenize the input code context into a list of words using NLTK's word_tokenize function.
- Iterate through the list of words and identify any words that could be method names, based on the following criteria:
- The word starts with a lowercase letter.
- The word appears after a period (indicating a method call) or a newline (indicating a function definition).
- The word is not a reserved Python keyword or built-in function name.
- For each identified method name, get the top 5 most similar words using the most_similar method of the trained Word2Vec model, excluding any words that are already in the context or are reserved keywords/built-in function names.
- Return the top 5 similar words for each identified method name.



In [1]:
import os

# Create the 'Python Repositories' directory if it doesn't already exist
if not os.path.exists('Python Repositories'):
    os.makedirs('Python Repositories')

import subprocess

# List of regular link repositories to download
repos = ['matplotlib', 'scikit-learn', 'numpy', 'django', 'scipy', 'ansible', 'scrapy', 'Mailpile', 'sshuttle']

# Clone each repository into the 'Python Repositories' directory
for repo in repos:
    # Checking if the repository already exists
    if not os.path.exists(f'Python Repositories/{repo}'):
        subprocess.run(['git', 'clone', f'https://github.com/{repo}/{repo}.git', f'Python Repositories/{repo}'])
    else:
        print(f'Python Repositories/{repo} already exists')

# List of non regular link repositories to download
additional_repos = ['https://github.com/pandas-dev/pandas', 'https://github.com/pallets/flask', 'https://github.com/psf/requests', 'https://github.com/getsentry/sentry', 'https://github.com/saltstack/salt', 'https://github.com/samuelclay/NewsBlur', 'https://github.com/beetbox/beets']

# Clone each repository into the 'Python Repositories' directory
for repo in additional_repos:
    # Checking if the repository already exists
    if not os.path.exists(f'Python Repositories/{repo.split("/")[-1]}'):
        subprocess.run(['git', 'clone', repo, f'Python Repositories/{repo.split("/")[-1]}'])
    else:
        print(f'Python Repositories/{repo.split("/")[-1]} already exists')


Python Repositories/matplotlib already exists
Python Repositories/scikit-learn already exists
Python Repositories/numpy already exists
Python Repositories/django already exists
Python Repositories/scipy already exists
Python Repositories/ansible already exists
Python Repositories/scrapy already exists
Python Repositories/Mailpile already exists
Python Repositories/sshuttle already exists
Python Repositories/pandas already exists
Python Repositories/flask already exists
Python Repositories/requests already exists
Python Repositories/sentry already exists
Python Repositories/salt already exists


Cloning into 'Python Repositories/NewsBlur'...
Updating files:  97% (5973/6157)

Python Repositories/beets already exists


Updating files: 100% (6157/6157), done.


In [2]:
# Importing the required libraries
import gensim
import numpy as np
from nltk import word_tokenize

# Load the trained Python2Vec model
model = gensim.models.Word2Vec.load('python2vec.model')

## Implementation of the trained Python2Vec model for autocomplete and suggesting method names

In [3]:
# Define a function to generate autocomplete suggestions using the Word2Vec model
def autocomplete(method_name):
    # Get the most similar method names
    similar_methods = model.wv.most_similar(positive=[method_name], topn=10)

    # Return the most similar method names
    return similar_methods

In [4]:
# Define a function to suggest method names
def suggest_method_names(context):
    """
    Given a code context, generate a list of suggested method names based on the trained Python2Vec model.
    """
    # Tokenize the context
    context_tokens = [token.lower() for token in word_tokenize(context) if token.isalpha()]

    # Get the vector representation of the context by averaging the vectors of its constituent tokens
    context_vector = np.mean([model.wv[token] for token in context_tokens if token in model.wv], axis=0)

    # Find the 10 most similar words to the context vector
    similar_words = model.wv.similar_by_vector(context_vector, topn=10)

    # Return the words with the highest similarity that aren't already in the context
    return [word for word, similarity in similar_words if word not in context_tokens][:5]

## Example usage of the autocomplete and suggest_method_names functions

In [5]:
prefix = 'np'
autocomplete_results = autocomplete(prefix)
print(f'Autocomplete suggestions for {prefix}: \n\n{autocomplete_results}')

context = 'import pandas as pd\n\ndef process_data(data):\n df = pd.DataFrame(data)\n # TODO: add more code here\n'
method_suggestions = suggest_method_names(context)
print(f'\nSuggested method names for {context}: \n\n{method_suggestions}')

Autocomplete suggestions for np: 

[('full_like', 0.6535319685935974), ('amax', 0.6041274070739746), ('set_numeric_ops', 0.6008948087692261), ('ones_like', 0.5989103317260742), ('unit_impulse', 0.597734272480011), ('asanyarray', 0.5955049395561218), ('hstack', 0.5954834818840027), ('nverts', 0.5910012125968933), ('vstack', 0.5907800197601318), ('logical_not', 0.58856600522995)]

Suggested method names for import pandas as pd

def process_data(data):
 df = pd.DataFrame(data)
 # TODO: add more code here
: 

['testdata', 'notnull', 'from_dict', 'merge_ordered', 'dtl']
