GitHub - coinse/cs453-demo-irfl

Information Retrieval (IR) Based Fault Localisation

We will use tf-idf Vector Space Modelling (VSM) of documents to measure the similarity between the bug report and all source code files. For the hands-on, we will skip the various pre-processing stages, and only use English natural language stopwords filtering.

Dependencies

We will use scikit-learn to implement the vectorization and the similarity measurement.

Instructions

The provided irfl.py file has a skeleton to implement the IRFL heuristic. For the tf-idf vectorisation, we will use the TfidfVectorizer from the sklearn package (sklearn.feature_extraction.text.TfidfVectorizer). The API documentation is here. Note that you can submit a list of filenames to the vectorizer. This is why the step 1 is to collect all filenames. Step 2 is to use TfidfVectorizer to get the vector representations.

Collect all documents (i.e., the bug report and all source files):
Compute tf-idf vectors of each document

Given a matrix (i.e., a vector of vectors), you can use the pairwise cosine_similarity function from sklearn (sklearn.metrics.pairwise.cosine_similarity), whose documentation is here.

Compute cosine similarity between each vector
Rank source files using the similarity
Report the top five files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Lang10b		Lang10b
Lang1b		Lang1b
Lang10b_report.txt		Lang10b_report.txt
Lang1b_report.txt		Lang1b_report.txt
README.md		README.md
irfl.py		irfl.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lang10b

Lang10b

Lang1b

Lang1b

Lang10b_report.txt

Lang10b_report.txt

Lang1b_report.txt

Lang1b_report.txt

README.md

README.md

irfl.py

irfl.py

requirements.txt

requirements.txt

Repository files navigation

Information Retrieval (IR) Based Fault Localisation

Dependencies

Instructions

About

Releases

Packages

Languages

coinse/cs453-demo-irfl

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval (IR) Based Fault Localisation

Dependencies

Instructions

About

Resources

Stars

Watchers

Forks

Languages