# Introduction - Using COSINE Metric

In this notebook we demonstrate the use of **LSI (Latent Semantic Indexing)** technique of Information Retrieval context to make trace link recovery between Features and Bug Reports.

We model our study as follows:

* Each bug report title, summary and description compose a single query.
* We use each feature description and title as an entire document that must be returned to the query made

This notebook follows the analysis made in **oracle_v2_analysis**, where we obtained an Cohen's kappa score of _0.41_ between the answers of the researcher and the answers of the volunteers.

# Import Libraries

In [1]:
from mod_finder_util import mod_finder_util
mod_finder_util.add_modules_origin_search_path()

import pandas as pd

from modules.models_runner.feat_br_runner import Feat_BR_Runner
from modules.utils import aux_functions

from IPython.display import display

import warnings; warnings.simplefilter('ignore')

# Running LSI Model

In [2]:
%%time

runner = Feat_BR_Runner()
lsi_model, lsi_eval = runner.run_lsi_model()

Features.shape: (21, 8)
SelectedBugReports2.shape: (93, 22)
Expert and Volunteers Matrix.shape: (21, 93)

Model Evaluation -------------------------------------------
{'Measures': {'Mean FScore of LSI_Model_Feat_BR': 0.07526881720430108,
              'Mean Precision of LSI_Model_Feat_BR': 0.07526881720430108,
              'Mean Recall of LSI_Model_Feat_BR': 0.07526881720430108},
 'Setup': [{'Name': 'LSI_Model_Feat_BR'},
           {'Similarity Measure and Minimum Threshold': ('cosine', 0.8)},
           {'Top Value': 100},
           {'SVD Model': {'algorithm': 'randomized',
                          'n_components': 100,
                          'n_iter': 10,
                          'random_state': 42,
                          'tol': 0.0}},
           {'Vectorizer': {'analyzer': 'word',
                           'binary': False,
                           'decode_error': 'strict',
                           'dtype': <class 'numpy.float64'>,
                           'encoding':

In [3]:
aux_functions.highlight_df(runner.orc.iloc[0:20, 0:9])

Unnamed: 0_level_0,BR_1181835_SRC,BR_1248267_SRC,BR_1248268_SRC,BR_1257087_SRC,BR_1264988_SRC,BR_1267480_SRC,BR_1267501_SRC,BR_1269348_SRC,BR_1269485_SRC
feat_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
new_awesome_bar,0,0,0,1,0,0,0,0,0
windows_child_mode,0,0,0,0,0,0,0,0,0
apz_async_scrolling,0,0,0,0,0,0,0,0,0
browser_customization,0,0,0,0,0,0,0,0,0
pdf_viewer,0,0,0,0,0,0,0,0,0
context_menu,1,0,0,0,0,0,0,0,0
w10_comp,0,0,0,0,0,0,0,0,0
tts_in_desktop,0,0,0,0,0,0,0,0,0
tts_in_rm,0,0,0,0,0,0,0,0,0
webgl_comp,0,0,0,0,0,0,0,0,0


In [4]:
aux_functions.highlight_df(lsi_model.get_trace_links_df().iloc[0:20, 0:9])

br_name,BR_1181835_SRC,BR_1248267_SRC,BR_1248268_SRC,BR_1257087_SRC,BR_1264988_SRC,BR_1267480_SRC,BR_1267501_SRC,BR_1269348_SRC,BR_1269485_SRC
feat_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
new_awesome_bar,0,0,0,0,1,0,0,0,0
windows_child_mode,0,0,0,0,0,0,0,0,0
apz_async_scrolling,0,0,0,0,0,0,0,0,0
browser_customization,0,0,0,0,0,0,0,0,0
pdf_viewer,0,0,0,0,0,0,0,0,0
context_menu,0,1,0,0,0,0,0,0,0
w10_comp,0,0,0,0,0,0,0,0,0
tts_in_desktop,0,0,0,0,0,0,0,0,0
tts_in_rm,0,0,0,0,0,0,0,0,0
webgl_comp,0,0,0,0,0,0,0,0,0


In [5]:
aux_functions.highlight_df(lsi_model.get_sim_matrix().iloc[0:20, 0:9])

br_name,BR_1181835_SRC,BR_1248267_SRC,BR_1248268_SRC,BR_1257087_SRC,BR_1264988_SRC,BR_1267480_SRC,BR_1267501_SRC,BR_1269348_SRC,BR_1269485_SRC
feat_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
new_awesome_bar,0.338848,0.320741,0.568394,0.671144,0.931044,0.164355,0.631198,0.181918,0.640709
windows_child_mode,0.430389,0.0561108,0.406251,0.165129,0.0772069,0.212747,0.206565,0.126042,0.223301
apz_async_scrolling,0.133358,0.00122383,0.0295661,0.047032,0.0630632,0.042386,0.260099,0.00498738,0.173511
browser_customization,0.17833,0.0252172,0.279508,0.057479,0.139368,0.520398,0.0431226,0.102766,0.0478245
pdf_viewer,0.0217932,0.00557269,0.048238,0.0127021,0.0170318,0.193004,0.00952956,0.0227099,0.0105686
context_menu,0.0792703,0.971288,0.429624,0.44198,0.131035,0.0496182,0.33884,0.0514961,0.430191
w10_comp,0.521655,0.190804,0.318948,0.220759,0.23969,0.433016,0.491275,0.179211,0.514725
tts_in_desktop,0.27618,0.0175157,0.202421,0.0399245,0.0898432,0.400903,0.59806,0.0713803,0.528585
tts_in_rm,0.445137,0.0231009,0.256051,0.0526551,0.127672,0.476724,0.430549,0.0941412,0.433093
webgl_comp,0.252951,0.0160424,0.288438,0.0365664,0.0822864,0.367183,0.0274333,0.0653764,0.0304245
