# Introduction - Oracle Version 2

Notebook to load **bug reports**, **test cases** datasets and the **feature_matrixes** from the expert and volunteers responses given in the PyBossa applications, and create from them the **oracle** dataset. 

In this notebook we create a version of the oracle based on the results obtained from the empirical study made with volunteers and the PyBossa application. The relationship between bug reports and test cases is established through the firefox features shared between both artifacts.

Is expected that this oracle be more precise than the version created on the previous notebook (__oracle_v1__), once the trace links are created based on the existing relationship between the bug report and a given Firefox Feature.

# Load Libraries and Data

In [1]:
from mod_finder_util import mod_finder_util
mod_finder_util.add_modules_origin_search_path()

import pandas as pd
import numpy as np
from sklearn.externals.joblib import Parallel, delayed
from tqdm import tqdm

from modules.utils import aux_functions
from modules.utils import firefox_dataset_p2 as fd

In [2]:
testcases = fd.read_testcases_df()
bugreports = fd.read_bugreports_df()
features = fd.read_features_df()
expert_matrx = fd.read_expert_matrix_df()
volunteers_matrix = fd.read_volunteers_matrix_df()

TestCases.shape: (207, 12)
BugReports.shape: (93, 19)
Features.shape: (21, 8)
Expert Matrix shape: (93, 21)
Volunteers Matrix shape: (63, 21)


# EDA - Exploratory Data Analysis

In [36]:
testcases.head()

Unnamed: 0,TC_Number,TestDay,Feature_ID,Firefox_Feature,Gen_Title,Crt_Nr,Title,Preconditions,Steps,Expected_Result,tc_name,tc_desc
0,1,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,1,Notification - Popup Block,,1. Launch Firefox\n2. Navigate to http://www.p...,1. Firefox is successfully launched\n9. The al...,TC_1_TRG,1 20181221 20 <notificationbox> and <notificat...
1,2,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,2,Notification - Process Hang,,"1. Launch Firefox\n2. In the URL bar, navigate...",1. Firefox is successfully launched\n2. Firefo...,TC_2_TRG,2 20181221 20 <notificationbox> and <notificat...
2,3,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,3,Verify Notifications appear in RTL Mode,,"1. Launch Firefox\n2. In about:config, change ...",1. Firefox is successfully launched\n2.The for...,TC_3_TRG,3 20181221 20 <notificationbox> and <notificat...
3,4,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,4,Verify Notifications appear in High Contrast M...,,"1. While the browser is in High Contrast Mode,...",1. Firefox has been launched.\n2. Firefox begi...,TC_4_TRG,4 20181221 20 <notificationbox> and <notificat...
4,5,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,5,Verify notifications react to differing Zoom l...,,"1. While the browser is in High Contrast Mode,...",1. Firefox has been launched.\n2. Firefox begi...,TC_5_TRG,5 20181221 20 <notificationbox> and <notificat...


In [37]:
bugreports.head()

Unnamed: 0,Bug_Number,Summary,Platform,Component,Version,Creation_Time,Whiteboard,QA_Whiteboard,First_Comment_Text,First_Comment_Creation_Time,Status,Product,Priority,Resolution,Severity,Is_Confirmed,br_name,br_desc,Features_IDs
0,1181835,Provide a UI for migrating users' add-ons to w...,Unspecified,Extension Compatibility,49 Branch,2015-07-08T23:01:45Z,[UX] triaged,,We're still not exactly sure how this would wo...,2015-07-08T23:01:45Z,RESOLVED,Firefox,P2,FIXED,normal,True,BR_1181835_SRC,1181835 Provide a UI for migrating users' add-...,6.0
1,1248267,"Right click on bookmark item of ""Recently Book...",Unspecified,Bookmarks & History,48 Branch,2016-02-14T17:45:54Z,,,Steps To Reproduce: 1. Open Bookmarks menu 2. ...,2016-02-14T17:45:54Z,VERIFIED,Firefox,P3,FIXED,normal,True,BR_1248267_SRC,"1248267 Right click on bookmark item of ""Recen...",
2,1248268,"Unable to disable ""Recently bookmarked""",All,Bookmarks & History,48 Branch,2016-02-14T17:54:44Z,,,Created attachment 8719295 Firefox Nightly 47_...,2016-02-14T17:54:44Z,VERIFIED,Firefox,P3,FIXED,major,True,BR_1248268_SRC,"1248268 Unable to disable ""Recently bookmarked...",
3,1257087,Middle mouse click on history item would not open,Unspecified,Bookmarks & History,48 Branch,2016-03-16T05:13:47Z,,,[Tracking Requested - why for this release]: r...,2016-03-16T05:13:47Z,VERIFIED,Firefox,P2,FIXED,normal,True,BR_1257087_SRC,1257087 Middle mouse click on history item wou...,1.0
4,1264988,Scrollbar appears for a moment in the new Awes...,All,Address Bar,48 Branch,2016-04-15T15:17:33Z,[fxsearch] [photon-performance],,Created attachment 8741829 Bug.mov User Agent...,2016-04-15T15:17:33Z,VERIFIED,Firefox,P1,FIXED,normal,True,BR_1264988_SRC,1264988 Scrollbar appears for a moment in the ...,


In [3]:
features.head()

Unnamed: 0,Feature_Number,Feature_Shortname,Firefox_Version,Firefox_Feature,Feature_Description,Reference
0,1,new_awesome_bar,48 Branch + 50 Branch,New Awesome Bar,The Firefox address bar displays a page's web ...,https://support.mozilla.org/en-US/kb/awesome-b...
1,2,windows_child_mode,48 Branch,Windows Child Mode,Child mode is a feature of Windows that allows...,https://wiki.mozilla.org/QA/Windows_Child_Mode
2,3,apz_async_scrolling,48 Branch,APZ - Async Scrolling,The Async Pan/Zoom module (APZ) is a platform ...,https://wiki.mozilla.org/Platform/GFX/APZ
3,4,browser_customization,49 Branch,Browser Customization,. Install and Customize Firefox Themes. \n. Cu...,https://support.mozilla.org/en-US/kb/use-theme...
4,5,pdf_viewer,49 Branch,PDF Viewer,"Zoom in, Zoom out, Print and Save PDF Files.",https://support.mozilla.org/en-US/kb/view-pdf-...


In [39]:
expert_matrix.head()

Unnamed: 0_level_0,new_awesome_bar,windows_child_mode,apz_async_scrolling,browser_customization,pdf_viewer,context_menu,w10_comp,tts_in_desktop,tts_in_rm,webgl_comp,...,pointer_lock_api,webm_eme,zoom_indicator,downloads_dropmaker,webgl2,flac_support,indicator_device_perm,flash_support,notificationbox,update_directory
bug_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1181835,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248267,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248268,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1257087,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1264988,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [40]:
volunteers_matrix.head()

Unnamed: 0_level_0,new_awesome_bar,windows_child_mode,apz_async_scrolling,browser_customization,pdf_viewer,context_menu,w10_comp,tts_in_desktop,tts_in_rm,webgl_comp,...,pointer_lock_api,webm_eme,zoom_indicator,downloads_dropmaker,webgl2,flac_support,indicator_device_perm,flash_support,notificationbox,update_directory
bug_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1181835,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248267,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248268,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1257087,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1264988,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# Oracle

## Get Features by Bug Report

In [4]:
def get_features(br_id, from_matrix=""):
    features_ids = ""
    for col in volunteers_matrix.columns:
        if from_matrix == "EXPERT_AND_VOLUNTEERS_MATRICES":
            if expert_matrix.at[br_id, col] == 1 and volunteers_matrix.at[br_id, col] == 1:
                if features_ids == "":
                    features_ids = str(expert_matrix.columns.get_loc(col) + 1)
                else:
                    features_ids = features_ids + " " + str(expert_matrix.columns.get_loc(col) + 1)
        elif from_matrix == "EXPERT_MATRIX":
            if expert_matrix.at[br_id, col] == 1:
                if features_ids == "":
                    features_ids = str(expert_matrix.columns.get_loc(col) + 1)
                else:
                    features_ids = features_ids + " " + str(expert_matrix.columns.get_loc(col) + 1)
        elif from_matrix == "VOLUNTEERS_MATRIX":
            if volunteers_matrix.at[br_id, col] == 1:
                if features_ids == "":
                    features_ids = str(volunteers_matrix.columns.get_loc(col) + 1)
                else:
                    features_ids = features_ids + " " + str(volunteers_matrix.columns.get_loc(col) + 1)
            
    return features_ids

matrices_names = [('exp_vol_m','EXPERT_AND_VOLUNTEERS_MATRICES'),
                  ('exp_m','EXPERT_MATRIX'),
                  ('vol_m','VOLUNTEERS_MATRIX')]

bugreports['Features_IDs'] = ""
for br_id in volunteers_matrix.index:
    for idx2, br in bugreports.iterrows():
        if br.Bug_Number == br_id:
            for mat_code,mat_name in matrices_names:
                bugreports.at[idx2, 'Features_IDs_'+mat_code] = get_features(br_id, from_matrix=mat_name)

bugreports[['Bug_Number','Features_IDs_exp_vol_m','Features_IDs_exp_m','Features_IDs_vol_m']].head()

Unnamed: 0,Bug_Number,Features_IDs_exp_vol_m,Features_IDs_exp_m,Features_IDs_vol_m
0,1181835,6.0,6.0,6.0
1,1248267,,,4.0
2,1248268,,,
3,1257087,1.0,1.0,1.0
4,1264988,,,


## Checking Link Condition Function

Check link condition function to decide if a given cell in the oracle has a positive link (1) or a negative link (0).

In [5]:
def check_link_condition(br, tc):
    if str(tc.Feature_ID) in br.Features_IDs.split(" "):
        return True
    return False

oracle_df = pd.DataFrame(columns=bugreports.br_name, index=testcases.tc_name, data=np.zeros(shape=(len(testcases),len(bugreports))), dtype='int8')
for idx_1,br in tqdm(bugreports.iterrows()):
    for idx_2,tc in testcases.iterrows():
        if check_link_condition(br, tc):
            oracle_df.at[tc.tc_name, br.br_name] = 1
        else:
            oracle_df.at[tc.tc_name, br.br_name] = 0


93it [00:02, 37.11it/s]


## Display Oracle

In [6]:
print('Oracle shape: {}\n'.format(oracle_df.shape))
display(oracle_df.head())

Oracle shape: (207, 93)



br_name,BR_1181835_SRC,BR_1248267_SRC,BR_1248268_SRC,BR_1257087_SRC,BR_1264988_SRC,BR_1267480_SRC,BR_1267501_SRC,BR_1269348_SRC,BR_1269485_SRC,BR_1270274_SRC,...,BR_1352539_SRC,BR_1353831_SRC,BR_1357085_SRC,BR_1357458_SRC,BR_1365887_SRC,BR_1408361_SRC,BR_1430603_SRC,BR_1432915_SRC,BR_1449700_SRC,BR_1451475_SRC
tc_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
TC_1_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_2_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_3_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_4_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_5_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Save Oracle

New Oracle Dataset
Dimension: 207 x 93

In [84]:
oracle_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final_emp_study.csv')

# Tests

## Checking Values [0]

### Analyze Entire Oracle Created

In [85]:
oo_df_full = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final_emp_study.csv')
oo_df_full.set_index('tc_name', inplace=True)

assert(oo_df_full.loc['TC_13_TRG', 'BR_1257087_SRC'] == 1)
assert(oo_df_full.loc['TC_14_TRG', 'BR_1271607_SRC'] == 1)
assert(oo_df_full.loc['TC_15_TRG', 'BR_1276120_SRC'] == 1)

## Checking Values [1]
Amount Positive and Negative Links

In [86]:
positive_links = 0
negative_links = 0
for idx,row in oo_df_full.iterrows():
    for col in oo_df_full.columns:
        if row[col] == 1:
            positive_links = positive_links + 1
        else:
            negative_links = negative_links + 1

print("Positive Links Amount: {}".format(positive_links))
print("Negative Links Amount: {}".format(negative_links))

Positive Links Amount: 915
Negative Links Amount: 18336


### Expected Amount of Positive and Negative Links

In [87]:
for idx, row in bugreports.iterrows():
    amount_tcs = 0  # amount of testcases of feature
    for f_id in row.Features_IDs.split(" "):
        if f_id != "":
            amount_tcs = amount_tcs + len(testcases[testcases.Feature_ID == int(f_id)])

    bugreports.at[idx, 'Amount_TCs'] = amount_tcs
    
display(bugreports[['Bug_Number', 'Features_IDs', 'Amount_TCs']].head(10))

positives_amount = bugreports.Amount_TCs.sum()
negatives_amount = len(bugreports) * len(testcases) - positives_amount

print("Total Amount of Expected Positive Links: {}".format(positives_amount))
print("Total Amount of Expected Negative Links: {}".format(negatives_amount))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Unnamed: 0,Bug_Number,Features_IDs,Amount_TCs
9192,1181835,6.0,31.0
10621,1248267,,0.0
10622,1248268,,0.0
10904,1257087,1.0,13.0
11150,1264988,,0.0
11229,1267480,3.0,22.0
11232,1267501,,0.0
11331,1269348,3.0,22.0
11339,1269485,,0.0
11375,1270274,6.0,31.0


Total Amount of Expected Positive Links: 915.0
Total Amount of Expected Negative Links: 18336.0


# Analysis

## Kappa - Bug Reports x Features : Volunteers and Expert
Calculate Cohen's Kappa 

The Cohen's Kappa Score measures the iter-rater agreement for qualitive answers (categorical items). It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance.

In [55]:
from sklearn.metrics import cohen_kappa_score

expert_answers = []
volunteers_answers = []

for idx,row in volunteers_matrix.iterrows():
    for col in volunteers_matrix.columns:
        volunteers_answers.append(volunteers_matrix.at[idx,col])
        expert_answers.append(expert_matrix.at[idx,col])

print("Expert Answers Length: {}".format(len(expert_answers)))
print("Volunteers Answers Length: {}".format(len(volunteers_answers)))

print("Cohen Kappa Score: {}".format(cohen_kappa_score(expert_answers, volunteers_answers)))

Expert Answers Length: 1323
Volunteers Answers Length: 1323
Cohen Kappa Score: 0.5528894896924637
