# Introduction - Oracle Version 2

Notebook to load **bug reports**, **test cases** datasets and the **feature_matrixes** from the expert and volunteers responses given in the PyBossa applications, and create from them the **oracle** dataset. 

In this notebook we create a version of the oracle based on the results obtained from the empirical study made with volunteers and the PyBossa application. The relationship between bug reports and test cases is established through the firefox features shared between both artifacts.

Is expected that this oracle be more precise than the version created on the previous notebook (__oracle_v1__), once the trace links are created based on the existing relationship between the bug report and a given Firefox Feature.

# Load Libraries and Data

In [1]:
from mod_finder_util import mod_finder_util
mod_finder_util.add_modules_origin_search_path()

import pandas as pd
import numpy as np
from sklearn.externals.joblib import Parallel, delayed
from tqdm import tqdm

from modules.utils import aux_functions
from modules.utils import firefox_dataset_p2 as fd

In [3]:
testcases = fd.read_testcases_df()
bugreports = fd.read_bugreports_df()
features = fd.read_features_df()

print()
expert_matrix = fd.read_feat_br_expert_df()
volunteers_matrix = fd.read_feat_br_volunteers_df()

TestCases.shape: (207, 12)
BugReports.shape: (93, 18)
Features.shape: (21, 8)

Expert Matrix shape: (93, 21)
Volunteers Matrix shape: (73, 21)


# Oracles

In this section we create both oracles Features x Bug Reports and Test Cases x Bug Reports oracles from the data obtained from the empirical study.

## Data Origins Enum

Enumeration of the many data origins for the creation of the oracles

In [3]:
from enum import Enum

class DataOrigin(Enum):
    VOLUNTEERS = "VOLUNTEERS"
    EXPERT = "EXPERT"
    VOLUNTEERS_AND_EXPERT = "VOLUNTEERS_&_EXPERT"

## Features x Bug Reports Trace Matrix

### Create Features x Bug Reports Trace Matrix

In [4]:
feat_br_matrix = pd.DataFrame(index=bugreports.br_name, 
                              columns=features.feat_name, 
                              data=np.zeros((len(bugreports),len(features)),dtype='int8'))

print('Features x Bug Reports Trace Matrix shape: {}'.format(feat_br_matrix.shape))

for idx,row in volunteers_matrix.iterrows():
    for col in volunteers_matrix.columns:
        br_name = 'BR_{}_SRC'.format(idx)
        if expert_matrix.at[idx,col] == volunteers_matrix.at[idx,col]:
            feat_br_matrix.at[br_name,col] = expert_matrix.at[idx,col]
        else:
            feat_br_matrix.at[br_name,col] = 0  

print(feat_br_matrix.shape)

Features x Bug Reports Trace Matrix shape: (93, 21)
(93, 21)


### Save Features x Bug Reports Trace Matrix

In [5]:
feat_br_matrix.T.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/firefox_v2/feat_br/br_2_feature_matrix_expert_volunteers.csv', index=True)

## Test Cases x Bug Reports Trace Matrix

### Get Features by Bug Report

In [6]:
def get_features(br_id, data_origin):
    features_ids = ""
    for col in volunteers_matrix.columns:
        if data_origin == DataOrigin.VOLUNTEERS_AND_EXPERT:
            if expert_matrix.at[br_id, col] == 1 and volunteers_matrix.at[br_id, col] == 1:
                if features_ids == "":
                    features_ids = str(expert_matrix.columns.get_loc(col) + 1)
                else:
                    features_ids = features_ids + " " + str(expert_matrix.columns.get_loc(col) + 1)
        elif data_origin == DataOrigin.EXPERT:
            if expert_matrix.at[br_id, col] == 1:
                if features_ids == "":
                    features_ids = str(expert_matrix.columns.get_loc(col) + 1)
                else:
                    features_ids = features_ids + " " + str(expert_matrix.columns.get_loc(col) + 1)
        elif data_origin == DataOrigin.VOLUNTEERS:
            if volunteers_matrix.at[br_id, col] == 1:
                if features_ids == "":
                    features_ids = str(volunteers_matrix.columns.get_loc(col) + 1)
                else:
                    features_ids = features_ids + " " + str(volunteers_matrix.columns.get_loc(col) + 1)
            
    return features_ids

Unnamed: 0,Bug_Number,Features_IDs_exp_vol_m,Features_IDs_exp_m,Features_IDs_vol_m
0,1181835,6.0,6.0,6.0
1,1248267,,,4.0
2,1248268,,,
3,1257087,1.0,1.0,1.0
4,1264988,,,


### Get Trace Links between Bug Reports and Features 

**Generate AuxDf**

In [None]:
# TODO

In [None]:
matrices_names = [('exp_vol_m', DataOrigin.VOLUNTEERS_AND_EXPERT),
                  ('exp_m', DataOrigin.EXPERT),
                  ('vol_m', DataOrigin.VOLUNTEERS)]

aux_df = pd.DataFrame(columns=['Bug_Number','Features_IDs_exp_vol_m',
                               'Features_IDs_exp_m','Features_IDs_vol_m'])

aux_df['Bug_Number'] = bugreports.Bug_Number

for br_id in volunteers_matrix.index:
    for idx2, br in bugreports.iterrows():
        if br.Bug_Number == br_id:
            for mat_code,d_origin in matrices_names:
                aux_df.at[idx2, 'Features_IDs_'+mat_code] = get_features(br_id, data_origin=d_origin)

aux_df[['Bug_Number','Features_IDs_exp_vol_m','Features_IDs_exp_m','Features_IDs_vol_m']].head()

### Save Aux_Df Dataframe

In [None]:
aux_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/firefox_v2/aux_data/aux_df.csv')

### Checking Link Condition Function

Check link condition function to decide if a given cell in the oracle has a positive link (1) or a negative link (0).

In [7]:
def check_link_condition(br, tc, data_origin):
    col_name = ""
    if data_origin == DataOrigin.VOLUNTEERS:
        col_name = "Features_IDs_vol_m"
    elif data_origin == DataOrigin.EXPERT:
        col_name = "Features_IDs_exp_m"
    elif data_origin == DataOrigin.VOLUNTEERS_AND_EXPERT:
        col_name = "Features_IDs_exp_vol_m"
    
    if str(tc.Feature_ID) in br[col_name].split(" "):
        return True
    return False

### Generate Oracles

In [8]:
def generate_oracle(data_origin):
    oracle_df = pd.DataFrame(columns=bugreports.br_name, index=testcases.tc_name, data=np.zeros(shape=(len(testcases),len(bugreports))), dtype='int8')
    for idx_1,br in tqdm(bugreports.iterrows()):
        for idx_2,tc in testcases.iterrows():
            if check_link_condition(br, tc, data_origin):
                oracle_df.at[tc.tc_name, br.br_name] = 1
            else:
                oracle_df.at[tc.tc_name, br.br_name] = 0
    
    return oracle_df

oracle_volunteers_df = generate_oracle(DataOrigin.VOLUNTEERS)
oracle_expert_df = generate_oracle(DataOrigin.EXPERT)
oracle_expert_volunteers_df = generate_oracle(DataOrigin.VOLUNTEERS_AND_EXPERT)

print('oracle_volunteers_df.shape: {}'.format(oracle_volunteers_df.shape))
print('oracle_expert_df.shape: {}'.format(oracle_expert_df.shape))
print('oracle_expert_volunteers_df.shape: {}'.format(oracle_expert_volunteers_df.shape))

93it [00:01, 46.01it/s]
93it [00:01, 47.36it/s]
93it [00:01, 48.03it/s]

oracle_volunteers_df.shape: (207, 93)
oracle_expert_df.shape: (207, 93)
oracle_expert_volunteers_df.shape: (207, 93)





### Save Oracles

In [10]:
oracle_volunteers_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/firefox_v2/tc_br/oracle_volunteers.csv')
oracle_expert_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/firefox_v2/tc_br/oracle_expert.csv')
oracle_expert_volunteers_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/firefox_v2/tc_br/oracle_expert_volunteers.csv')