# Introduction

Notebook to load **bug reports**, **test cases** datasets and the **feature_matrixes** from the expert and volunteers responses given in the PyBossa applications, and create from them the **oracle** dataset. 

Is expected that this oracle be more precise than the version created on the previous notebook (__input_proc_fx_p1__), once the trace links are created based on the existing relationship between the bug report and a given Firefox Feature.

## Load Libraries and Data

In [8]:
import pandas as pd
import numpy as np
from sklearn.externals.joblib import Parallel, delayed
from tqdm import tqdm

from utils import aux_functions

testcases = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/TC/testcases_final.csv')
print('Test Cases Shape: {}'.format(testcases.shape))

bugreports = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/bugreports_final.csv')
print('Bug Reports shape: {}'.format(bugreports.shape))

expert_matrix = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/br_feat_recovery_empirical_study/pybossa-apps/recover_taskruns/br_2_feature_matrix_expert.csv')
expert_matrix.set_index('bug_number', inplace=True)
print('Expert Matrix shape: {}'.format(expert_matrix.shape))

#volunteers_matrix = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/br_feat_recovery_empirical_study/pybossa-apps/recover_taskruns/br_2_feature_matrix_volunteers.csv')
#print('Volunteers Matrix shape: {}'.format(volunteers_matrix.shape))

features = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/Features/features.csv')
print('Features shape: {}'.format(features.shape))

Test Cases Shape: (207, 12)
Bug Reports shape: (35314, 18)
Expert Matrix shape: (93, 21)
Features shape: (21, 5)


## EDA - Exploratory Data Analysis

In [9]:
testcases.head()

Unnamed: 0,TC_Number,TestDay,Feature_ID,Firefox_Feature,Gen_Title,Crt_Nr,Title,Preconditions,Steps,Expected_Result,tc_name,tc_desc
0,1,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,1,Notification - Popup Block,,1. Launch Firefox\n2. Navigate to http://www.p...,1. Firefox is successfully launched\n9. The al...,TC_1_TRG,1 20181221 20 <notificationbox> and <notificat...
1,2,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,2,Notification - Process Hang,,"1. Launch Firefox\n2. In the URL bar, navigate...",1. Firefox is successfully launched\n2. Firefo...,TC_2_TRG,2 20181221 20 <notificationbox> and <notificat...
2,3,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,3,Verify Notifications appear in RTL Mode,,"1. Launch Firefox\n2. In about:config, change ...",1. Firefox is successfully launched\n2.The for...,TC_3_TRG,3 20181221 20 <notificationbox> and <notificat...
3,4,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,4,Verify Notifications appear in High Contrast M...,,"1. While the browser is in High Contrast Mode,...",1. Firefox has been launched.\n2. Firefox begi...,TC_4_TRG,4 20181221 20 <notificationbox> and <notificat...
4,5,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,5,Verify notifications react to differing Zoom l...,,"1. While the browser is in High Contrast Mode,...",1. Firefox has been launched.\n2. Firefox begi...,TC_5_TRG,5 20181221 20 <notificationbox> and <notificat...


In [10]:
bugreports.head()

Unnamed: 0,Bug_Number,Summary,Platform,Component,Version,Creation_Time,Whiteboard,QA_Whiteboard,First_Comment_Text,First_Comment_Creation_Time,Status,Product,Priority,Resolution,Severity,Is_Confirmed,br_name,br_desc
0,506297,Livemarks with null site/feed uris cause sync ...,All,Sync,unspecified,2009-07-24T17:08:43Z,,,2009-07-24 09:54:28 FaultTolerance D...,2009-07-24T17:08:43Z,RESOLVED,Firefox,--,FIXED,normal,True,BR_506297_SRC,506297 Livemarks with null site/feed uris caus...
1,506338,Enhance Crash Recovery to better help the user,All,Session Restore,Trunk,2009-07-24T19:17:21Z,[crashkill][crashkill-metrics],,When our users crash they are pretty much in t...,2009-07-24T19:17:21Z,NEW,Firefox,--,,enhancement,True,BR_506338_SRC,506338 Enhance Crash Recovery to better help t...
2,506507,Dragging multiple bookmarks in the bookmarks s...,x86,Bookmarks & History,Trunk,2009-07-26T06:16:02Z,,,User-Agent: Mozilla/5.0 (Windows; U; Win...,2009-07-26T06:16:02Z,RESOLVED,Firefox,--,WORKSFORME,normal,True,BR_506507_SRC,506507 Dragging multiple bookmarks in the book...
3,506550,Unreliable Back Button navigating nytimes.com,x86,Extension Compatibility,3.5 Branch,2009-07-26T16:12:10Z,[caused by adblock plus][platform-rel-NYTimes],,User-Agent: Mozilla/5.0 (Windows; U; Win...,2009-07-26T16:12:10Z,RESOLVED,Firefox,--,FIXED,normal,False,BR_506550_SRC,506550 Unreliable Back Button navigating nytim...
4,506575,ALT + F4 when dropdown of autocomplete is open...,x86,Address Bar,3.5 Branch,2009-07-26T20:14:54Z,,,Pressing ALT + F4 when the autocomplete dropdo...,2009-07-26T20:14:54Z,NEW,Firefox,P5,,normal,True,BR_506575_SRC,506575 ALT + F4 when dropdown of autocomplete ...


In [11]:
features.head()

Unnamed: 0,Feature_Number,Firefox_Version,Firefox_Feature,Feature_Description,Reference
0,1,48 Branch + 50 Branch,New Awesome Bar,The Firefox address bar displays a page's web ...,https://support.mozilla.org/en-US/kb/awesome-b...
1,2,48 Branch,Windows Child Mode,Child mode is a feature of Windows that allows...,https://wiki.mozilla.org/QA/Windows_Child_Mode
2,3,48 Branch,APZ - Async Scrolling,The Async Pan/Zoom module (APZ) is a platform ...,https://wiki.mozilla.org/Platform/GFX/APZ
3,4,49 Branch,Browser Customization,. Install and Customize Firefox Themes. \n. Cu...,https://support.mozilla.org/en-US/kb/use-theme...
4,5,49 Branch,PDF Viewer,"Zoom in, Zoom out, Print and Save PDF Files.",https://support.mozilla.org/en-US/kb/view-pdf-...


In [12]:
expert_matrix.head()

Unnamed: 0_level_0,new_awesome_bar,windows_child_mode,apz_async_scrolling,browser_customization,pdf_viewer,context_menu,w10_comp,tts_in_desktop,tts_in_rm,webgl_comp,...,pointer_lock_api,webm_eme,zoom_indicator,downloads_dropmaker,webgl2,flac_support,indicator_device_perm,flash_support,notificationbox,update_directory
bug_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1181835,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248267,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248268,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1257087,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1264988,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Bug Reports Selection

In [7]:
brs_versions = ['48 Branch', '49 Branch', '50 Branch', '51 Branch']
brs_status = ['RESOLVED','VERIFIED']
brs_priority = ['P1', 'P2', 'P3']
brs_resolutions = ['FIXED']
brs_severities = ['major', 'normal', 'blocker', 'critical']
brs_isconfirmed = [True]
selected_bugs = bugreports[(bugreports.Version.isin(brs_versions)) &
                             (bugreports.Status.isin(brs_status)) &
                             (bugreports.Priority.isin(brs_priority)) &
                             (bugreports.Resolution.isin(brs_resolutions)) &
                             (bugreports.Severity.isin(brs_severities)) &
                             (bugreports.Is_Confirmed.isin(brs_isconfirmed))
                            ]
print(selected_bugs.shape)

(93, 18)


## Oracle

#### Add Feature_Name column to features dataset

In [25]:
features['Feature_Name'] = ['new_awesome_bar',
            'windows_child_mode',
            'apz_async_scrolling',
            'browser_customization',
            'pdf_viewer',
            'context_menu',
            'w10_comp', 
            'tts_in_desktop', 
            'tts_in_rm', 
            'webgl_comp',
            'video_and_canvas_render', 
            'pointer_lock_api',
            'webm_eme', 
            'zoom_indicator',
            'downloads_dropmaker',
            'webgl2', 
            'flac_support', 
            'indicator_device_perm',
            'flash_support',  
            'notificationbox',          
            'update_directory']
features.head()

Unnamed: 0,Feature_Number,Firefox_Version,Firefox_Feature,Feature_Description,Reference,Feature_Name
0,1,48 Branch + 50 Branch,New Awesome Bar,The Firefox address bar displays a page's web ...,https://support.mozilla.org/en-US/kb/awesome-b...,new_awesome_bar
1,2,48 Branch,Windows Child Mode,Child mode is a feature of Windows that allows...,https://wiki.mozilla.org/QA/Windows_Child_Mode,windows_child_mode
2,3,48 Branch,APZ - Async Scrolling,The Async Pan/Zoom module (APZ) is a platform ...,https://wiki.mozilla.org/Platform/GFX/APZ,apz_async_scrolling
3,4,49 Branch,Browser Customization,. Install and Customize Firefox Themes. \n. Cu...,https://support.mozilla.org/en-US/kb/use-theme...,browser_customization
4,5,49 Branch,PDF Viewer,"Zoom in, Zoom out, Print and Save PDF Files.",https://support.mozilla.org/en-US/kb/view-pdf-...,pdf_viewer


In [35]:
display(expert_matrix.head())
expert_matrix.iloc[0,5]

Unnamed: 0_level_0,new_awesome_bar,windows_child_mode,apz_async_scrolling,browser_customization,pdf_viewer,context_menu,w10_comp,tts_in_desktop,tts_in_rm,webgl_comp,...,pointer_lock_api,webm_eme,zoom_indicator,downloads_dropmaker,webgl2,flac_support,indicator_device_perm,flash_support,notificationbox,update_directory
bug_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1181835,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248267,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1248268,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1257087,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1264988,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


1

In [41]:
def get_features_from_expert_matrix(br_id):
    features_ids = ""
    for col in expert_matrix.columns:
        if expert_matrix.at[br_id, col] == 1:
            if features_ids == "":
                features_ids = str(expert_matrix.columns.get_loc(col))
            else:
                features_ids = features_ids + " " + str(expert_matrix.columns.get_loc(col))
    return features_ids

selected_bugs['Features_IDs'] = ""
for br_id in expert_matrix.index:
    for idx2, br in selected_bugs.iterrows():
        if br.Bug_Number == br_id:
            selected_bugs.at[idx2, 'Features_IDs'] = get_features_from_expert_matrix(br_id)

selected_bugs[['Bug_Number','Features_IDs']].head()

Unnamed: 0,Bug_Number,Features_IDs
9192,1181835,5.0
10621,1248267,
10622,1248268,
10904,1257087,0.0
11150,1264988,


In [43]:
def check_link_condition(br, tc):
    if str(tc.Feature_ID) in br.Features_IDs.split(" "):
            return True
    return False

oracle_df = pd.DataFrame(columns=bugreports.br_name, index=testcases.tc_name, data=np.zeros(shape=(len(testcases),len(bugreports))), dtype='int8')
for idx_1,br in tqdm(selected_bugs.iterrows()):
    for idx_2,tc in testcases.iterrows():
        if check_link_condition(br, tc):
            oracle_df.at[tc.tc_name, br.br_name] = 1
        else:
            oracle_df.at[tc.tc_name, br.br_name] = 0

oracle_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final_emp_study.csv')

93it [00:01, 47.58it/s]


In [45]:
oracle_df.shape
display(oracle_df.head())

br_name,BR_506297_SRC,BR_506338_SRC,BR_506507_SRC,BR_506550_SRC,BR_506575_SRC,BR_506729_SRC,BR_506768_SRC,BR_506795_SRC,BR_506820_SRC,BR_506831_SRC,...,BR_1516270_SRC,BR_1516329_SRC,BR_1516358_SRC,BR_1516416_SRC,BR_1516505_SRC,BR_1516547_SRC,BR_1516582_SRC,BR_1516749_SRC,BR_1516792_SRC,BR_1516895_SRC
tc_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
TC_1_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_2_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_3_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_4_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
TC_5_TRG,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Oracle Functions and Auxiliary Variables

In [8]:
list_fversion_to_testday = [('48 Branch','20160603'),('48 Branch','20160624'),('48 Branch','20160708'),
                            ('49 Branch','20160722'),('49 Branch','20160812'),('49 Branch','20160826'),
                            ('50 Branch','20160909'),('50 Branch','20160930'),('50 Branch','20161014'),
                            ('51 Branch','20161028'),('51 Branch','20161125'),('51 Branch','20170106')]

NUMBER_SUBSETS = 7

def check_link_condition(br, tc):
    for tup in [(br['Version'],tday) for tday in tc['TestDay'].split(' + ')]:
        if tup in list_fversion_to_testday:
            return True
    return False


def create_links(idx, tc_df, br_df):
    oracle_df = pd.DataFrame(columns=br_df.br_name, index=tc_df.tc_name, data=np.zeros(shape=(len(tc_df),len(br_df))), dtype='int8')
    for idx_1,br in tqdm(br_df.iterrows()):
        for idx_2,tc in tc_df.iterrows():
            if check_link_condition(br, tc):
                oracle_df.at[tc.tc_name, br.br_name] = 1
            else:
                oracle_df.at[tc.tc_name, br.br_name] = 0
    
    oracle_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/part/trace_matrix_{}.csv'.format(idx))

def create_br_dfs_list():
    list_br_dfs = []
    for i in range(0, 35315, 5045):   # 35315 / 5045 == NUMBER_SUBSETS
        list_br_dfs.append(bugreports_final.iloc[i:i+5045,:])
    return list_br_dfs

def create_tc_dfs_list():
    return [testcases.copy() for i in range(NUMBER_SUBSETS)]

#### Create Small Size Oracle

In [9]:
br_aux = bugreports_final[(bugreports_final.Version == '50 Branch') | (bugreports_final.Version == '60 Branch')].sample(15, random_state=42)
tc_aux = testcases[(testcases.TestDay.str.contains('20161014')) | (testcases.TestDay.str.contains('20161028'))].sample(10, random_state=1000)

br_aux[br_aux.Version == '50 Branch'].loc[:, ['Bug_Number','Version']].head(100)

Unnamed: 0,Bug_Number,Version
14902,1319983,50 Branch
14763,1318407,50 Branch
12484,1287109,50 Branch
14981,1320548,50 Branch
15367,1325288,50 Branch
12059,1280856,50 Branch


In [10]:
tc_aux[tc_aux.TestDay.str.contains('20161014')].loc[:,['TC_Number','TestDay']].head(100)

Unnamed: 0,TC_Number,TestDay
18,19,20160603 + 20160624 + 20161014
15,16,20160603 + 20160624 + 20161014
14,15,20160603 + 20160624 + 20161014


In [11]:
create_links('small', tc_aux, br_aux)

small_orc = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/part/trace_matrix_small.csv')
aux_functions.highlight_df(small_orc)

15it [00:00, 504.96it/s]


Unnamed: 0,tc_name,BR_1441532_SRC,BR_1319983_SRC,BR_1443343_SRC,BR_1464815_SRC,BR_1318407_SRC,BR_1468122_SRC,BR_1445895_SRC,BR_1459431_SRC,BR_1287109_SRC,BR_1320548_SRC,BR_1469153_SRC,BR_1325288_SRC,BR_1463768_SRC,BR_1469753_SRC,BR_1280856_SRC
0,TC_165_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,TC_19_TRG,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1
2,TC_152_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,TC_16_TRG,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1
4,TC_160_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,TC_169_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,TC_145_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,TC_15_TRG,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1
8,TC_149_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,TC_167_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Create Entire Oracle

In [12]:
tasks = [(idx,tc_df, br_df) for idx,(tc_df,br_df) in enumerate(zip(create_tc_dfs_list(),create_br_dfs_list()))]
results = Parallel(n_jobs=7, verbose=3)(delayed(create_links)(idx,tc_df,br_df) for idx,tc_df,br_df in tasks)

[Parallel(n_jobs=7)]: Using backend LokyBackend with 7 concurrent workers.
[Parallel(n_jobs=7)]: Done   3 out of   7 | elapsed:  4.0min remaining:  5.3min
[Parallel(n_jobs=7)]: Done   7 out of   7 | elapsed:  4.1min finished


#### Analyze Oracle Parts Created

In [13]:
oo_df_2 = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/part/trace_matrix_2.csv')
oo_df_2.set_index('tc_name', inplace=True)

print(oo_df_2.loc['TC_15_TRG', 'BR_1319983_SRC'])
print(oo_df_2.loc['TC_16_TRG', 'BR_1319983_SRC'])
print(oo_df_2.loc['TC_19_TRG', 'BR_1319983_SRC'])

1
1
1


In [14]:
oo_dfs = []
for i in range(NUMBER_SUBSETS):
    df = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/part/trace_matrix_{}.csv'.format(i))
    oo_dfs.append(df)
    print(df.shape)

(207, 5046)
(207, 5046)
(207, 5046)
(207, 5046)
(207, 5046)
(207, 5046)
(207, 5045)


#### Join Oracle Parts

In [15]:
oo_df = pd.DataFrame(index=testcases.tc_name, dtype='int8')
for i in range(NUMBER_SUBSETS):
    aux_df = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/part/trace_matrix_{}.csv'.format(i))
    aux_df.set_index('tc_name', inplace=True)
    oo_df = oo_df.join(aux_df)

print(oo_df.shape)
print(oo_df.info())

(207, 35314)
<class 'pandas.core.frame.DataFrame'>
Index: 207 entries, TC_1_TRG to TC_207_TRG
Columns: 35314 entries, BR_506297_SRC to BR_1516895_SRC
dtypes: int64(35314)
memory usage: 55.8+ MB
None


## Save DataFrames

In [16]:
oo_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final.csv')
bugreports_final.to_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/bugreports_final.csv', index=False)
testcases.to_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/TC/testcases_final.csv', index=False)

# -----

#### Checking Values [0]

#### Analyze Entire Oracle Created

In [17]:
oo_df_full = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final.csv')
oo_df_full.set_index('tc_name', inplace=True)

print(oo_df_full.loc['TC_15_TRG', 'BR_1319983_SRC'])
print(oo_df_full.loc['TC_16_TRG', 'BR_1319983_SRC'])
print(oo_df_full.loc['TC_19_TRG', 'BR_1319983_SRC'])

1
1
1


### -----

#### Checking Values [1]
FVersion to TestDay

In [18]:
ck_df = pd.DataFrame(columns=['testday','f_version','features_released','testcases_list'])
ck_df.testday = ['20160603', '20160624', '20160708', 
                 '20160722', '20160812', '20160826', 
                 '20160909', '20160930', '20161014', 
                 '20161028', '20161125', '20170106']
ck_df.f_version = ['48 Branch', '48 Branch', '48 Branch', 
                  '49 Branch', '49 Branch', '49 Branch', 
                  '50 Branch', '50 Branch', '50 Branch', 
                  '51 Branch', '51 Branch', '51 Branch' ]
ck_df.features_released = [
    "Awesome Bar Search, Awesome Bar Icons - Left, Awesome Bar Icons - Right",
    "Awesome Bar Search, Awesome Bar Icons - Left, Awesome Bar Icons - Right",
    "apz, Scrolling using different devices (wired mouse, wireless mouse, trackpad/touchpad) - where available devices",
    'context menu - exploratory testing, context menu - full functional testing, pdf viewer, browser customization',
    'windows 10 compatibility, text to speech in reader mode, text to speech on desktop',
    'webgl compatibility, exploratory testing',
    '',
    'Pointer Lock API, WebM EME support for Widevine',
    'New Awesome Bar',
    'Zoom indicator, Downloads dropmaker',
    'WebGL2,  FLAC support,  Indicator for device permissions,  Zoom Indicator',
    'WebGL2, Zoom Indicator, Flash support']

ck_df.testcases_list = ""

included = []
for i,tc in testcases.iterrows():
    for j,row in ck_df.iterrows(): 
        if row['testday'] in tc['TestDay']:
            if ck_df.at[j,'testcases_list'] == "":
                ck_df.at[j,'testcases_list'] = str(tc.TC_Number)
            else:
                ck_df.at[j,'testcases_list'] = ck_df.at[j,'testcases_list'] + " " + str(tc.TC_Number)
            if tc.TC_Number not in included:
                included.append(tc.TC_Number)

ck_df.head(20)

Unnamed: 0,testday,f_version,features_released,testcases_list
0,20160603,48 Branch,"Awesome Bar Search, Awesome Bar Icons - Left, ...",13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 2...
1,20160624,48 Branch,"Awesome Bar Search, Awesome Bar Icons - Left, ...",13 14 15 16 17 18 19 20 21 22 23 24 25
2,20160708,48 Branch,"apz, Scrolling using different devices (wired ...",37 38 39 40 41 42 43 44 45 46 47
3,20160722,49 Branch,"context menu - exploratory testing, context me...",59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 7...
4,20160812,49 Branch,"windows 10 compatibility, text to speech in re...",104 105 106 107 108 109 110 111 112 113 114 11...
5,20160826,49 Branch,"webgl compatibility, exploratory testing",120 121 122 123 124
6,20160909,50 Branch,,
7,20160930,50 Branch,"Pointer Lock API, WebM EME support for Widevine",125 126 127 128 129 130 131 132 133 134 135 13...
8,20161014,50 Branch,New Awesome Bar,13 14 15 16 17 18 19 20 21
9,20161028,51 Branch,"Zoom indicator, Downloads dropmaker",142 143 144 145 146 147 148 149 150 151 152 15...


In [19]:
ck_df.to_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/TD_2_FVersion/testday_to_fversion.csv')

#### Analyze Amount of Different Test Cases by Firefox Version

In [20]:
testcases.TestDay.value_counts()

20160722                          45
20160603                          22
20161125                          22
20161028                          18
20160930                          17
20160812                          16
20161125 + 20170106               13
20181221                          12
20160603 + 20160708               11
20161028 + 20161125 + 20170106    11
20160603 + 20160624 + 20161014     9
20160826                           5
20160603 + 20160624                4
20170106                           2
Name: TestDay, dtype: int64

In [21]:
testcases.TestDay.value_counts().sum()

207

In [22]:
b48_0 = ck_df.loc[0].testcases_list.split(' ')
b48_1 = ck_df.loc[1].testcases_list.split(' ')
b48_2 = ck_df.loc[2].testcases_list.split(' ')
b48 = b48_0 + b48_1 + b48_2
print('Amount Different Test Cases - 48 Branch: {}'.format(len(set(b48))))

b49_0 = ck_df.loc[3].testcases_list.split(' ')
b49_1 = ck_df.loc[4].testcases_list.split(' ')
b49_2 = ck_df.loc[5].testcases_list.split(' ')
b49 = b49_0 + b49_1 + b49_2
print('Amount Different Test Cases - 49 Branch: {}'.format(len(set(b49))))

#b50_0 = ck_df.loc[6].testcases_list.split(' ')
b50_1 = ck_df.loc[7].testcases_list.split(' ')
b50_2 = ck_df.loc[8].testcases_list.split(' ')
b50 = b50_1 + b50_2
print('Amount Different Test Cases - 50 Branch: {}'.format(len(set(b50))))

b51_0 = ck_df.loc[9].testcases_list.split(' ')
b51_1 = ck_df.loc[10].testcases_list.split(' ')
b51_2 = ck_df.loc[11].testcases_list.split(' ')
b51 = b51_0 + b51_1 + b51_2
print('Amount Different Test Cases - 51 Branch: {}'.format(len(set(b51))))

print('Total Amount TCs Sets Union (len(set(b48) | set(b49) | set(b50) | set(b51))): {}'.format(len(set(b48) | set(b49) | set(b50) | set(b51))))

Amount Different Test Cases - 48 Branch: 46
Amount Different Test Cases - 49 Branch: 66
Amount Different Test Cases - 50 Branch: 26
Amount Different Test Cases - 51 Branch: 66
Total Amount TCs Sets Union (len(set(b48) | set(b49) | set(b50) | set(b51))): 195


### ---------

#### Checking Values [2]

Checking amount of positive and negative links in oracle

In [23]:
oracle = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final.csv')
oracle.set_index('tc_name', inplace=True, drop=True)

print(oracle.shape)

(207, 35314)


In [24]:
num_pos = 0
num_neg = 0

for i in range(len(testcases)):
    counts = oracle.iloc[i, :].value_counts()
    if len(counts) == 2:
        num_neg = num_neg + counts[0]
        num_pos = num_pos + counts[1]
    else:
        num_neg = num_neg + counts[0]
    
print('Num Positive Links: {}'.format(num_pos))
print('Num Negative Links: {}'.format(num_neg))

Num Positive Links: 86144
Num Negative Links: 7223854


### ---------

#### Checking Values [3]

Checking subset of oracle

In [25]:
oracle = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/oracle/output/trace_matrix_final.csv')
oracle.set_index('tc_name', inplace=True, drop=True)

print(oracle.shape)

(207, 35314)


In [26]:
bugreports_subset_df = bugreports_final[(bugreports_final.Version == '50 Branch') | (bugreports_final.Version == '60 Branch')].sample(15, random_state=42)
bugreports_subset_df[bugreports_subset_df.Version == '50 Branch'].loc[:, ['Bug_Number','Version']].head(100)

Unnamed: 0,Bug_Number,Version
14902,1319983,50 Branch
14763,1318407,50 Branch
12484,1287109,50 Branch
14981,1320548,50 Branch
15367,1325288,50 Branch
12059,1280856,50 Branch


In [27]:
testcases_subset_df = testcases[(testcases.TestDay.str.contains('20161014')) | (testcases.TestDay.str.contains('20161028'))].sample(10, random_state=1000)

selected_testcases = ['TC_{}_TRG'.format(tc_num) for tc_num in [13,14,15,16,17,18,19,20,21]]  # should link with 50 Branch
aux_tc = testcases[testcases.tc_name.isin(selected_testcases)]

tc_subset_df = testcases_subset_df.append(aux_tc)
tc_subset_df.drop_duplicates(inplace=True)

tc_subset_df[tc_subset_df.TestDay.str.contains('20161014')].loc[:,['TC_Number','TestDay']].head(100)

Unnamed: 0,TC_Number,TestDay
18,19,20160603 + 20160624 + 20161014
15,16,20160603 + 20160624 + 20161014
14,15,20160603 + 20160624 + 20161014
12,13,20160603 + 20160624 + 20161014
13,14,20160603 + 20160624 + 20161014
16,17,20160603 + 20160624 + 20161014
17,18,20160603 + 20160624 + 20161014
19,20,20160603 + 20160624 + 20161014
20,21,20160603 + 20160624 + 20161014


In [28]:
testcases_names_subset = tc_subset_df.tc_name
bug_reports_names_subset = bugreports_subset_df.br_name
orc_subset_df = oracle.loc[testcases_names_subset, bug_reports_names_subset]

print('TestCases Subset Shape: {}'.format(tc_subset_df.shape))
print('BugReports Subset Shape: {}'.format(bugreports_subset_df.shape))
print('Oracle Subset Shape: {}'.format(orc_subset_df.shape))

TestCases Subset Shape: (16, 12)
BugReports Subset Shape: (15, 18)
Oracle Subset Shape: (16, 15)


In [29]:
aux_functions.highlight_df(orc_subset_df)

Unnamed: 0_level_0,BR_1441532_SRC,BR_1319983_SRC,BR_1443343_SRC,BR_1464815_SRC,BR_1318407_SRC,BR_1468122_SRC,BR_1445895_SRC,BR_1459431_SRC,BR_1287109_SRC,BR_1320548_SRC,BR_1469153_SRC,BR_1325288_SRC,BR_1463768_SRC,BR_1469753_SRC,BR_1280856_SRC
tc_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
TC_165_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
TC_19_TRG,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1
TC_152_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
TC_16_TRG,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1
TC_160_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
TC_169_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
TC_145_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
TC_15_TRG,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1
TC_149_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
TC_167_TRG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
