# FlakyPaper

## Business Understanding

Whenever new code is written to develop or update software, a Web page, or an app, it must be tested throughout the development process to ensure that the application does what it is supposed to do when it is released for use. Logically, when subjected to the same test over and over again, the code will produce the same result: the application will either work correctly each time, thus passing the test, or it will not work correctly each time, thus failing the test.

However, seemingly at random, occasionally the same test will produce different results on the same codebase. Sometimes it will show that the code passed the test and the application worked as planned, and sometimes it will show that the code failed the test and did not work as planned. When this happens, the test is considered flaky.

Flaky can be caused by several factors:
1. a problem with the code just written
2. a problem with the test itself
3. some external factors that compromise the test results.

It is not always easy to detect such tests, it may happen that we run a test 10000 times and always have the same result, but if we run the test one more time we will have a different result. The purpose of "FlakyPaper" is...

## Data Engineering

There are two datasets to preprocess. The first is "DatasetGenerale.csv".

In [1]:
import pandas
import os

DATASET_NAME = "DatasetGenerale2.csv"

def loading_data_set(dataset_name):
    current_directory = os.getcwd()
    csv_path = os.path.join(current_directory,dataset_name)
    return pandas.read_csv(csv_path)

dataset = loading_data_set(DATASET_NAME)

### Data Understanding

In [2]:
dataset.head()

Unnamed: 0,idProject,nameProject,testCase,tloc,tmcCabe,assertionDensity,assertionRoulette,mysteryGuest,eagerTest,sensitiveEquality,...,mpc,halsteadVocabulary,halsteadLength,halsteadVolume,classDataShouldBePrivate,complexClass,functionalDecomposition,godClass,spaghettiCode,isFlaky
0,1,Activiti,org.activiti.runtime.api.model.impl.APITaskCan...,0.157329,0.017481,0.003885,0.017481,0.0,0.017481,0.0,...,0.034962,0.419544,0.314658,1.0,0.0,0.0,0.017481,0.0,0.0,0
1,1,Activiti,org.activiti.runtime.api.model.impl.APITaskCan...,0.157329,0.017481,0.003885,0.017481,0.0,0.017481,0.0,...,0.034962,0.419544,0.314658,1.0,0.0,0.0,0.017481,0.0,0.0,0
2,1,Activiti,org.activiti.runtime.api.model.impl.APITaskCon...,0.007551,0.003776,0.0,0.0,0.0,0.0,0.0,...,0.105719,0.558802,0.200112,1.0,0.0,0.0,0.007551,0.0,0.0,0
3,1,Activiti,org.activiti.runtime.api.model.impl.APITaskCon...,0.007551,0.003776,0.0,0.0,0.0,0.0,0.0,...,0.105719,0.558802,0.200112,1.0,0.0,0.0,0.007551,0.0,0.0,0
4,1,Activiti,org.activiti.runtime.api.model.impl.APITaskCon...,0.007551,0.003776,0.0,0.0,0.0,0.0,0.0,...,0.105719,0.558802,0.200112,1.0,0.0,0.0,0.007551,0.0,0.0,0


In [3]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 224826 entries, 0 to 224825
Data columns (total 30 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   idProject                 224826 non-null  int64  
 1   nameProject               224826 non-null  object 
 2   testCase                  224826 non-null  object 
 3   tloc                      224826 non-null  float64
 4   tmcCabe                   224826 non-null  float64
 5   assertionDensity          224826 non-null  float64
 6   assertionRoulette         224826 non-null  float64
 7   mysteryGuest              224826 non-null  float64
 8   eagerTest                 224826 non-null  float64
 9   sensitiveEquality         224826 non-null  float64
 10  resourceOptimism          224826 non-null  float64
 11  conditionalTestLogic      224826 non-null  float64
 12  fireAndForget             224826 non-null  float64
 13  testRunWar                224826 non-null  f

| Features                 | Descrizione                                                                                                        |
|--------------------------|--------------------------------------------------------------------------------------------------------------------|
| Id                       | Project id                                                                                                         |
| NameProject              | Project name                                                                                                       |
| TestCase                 | Test case in exam                                                                                                  |
| tloc                     | Number of lines of code in a test suite                                                                          |
| tmcCabe                  | Sum of cyclomatic complexities of all methods of a class                                                           |
| assertionDensity         | Percentage of assercions in a test suite                                                                           |
| assertionRoulette        | Metric indicating whether the test has more than one undocumented assertion                                        |
| mysteryGuest             | Metric indicating whether the test uses an external resource (e.g., database,file ...)                             |
| eagerTest                | Metric that indicates whether a test invokes several methods of the production object.                             |
| sensitiveEquality        | Indicates whether the toString method and utilizzado in the test                                                   |
| resourceOptimism         | Method that makes optimistic assumptions about the existence of a resource (e.g., file) used within it             |
| conditionalTestLogic     | The test has conditional logic in it that does different things depending on the current environment.              |
| fireAndForget            | Test ending prematurely as it does not wait for responses from external calls                                      |
| loc                      | lines of code including comments                                                                                   |
| lcom2                    | Modified lcom1                                                                                                     |
| lcom5                    | Modified lcom1                                                                                                     |
| cbo                      | Number of dependencies of a class with other classes                                                                |
| wmc                      | Sum of the cyclomatic complexities of all methods of a class                                               |
| rfc                      | Number of methods (including inherited methods) that can be called from other classes                           |
| mpc                      |  numbers of messages passing among objects of the class                                                                                                                  |
| halsteadVocabulary       | Gaussian                                                                                                           |
| halsteadLength           | Total number of distinct operators and operands a function                                                     |
| halsteadVolume           | Memory (in bits) required to store the program                                                           |
| classDataShouldBePrivate | Class that exposes its attributes, violating the principle of information hiding.                                 |
| complexClass             | Cyclomatic complexity of a class, or the number of linearly independent paths within the class |
| functionalDecomposition  | Metric that indicates whether inheritance and polymorphism are used incorrectly in a class.                 |
| godClass                 | Large class size implementing several responsibilities                                                  |
| spaghettiCode            | Class does not possess a consistent structure e.g. an excessively long method that has no parameters   |
| isFlaky                  | Boolean indicating whether the test is flaky or not                                                                   |

In [4]:
dataset.describe(include="all")

Unnamed: 0,idProject,nameProject,testCase,tloc,tmcCabe,assertionDensity,assertionRoulette,mysteryGuest,eagerTest,sensitiveEquality,...,mpc,halsteadVocabulary,halsteadLength,halsteadVolume,classDataShouldBePrivate,complexClass,functionalDecomposition,godClass,spaghettiCode,isFlaky
count,224826.0,224826,224826,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0,...,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0,224826.0
unique,,207,201277,,,,,,,,...,,,,,,,,,,
top,,guava,org.apache.flink.cep.nfa.NFAITCase.filter,,,,,,,,...,,,,,,,,,,
freq,,14587,151,,,,,,,,...,,,,,,,,,,
mean,44.508887,,,0.011851,0.00267,0.000147,0.001519,0.000393,0.001034,6.8e-05,...,0.025684,0.617342,0.14997,0.981905,0.000179,0.002192,0.000511,0.002704,0.007061,0.007548
std,24.703146,,,0.032441,0.006041,0.000675,0.007648,0.004093,0.00258,0.000934,...,0.028276,0.183309,0.038452,0.082103,0.001511,0.008483,0.002817,0.039785,0.022127,0.086551
min,1.0,,,1.7e-05,1.2e-05,0.0,0.0,0.0,0.0,0.0,...,0.0,0.030488,0.016053,0.084707,0.0,0.0,0.0,-0.071535,0.0,0.0
25%,23.0,,,0.001807,0.000463,0.0,0.0,0.0,0.0,0.0,...,0.008715,0.483244,0.126667,1.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,44.0,,,0.004597,0.001113,0.0,0.0,0.0,0.000134,0.0,...,0.018736,0.588916,0.144433,1.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,59.0,,,0.011041,0.002636,8e-05,0.001024,0.0,0.001118,0.0,...,0.032696,0.725551,0.167044,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [5]:
dataset.duplicated()

0         False
1         False
2         False
3         False
4         False
          ...  
224821    False
224822    False
224823    False
224824    False
224825    False
Length: 224826, dtype: bool

### Data Cleaning

The dataset presents some duplicates in its rows that should be dropped. Also, the index and samples that have ".setup" or ".teardown" in the "testCase" string should be removed too.

In [6]:
dataset_copy = dataset.copy()
dataset_copy = dataset_copy[dataset_copy['testCase'].str.lower().str.contains('.setup|.teardown') == False]
dataset_copy = dataset_copy.drop_duplicates()
dataset_copy = dataset_copy.reset_index()
dataset_copy = dataset_copy.drop(['idProject','index'],axis=1)

### DataSet Partitioning
Before manipulating the dataset, it will be divided into train-set(80%) and test-set(20%). The train-set will be used to identify a predictive model, while the test-set will be used to test the machine-learning algorithm. For the dataset partitioning, a stratified sampling is adopted to have the same test flaky proportions (False, True) between the test dataset and the training dataset

In [7]:
from sklearn.model_selection import StratifiedShuffleSplit
split=StratifiedShuffleSplit(n_splits=1,test_size=0.2,random_state=42)
for train_index_stratified,test_index_stratified in split.split(dataset_copy,dataset_copy['isFlaky']):
    train_set=dataset_copy.loc[train_index_stratified]
    test_set=dataset_copy.loc[test_index_stratified]

print("Dimensione Train-set ",len(train_set))
print("Dimensione Test-set ",len(test_set))

Dimensione Train-set  163890
Dimensione Test-set  40973


## Feature Selection and Dataset balancing

To optimize the set of feature for the dataset, a feature selection step must be applied. The feature scaling phase will be avoided because the dataset is already normalized. But the dataset is unbalanced, so we need to perform additional operations to balance it.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from matplotlib import pyplot as plt
import numpy as np


def get_object_colum(dataset):
    drop_col = []
    for col in dataset.columns:
        if dataset[col].dtypes == 'object':
            drop_col.append(col)
    return drop_col
train_set_copy = train_set.copy()
train_set_copy = train_set_copy.drop(get_object_colum(train_set_copy), axis=1)
X_train_set = train_set_copy.drop(['isFlaky'], axis=1)
y_train_set = train_set_copy['isFlaky']
columns=X_train_set.columns
X_train_set = X_train_set.to_numpy()
y_train_set = y_train_set.to_numpy()

df=pandas.DataFrame(X_train_set,columns=columns)
rf_fs=RandomForestClassifier(n_estimators=len(X_train_set),random_state=0,n_jobs=-1)
rf_fs.fit(X=X_train_set,y=y_train_set)
importance=rf_fs.feature_importances_
indices=np.argsort(importance)[::-1]
colum_remove=[]
for f in range (X_train_set.shape[1]):
    if importance[indices[f]] < 0.02:
        colum_remove.append(columns[indices[f]])
df=df.drop(colum_remove,axis=1)
X_train_set=df.to_numpy()


ds = pandas.DataFrame(X_train_set)
plt.title('Dataset non bilanciato')
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(ds.iloc[:, 0],ds.iloc[:, 1], marker='o', c=y_train_set,
        s=25, edgecolor='k', cmap=plt.cm.coolwarm)
plt.show()

In [None]:
from imblearn.over_sampling import SMOTE

sm = SMOTE(sampling_strategy='auto', k_neighbors=3, random_state=42)
X_train_set, y_train_set = sm.fit_resample(X_train_set, y_train_set)

ds = pandas.DataFrame(X_train_set)
plt.title('Dataset non bilanciato')
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(ds.iloc[:, 0], ds.iloc[:, 1], marker='o', c=y_train_set,s=25, edgecolor='k', cmap=plt.cm.coolwarm)
plt.show()