This notebook purpose is to convert all the data (.csv files) from the android projects into new .csv files. The new csvs have the ACR, ACDIF and ACDEN metrics, their mean and median. This notebook also compiles all projects into two csvs: one that has all the normal metrics, and another with the average metrics.
PS: The geometric mean done here is unnecessary, and not the one we used on our paper.


In [1]:
#These are all the necessary imports for this notebook. Make sure you have all of the dependencies installed.
import pandas as pd

from scipy.stats.mstats import gmean

The first steps are getting our data from the original .csv files. These were made by extracting data from the projects using BOHR (https://github.com/wendellmfm/bohr) and JMetriX (https://github.com/lincolnrocha/JMetriX)

In [2]:
data_atoms_infinity_reddit = pd.read_csv(r'.\Data\reports\infinity-for-reddit-all.csv', sep=';')
data_atoms_discreet_launcher = pd.read_csv(r'.\Data\reports\discreet-launcher-all.csv', sep=';')
data_atoms_open_tracks = pd.read_csv(r'.\Data\reports\opentracks-all.csv', sep=';')
data_atoms_xupdate = pd.read_csv(r'.\Data\reports\xupdate-all.csv', sep=';')
data_atoms_presence_publisher = pd.read_csv(r'.\Data\reports\presence-publisher-all.csv', sep=';')
data_atoms_asteroid_os_sync = pd.read_csv(r'.\Data\reports\\asteroid-os-sync-all.csv', sep=';')
data_atoms_unexpected_keyboard = pd.read_csv(r'.\Data\reports\unexpected-keyboard-all.csv', sep=';')
data_atoms_shitter = pd.read_csv(r'.\Data\reports\shitter-all.csv', sep=';')
data_atoms_colorpickerview = pd.read_csv(r'.\Data\reports\colorpickerview-all.csv', sep=';')
data_atoms_gestureviews = pd.read_csv(r'.\Data\reports\gestureviews-all.csv', sep=';')

These functions prepare the new metrics and how the new .csv files will be created

In [3]:
def csv_preparation (data, name):
    data['Number of Atoms per LoC (10^-3)'] = data['N.Atoms']*1000/data['LoC']
    data['Atom Diffusion'] = data['Classes w/ Atoms']/data['Classes Total']
    data['Atom Density'] = data['N.Atoms']/data['Classes w/ Atoms']
    data['Project'] = name
    data = data.iloc[::-1]
    data = data.iloc[:,[20,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]]
    return data

In [4]:
def new_csv_creation (data, name):
    new_data = []

    mean_class_total_before = data[data["Period"] == "Before CI/CD"]["Classes Total"].mean()
    mean_class_total_after = data[data["Period"] == "After CI/CD"]["Classes Total"].mean()  
    mean_class_atoms_before = data[data["Period"] == "Before CI/CD"]["Classes w/ Atoms"].mean()
    mean_class_atoms_after = data[data["Period"] == "After CI/CD"]["Classes w/ Atoms"].mean()  
    mean_atoms_before = data[data["Period"] == "Before CI/CD"]["N.Atoms"].mean() 
    mean_atoms_after = data[data["Period"] == "After CI/CD"]["N.Atoms"].mean()
    mean_loc_before = data[data["Period"] == "Before CI/CD"]["LoC"].mean() 
    mean_loc_after = data[data["Period"] == "After CI/CD"]["LoC"].mean()
    mean_num_per_loc_before = data[data["Period"] == "Before CI/CD"]["Number of Atoms per LoC (10^-3)"].mean() 
    mean_num_per_loc_after = data[data["Period"] == "After CI/CD"]["Number of Atoms per LoC (10^-3)"].mean()
    mean_diffusion_before = data[data["Period"] == "Before CI/CD"]["Atom Diffusion"].mean() 
    mean_diffusion_after = data[data["Period"] == "After CI/CD"]["Atom Diffusion"].mean()
    mean_density_before = data[data["Period"] == "Before CI/CD"]["Atom Density"].mean() 
    mean_density_after = data[data["Period"] == "After CI/CD"]["Atom Density"].mean()

    median_class_total_before = data[data["Period"] == "Before CI/CD"]["Classes Total"].median()
    median_class_total_after = data[data["Period"] == "After CI/CD"]["Classes Total"].median()  
    median_class_atoms_before = data[data["Period"] == "Before CI/CD"]["Classes w/ Atoms"].median()
    median_class_atoms_after = data[data["Period"] == "After CI/CD"]["Classes w/ Atoms"].median() 
    median_atoms_before = data[data["Period"] == "Before CI/CD"]["N.Atoms"].median() 
    median_atoms_after = data[data["Period"] == "After CI/CD"]["N.Atoms"].median()
    median_loc_before = data[data["Period"] == "Before CI/CD"]["LoC"].median() 
    median_loc_after = data[data["Period"] == "After CI/CD"]["LoC"].median()
    median_num_per_loc_before = data[data["Period"] == "Before CI/CD"]["Number of Atoms per LoC (10^-3)"].median() 
    median_num_per_loc_after = data[data["Period"] == "After CI/CD"]["Number of Atoms per LoC (10^-3)"].median()
    median_diffusion_before = data[data["Period"] == "Before CI/CD"]["Atom Diffusion"].median() 
    median_diffusion_after = data[data["Period"] == "After CI/CD"]["Atom Diffusion"].median()
    median_density_before = data[data["Period"] == "Before CI/CD"]["Atom Density"].median() 
    median_density_after = data[data["Period"] == "After CI/CD"]["Atom Density"].median()

    gmean_class_total_before = gmean(data[data["Period"] == "Before CI/CD"]["Classes Total"])
    gmean_class_total_after = gmean(data[data["Period"] == "After CI/CD"]["Classes Total"])  
    gmean_class_atoms_before = gmean(data[data["Period"] == "Before CI/CD"]["Classes w/ Atoms"])
    gmean_class_atoms_after = gmean(data[data["Period"] == "After CI/CD"]["Classes w/ Atoms"])  
    gmean_atoms_before = gmean(data[data["Period"] == "Before CI/CD"]["N.Atoms"]) 
    gmean_atoms_after = gmean(data[data["Period"] == "After CI/CD"]["N.Atoms"])
    gmean_loc_before = gmean(data[data["Period"] == "Before CI/CD"]["LoC"]) 
    gmean_loc_after = gmean(data[data["Period"] == "After CI/CD"]["LoC"])
    gmean_num_per_loc_before = gmean(data[data["Period"] == "Before CI/CD"]["Number of Atoms per LoC (10^-3)"])
    gmean_num_per_loc_after = gmean(data[data["Period"] == "After CI/CD"]["Number of Atoms per LoC (10^-3)"])
    gmean_diffusion_before = gmean(data[data["Period"] == "Before CI/CD"]["Atom Diffusion"]) 
    gmean_diffusion_after = gmean(data[data["Period"] == "After CI/CD"]["Atom Diffusion"])
    gmean_density_before = gmean(data[data["Period"] == "Before CI/CD"]["Atom Density"]) 
    gmean_density_after = gmean(data[data["Period"] == "After CI/CD"]["Atom Density"])

    new_data.append([name, "Before CI/CD", "Mean", mean_class_total_before, mean_class_atoms_before, mean_loc_before, mean_atoms_before, mean_num_per_loc_before, mean_diffusion_before, mean_density_before])
    new_data.append([name, "After CI/CD", "Mean", mean_class_total_after, mean_class_atoms_after, mean_loc_after, mean_atoms_after, mean_num_per_loc_after, mean_diffusion_after, mean_density_after])
    new_data.append([name, "Before CI/CD", "Median", median_class_total_before, median_class_atoms_before, median_loc_before, median_atoms_before, median_num_per_loc_before, median_diffusion_before, median_density_before])
    new_data.append([name, "After CI/CD", "Median", median_class_total_after, median_class_atoms_after, median_loc_after, median_atoms_after, median_num_per_loc_after, median_diffusion_after, median_density_after])
    new_data.append([name, "Before CI/CD", "Geo-Mean", gmean_class_total_before, gmean_class_atoms_before, gmean_loc_before, gmean_atoms_before, gmean_num_per_loc_before, gmean_diffusion_before, gmean_density_before])
    new_data.append([name, "After CI/CD", "Geo-Mean", gmean_class_total_after, gmean_class_atoms_after, gmean_loc_after, gmean_atoms_after, gmean_num_per_loc_after, gmean_diffusion_after, gmean_density_after])

    df = pd.DataFrame(new_data)
    new_columns = data.columns.delete([7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
    df.columns = new_columns
    return df

From now up until almost the end, the  notebook will go through every Android project and calculate the metrics, organize the data, and create the necessary .csv files for the other notebooks

Infinity-For-Reddit

In [5]:
data_infinity_reddit_prepared = csv_preparation(data_atoms_infinity_reddit, 'Infinity-For-Reddit')

In [6]:
new_data_infinity_reddit = new_csv_creation(data_infinity_reddit_prepared, 'Infinity-For-Reddit')

In [7]:
data_infinity_reddit_prepared.to_csv(".\Data\data_atoms_infinity_reddit.csv", index=False)

In [8]:
new_data_infinity_reddit.to_csv(".\Data\mean_median_infinity_reddit.csv", index=False)

Discreet Launcher

In [9]:
data_discreet_launcher_prepared = csv_preparation(data_atoms_discreet_launcher, 'Discreet Launcher')

In [10]:
new_data_discreet_launcher = new_csv_creation(data_discreet_launcher_prepared, 'Discreet Launcher')

In [11]:
data_discreet_launcher_prepared.to_csv(".\Data\data_atoms_discreet_launcher.csv", index=False)

In [12]:
new_data_discreet_launcher.to_csv(".\Data\mean_median_discreet_launcher.csv", index=False)

Open Tracks

In [13]:
data_open_tracks_prepared = csv_preparation(data_atoms_open_tracks, 'Open Tracks')

In [14]:
new_data_open_tracks = new_csv_creation(data_open_tracks_prepared, 'Open Tracks')

In [15]:
data_open_tracks_prepared.to_csv(".\Data\data_atoms_open_tracks.csv", index=False)

In [16]:
new_data_open_tracks.to_csv(".\Data\mean_median_open_tracks.csv", index=False)

XUpdate

In [17]:
data_xupdate_prepared = csv_preparation(data_atoms_xupdate, 'XUpdate')

In [18]:
new_data_xupdate = new_csv_creation(data_xupdate_prepared, 'XUpdate')

In [19]:
data_xupdate_prepared.to_csv(".\Data\data_atoms_xupdate.csv", index=False)

In [20]:
new_data_xupdate.to_csv(".\Data\mean_median_xupdate.csv", index=False)

Presence-Publisher

In [21]:
data_presence_publisher_prepared = csv_preparation(data_atoms_presence_publisher, 'Presence-Publisher')

In [22]:
new_data_presence_publisher = new_csv_creation(data_presence_publisher_prepared, 'Presence-Publisher')

In [23]:
data_presence_publisher_prepared.to_csv(".\Data\data_atoms_presence_publisher.csv", index=False)

In [24]:
new_data_presence_publisher.to_csv(".\Data\mean_median_presence_publisher.csv", index=False)

AsteroidOSSync

In [25]:
data_asteroid_os_sync_prepared = csv_preparation(data_atoms_asteroid_os_sync, 'AsteroidOSSync')

In [26]:
new_data_asteroid_os_sync = new_csv_creation(data_asteroid_os_sync_prepared, 'AsteroidOSSync')

In [27]:
data_asteroid_os_sync_prepared.to_csv(".\Data\data_atoms_asteroid_os_sync.csv", index=False)

In [28]:
new_data_asteroid_os_sync.to_csv(".\Data\mean_median_asteroid_os_sync.csv", index=False)

Unexpected-Keyboard

In [29]:
data_unexpected_keyboard_prepared = csv_preparation(data_atoms_unexpected_keyboard, 'Unexpected-Keyboard')

In [30]:
new_data_unexpected_keyboard = new_csv_creation(data_unexpected_keyboard_prepared, 'Unexpected-Keyboard')

In [31]:
data_unexpected_keyboard_prepared.to_csv(".\Data\data_atoms_unexpected_keyboard.csv", index=False)

In [32]:
new_data_unexpected_keyboard.to_csv(".\Data\mean_median_unexpected_keyboard.csv", index=False)

Shitter

In [33]:
data_shitter_prepared = csv_preparation(data_atoms_shitter, 'Shitter')

In [34]:
new_data_shitter = new_csv_creation(data_shitter_prepared, 'Shitter')

In [35]:
data_shitter_prepared.to_csv(".\Data\data_atoms_shitter.csv", index=False)

In [36]:
new_data_shitter.to_csv(".\Data\mean_median_shitter.csv", index=False)

ColorPickerView

In [37]:
data_colorpickerview_prepared = csv_preparation(data_atoms_colorpickerview, 'ColorPickerView')

In [38]:
new_data_colorpickerview = new_csv_creation(data_colorpickerview_prepared, 'ColorPickerView')

In [39]:
data_colorpickerview_prepared.to_csv(".\Data\data_atoms_colorpickerview.csv", index=False)

In [40]:
new_data_colorpickerview.to_csv(".\Data\mean_median_colorpickerview.csv", index=False)

GestureViews

In [41]:
data_gestureviews_prepared = csv_preparation(data_atoms_gestureviews, 'GestureViews')

In [42]:
new_data_gestureviews = new_csv_creation(data_gestureviews_prepared, 'GestureViews')

In [43]:
data_gestureviews_prepared.to_csv(".\Data\data_atoms_gestureviews.csv", index=False)

In [44]:
new_data_gestureviews.to_csv(".\Data\mean_median_gestureviews.csv", index=False)

Finally, here we create .csv files with all the data, both in raw and average formats

In [45]:
df = data_infinity_reddit_prepared
df = df.append(data_gestureviews_prepared)
df = df.append(data_colorpickerview_prepared)
df = df.append(data_shitter_prepared)
df = df.append(data_open_tracks_prepared)
df = df.append(data_unexpected_keyboard_prepared)
df = df.append(data_presence_publisher_prepared)
df = df.append(data_xupdate_prepared)
df = df.append(data_discreet_launcher_prepared)
df = df.append(data_asteroid_os_sync_prepared)
df.to_csv(".\Data\projects_android.csv", index=False)

In [46]:
df = new_data_infinity_reddit
df = df.append(new_data_gestureviews)
df = df.append(new_data_colorpickerview)
df = df.append(new_data_shitter)
df = df.append(new_data_open_tracks)
df = df.append(new_data_unexpected_keyboard)
df = df.append(new_data_presence_publisher)
df = df.append(new_data_xupdate)
df = df.append(new_data_discreet_launcher)
df = df.append(new_data_asteroid_os_sync)
df.to_csv(".\Data\projects_android_mean_median.csv", index=False)