The following notebook contains various scripts to obtain the static analysis characteristics of the WatchDog data.
Make sure that you have downloaded the two bson files from the server: `users.bson` and `events.bson` and that they exist in the directory this notebook is in.

The first step is to load in all required packages. It should be rarely needed to rerun this cell.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
import pymongo
import bson
import json
from datetime import datetime
from collections import Counter,defaultdict,OrderedDict

Read the bson files as well as the eclipse messages dictionary obtained from [the internal Eclipse compiler messages.properties](https://github.com/eclipse/eclipse.jdt.core/blob/efc9b650d8590a5670b5897ab6f8c0fb0db2799d/org.eclipse.jdt.core/compiler/org/eclipse/jdt/internal/compiler/problem/messages.properties)

In [None]:
with open('users.bson', 'rb') as user_file:
    users = bson.decode_all(user_file.read())
with open('events.bson', 'rb') as events_file:
    events = bson.decode_all(events_file.read())
with open('eclipse-messages.json', 'r') as messages_file:
    eclipse_messages = json.load(messages_file)

The following two functions are available for general visualization. The first one is used to generate a histogram with the top 25 items, as well as the full histogram thereafter. The second function can output a dictionary in a human-friendly table format

In [None]:
def plot_counts(ylabel, xlabel, count_list, top_n_items = 0):
    labels, values = zip(*sorted(Counter(count_list).items(), key=lambda tup: tup[1], reverse = True))

    indexes = np.arange(len(labels))
    
    if (top_n_items != 0):
        first_n_indexes = indexes[:top_n_items]
        first_n_values = values[:top_n_items]
        first_n_labels = labels[:top_n_items]

        fig = plt.figure(figsize=(15,20))
        plt.barh(first_n_indexes, first_n_values)
        plt.yticks(first_n_indexes, first_n_labels)
        plt.xlabel(ylabel)
        plt.ylabel(xlabel)
        plt.tight_layout()
        plt.show()
        fig.savefig(('img/top-' + ylabel + xlabel + '.png').replace(' ', '-'))

    fig = plt.figure(figsize=(15,30))
    plt.barh(indexes, values)
    plt.yticks(indexes, labels)
    plt.xlabel(ylabel)
    plt.ylabel(xlabel)
    plt.tight_layout()
    plt.show()
    fig.savefig(('img/' + ylabel + xlabel + '.png').replace(' ', '-'))

def print_dictionary_as_table(header1, header2, dictionary, anonymize=False):
    print(header1 + ' & ' + header2 + ' \\\\ \hline')
    for row in [('{:<' + str(len(header1)) + '}').format(index if anonymize else key) + ' & ' + str(dictionary[key] if type(dictionary[key]) == int else len(dictionary[key])) + ' \\\\' for index,key in enumerate(dictionary)]:
        print(row)

First obtain the static analysis events we are interested in. The types are `sa-wc` and `sa-wr`. A previous version of WatchDog was deployed for IntelliJ, but this version did not include the full data characteristics that we needed. Therefore, we have to filter for `'warning' in event`, as this version of WatchDog does not have this field in the event. Later versions of WatchDog do.

The events can be filtered by `userId`. This is used as in previous analyses 1 user generated a significant portion of the warnings, which would result in a misreprentation of the full developer population.

In [None]:
sa_events = list(filter(lambda event: (event['userId'] != '5a08e78c0e305bfcd5865a105ca44fc9f042b1d7') and (event['et'] == 'sa-wc' or event['et'] == 'sa-wr') and ('warning' in event), events))
print('Number of static analysis events: ' + str(len(sa_events)))
print('Number of warning creation events: ' + str(len(list(filter(lambda event: event['et'] == 'sa-wc', sa_events)))))
print('Number of warning removal events: ' + str(len(list(filter(lambda event: event['et'] == 'sa-wr', sa_events)))))

The very first analysis we do is plotting a histogram of the warning categories. The y-axis shows the warning that is being generated. Since Eclipse normally uses integers to represent a category, use the previously loaded `eclipse-messages.json` data to map back to the full message pattern. This makes reading the graph significantly easier.

In [None]:
sa_events_with_classifications = list(filter(lambda event: event['warning']['type'] != 'unknown', sa_events))
plot_counts('Number of events in category', 'Warning category', list(map(lambda warning: eclipse_messages[str(int(warning) - 1)] if warning.isdigit() else warning , map(lambda event: event['warning']['type'], sa_events_with_classifications))), 25)

The second analysis is regarding the location of the warnings in a file. The location is relative, meaning that we take the line as percentage of the full file length. Since some files do not have the file length information, disregard these values. There is also a heatmap for the warning snapshots, which thus includes the same information but then for unresolved warnings.

In [None]:
def showHeatMap(warnings):
    hist, edges = np.histogram(warnings, np.arange(0, 1.01, 0.01))
    hist=hist[np.newaxis,:]
    plt.imshow(hist, aspect = "auto", cmap="viridis", extent=[0,1,0,100])
    plt.gca().set_yticks([])
    plt.xlabel('Location of warning relative to total file length')
    plt.ylabel('Frequency of occurrence')
    plt.show()

def get_relative_line(event):
    if (event['warning']['doctotal'] == -1):
        return round(event['warning']['line'] / event['doc']['sloc'], 2)
    return round(event['warning']['line'] / event['warning']['doctotal'], 2)

print('Warnings added/removed relative to file')
sa_events_doctotal = filter(lambda event: 'doctotal' in event['warning'] and abs(event['doc']['sloc']) != 1 and event['doc']['sloc'] != 0 and event['warning']['line'] != -1, sa_events)
showHeatMap(list(filter(lambda count: count != 0.5, map(get_relative_line, sa_events_doctotal))))

print('Warning snapshots of all warnings')
sa_snapshots = list(filter(lambda event: event['et'] == 'sa-snap', events))
snapshots_relative_loc = []
for event in sa_snapshots:
    for warning in event['warnings']:
        if abs(event['doc']['sloc'] != -1 and event['doc']['sloc'] != 0):
            percentage = round(warning['line'] / event['doc']['sloc'], 2)
            if (percentage < 1):
                snapshots_relative_loc.append(percentage)
showHeatMap(snapshots_relative_loc)

To get a quick overview of our developer population, plot the number of events per developer. Use this data to spot potential data skews and act accordingly.

In [None]:
plot_counts('Number of events per user', 'User ID', list(map(lambda event: event['userId'] , sa_events)))

Next we are tracking warning time. We only have this data for warning removals, as these events have a previous creation timestamp to compare to. We split the data into two: for the removals that have a creation time and for those that do not. Print the population percentage of time-calculated warnings, to get a sense of how many warnings are actually resolved without previous information.

In [None]:
created_warning_events = list(filter(lambda event: event['et'] == 'sa-wr', sa_events))
life_time_events = list(map(lambda event: event['warning']['diff'], created_warning_events))
has_time_diff = list(filter(lambda time: time != -1, life_time_events))
has_no_time_diff = list(filter(lambda time: time == -1, life_time_events))

number_of_time_diff = len(has_time_diff)
number_of_no_time_diff = len(has_no_time_diff)
print('Number of warnings which have a time diff: ' + str(number_of_time_diff))
print('Number of warnings which do not have a time diff: ' + str(number_of_no_time_diff))
print('Relative percentage of time diff of no time diff: ' + str(number_of_time_diff / (number_of_time_diff + number_of_no_time_diff) * 100) + ' %')

Next we print out the number of events per user, as well as the time distribution per user. Filter out all users that do not have enough data yet (e.g. less than 25 events), to obtain a fair representation of their activity.

In [None]:
life_time_per_user = defaultdict(list)
for event in created_warning_events:
    if event['warning']['diff'] != -1:
        life_time_per_user[event['userId']].append(event['warning']['diff'])

values = list(filter(lambda values: len(values) > 25, [life_time_per_user[user] for user in life_time_per_user.keys()]))

print_dictionary_as_table('{:<40}'.format('User ID'), 'Number of events', life_time_per_user)
print_dictionary_as_table('{:<40}'.format('User ID'), 'Number of events', life_time_per_user, True)

plt.figure(figsize=(10,10))
ax = plt.axes()
bp = plt.boxplot(values, sym='+', vert=False, showfliers=False,notch=False)
plt.ylabel('Distribution of resolution time per user')
plt.xlabel('Time in seconds to resolve a warning')
ax.set_yticklabels(list(filter(lambda user: len(life_time_per_user[user]) > 25, life_time_per_user.keys())))
plt.show()

Similarly to time distribution per user, we also plot the time distribution for the programming experience.

In [None]:
unique_users = defaultdict(str)
life_time_per_programming_experience = defaultdict(list)
for event in created_warning_events:
    if event['warning']['diff'] != -1:
        user = list(filter(lambda user: user['id'] == event['userId'], users))
        if (len(user) > 0):
            unique_users[event['userId']] = user[0]['programmingExperience']
            life_time_per_programming_experience[user[0]['programmingExperience']].append(event['warning']['diff'])
        else:
            print('Could not find user with id: ' + str(event['userId']))

counts_per_exp = [life_time_per_programming_experience[exp] for exp in life_time_per_programming_experience.keys()]
values = list(filter(lambda values: len(values) > 25, counts_per_exp))

life_time_per_programming_experience['N/A'] = life_time_per_programming_experience['N/A'] + life_time_per_programming_experience['NA']
del life_time_per_programming_experience['NA']

print_dictionary_as_table('Programming experience', 'Number of events', OrderedDict(sorted(life_time_per_programming_experience.items())))

print()

programming_exp_user_count = defaultdict(int)
for user, exp in unique_users.items():
    programming_exp_user_count[exp] = programming_exp_user_count[exp] + 1

programming_exp_user_count['N/A'] = programming_exp_user_count['N/A'] + programming_exp_user_count['NA']
del programming_exp_user_count['NA']
print_dictionary_as_table('Programming experience', 'Number of users', OrderedDict(sorted(programming_exp_user_count.items())))

plt.figure(figsize=(10,10))
ax = plt.axes()
bp = plt.boxplot(values, sym='+', vert=False, showfliers=False,notch=False)
plt.ylabel('Distribution of resolution time per user')
plt.xlabel('Time in seconds to resolve a warning')
ax.set_yticklabels(list(filter(lambda exp: len(life_time_per_programming_experience[exp]) > 25, life_time_per_programming_experience.keys())))
plt.show()