# **Results Analysis**

Author: Artur Kasenõmmn

# **Introduction**

This notebook contains the data analysis for the bachelor's thesis "Energy Matters: Evaluating JavaScript Asynchronous Patterns for Green Development." The thesis investigates whether different JavaScript asynchronous programming patterns (callbacks, promises, and async/await) have significant differences in energy consumption for the two test cases specified in the thesis.

In this notebook we analyze the results from our experiments. We start by loading the CSV data files and organizing them into data structures. Then we test for data normality, and based on the results apply additional tests to determine if there are any significant differences.

## **Imports**

May need to install some additional libraries that aren't included in the default environment.



In [1]:
!pip install scikit-posthocs

Collecting scikit-posthocs
  Downloading scikit_posthocs-0.11.4-py3-none-any.whl.metadata (5.8 kB)
Downloading scikit_posthocs-0.11.4-py3-none-any.whl (33 kB)
Installing collected packages: scikit-posthocs
Successfully installed scikit-posthocs-0.11.4


In [2]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import kruskal
from scipy.stats import shapiro
import matplotlib.pyplot as plt
import seaborn as sns
import scikit_posthocs as sp
import os

## **Load data**

First we need to manually add all our CSV files to the notebook working directory. Then we read them into data structures organized by test case and pattern type, which makes it easier to analyze.

In [3]:
tc1_files = [
  'callback_get_tc1_results.csv', 'promise_get_tc1_results.csv', 'asyncawait_get_tc1_results.csv',
  'callback_post_tc1_results.csv', 'promise_post_tc1_results.csv', 'asyncawait_post_tc1_results.csv'
]
tc2_files = [
  'callback_tc2_results.csv', 'promise_tc2_results.csv', 'asyncawait_tc2_results.csv'
]

tc1_data = {}
tc2_data = {}

for file in tc1_files:
  pattern_parts = file.split('_')
  pattern = pattern_parts[0] + '_' + pattern_parts[1]
  tc1_data[pattern] = pd.read_csv(file)

for file in tc2_files:
  pattern = file.split('_')[0]
  tc2_data[pattern] = pd.read_csv(file)

## **Data Cleaning**

We did not need to clean the data. The experiment was set up so that all result files were created in a consistent format. This means missing or incorrect values were not expected. We also manually checked the files and confirmed that there are no missing values.

## **Data Normality Test**

We use the Shapiro-Wilk test with an alpha level of α = 0.05 to check whether the data follows a normal distribution. This helps us decide between parametric or non-parametric statistical tests for comparing the patterns later on.

In [4]:
def perform_shapiro_wilk(data, metric):
  results = {}
  for pattern, df in data.items():
    stat, p = shapiro(df[f'TIME_SECONDS' if metric == 'time' else 'ENERGY_JOULES'])
    results[pattern] = {
      'Statistic': f"{stat:.3f}",
      'p-value': f"{p:.3f}",
      'Normality': "Normal" if p > 0.05 else "Not Normal"
    }
  return results

def display_normality_results(results, title):
  print(title)
  df = pd.DataFrame.from_dict(results, orient='index')
  df.index.name = 'Pattern'
  print(df)
  print("\n")

### **TC1 Normality Test**

In [5]:
tc1_energy_normality = perform_shapiro_wilk(tc1_data, 'energy')
display_normality_results(tc1_energy_normality, "TC1, Energy Consumption Normality Test")

tc1_time_normality = perform_shapiro_wilk(tc1_data, 'time')
display_normality_results(tc1_time_normality, "TC1, Execution Time Normality Test")

TC1, Energy Consumption Normality Test
                Statistic p-value   Normality
Pattern                                      
callback_get        0.897   0.007  Not Normal
promise_get         0.590   0.000  Not Normal
asyncawait_get      0.899   0.008  Not Normal
callback_post       0.936   0.071      Normal
promise_post        0.855   0.001  Not Normal
asyncawait_post     0.913   0.018  Not Normal


TC1, Execution Time Normality Test
                Statistic p-value   Normality
Pattern                                      
callback_get        0.811   0.000  Not Normal
promise_get         0.865   0.001  Not Normal
asyncawait_get      0.699   0.000  Not Normal
callback_post       0.743   0.000  Not Normal
promise_post        0.714   0.000  Not Normal
asyncawait_post     0.592   0.000  Not Normal




### **TC2 Normality Test**

In [6]:
tc2_energy_normality = perform_shapiro_wilk(tc2_data, 'energy')
display_normality_results(tc2_energy_normality, "TC2,  Energy Consumption Normality Test")

tc2_time_normality = perform_shapiro_wilk(tc2_data, 'time')
display_normality_results(tc2_time_normality, "TC2, Execution Time Normality Test")

TC2,  Energy Consumption Normality Test
           Statistic p-value   Normality
Pattern                                 
callback       0.970   0.551      Normal
promise        0.776   0.000  Not Normal
asyncawait     0.958   0.271      Normal


TC2, Execution Time Normality Test
           Statistic p-value   Normality
Pattern                                 
callback       0.360   0.000  Not Normal
promise        0.250   0.000  Not Normal
asyncawait     0.974   0.663      Normal




### **Normality Test Results**

Based on the normality test results, we can see that most of our data does not follow a normal distribution. Since most of our data doesn't follow normal distributions, we should use non-parametric statistical tests. We will use the Kruskal-Wallis test, which is a non-parametric test that compares medians across multiple groups without requiring normally distributed data.

## **Statistical Comparison**

In this section, we apply the Kruskal-Wallis test to determine if there are statistically significant differences in energy consumption and execution time between the three patterns. We perform this analysis separately for TC1 and TC2. In TC1 we do the test for both GET and POST request results separately. For each test, we find the test statistic, p-value, and whether the differences are statistically significant using an alpha level of α = 0.05.

In [7]:
def perform_kruskal_wallis(data):
  data_list = [df.values.flatten() for df in data.values()]
  stat, p = kruskal(*data_list)
  results = {
    'Statistic': f"{stat:.3f}",
    'p-value': f"{p:.3f}",
    'Significant': "No" if p >= 0.05 else "Yes"
  }
  return results

def display_kruskal_wallis_results(results, title):
  print(title)
  df = pd.DataFrame.from_dict(results, orient='index')
  df.index.name = 'Metric'
  print(df)
  print("\n")

### **TC1 Statistical Comparison**

**GET Requests**

In [8]:
tc1_energy_get_patterns = {k: v[['ENERGY_JOULES']] for k, v in tc1_data.items() if 'get' in k}
tc1_energy_get_kruskal = perform_kruskal_wallis(tc1_energy_get_patterns)
display_kruskal_wallis_results({'Energy': tc1_energy_get_kruskal}, "TC1, Energy Consumption Kruskal-Wallis Test (GET)")

tc1_time_get_patterns = {k: v[['TIME_SECONDS']] for k, v in tc1_data.items() if 'get' in k}
tc1_time_get_kruskal = perform_kruskal_wallis(tc1_time_get_patterns)
display_kruskal_wallis_results({'Time': tc1_time_get_kruskal}, "TC1, Execution Time Kruskal-Wallis Test (GET)")

TC1, Energy Consumption Kruskal-Wallis Test (GET)
       Statistic p-value Significant
Metric                              
Energy     7.028   0.030         Yes


TC1, Execution Time Kruskal-Wallis Test (GET)
       Statistic p-value Significant
Metric                              
Time       0.062   0.969          No




**POST Request**

In [9]:
tc1_energy_post_patterns = {k: v[['ENERGY_JOULES']] for k, v in tc1_data.items() if 'post' in k}
tc1_energy_post_kruskal = perform_kruskal_wallis(tc1_energy_post_patterns)
display_kruskal_wallis_results({'Energy': tc1_energy_post_kruskal}, "TC1, Energy Consumption Kruskal-Wallis Test (POST)")

tc1_time_post_patterns = {k: v[['TIME_SECONDS']] for k, v in tc1_data.items() if 'post' in k}
tc1_time_post_kruskal = perform_kruskal_wallis(tc1_time_post_patterns)
display_kruskal_wallis_results({'Time': tc1_time_post_kruskal}, "TC1, Execution Time Kruskal-Wallis Test (POST)")

TC1, Energy Consumption Kruskal-Wallis Test (POST)
       Statistic p-value Significant
Metric                              
Energy     6.732   0.035         Yes


TC1, Execution Time Kruskal-Wallis Test (POST)
       Statistic p-value Significant
Metric                              
Time       0.850   0.654          No




### **TC2 Statistical Comparison**

In [10]:
tc2_energy_kruskal_data = {pattern: df[['ENERGY_JOULES']] for pattern, df in tc2_data.items()}
tc2_energy_kruskal = perform_kruskal_wallis(tc2_energy_kruskal_data)
display_kruskal_wallis_results({'Energy': tc2_energy_kruskal}, "TC2, Energy Consumption Kruskal-Wallis Test")

tc2_time_kruskal_data = {pattern: df[['TIME_SECONDS']] for pattern, df in tc2_data.items()}
tc2_time_kruskal = perform_kruskal_wallis(tc2_time_kruskal_data)
display_kruskal_wallis_results({'Time': tc2_time_kruskal}, "TC2, Execution Time Kruskal-Wallis Test")

TC2, Energy Consumption Kruskal-Wallis Test
       Statistic p-value Significant
Metric                              
Energy     0.706   0.703          No


TC2, Execution Time Kruskal-Wallis Test
       Statistic p-value Significant
Metric                              
Time      44.927   0.000         Yes




### **Statistical Comparison Test Results**

For TC1, we found significant differences in energy consumption between the three patterns for both GET requests (p = 0.030) and POST requests (p = 0.035). This suggests that the choice of asynchronous pattern has a significant impact on energy efficiency for TC1. However, we did not find significant differences in execution time for either GET requests (p = 0.969) or POST requests (p = 0.654), which tells us that all three patterns perform similarly in terms of execution time for TC1.

For TC2, we observed the opposite pattern. We found no significant differences in energy consumption between the patterns (p = 0.703), suggesting that pattern choice doesn't significantly affect energy consumption for TC2. However, we did find significant differences in execution time (p < 0.001), which tells us that pattern choice does impacts execution time for TC2.

## **Post-hoc Analysis: Dunn's Test**

Since the Kruskal-Wallis test revealed differences in energy consumption for TC1 and in execution time for TC2, we need to determine which patterns differ from each other. For this purpose, we will now conduct Dunn's test. Dunn's test makes pairwise comparisons between all three asynchronous patterns, so we can identify exactly which patterns perform significantly better or worse than others. Note that we do the Dunn's test only for these test case and metric combinations that were found to have significant differences in previous section, i.e. energy consumption for TC1 and execution time for TC2.

In [11]:
def perform_dunn_test(data, metric, title):
  print(f"Dunn's Test - {title} - {metric.capitalize()}")
  data_list = [df[f'TIME_SECONDS' if metric == 'time' else 'ENERGY_JOULES'].values for df in data.values()]
  labels_array = np.concatenate([[label] * len(d) for label, d in zip(list(data.keys()), data_list)])
  df = pd.DataFrame({metric: np.hstack(data_list), 'pattern': labels_array})
  with pd.option_context('display.float_format', '{:.3f}'.format):
    posthoc_results = sp.posthoc_dunn(df, val_col=metric, group_col='pattern', p_adjust='bonferroni')
    print(posthoc_results)
  print("\n")


### **TC1 Dunn's Test**

In [12]:
tc1_energy_get_patterns_dunn = {k: v[['ENERGY_JOULES']] for k, v in tc1_data.items() if 'get' in k}
perform_dunn_test(tc1_energy_get_patterns_dunn, 'energy', 'TC1, GET Requests')

tc1_energy_post_patterns_dunn = {k: v[['ENERGY_JOULES']] for k, v in tc1_data.items() if 'post' in k}
perform_dunn_test(tc1_energy_post_patterns_dunn, 'energy', 'TC1, POST Requests')


Dunn's Test - TC1, GET Requests - Energy
                asyncawait_get  callback_get  promise_get
asyncawait_get           1.000         0.027        0.282
callback_get             0.027         1.000        1.000
promise_get              0.282         1.000        1.000


Dunn's Test - TC1, POST Requests - Energy
                 asyncawait_post  callback_post  promise_post
asyncawait_post            1.000          0.030         0.380
callback_post              0.030          1.000         0.877
promise_post               0.380          0.877         1.000




### **TC2 Dunn's Test**

In [13]:
tc2_time_kruskal_data_dunn = {pattern: df[['TIME_SECONDS']] for pattern, df in tc2_data.items()}
perform_dunn_test(tc2_time_kruskal_data_dunn, 'time', 'TC2')

Dunn's Test - TC2 - Time
            asyncawait  callback  promise
asyncawait       1.000     0.000    1.000
callback         0.000     1.000    0.000
promise          1.000     0.000    1.000




### **Dunn's Test Results**

For energy use in TC1, it looks like async/await uses a different amount of energy than callback (p = 0.027 for GET and p = 0.030 for POST). These p-values are less than 0.05, which is our cutoff for saying there's a statistically significant difference. The promise pattern's energy use wasn't clearly different from the other two for GET requests (p = 0.282 compared to async/await, and p = 1.000 compared to callback, both greater than 0.05). For the POST requests, promise also did not differ from async/await (p = 0.380) and callback (p = 0.877).

When we look at TC2, the callback pattern acted quite differently. It had a significantly different execution time compared to both async/await (p < 0.001) and promise (p < 0.001). These p-values are much lower than 0.05, showing a strong difference. On the other hand, async/await and promise took about the same amount of time to finish in TC2 (p = 1.000).

In summary, for energy consumption in TC1, the p-values show a statistically significant difference between async/await and callback. For execution time in TC2, the very low p-values show a significant difference between callback and both async/await and promise, while the high p-value shows no significant difference between async/await and promise.

## **Descriptive Statistics**

The Dunn's test identified where the significant differences are. What it did not tell us is what the differences are, i.e. which pattern has lower energy consumption or execution time. To determine that, we will now calculate descriptive statistics (mean, median, standard deviation, min, max) for each pattern and test case to understand the differences.

In [14]:
def calculate_descriptive_stats(series):
  return {
    'mean': series.mean(),
    'median': series.median(),
    'std': series.std(),
    'min': series.min(),
    'max': series.max()
  }

### **TC1 Descriptive Statistics**

**GET Request**

In [15]:
for pattern, df in tc1_data.items():
  if 'get' in pattern:
    print(f"\nPattern: {pattern}")
    print("Energy (Joules):")
    for stat, value in calculate_descriptive_stats(df['ENERGY_JOULES']).items():
      print(f"    {stat}: {value:.3f}")

    print("Time (Seconds):")
    for stat, value in  calculate_descriptive_stats(df['TIME_SECONDS']).items():
      print(f"    {stat}: {value:.3f}")


Pattern: callback_get
Energy (Joules):
    mean: 3.180
    median: 3.080
    std: 0.428
    min: 2.610
    max: 4.550
Time (Seconds):
    mean: 1.260
    median: 1.259
    std: 0.003
    min: 1.256
    max: 1.271

Pattern: promise_get
Energy (Joules):
    mean: 3.370
    median: 3.180
    std: 0.728
    min: 2.820
    max: 6.660
Time (Seconds):
    mean: 1.260
    median: 1.259
    std: 0.004
    min: 1.256
    max: 1.269

Pattern: asyncawait_get
Energy (Joules):
    mean: 3.394
    median: 3.305
    std: 0.364
    min: 2.850
    max: 4.620
Time (Seconds):
    mean: 1.260
    median: 1.259
    std: 0.005
    min: 1.254
    max: 1.282


**POST Request**

In [16]:
for pattern, df in tc1_data.items():
  if 'post' in pattern:
    print(f"Pattern: {pattern}")
    print("Energy (Joules):")
    for stat, value in calculate_descriptive_stats(df['ENERGY_JOULES']).items():
      print(f"{stat}: {value:.3f}")

    print("Time (Seconds):")
    for stat, value in calculate_descriptive_stats(df['TIME_SECONDS']).items():
      print(f"{stat}: {value:.3f}")
    print("\n")

Pattern: callback_post
Energy (Joules):
mean: 3.187
median: 3.220
std: 0.202
min: 2.710
max: 3.470
Time (Seconds):
mean: 1.260
median: 1.260
std: 0.004
min: 1.257
max: 1.273


Pattern: promise_post
Energy (Joules):
mean: 3.270
median: 3.310
std: 0.336
min: 2.820
max: 4.510
Time (Seconds):
mean: 1.260
median: 1.260
std: 0.003
min: 1.255
max: 1.274


Pattern: asyncawait_post
Energy (Joules):
mean: 3.383
median: 3.340
std: 0.297
min: 2.850
max: 4.190
Time (Seconds):
mean: 1.260
median: 1.259
std: 0.005
min: 1.256
max: 1.285




### **TC2 Descriptive Statistics**

In [17]:
for pattern, df in tc2_data.items():
  print(f"Pattern: {pattern}")
  print("Energy (Joules):")
  for stat, value in calculate_descriptive_stats(df['ENERGY_JOULES']).items():
    print(f"{stat}: {value:.3f}")

  print("Time (Seconds):")
  for stat, value in calculate_descriptive_stats(df['TIME_SECONDS']).items():
    print(f"{stat}: {value:.3f}")
  print("\n")

Pattern: callback
Energy (Joules):
mean: 3.207
median: 3.190
std: 0.360
min: 2.570
max: 3.930
Time (Seconds):
mean: 1.246
median: 1.241
std: 0.017
min: 1.238
max: 1.311


Pattern: promise
Energy (Joules):
mean: 3.301
median: 3.105
std: 0.585
min: 2.630
max: 5.300
Time (Seconds):
mean: 1.258
median: 1.252
std: 0.033
min: 1.247
max: 1.434


Pattern: asyncawait
Energy (Joules):
mean: 3.268
median: 3.350
std: 0.316
min: 2.750
max: 3.830
Time (Seconds):
mean: 1.253
median: 1.253
std: 0.003
min: 1.247
max: 1.260




### **Descriptive Statistics Results**

Looking at the descriptive statistics for TC1, we notice that the callback pattern generally showed the lowest average energy consumption for both GET (around 3.18 J) and POST (around 3.19 J) requests. The median energy consumption for callback was also the lowest for GET requests (around 3.08 J).  The standard deviation for callback energy was also relatively low (around 0.43 J for GET and 0.20 J for POST). In contrast, the promise pattern had an average around 3.37 J for GET and 3.27 J for POST, with medians around 3.18 J for GET and 3.31 J for POST. The promise GET request results still had a higher standard deviation (0.73 J) and a wider range (2.82 J to 6.66 J), indicating more variability in its energy consumption for GET requests. The async/await pattern had averages around 3.39 J for GET and 3.38 J for POST, with medians around 3.305 J for GET and 3.34 J for POST.

When it comes to execution time in TC1, the data gives a different result. The average (around 1.260 s for all patterns) and median (around 1.259 s for GET and around 1.260 s for POST) execution times for GET and POST requests were similar across all three asynchronous patterns.

For TC2, the key difference appears to be execution time. The callback pattern showed a faster average (around 1.246 s) and median (around 1.241 s) execution time compared to both the promise pattern (average around 1.258 s, median around 1.252 s) and async/await pattern (average around 1.253 s, median around 1.253 s). The callback pattern also showed a larger standard deviation in time (0.017 s) and a wider range (1.238 s to 1.311 s) compared to async/await (standard deviation 0.003 s, range 1.247 s to 1.260 s), which shows more variability in callback's execution time for TC2. Promise had the highest standard deviation in time (0.033 s) and the widest range (1.247 s to 1.434 s). Interestingly, the promise and async/await patterns had quite similar average and median execution times in this scenario. Regarding energy consumption in TC2, the average energy consumption was similar across all three patterns (around 3.21 J for callback, 3.30 J for promise, and 3.27 J for async/await). The standard deviations were 0.36 J for callback, 0.58 J for promise, and 0.32 J for async/await.

In summary, the descriptive statistics suggest that for TC1, callback tended to have the lowest average energy consumption for both GET and POST requests, and the lowest median energy consumption for GET requests. For POST requests, Promise showed a slightly higher average but also a higher median energy consumption compared to callback. All patterns performed similarly in terms of execution time in TC1. For TC2, callback stood out as having the lowest execution time, although with more variability. Energy consumption in TC2 did not show clear advantages for any specific pattern.

## **How big are the differences?**

In [18]:
def calculate_mean_values(data, metric):
  mean_values = {}
  for pattern, df in data.items():
    mean_values[pattern] = df[f'TIME_SECONDS' if metric == 'time' else 'ENERGY_JOULES'].mean()
  return mean_values

def calculate_percentage_difference(value, baseline):
  return ((value - baseline) / baseline) * 100


# TC1 - Energy Consumption
tc1_energy_means = calculate_mean_values(tc1_data, 'energy')

print("\nTC1 - Energy Consumption:")
callback_get_energy_mean = tc1_energy_means['callback_get']
asyncawait_get_energy_diff = calculate_percentage_difference(tc1_energy_means['asyncawait_get'], callback_get_energy_mean)
print(f"Asyncawait (GET) vs. Callback (GET): Asyncawait consumed {asyncawait_get_energy_diff:.2f}% more energy.")

callback_post_energy_mean = tc1_energy_means['callback_post']
asyncawait_post_energy_diff = calculate_percentage_difference(tc1_energy_means['asyncawait_post'], callback_post_energy_mean)
print(f"Asyncawait (POST) vs. Callback (POST): Asyncawait consumed {asyncawait_post_energy_diff:.2f}% more energy.")

# TC2 - Execution Time
tc2_time_means = calculate_mean_values(tc2_data, 'time')

print("\nC2 - Execution Time:")
callback_time_mean = tc2_time_means['callback']
promise_time_diff = calculate_percentage_difference(tc2_time_means['promise'], callback_time_mean)
asyncawait_time_diff = calculate_percentage_difference(tc2_time_means['asyncawait'], callback_time_mean)
print(f"Promise vs. Callback: Promise took {promise_time_diff:.2f}% longer.")
print(f"Asyncawait vs. Callback: Async/await took {asyncawait_time_diff:.2f}% longer.")


TC1 - Energy Consumption:
Asyncawait (GET) vs. Callback (GET): Asyncawait consumed 6.75% more energy.
Asyncawait (POST) vs. Callback (POST): Asyncawait consumed 6.15% more energy.

C2 - Execution Time:
Promise vs. Callback: Promise took 1.01% longer.
Asyncawait vs. Callback: Async/await took 0.56% longer.


For TC1 energy consumption, async/await used approximately 6-7% more energy than callback for both GET and POST requests. For TC2 execution time, promise took about 1% longer than callback, and async/await took about 0.6% longer than callback.

Even though our numbers show some statistical differences between the patterns, we need to think about whether these differences are actually large enough in a real-world situation. Choosing which pattern to use also depends on other things, like how easy the code is to write and understand, and how well it handles problems. We'll discuss more about these things and whether it's really worth using one pattern over another in the thesis.