In [1]:
# Imports
%matplotlib inline
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import scipy.stats as stats
import io
import requests

# Load data
url="https://ndownloader.figshare.com/files/10185495"
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')))

# Statistical Analysis

## Test for normal distribution

In order to statistically confirm our hypothesis that the triple pattern type has an impact on the runtime, we first need to check whether the results are normally distributed.

### Test: Normal Distribution

We test whether the samples for each triple pattern type in both evironments for all knowledge graphs differ from a normal distribution.
This function tests the null hypothesis that a sample comes from a normal distribution. It is based on D’Agostino and Pearson’s test that combines skew and kurtosis to produce an omnibus test of normality.

Hypothesis:
$H_0:$ The sample comes from a normal distribution.

In [2]:
# Select data for Study 1 only with cold cache
study1 = df[(df['study'] == 1) & (df['Cache'] == False)]
print("Test for normal distribution")
for env in study1['Environment'].unique():
    print("Environment: {0}".format(env))
    for source in study1['Source'].unique(): 
        print("Source: {0}".format(source))
        for category in study1['category'].unique():
            a, p = stats.mstats.normaltest(study1[(study1['category'] == category) 
                                                    & (study1['Source'] == source)
                                                    & (study1['Environment'] == env)]['ms'])
            print("Category: " + str(category) +  "; p-Value: " + str(p))

Test for normal distribution
Environment: Controlled
Source: DBLP
Category: <v,r,r>; p-Value: 1.18476616736e-58
Category: <r,r,v>; p-Value: 0.0
Category: <v,v,v>; p-Value: 5.05781986828e-305
Category: <r,v,r>; p-Value: 0.0
Category: <r,v,v>; p-Value: 4.57033798575e-140
Category: <v,v,r>; p-Value: 2.49535647547e-75
Category: <r,r,r>; p-Value: 0.0
Category: <v,r,v>; p-Value: 0.0724368024255
Source: DBpedia
Category: <v,r,r>; p-Value: 2.72262412116e-55
Category: <r,r,v>; p-Value: 3.84173293512e-223
Category: <v,v,v>; p-Value: 2.84196378922e-124


  np.min(n))


Category: <r,v,r>; p-Value: 4.05240471342e-288
Category: <r,v,v>; p-Value: 1.87259087176e-10
Category: <v,v,r>; p-Value: 4.35381904155e-41
Category: <r,r,r>; p-Value: 0.0
Category: <v,r,v>; p-Value: 2.36508925397e-43
Source: GeoNames
Category: <v,r,r>; p-Value: 3.04749297303e-31
Category: <r,r,v>; p-Value: 0.0
Category: <v,v,v>; p-Value: 3.59398495489e-148
Category: <r,v,r>; p-Value: 1.80800959123e-297
Category: <r,v,v>; p-Value: 2.74780234928e-280
Category: <v,v,r>; p-Value: 1.18087461651e-239
Category: <r,r,r>; p-Value: 2.24765255975e-265


  np.min(n))


Category: <v,r,v>; p-Value: 1.17154728246e-05
Source: Wiktionary
Category: <v,r,r>; p-Value: 8.98780694032e-144
Category: <r,r,v>; p-Value: 3.0925437434e-303
Category: <v,v,v>; p-Value: 6.27668528553e-217
Category: <r,v,r>; p-Value: 8.7755759042e-296
Category: <r,v,v>; p-Value: 3.18174727903e-180
Category: <v,v,r>; p-Value: 8.72823422871e-136
Category: <r,r,r>; p-Value: 1.32378417168e-227
Category: <v,r,v>; p-Value: 0.715014510996
Environment: Real-World
Source: DBLP
Category: <v,r,r>; p-Value: 0.0
Category: <r,r,v>; p-Value: 0.0
Category: <v,v,v>; p-Value: 0.0


  np.min(n))


Category: <r,v,r>; p-Value: 0.0
Category: <r,v,v>; p-Value: 0.0
Category: <v,v,r>; p-Value: 0.0
Category: <r,r,r>; p-Value: 0.0
Category: <v,r,v>; p-Value: 0.140616799544
Source: DBpedia
Category: <v,r,r>; p-Value: 0.0
Category: <r,r,v>; p-Value: 0.0
Category: <v,v,v>; p-Value: 0.0
Category: <r,v,r>; p-Value: 0.0
Category: <r,v,v>; p-Value: 0.0
Category: <v,v,r>; p-Value: 1.23632691433e-271
Category: <r,r,r>; p-Value: 0.0
Category: <v,r,v>; p-Value: 4.25642363424e-28
Source: GeoNames
Category: <v,r,r>; p-Value: 3.90942889033e-293
Category: <r,r,v>; p-Value: 0.0
Category: <v,v,v>; p-Value: 0.0
Category: <r,v,r>; p-Value: 0.0
Category: <r,v,v>; p-Value: 0.0
Category: <v,v,r>; p-Value: 0.0
Category: <r,r,r>; p-Value: 0.0
Category: <v,r,v>; p-Value: 0.101345088483
Source: Wiktionary
Category: <v,r,r>; p-Value: 5.34935998266e-258
Category: <r,r,v>; p-Value: 0.0
Category: <v,v,v>; p-Value: 0.0
Category: <r,v,r>; p-Value: 0.0
Category: <r,v,v>; p-Value: 0.0
Category: <v,v,r>; p-Value: 9.59641

### Result
For all triple pattern types, we can reject the null hypothesis at a significance level of $\alpha = 0.05$ for all knowledge graphs in the real-world environment. For the controlled environment, we cannot reject the null hypothesis for the pattern type $\langle v,r,v \rangle$ for DBLP, GeoNames and Wiktionary in the real-world environment and for Wiktionary in the controlled environment. However, the due to the rather low $\alpha$ values and with 60 / 64 rejected hypothesis, we assume the samples do not come from a normal distribution. Conclusively, we apply a non-parametric test to examine the different pattern types.

## Triple Pattern Type.

To test if the samples for the different pattern types significantly differ, we apply the Kruskal-Wallis test.
The test can be used to study, whether a group of samples originates from the same distrbution.

### Test: Kruskal Wallis Test

We apply the Kruskal Wallis test to test whether there is a significant difference between the group. The response times for a specific Triple Pattern type represent a group.

Hypothesis:
$H_0$: There is no difference between the groups

### Response time

In [3]:
# Split the data into the categories
for env in study1['Environment'].unique():
    print("Environment: {0}".format(env))
    for source in study1['Source'].unique(): 
        print("Knowledge graph: {0}".format(source))
        col = 'ms'
        cats = study1['category'].unique()
        t_df_1 = study1[(study1['category'] == cats[0]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_2 = study1[(study1['category'] == cats[1]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_3 = study1[(study1['category'] == cats[2]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_4 = study1[(study1['category'] == cats[3]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_5 = study1[(study1['category'] == cats[4]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_6 = study1[(study1['category'] == cats[5]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_7 = study1[(study1['category'] == cats[6]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_8 = study1[(study1['category'] == cats[7]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        stat, p = stats.kruskal(t_df_1[col],t_df_2[col], t_df_3[col], t_df_4[col], t_df_5[col], t_df_6[col], t_df_7[col], t_df_8[col])
        print("Kruskal Wallis Test: p-Value = " + str(p))

Environment: Controlled
Knowledge graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0
Environment: Real-World
Knowledge graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0


### Relative response time

In [4]:
# Split the data into the categories
for env in study1['Environment'].unique():
    print("Environment: {0}".format(env))
    for source in study1['Source'].unique(): 
        print("Knowledge graph: {0}".format(source))
        col = 'ms_per_res'
        cats = study1['category'].unique()
        t_df_1 = study1[(study1['category'] == cats[0]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_2 = study1[(study1['category'] == cats[1]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_3 = study1[(study1['category'] == cats[2]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_4 = study1[(study1['category'] == cats[3]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_5 = study1[(study1['category'] == cats[4]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_6 = study1[(study1['category'] == cats[5]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_7 = study1[(study1['category'] == cats[6]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        t_df_8 = study1[(study1['category'] == cats[7]) & (study1['Source'] == source) & (study1['Environment'] == env)]
        stat, p = stats.kruskal(t_df_1[col],t_df_2[col], t_df_3[col], t_df_4[col], t_df_5[col], t_df_6[col], t_df_7[col], t_df_8[col])
        print("Kruskal Wallis Test: p-Value = " + str(p))

Environment: Controlled
Knowledge graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0
Environment: Real-World
Knowledge graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0


### Result 

The Kruskal Wallis test yields a p-Value of 0.0 which means that the Null Hypothesis may be rejected at a significance level of $\alpha = 0.05$ and thus there is a significant difference between the groups. The next step is a post-hoc analysis to find which pair of types differ from each other.

## Answer Cardinality.

Next, we want to study the relation between the answer cardinality and the response time. For this purpose, we conduct a correlation analysis.

In [5]:
# Gather data
rho = []
for env in study1['Environment'].unique():
    for server in list(study1['Source'].unique()):
        a = study1[(study1['Source'] == server) 
                   & (study1['category'] != "<v,v,v>")
                   & (study1['Environment'] == env)]
        one_page = a[a['one_page'] == True] 
        more_pages = a[a['one_page'] == False] 
        rho_1 = stats.pearsonr(one_page['total_items'], one_page['ms'])[0]
        rho_2 = stats.pearsonr(more_pages['total_items'], more_pages['ms'])[0]
        rho_all = stats.pearsonr(a['total_items'], a['ms'])[0]
        rho.append({"KG": server, "Environment" : env, "r_1" : rho_1, "r_2" : rho_2, "r_all" : rho_all })
# Create Table
cdf = pd.DataFrame(rho)
cdf.sort_values(by=['KG'], inplace=True)
cdf = cdf.transpose()
cdf

Unnamed: 0,0,4,1,5,2,6,3,7
Environment,Controlled,Real-World,Controlled,Real-World,Controlled,Real-World,Controlled,Real-World
KG,DBLP,DBLP,DBpedia,DBpedia,GeoNames,GeoNames,Wiktionary,Wiktionary
r_1,0.462653,0.0253597,0.57095,0.0488652,0.0611747,-0.00134887,0.317332,0.0018773
r_2,-0.0556256,-0.0103808,0.0361779,0.0255433,-0.0551964,-0.0163546,-0.0328858,-0.0556055
r_all,0.113472,-0.000368441,0.0919915,0.0071217,0.101094,-0.00129091,0.178342,-0.00168576


## Caching.

Again, we conduct a Kruskal-Wallis test to check for the statistical significance of our results. This time we compare for each environment and each source whether therer is a difference between the uncached and cached results. 

### Test: Kruskal Wallis Test

In [6]:
# Select the data including cold and warm cache
study1_1 = df[(df['study'] == 1)]

for env in study1_1['Environment'].unique():
    print("Environemt: {0}".format(env))
    for source in study1_1['Source'].unique():
        uncached = study1_1[(study1_1['Environment'] == env) 
                            & (study1_1['Source'] == source) 
                            & (study1_1['Cache'] == False)]
        cached = study1_1[(study1_1['Environment'] == env) 
                            & (study1_1['Source'] == source) 
                            & (study1_1['Cache'] == True)]
        stat, p = stats.kruskal(uncached, cached)
        print("Knowledge Graph: {0}".format(source))
        print("Kruskal Wallis Test: p-Value = " + str(p))

Environemt: Controlled




Knowledge Graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0
Environemt: Real-World
Knowledge Graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0


### Result

We find that the difference between the uncached and cached results is statistically significant at a level of $\alpha = 0.05$ for all knowledge graphs in both environments.

## KG / TPF Instance Relation.

We conduct a Kruskal-Wallis test to check for the statistical significance of our results. This time we compare for each environment and each source whether therer is a difference between the uncached and cached results. 

### Test: Kruskal Wallis Test

In [7]:
# Select the data
study4 = df[(df['study'] == 4) & (df['Environment'] == "Controlled") & (df['Cache'] == False)]

for source in study4['Source'].unique():
    single_kg = study4[(study4['Source'] == source) & (study4['KGs'] == "Single KG")]
    multiple_kg = study4[(study4['Source'] == source) & (study4['KGs'] == "Multiple KG")]
    stat, p = stats.kruskal(single_kg, multiple_kg)
    print("Knowledge Graph: {0}".format(source))
    print("Kruskal Wallis Test: p-Value = " + str(p))

Knowledge Graph: DBLP
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: DBpedia
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: GeoNames
Kruskal Wallis Test: p-Value = 0.0
Knowledge Graph: Wiktionary
Kruskal Wallis Test: p-Value = 0.0


### Result

We find that the response time is significantly different at a level of $\alpha = 0.05$ when comparing the results for a single KG loaded on the TPF server to the results of multiple KGs loaded on the TPF server.