Course Human-Centered Data Science ([HCDS](https://www.mi.fu-berlin.de/en/inf/groups/hcc/teaching/winter_term_2020_21/course_human_centered_data_science.html)) - Winter Term 2020/21 - [HCC](https://www.mi.fu-berlin.de/en/inf/groups/hcc/index.html) | [Freie Universität Berlin](https://www.fu-berlin.de/)

***
#Transparency

## [1] General understanding
> What is the model about and who is using it?

* What is your model about?

  * The model predicts whether an article will need speedy deletion (because it is, for example, spam, vandalism, attack). 


* Why is this model useful?

  * The articles can be edited by anyone, therefor it is necessary to prevent actions like spam or vandalism. 


* Who is using this model?

  * The model is mainly used to help wikipedians to faster find vandalising edits.


* What are stakeholder or users of ORES?

  * Everyone who is using and editing wikipedia is basically a stakeholder of the ORES, since malfunctions or biases in the algorithm prevent people from getting the best and most accurate informations possible. 

* Why is this model useful to wikipedia?

  * The model is necessary for wikipedia, since the amount of daily edited and added articles is too large to be overviewed by just humans. Compared to it's predicessors it also improves the user friendlyness towards new ediors, which is also in wikipedia's best interest, since this implies that more people can contribute and therefor more knowledge can be obtained.

* What applications/projects/... within wikipedia are using this model?

  * enwiki, testwiki, simplewiki and ptwiki are using this model. 

## [2] API

In [None]:
import pandas as pd
import requests
import json
import pprint
import numpy as np

### https://ores.wikimedia.org/v3/scores/

The "scores" API call retrieves information about all projects and which model with the associated version is available for this project. 

In [None]:
call1 = requests.get('https://ores.wikimedia.org/v3/scores/')
scores = call1.json()
c1 = json.dumps(scores, indent = 1)
p1 = json.loads(c1)
pprint.pprint(c1, width= 200)

('{\n'
 ' "arwiki": {\n'
 '  "models": {\n'
 '   "articletopic": {\n'
 '    "version": "1.2.0"\n'
 '   },\n'
 '   "damaging": {\n'
 '    "version": "0.5.0"\n'
 '   },\n'
 '   "goodfaith": {\n'
 '    "version": "0.5.0"\n'
 '   }\n'
 '  }\n'
 ' },\n'
 ' "bnwiki": {\n'
 '  "models": {\n'
 '   "reverted": {\n'
 '    "version": "0.5.0"\n'
 '   }\n'
 '  }\n'
 ' },\n'
 ' "bswiki": {\n'
 '  "models": {\n'
 '   "damaging": {\n'
 '    "version": "0.5.0"\n'
 '   },\n'
 '   "goodfaith": {\n'
 '    "version": "0.5.0"\n'
 '   }\n'
 '  }\n'
 ' },\n'
 ' "cawiki": {\n'
 '  "models": {\n'
 '   "damaging": {\n'
 '    "version": "0.5.1"\n'
 '   },\n'
 '   "goodfaith": {\n'
 '    "version": "0.5.1"\n'
 '   }\n'
 '  }\n'
 ' },\n'
 ' "cswiki": {\n'
 '  "models": {\n'
 '   "articletopic": {\n'
 '    "version": "1.2.0"\n'
 '   },\n'
 '   "damaging": {\n'
 '    "version": "0.6.0"\n'
 '   },\n'
 '   "goodfaith": {\n'
 '    "version": "0.6.0"\n'
 '   }\n'
 '  }\n'
 ' },\n'
 ' "dewiki": {\n'
 '  "models": {\n'
 ' 

In [None]:
models = {'project': [],'model':[],'version':[]}
model_df = pd.DataFrame(models, columns = ['model','project', 'version' ])

for project in scores.keys():
  for model in scores[project]['models'].keys():
    version = scores[project]['models'][model]['version']
    model_df = model_df.append({'model':model, 'project': project, 'version':version}, ignore_index=True)
model_df[model_df['model'] == 'draftquality']

Unnamed: 0,model,project,version
17,draftquality,enwiki,0.2.1
71,draftquality,ptwiki,0.2.1
81,draftquality,simplewiki,0.2.1
95,draftquality,testwiki,0.2.1


###https://ores.wikimedia.org/v3/scores/?model_info

This API call gives detailed information about:

*   environment
*   params
*   score schema
*   statistics
*   type
*   version

of all available models in each project. 


In [None]:
call2 = requests.get('https://ores.wikimedia.org/v3/scores/?model_info')
model_info = call2.json()
c2 = json.dumps(model_info, indent= 2)
p2 = json.loads(c2)
pprint.pprint(c2, width= 200, depth= 2)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 '              "c": 0.934,\n'
 '              "km": 0.954,\n'
 '              "sm": 0.961,\n'
 '              "taslak": 0.806\n'
 '            },\n'
 '            "macro": 0.915,\n'
 '            "micro": 0.848\n'
 '          },\n'
 '          "!precision": {\n'
 '            "labels": {\n'
 '              "b": 0.969,\n'
 '              "baslag\\u0131\\u00e7": 0.866,\n'
 '              "c": 0.95,\n'
 '              "km": 0.994,\n'
 '              "sm": 0.995,\n'
 '              "taslak": 0.707\n'
 '            },\n'
 '            "macro": 0.914,\n'
 '            "micro": 0.791\n'
 '          },\n'
 '          "!recall": {\n'
 '            "labels": {\n'
 '              "b": 0.94,\n'
 '              "baslag\\u0131\\u00e7": 0.89,\n'
 '              "c": 0.918,\n'
 '              "km": 0.917,\n'
 '              "sm": 0.93,\n'
 '              "taslak": 0.937\n'
 '            },\n'
 '            "macro": 0.922,\n'
 '         

###https://ores.wikimedia.org/v3/scores/enwiki

This is like the scores API call (showing available models and versions), except that it only shows results for the enwiki project.  

In [None]:
call3 = requests.get('https://ores.wikimedia.org/v3/scores/enwiki')
enwiki = call3.json()
c3 = json.dumps(enwiki, indent = 1)
p3 = json.loads(c3)
pprint.pprint(c3, width= 200)

('{\n'
 ' "enwiki": {\n'
 '  "models": {\n'
 '   "articlequality": {\n'
 '    "version": "0.8.2"\n'
 '   },\n'
 '   "articletopic": {\n'
 '    "version": "1.2.0"\n'
 '   },\n'
 '   "damaging": {\n'
 '    "version": "0.5.1"\n'
 '   },\n'
 '   "draftquality": {\n'
 '    "version": "0.2.1"\n'
 '   },\n'
 '   "drafttopic": {\n'
 '    "version": "1.2.0"\n'
 '   },\n'
 '   "goodfaith": {\n'
 '    "version": "0.5.1"\n'
 '   },\n'
 '   "wp10": {\n'
 '    "version": "0.8.2"\n'
 '   }\n'
 '  }\n'
 ' }\n'
 '}')


In [None]:
pd.DataFrame.from_dict(enwiki['enwiki']['models'])

Unnamed: 0,articlequality,articletopic,damaging,draftquality,drafttopic,goodfaith,wp10
version,0.8.2,1.2.0,0.5.1,0.2.1,1.2.0,0.5.1,0.8.2


###https://ores.wikimedia.org/v3/scores/enwiki?models=draftquality&model_info

Similar as the model info API part, we now only the infos for our model. 

In [None]:
call4 = requests.get('https://ores.wikimedia.org/v3/scores/enwiki?models=draftquality&model_info')
enwiki_model = call4.json()
c4 = json.dumps(enwiki_model, indent = 1)
p4 = json.loads(c4)
pprint.pprint(c4, width= 200)

('{\n'
 ' "enwiki": {\n'
 '  "models": {\n'
 '   "draftquality": {\n'
 '    "environment": {\n'
 '     "machine": "x86_64",\n'
 '     "platform": "Linux-4.9.0-11-amd64-x86_64-with-debian-9.12",\n'
 '     "processor": "",\n'
 '     "python_branch": "",\n'
 '     "python_build": [\n'
 '      "default",\n'
 '      "Sep 27 2018 17:25:39"\n'
 '     ],\n'
 '     "python_compiler": "GCC 6.3.0 20170516",\n'
 '     "python_implementation": "CPython",\n'
 '     "python_revision": "",\n'
 '     "python_version": "3.5.3",\n'
 '     "release": "4.9.0-11-amd64",\n'
 '     "revscoring_version": "2.8.2",\n'
 '     "system": "Linux",\n'
 '     "version": "#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)"\n'
 '    },\n'
 '    "params": {\n'
 '     "ccp_alpha": 0.0,\n'
 '     "center": false,\n'
 '     "criterion": "friedman_mse",\n'
 '     "init": null,\n'
 '     "label_weights": null,\n'
 '     "labels": [\n'
 '      "OK",\n'
 '      "spam",\n'
 '      "vandalism",\n'
 '      "attack"\n'
 '     ],\n'
 '    

### https://ores.wikimedia.org/v3/scores/enwiki?models=draftquality&revids=641962088

For this API we used the rev ID from the github wikimedia page which was listed as an example. Using this ID it can be seen, that this was predicted as spam with our used model.

In [None]:
call5 = requests.get('https://ores.wikimedia.org/v3/scores/enwiki?models=draftquality&revids=641962088')
rev = call5.json()
c5 = json.dumps(rev, indent = 1)
p5 = json.loads(c5)
pprint.pprint(c5, width= 200)

('{\n'
 ' "enwiki": {\n'
 '  "models": {\n'
 '   "draftquality": {\n'
 '    "version": "0.2.1"\n'
 '   }\n'
 '  },\n'
 '  "scores": {\n'
 '   "641962088": {\n'
 '    "draftquality": {\n'
 '     "score": {\n'
 '      "prediction": "spam",\n'
 '      "probability": {\n'
 '       "OK": 0.11366053720369897,\n'
 '       "attack": 0.01792559000882414,\n'
 '       "spam": 0.793175595649904,\n'
 '       "vandalism": 0.07523827713757288\n'
 '      }\n'
 '     }\n'
 '    }\n'
 '   }\n'
 '  }\n'
 ' }\n'
 '}')


###https://ores.wikimedia.org/v3/scores/enwiki/641962088/draftquality?model_info

This API call gives us the opportunity to use a certain rev ID and receive model information, like the model info API call. 

In [None]:
call6 = requests.get('https://ores.wikimedia.org/v3/scores/enwiki/641962088/draftquality?model_info')
rev = call6.json()
c6 = json.dumps(rev, indent = 1)
p6 = json.loads(c6)
pprint.pprint(c6, width= 200)

('{\n'
 ' "enwiki": {\n'
 '  "models": {\n'
 '   "draftquality": {\n'
 '    "environment": {\n'
 '     "machine": "x86_64",\n'
 '     "platform": "Linux-4.9.0-11-amd64-x86_64-with-debian-9.12",\n'
 '     "processor": "",\n'
 '     "python_branch": "",\n'
 '     "python_build": [\n'
 '      "default",\n'
 '      "Sep 27 2018 17:25:39"\n'
 '     ],\n'
 '     "python_compiler": "GCC 6.3.0 20170516",\n'
 '     "python_implementation": "CPython",\n'
 '     "python_revision": "",\n'
 '     "python_version": "3.5.3",\n'
 '     "release": "4.9.0-11-amd64",\n'
 '     "revscoring_version": "2.8.2",\n'
 '     "system": "Linux",\n'
 '     "version": "#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)"\n'
 '    },\n'
 '    "params": {\n'
 '     "ccp_alpha": 0.0,\n'
 '     "center": false,\n'
 '     "criterion": "friedman_mse",\n'
 '     "init": null,\n'
 '     "label_weights": null,\n'
 '     "labels": [\n'
 '      "OK",\n'
 '      "spam",\n'
 '      "vandalism",\n'
 '      "attack"\n'
 '     ],\n'
 '    

###https://ores.wikimedia.org/v3/scores/enwiki/641962088/draftquality?features=true

This gives an overview of the used features for this model ad the associated rev ID.

In [None]:
call7 = requests.get('https://ores.wikimedia.org/v3/scores/enwiki/641962088/draftquality?features=true')
rev = call7.json()
c7 = json.dumps(rev, indent = 1)
p7 = json.loads(c7)
pprint.pprint(c7, width= 200)

('{\n'
 ' "enwiki": {\n'
 '  "models": {\n'
 '   "draftquality": {\n'
 '    "version": "0.2.1"\n'
 '   }\n'
 '  },\n'
 '  "scores": {\n'
 '   "641962088": {\n'
 '    "draftquality": {\n'
 '     "features": {\n'
 '      "feature.english.sentiment.revision.negative_polarity": 0.125,\n'
 '      "feature.english.sentiment.revision.positive_polarity": 1.0,\n'
 '      "feature.english.stemmed.revision.stems_length": 93,\n'
 '      "feature.enwiki.main_article_templates": 0.0,\n'
 '      "feature.enwiki.revision.category_links": 0.0,\n'
 '      "feature.enwiki.revision.cite_templates": 0.0,\n'
 '      "feature.enwiki.revision.cn_templates": 0.0,\n'
 '      "feature.enwiki.revision.image_links": 0.0,\n'
 '      "feature.enwiki.revision.infobox_templates": 0.0,\n'
 '      "feature.len(<datasource.english.badwords.revision.matches>)": 0.0,\n'
 '      "feature.len(<datasource.english.dictionary.revision.dict_words>)": 21.0,\n'
 '      "feature.len(<datasource.english.dictionary.revision.non_dict_

### Feature Injection

To inject a feature we first have to check which features were used.

In [None]:
feat_inj = requests.get('https://ores.wikimedia.org/v3/scores/enwiki/21312312/articlequality?features')
fi = feat_inj.json()
c8 = json.dumps(fi, indent = 1)
p8 = json.loads(c8)
pprint.pprint(p8, width= 200)

{'enwiki': {'models': {'articlequality': {'version': '0.8.2'}},
            'scores': {'21312312': {'articlequality': {'features': {'feature.english.stemmed.revision.stems_length': 3926,
                                                                    'feature.enwiki.infobox_images': 0,
                                                                    'feature.enwiki.main_article_templates': 0.0,
                                                                    'feature.enwiki.revision.category_links': 3.0,
                                                                    'feature.enwiki.revision.cite_templates': 0.0,
                                                                    'feature.enwiki.revision.cn_templates': 0.0,
                                                                    'feature.enwiki.revision.image_links': 0.0,
                                                                    'feature.enwiki.revision.image_template': 0.0,
                         

By using the example from https://www.mediawiki.org/wiki/ORES/Feature_injection the prediction should change, but it does not. So we can't figure out which influence the feature injection has. 

In [None]:
feat_inj = requests.get('https://ores.wikimedia.org/v3/scores/enwiki/21312312/articlequality?features/feature.enwiki.revision.cite_templates=13.0/feature.enwiki.revision.paragraphs_without_refs_total_length=0.0/feature.wikitext.revision.ref_tags=13.0/feature.wikitext.revision.templates=18.0')
fi = feat_inj.json()
c8 = json.dumps(fi, indent = 1)
p8 = json.loads(c8)
pprint.pprint(p8, width= 200)

{'enwiki': {'models': {'articlequality': {'version': '0.8.2'}},
            'scores': {'21312312': {'articlequality': {'score': {'prediction': 'Start',
                                                                 'probability': {'B': 0.23776933204655176,
                                                                                 'C': 0.13358945258161725,
                                                                                 'FA': 0.006881530486026957,
                                                                                 'GA': 0.009387151285779235,
                                                                                 'Start': 0.5878400472447988,
                                                                                 'Stub': 0.024532486355225997}}}}}}}


## [3] ML algorithm and training/test data
> Which machine learning model is underlying and what data is used to build the model?


* Check out `model_info` in detail.
* What does it tell you about the model performance?

  * It gives information about different metrics, such as precision, accuracy, or match rate, which shows that the accuracy for example is actually very high, considering that for all categories (OK, spam, attack, ...) it is well over 95 %.

* You can visualise and explain your results regarding model performance.
* What data was used to train and test the model?

  * Already reviewed articles with the labels ["OK", "spam", "vandalism", "attack"] were used to train the model

* What machine learning algorithm is your model using? Please explain briefly.

  * Gradient Boost: using multiple weaker models, that makeup a stronger model overall by having the models learn from each others mistakes

In [None]:
call = requests.get('https://ores.wikimedia.org/v3/scores/enwiki?models=draftquality&model_info')
model_info = call.json()

In [None]:
params_dict = {'params': [],'value':[]}
params_df = pd.DataFrame(params_dict, columns = ['params','value' ])

for key in model_info['enwiki']['models']['draftquality']['params'].keys():
  values = model_info['enwiki']['models']['draftquality']['params'][key]

  params_df = params_df.append({'params':key,'value': values}, ignore_index=True)


In [None]:
params_df

Unnamed: 0,params,value
0,ccp_alpha,0
1,center,0
2,criterion,friedman_mse
3,init,
4,label_weights,
5,labels,"[OK, spam, vandalism, attack]"
6,learning_rate,0.1
7,loss,deviance
8,max_depth,5
9,max_features,log2


In [None]:
metrics_dict = {'metrics': [],'value':[]}
metrics_df = pd.DataFrame(metrics_dict, columns = ['metrics','value' ])

for key in model_info['enwiki']['models']['draftquality']['statistics'].keys():
 for key2 in model_info['enwiki']['models']['draftquality']['statistics'][key].keys():
    value = model_info['enwiki']['models']['draftquality']['statistics'][key][key2]

    metrics_df = metrics_df.append({'metrics':key+' ('+key2+')','value': value}, ignore_index=True)


In [None]:
metrics_df

Unnamed: 0,metrics,value
0,!f1 (labels),"{'OK': 0.656, 'attack': 0.998, 'spam': 0.986, ..."
1,!f1 (macro),0.908
2,!f1 (micro),0.665
3,!precision (labels),"{'OK': 0.544, 'attack': 0.998, 'spam': 0.996, ..."
4,!precision (macro),0.884
5,!precision (micro),0.557
6,!recall (labels),"{'OK': 0.824, 'attack': 0.997, 'spam': 0.976, ..."
7,!recall (macro),0.946
8,!recall (micro),0.829
9,accuracy (labels),"{'OK': 0.975, 'attack': 0.995, 'spam': 0.973, ..."


In [None]:
print( "The used model is: ", model_info['enwiki']['models']['draftquality']['type'], ', Version: ', model_info['enwiki']['models']['draftquality']['version'] )

The used model is:  GradientBoosting , Version:  0.2.1


## [4] Features
> Which features are used and which have the greatest influence on the prediction?

* What features is your model using?
  * The used features can be seen in the output below.

* What do they mean?
  * There are several features. The first two are for example if the sentiment of the revision was rather positive or more negative. All features are about the revision of the article. For example revision length. This information are used to detect if the article is "Ok" or spam or vandalism. If there are e.g. no/short revisions for this article or only a few references it is most likely spam or something else.

* Which is the most important feature?
  * The feature "feature.wikitext.revision.chars" is with a value of 498512 the most important feature.

* The API call `https://ores.wikimedia.org/v3/scores/enwiki/991379667/draftquality?features=true` is used in this example.

* Are all models (in all languages of wikipedia), are they using the same features?
  * No, since ORES is only available in 14 different languages.
  * In addition not all projects support the model "draftquality" 

In [None]:
calli = requests.get('https://ores.wikimedia.org/v3/scores/enwiki/991379667/draftquality?features=true')
features = calli.json()

In [None]:
feature_dict = {'feature': [],'value':[]}
feature_df = pd.DataFrame(feature_dict, columns = ['feature','value' ])

for feature in features['enwiki']['scores']['991379667']['draftquality']['features'].keys():
  value = features['enwiki']['scores']['991379667']['draftquality']['features'][feature]
  feature_df = feature_df.append({ 'feature': feature, 'value':value }, ignore_index=True)

In [None]:
feature_df

Unnamed: 0,feature,value
0,feature.english.sentiment.revision.negative_po...,689.403
1,feature.english.sentiment.revision.positive_po...,1085.597
2,feature.english.stemmed.revision.stems_length,247938.0
3,feature.enwiki.main_article_templates,48.0
4,feature.enwiki.revision.category_links,42.0
5,feature.enwiki.revision.cite_templates,999.0
6,feature.enwiki.revision.cn_templates,0.0
7,feature.enwiki.revision.image_links,40.0
8,feature.enwiki.revision.infobox_templates,1.0
9,feature.len(<datasource.english.badwords.revis...,57.0


For other wiki projects, the features for this model are not available. 

***

#### Credits

We release the notebooks under the [Creative Commons Attribution license (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).