Course Human-Centered Data Science ([HCDS](https://www.mi.fu-berlin.de/en/inf/groups/hcc/teaching/winter_term_2020_21/course_human_centered_data_science.html)) - Winter Term 2020/21 - [HCC](https://www.mi.fu-berlin.de/en/inf/groups/hcc/index.html) | [Freie Universität Berlin](https://www.fu-berlin.de/)

***

# A4 - Transparency
Please use the follwing structure as a starting point. Extend and change the notebook according to your needs. This structure should help you to guide you through your analysis. This notebook is the foundation for condensing your results and writing your reflection in the end. So please read what we expect from you regarding the reflection first to structure your analysis accordingly.

## [1] General understanding
> What is the model about and who is using it?

* What is your model about?

In general, the `good-faith` model supports the ORES efforts to have an automated and transparent quality control mechanism for revisions on Wikipedia articles.
Like the whole ORES project, this model is also publicly available under the [Creative Commons Zero public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/) license.
This model predicts if an edit on an article was probably made in good-faith or not.

* Why is this model useful?

 It was trained on human judgement to separate edits that are done by spammers or vandals from those who are inexperienced or first time contributors.
With this, every `good-faith` newcomer can then be directed to an appropriate mentoring place.


* Who is using this model?

Editors,
WikiProject maintainers and
Wikipedian mentors to observe the activity of newly registered users.


* What are stakeholder or users of ORES?

Editors, Developers from the 'Wiki Artificial Intelligence' team, team members from the Wikimedia Scoring Platform, ...

* Why is this model useful to wikipedia?

After Wikimedia introduced automated article reviews, they saw a drop in edit requests from newly registered members.
The automated scoring system saw their edits not living up to wikimedias standards of an acceptable edit. The newcomers just didn't know what classifies a good article edit. The `good-faith` model was introduced to detect when a user attempts to edit an article with good intentions, so he/she can then be directed to an appropriate mentoring place.

* What applications/projects/... within wikipedia are using this model?

[Snuggle](https://en.wikipedia.org/wiki/Wikipedia:Snuggle)


## [2] API
> What does the ORES API (v3) tell you about a specific model? What functions does the API offer?

Use the API to investigate your model: https://ores.wikimedia.org/v3/#/. What do the follwing API calls do and what do they tell you about your model?

* `https://ores.wikimedia.org/v3/scores/`
* `https://ores.wikimedia.org/v3/scores/?model_info`
* `https://ores.wikimedia.org/v3/scores/enwiki`
* `https://ores.wikimedia.org/v3/scores/enwiki?models=YOURMODELNAME&model_info`
* `https://ores.wikimedia.org/v3/scores/enwiki?models=YOURMODELNAME&revids=SOMEIDHERE`
* `https://ores.wikimedia.org/v3/scores/enwiki/REVID/YOURMODELNAME?model_info`
* `https://ores.wikimedia.org/v3/scores/enwiki/REVID/YOURMODELNAME?features=true`

### Feature Injection
Please check out the _feature injection_ feature of ORES: https://www.mediawiki.org/wiki/ORES/Feature_injection

**Example:**

     # Here you can get the perdiction for a revision, if the user would habe been anonymous:
     https://ores.wikimedia.org/v3/scores/enwiki/991397091/damaging?features&feature.revision.user.is_anon=true

In [28]:
import requests
import json
import urllib.parse

# Customize these with your own information
headers = {
    'User-Agent': 'https://github.com/Arne117',
    'From': 'arner92@fu-berlin.de'
}

def get_ores_data(path, params):
    
    # endpoint = 'https://ores.wikimedia.org/v3/scores/{project}/?models={model}&revids={revids}'
    base_url = 'https://ores.wikimedia.org/v3/scores'
    endpoint = base_url + path

    # params = {'project': 'enwiki', 'model': 'goodfaith', 'revids': rev_id }
    
    if params:
        endpoint = endpoint + '?'
    
    print(endpoint + urllib.parse.urlencode(params))

    api_call = requests.get(endpoint + urllib.parse.urlencode(params))
    response = api_call.json()
    data = json.loads(json.dumps(response))

    return data

In [29]:
# https://ores.wikimedia.org/v3/scores/
get_ores_data('/', {})

https://ores.wikimedia.org/v3/scores/


{'arwiki': {'models': {'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.5.0'},
   'goodfaith': {'version': '0.5.0'}}},
 'bnwiki': {'models': {'reverted': {'version': '0.5.0'}}},
 'bswiki': {'models': {'damaging': {'version': '0.5.0'},
   'goodfaith': {'version': '0.5.0'}}},
 'cawiki': {'models': {'damaging': {'version': '0.5.1'},
   'goodfaith': {'version': '0.5.1'}}},
 'cswiki': {'models': {'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.6.0'},
   'goodfaith': {'version': '0.6.0'}}},
 'dewiki': {'models': {'damaging': {'version': '0.5.1'},
   'goodfaith': {'version': '0.5.1'}}},
 'elwiki': {'models': {'reverted': {'version': '0.5.0'}}},
 'enwiki': {'models': {'articlequality': {'version': '0.8.2'},
   'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.5.1'},
   'draftquality': {'version': '0.2.1'},
   'drafttopic': {'version': '1.2.0'},
   'goodfaith': {'version': '0.5.1'},
   'wp10': {'version': '0.8.2'}}},
 'enwiktionary': {'mo

In [30]:
# https://ores.wikimedia.org/v3/scores/?model_info
get_ores_data('/', { 'models': 'goodfaith' })

https://ores.wikimedia.org/v3/scores/?models=goodfaith


{'arwiki': {'models': {'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.5.0'},
   'goodfaith': {'version': '0.5.0'}}},
 'bnwiki': {'models': {'reverted': {'version': '0.5.0'}}},
 'bswiki': {'models': {'damaging': {'version': '0.5.0'},
   'goodfaith': {'version': '0.5.0'}}},
 'cawiki': {'models': {'damaging': {'version': '0.5.1'},
   'goodfaith': {'version': '0.5.1'}}},
 'cswiki': {'models': {'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.6.0'},
   'goodfaith': {'version': '0.6.0'}}},
 'dewiki': {'models': {'damaging': {'version': '0.5.1'},
   'goodfaith': {'version': '0.5.1'}}},
 'elwiki': {'models': {'reverted': {'version': '0.5.0'}}},
 'enwiki': {'models': {'articlequality': {'version': '0.8.2'},
   'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.5.1'},
   'draftquality': {'version': '0.2.1'},
   'drafttopic': {'version': '1.2.0'},
   'goodfaith': {'version': '0.5.1'},
   'wp10': {'version': '0.8.2'}}},
 'enwiktionary': {'mo

In [31]:
# https://ores.wikimedia.org/v3/scores/enwiki
get_ores_data('/enwiki', {})

https://ores.wikimedia.org/v3/scores/enwiki


{'enwiki': {'models': {'articlequality': {'version': '0.8.2'},
   'articletopic': {'version': '1.2.0'},
   'damaging': {'version': '0.5.1'},
   'draftquality': {'version': '0.2.1'},
   'drafttopic': {'version': '1.2.0'},
   'goodfaith': {'version': '0.5.1'},
   'wp10': {'version': '0.8.2'}}}}

In [32]:
# https://ores.wikimedia.org/v3/scores/enwiki?models=goodfaith&model_info
get_ores_data('/enwiki', { 'models': 'goodfaith', 'model_info': '' })

https://ores.wikimedia.org/v3/scores/enwiki?models=goodfaith&model_info=


{'enwiki': {'models': {'goodfaith': {'environment': {'machine': 'x86_64',
     'platform': 'Linux-4.9.0-11-amd64-x86_64-with-debian-9.12',
     'processor': '',
     'python_branch': '',
     'python_build': ['default', 'Sep 27 2018 17:25:39'],
     'python_compiler': 'GCC 6.3.0 20170516',
     'python_implementation': 'CPython',
     'python_revision': '',
     'python_version': '3.5.3',
     'release': '4.9.0-11-amd64',
     'revscoring_version': '2.8.0',
     'system': 'Linux',
     'version': '#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)'},
    'params': {'ccp_alpha': 0.0,
     'center': True,
     'criterion': 'friedman_mse',
     'init': None,
     'label_weights': {'false': 10},
     'labels': [True, False],
     'learning_rate': 0.01,
     'loss': 'deviance',
     'max_depth': 7,
     'max_features': 'log2',
     'max_leaf_nodes': None,
     'min_impurity_decrease': 0.0,
     'min_impurity_split': None,
     'min_samples_leaf': 1,
     'min_samples_split': 2,
     'min_weight_fr

In [36]:
# https://ores.wikimedia.org/v3/scores/enwiki?models=goodfaith&revids=SOMEIDHERE
get_ores_data('/enwiki', { 'models': 'goodfaith', 'revids': 807483153 })

# GA rev id: 807483153
# C rev id: 807483006
# Stub rev id: 355319463

https://ores.wikimedia.org/v3/scores/enwiki?models=goodfaith&revids=807483153


{'enwiki': {'models': {'goodfaith': {'version': '0.5.1'}},
  'scores': {'807483153': {'goodfaith': {'score': {'prediction': True,
      'probability': {'false': 0.04211422585711333,
       'true': 0.9578857741428867}}}}}}}

In [37]:
# https://ores.wikimedia.org/v3/scores/enwiki/REVID/goodfaith?model_info
get_ores_data('/enwiki/807483153/goodfaith', { 'model_info': '' })

https://ores.wikimedia.org/v3/scores/enwiki/807483153/goodfaith?model_info=


{'enwiki': {'models': {'goodfaith': {'environment': {'machine': 'x86_64',
     'platform': 'Linux-4.9.0-11-amd64-x86_64-with-debian-9.12',
     'processor': '',
     'python_branch': '',
     'python_build': ['default', 'Sep 27 2018 17:25:39'],
     'python_compiler': 'GCC 6.3.0 20170516',
     'python_implementation': 'CPython',
     'python_revision': '',
     'python_version': '3.5.3',
     'release': '4.9.0-11-amd64',
     'revscoring_version': '2.8.0',
     'system': 'Linux',
     'version': '#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)'},
    'params': {'ccp_alpha': 0.0,
     'center': True,
     'criterion': 'friedman_mse',
     'init': None,
     'label_weights': {'false': 10},
     'labels': [True, False],
     'learning_rate': 0.01,
     'loss': 'deviance',
     'max_depth': 7,
     'max_features': 'log2',
     'max_leaf_nodes': None,
     'min_impurity_decrease': 0.0,
     'min_impurity_split': None,
     'min_samples_leaf': 1,
     'min_samples_split': 2,
     'min_weight_fr

In [None]:
# https://ores.wikimedia.org/v3/scores/enwiki/REVID/goodfaith?features=true

## [3] ML algorithm and training/test data
> Which machine learning model is underlying and what data is used to build the model?

* Check out `model_info` in detail.
* What does it tell you about the model performance?
* You can visualise and explain your results regarding model performance.
* What data was used to train and test the model?
* What machine learning algorithm is your model using? Please explain briefly.

## [4] Features
> Which features are used and which have the greatest influence on the prediction?

* What features is your model using?
* What do they mean?
* Which is the most important features?
* `https://ores.wikimedia.org/v3/scores/enwiki/991379667/articlequality?features=true`
* Are all models (in all languages of wikipedia), are they using the same features?

## Sample code

***

#### Credits

We release the notebooks under the [Creative Commons Attribution license (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).