Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

WikiSubarticle

This is the repo for CSCW2017 paper Lin, Y., Yu, B., Hall, A., and Hecht, B. (2017) Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia. Proceedings of the 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017). New York: ACM Press.

The repo contains three parts:

  1. The Java program WikiSubarticle in ./wikibrain_w_subarticle_plugin/
  2. The Python Flask program that serves the trained Subarticle classifiers in ./flask_classifiers/
  3. The groud truth ratings of subarticle candidates in ./gold_standard_datasets/ that allow for training your own subarticle classifiers

Requirements

Java >= 1.7
Maven >= 2
Postgres >= 8.6
Python >= 3.5
Flask >= 0.12

Instructions

Step 1 - Set up WikiSubarticle

The Java program WikiSubarticle leverages WikiBrain to provide technical infrastucture to access Wikipedia content. Please follow the instructions on WikiBrain to set up this part.

Note: Currently, WikiSubarticle requires the training the MilneWitten Semantic Relatedness module of WikiBrain. Please refer to this page for details of how to train the module

Quick summary of the essential steps (explanations could be found in the above links)

  1. mvn generate-sources
  2. mvn -f wikibrain-utils/pom.xml clean compile exec:java -Dexec.mainClass=org.wikibrain.utils.ResourceInstaller
  3. screen -S subarticle_ingestions
  4. export JAVA_OPTS="-d64 -Xmx128000M -server"
  5. ./wb-java.sh org.wikibrain.Loader -l en,sv,de,nl,fr,ru,it,es,pl,vi,ja,pt,zh,uk,ca,fa,no,ar,fi,hu,id,ro,cs,ko,sr,simple -s fetchlinks -s download -s dumploader -s redirects -s wikitext -s lucene -s phrases -s concepts -s universal -s wikidata -s spatial -c customized.conf

Step 2 - Set up Python Flask

From ./flask_classifiers/, run python classifiers_server.py. Doing so will serve the trained subarticle classifiers through Flask so you don't need to train your own model

Step 3 - Run the Subarticle Classifier

wb-java.sh org.wikibrain.cookbook.core.SubarticleClassifier [main article title] [lang_code] [type of dataset] [rating options] -c [configuration]

Specifications of the parameters:
[main article title]: the program will find the subarticles of this Wikipedia article
[lang_code]: three options "en" "es" "zh"
[type of dataset]: one option "popular" (currently) [rating options]: two options "2" "2.5" "3"

The meanings for each parameter could be seen in the paper.

About

The repo for CSCW2017 paper "Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia"

Resources

Releases

No releases published

Packages

No packages published

Languages