Skip to content

UCSD DSC 180A Data Science Capstone, Section on Wikipedia Edit Wars

Notifications You must be signed in to change notification settings

KengChiChang/DSC180A-Wiki-War

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DSC 180A Capstone: Wikipedia Edit Wars

Wikipedia Edit Wars

How to create a better Internet environment to exchange knowledge and ideas?

  • TA: Keng-Chi Chang <kechang@ucsd.edu>
  • Discussion Section A02
    • Time: Wednesdays 9:00-9:50am
    • Location: Warren Lecture Hall 2113
  • Lab/Office Hours
    • Time: Fridays 9:00-10:00am or by appointment
    • Sign up here in advance
    • Location: CSE Basement (B250)

Schedule

Week Topic Notes
1 Introduction Slides
2 Background & Data I: Wisdom of the crowds and bias in articles Slides
3 Background & Data II: Edit wars and controversies Jan 24 Assignment 1 Due
4 Techniques I
5 Techniques II Slides
6 Replication Result I Feb 15 Assignment 2 Due
7 Replication Result II
8 Impacts and Ethics
9 Work on Proposals
10 Work on Proposals

Keywords

  • Tags: Social Media, Online Conflict, Information Control
  • Data: Unstructured text, Webpage metadata
  • Methods: Causal Inference, Natural Language Processing, Network Analysis

Background

Wikipedia is the largest collection of human knowledge, featuring open access and contributions. However, edit wars are actually quite common, making Wikimedia Foundation set up certain policies.

One might argue that not putting restrictions on edits allows free exhange of ideas and helps to grow the community. Others might argue that putting certain restrictions can produce a higher-quality content and attract better editors. The lessons from Wikipedia can inform us possible directions for the future of social media.

This project will initially replicate a study quantifying controversies of Wikipedia articles, conducted before 2014. This project will then update the result to 2019, evaluate its performance, and derive other ways to measure controversies.

Data

All edit histories are publicly available through Wikimedia.

  • For the first part of the replication, we will use the (cleaned) Wikipedia data released by WikiWarMonitor.
  • For the second part of the replication, we will use the (raw) Wikipedia data released by Wikimedia Data Archives.

Assignments

Possible Projects

Note: Your proposal is NOT limited to be about edit wars. The replication is the process to give you hands-on experience of the data and detect possible directions.

  • Visualize the controversies across languages and across time
  • Combinine text and edits to improve controversy detection
  • Detect and prevent systematic coordination to attack Wikipedia
  • Develop algorithms to make automatic judgments on edit wars, evaluate its effectiveness, and compare with human judgments
  • Evaluate the effectiveness of Wikipedia's policy to combat coordination attacks

Useful Resources

About

UCSD DSC 180A Data Science Capstone, Section on Wikipedia Edit Wars

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published