Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Code for importing, graphing and modeling AFT data

branch: master
readme.md

Article feedback research

Created to look at the value of project quality assessments (GA/FA/etc) in light of broad user feedback available from the Article feedback tool.

Eventually we want to model the relationship between marginal changes in project quality assessment and user feedback.

Components

Currently:

  • feed.import.R
    • Ingests the feedback data and the list of project rated articles.
    • Contains three options for sourcing the data: local, remote and intitial--explained below.
  • feed.plots.R, feed.Animations.R, feed.ggplot.R
    • For now, most of the big stuff is in here. Plots for summary stats and some other things
  • feed.reg.R
    • Simple linear regression, an aux regression and some binomial/multinomial regressions

Quality

All of this stuff is coming from my personal use. I've tried to comment where possible and make sensible decisions but I make no guarantees.

It is also not written in the style of an R package. The code expects you to run each script as needed (after the import script) in order to load the functions and objects into your environment. Then you can plot or model as needed using the supplied functions/objects.

License

  • All code is released under a CC-BY-SA license, as described and linked in license.md

Packages

Required

  • ggplot2 For plotting
  • boot for some bootstrapping (not strictly required but it makes the code easier)
  • Animation for saving animations (also requires ImageMagick)
  • MASS for a few regression models and utility functions
  • XML for reading from mediawiki API

Used for miscellany (or not used inside the script proper)

  • xtable for printing tables

Individual script notes

feed.import.R

Some things of note:

  • The option to download and create directories (initial) as well as to load from local files (local) expect a certain set of folders and will fail rather gracelessly if they are modified. This should be ok unless you change your working directory after loading the files to disk.
  • Depending on your internet connection loading the files remotely will take a few minutes. (remote) and (initial) call the MediaWiki API to enumerate assessment categories and downloads the feedback csv.

feed.ggplot.R

  • Most of these use the full dataset without decimation so they will take a while to render.

feed.reg.R

  • Two basic linear regressions. The first is a simple regression comparion rating average to other variables. The second is an auxillary regression meant to back out the influence of article length on rating average.
  • The simple model is checked against a bootstrapped regression for a rough test of error structure.
  • Summary stats and a table of differences between assessed articles are also produced.
  • Two proportional odds models are fitted/bootstrapped in order to better illustrate the relationship between length, count and rating average on likelihood of assessment. Plots are produced in this script due to the overhead in fitting the function repeatedly.
Something went wrong with that request. Please try again.