Module for analyzing contributions to a topic on Wikipedia.
git clone https://github.com/WikiEducationFoundation/TopicContribs.git cd TopicContribs python3 setup.py install
> python3 -m topics.cmdline cmdline Usage: cmdline --dumps=<path_to_dumps> --out=<path_to_output_dir> [--apm=<article_project_path>] [--pl=<project_list_path>] [--threads=<num_threads>] [--verbose] [<cohort_file> ... ] cmdline (-h | --help) Options: --dumps=<path_to_dumps> Directory containing the metadata dumps --out=<path_to_output_dir> Directory in which to put output files --apm=<article_project_path> Path to a csv of page_id project_name pairs. --pl=<project_list_path> Path to a csv with all project_name's that you would like to be included in the count. --threads=<num_threads> Number of threads to be used. All available will be used if not specified. <cohort_file> File containing usernames of interest. -v, --verbose Generate verbose output.
These must be full history dumps.
- For minimal size and maximal parallelization use
- If you want to use a single file
- If you already have the full text history dumps downloaded and you feel like
You can use mwdumps to download the latest set of dumps: https://github.com/kjschiroo/python-mwdumps
python3 -m mwdumps.cmdline --wiki=enwiki -v /path/to/save/dumps
This file provides a map between articles and the projects they are included in.
We expect it to be a
.csv following the format
Generating this file
This file can be produced by running
<user_database> with your user database.
This is a file listing all of the project names we are interested in. The
names must match those in the
project_name column of the
article_project_path file in order for the corresponding pages to be counted.
A file or set of files listing the usernames of the users we are interested in tracking. If multiple are used then each will be summed separately and output to a separate output file.
We will output one timeseries file for each
cohort_file and one extra
general file for all activity.
You can use
topicutils.tsvToCsv -i <input.tsv> -o <output.csv> to convert
.tsv generated by the
wmflabs databases to a