Permalink
Browse files

Some more improvements to README files.

  • Loading branch information...
1 parent 2b7a7eb commit c03b46c215fc26f085a31e4f7a05b9c4ffbcecc4 @lucadealfaro lucadealfaro committed with thumper Apr 12, 2010
Showing with 47 additions and 12 deletions.
  1. +47 −12 README-batch
View
@@ -20,29 +20,30 @@ PREREQUISITES:
=============
See the main README file for how to build WikiTrust.
-Do a "make allopt" from the top level, as described there.
+You need to perform step 2) of the installation procedures.
You also need to have a wiki dump file.
+We recommend running all these commands under "screen", so that if you
+get disconnected from the host, the commands do not terminate -- they
+can take quite a long while, for large wikis.
+
PROCESSING THE DUMP
===================
Processing the dump used to be a complex process, composed of several
steps. To facilitate the processing, we have written a wrapper file,
util/batch_process.py, which takes care of performing all steps
optimally, using the available multi-processing capabilities of your
-machine. Consequently, all the processing is reduced to two simple
-steps:
+machine. All you need to do is:
-Process the dumps:
-------------------
-cd util
-./batch_process.py --cmd_dir <path>/WikiTrust/analysis --dir <process_dir> <dump_file_name.xml.7z>
+ $ cd util
+ $ ./batch_process.py --cmd_dir <path>/WikiTrust/analysis --dir <process_dir> <dump_file_name.xml.7z>
Where:
-<path> is the path to WikiTrust
+ <path> is the path to WikiTrust
<process_dir> is the name of a directory that will be used.
This directory needs to be sufficiently large; as of September 2009,
@@ -53,7 +54,7 @@ Notes:
The command batch_process.py has many options, which allow also to do
the processing in step-by-step fashion; do
- ./batch_process.py --help
+ $ ./batch_process.py --help
for more information. In particular, batch_process performs the
following phases:
@@ -74,11 +75,45 @@ do_all_it_revs.py to your needs.
Load the data in the db:
------------------------
-./load_db.sh <process_dir>/sql <logfile> | mysql -u dbuser dbname -p
+First, we need to load the Wikipedia data.
+
+* For a single file, do:
+
+ $ cd ../test-scripts
+ $ cp db_access_data.ini.sample db_access_data.ini
+
+ Edit db_access_data.ini to reflect the database information for the
+ wiki you are using. Then, load the revisions into the wiki:
+
+ $ python load_data.py <wiki-xml-file>
+
+ where <wiki-xml-file> is the uncompressed Wikipedia dump.
+ If the wiki is not empty, you can use the --clear_db option instruct
+ load_data.py to erase any previous data in the wiki.
+ The above command uses mwdumper, see http://www.mediawiki.org/wiki/MWDumper
+
+* For loading many files, see the command util/load_all_files.sh
+
+Once that has been done, you need to load the mysql code generated by
+the batch analysis. You can do that via:
+
+ $ ./load_db.sh <process_dir>/sql <wiki_user> <wiki_db> <wiki_password>
+
+Next, we have to load the user reputations. Do it with the following
+command:
+
+ $ cd ../analysis
+ $ cat <process_dir>/user_reputations.txt | \
+ ./load_reputations -db_user <db_user> -db_pass <db_pass> -db_name <db_name>
+
+Finally, you need to tell the WikiTrust.php Mediawiki extension where
+the annotated revisions are. They are stored in the filesystem,
+rather than in the database, to improve performance. You need to set
+the following variable in LocalSettings.php in the Mediawiki installation:
-You will have to type the mysql user password.
-Here, <logfile> is the name of a file where loading statistics will appear.
+ $wgWikiTrustBlobPath = "<process_dir>/buckets";
+where <process_dir> is as you selected when running util/batch_process.py
...AND FINALLY...
=================

0 comments on commit c03b46c

Please sign in to comment.