Fast Commons Category Inspection is an in-memory database for fast commons category operations such as
- Loop detection
- Deep traversal
- Category intersection
- Category subtraction
FastCCI can operate without depth limits on categories.
fastcci_build_db builds the binary database files form an SQL dump of the categorylinks database.
fastcci_server is the database server backend that can be queried through HTTP.
The database is generated from a simple parent child pageid table that is generated with a short SQL query. On Wikimedia Tool Labs this query can be launched with the following command.
The text output is streamed into the
fastcci command that parses it and generates a binary database image, containing of the
fastcci.cat index file and the
fastcci.tree data file.
Both files are saved to the current directory.
mysql --defaults-file=$HOME/replica.my.cnf -h commonswiki.labsdb commonswiki_p -e 'select /* SLOW_OK */ cl_from, page_id, cl_type from categorylinks,page where cl_type!="page" and page_namespace=14 and page_title=cl_to order by page_id;' --quick --batch --silent | ./fastcci_build_db
Start the server with
./fastcci_server PORT DATADIR, where
PORT is the tcp port the server will listen, and
DATADIR is the path to the
The server can be queried through HTTP or WebSockets. The URLs are the same in both cases (except for the protocol part). The request string looks like an ordinary HTTP GET URL. assuming the server was started on port 8080 you can query it using curl like this:
fastcci_callback function. This mode is activated by adding the
t=js query parameter and value.
c1The primary category pageid integer value. This always has to be specified, otherwise the server will return an error 500.
c2The secondary category (or file) pageid
d1The primary search depth (defaults to infinity)
d2The secondary search depth (defaults to infinity)
aThe query action. Values can be:
andPerform the intersection between category
notFetch fils that are in category
c1but not in category
listList all files in and below category
fqvList all FPs, QIs, and VIs files (in that order) in and below category
pathFind the subcategory path from category
c1to file or category
The server performs some sanity checking on the query parameters to make sure that the pageids supplied are pointing to categories (or if allowed to files).
The response is delivered in a simple text format with multiple lines. Each line starts with a keyword and may be followed by data. The keywords are:
RESULTfollowed by a
|separated list of up to 50 integer triplets of the form
pageId,depth,tag. Each triplet stands for one image or category.
NOPATHindicates that no path from
a=pathrequest was found.
OUTOFfollowed by an integer that is the number of total items in th the calculated result (rather than the number of returned items). This can be either an exact number (for
a=list) or an estimate (for
QUEUEDis the immediate acknowledgement that the server has queued the current request.
WAITINGis sent to the client with one integer value representing the number of requests that are ahead in the queue and will be processed before the current request.
WORKINGfollowed by two integers representing the current number of items found in
c2. This response item is sent to the client every 0.2s and shows the current state of the ongoing category traversal.
DONEindicates the end of the server transmission.