Skip to content
This repository has been archived by the owner on Nov 25, 2019. It is now read-only.

arrdc can break on FreeBSD without --nomp; lack of default semaphore support #8

Open
GreenReaper opened this issue May 15, 2011 · 0 comments

Comments

@GreenReaper
Copy link

I am using FreeBSD 8.1 in a FreeBSD jail on an 8-core machine, with python's virtualenv set up as described.

The default multiprocessing will fail with the default settings for python26 port. With --nomp, it works fine, just takes a while (9:14, 39.5/sec). This is because by default the semaphore component of the multiprocessor module is broken on FreeBSD.

This is not technically an arrddict problem but it would be a good idea to check if Pool creation failed and give a clearer error message like "you need a working python multiprocessor module, which may involve enabling pth or kernel semaphores in your config when compiling python". This might also be mentioned on the website.

With experimental kernel semaphores enabled in the port config, I got 94.89/s out of it - though rather disturbingly the resulting file was slightly smaller than with --nomp. Slightly greater performance (100.1/s) and a larger file size was achieved using pth support offered by the port instead http://www.gnu.org/software/pth/

Here's my initial output:

(env-aard)[wikifur@wikifur ~]$ aardc wiki wikifur.en.cdb --siteinfo wikifur.en.json --wiki-lang en
/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/compiler.py:31: DeprecationWarning: Module 'PyICU' is deprecated, import 'icu' instead'
  from PyICU import Locale, Collator
Session dir ./aardc-1305488951-65
texvc: not found
blahtexml: not found
Writing log to ./aardc-1305488951-65/log
Converting wikifur.en.cdb
total: 21911
Traceback (most recent call last):
  File "/usr/home/wikifur/env-aard/bin/aardc", line 8, in <module>
    load_entry_point('aardtools==0.8.3', 'console_scripts', 'aardc')()
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 1093, in main
    converter.collect_articles(converter.make_input(input_file), options, compiler)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 250, in collect_articles
    p.parse(input_file)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 463, in parse_mp
    self.pool.close()
AttributeError: 'NoneType' object has no attribute 'close'

My log file contained:

22:10:01 INFO [compiler] Maximum file size is 2147483647 bytes
22:10:01 INFO [compiler] Wikipedia language: en
22:10:01 WARNING [compiler] Dictionary version is not specified and couldn't be guessed from input file name, using 20110515221001
22:10:01 INFO [compiler] Collecting articles
22:10:02 INFO [compiler] Collecting articles in wikifur.en.cdb
22:10:02 WARNING [wiki] No metadata file specified
22:10:02 INFO [wiki] Language: en (en)
22:10:02 INFO [wiki] Creating new worker pool with wiki cdb at wikifur.en.cdb

If I put an "if self.pool:" before the line mentioned, I get a clearer error which led to the above solution:

  File "/usr/home/wikifur/env-aard/bin/aardc", line 8, in <module>
    load_entry_point('aardtools==0.8.3', 'console_scripts', 'aardc')()
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 1093, in main
    converter.collect_articles(converter.make_input(input_file), options, compiler)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 250, in collect_articles
    p.parse(input_file)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 409, in parse_mp
    self.reset_pool(f)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 388, in reset_pool
    initargs=[cdbdir, self.lang, self.rtl])
  File "/usr/local/lib/python2.6/multiprocessing/__init__.py", line 227, in Pool
    return Pool(processes, initializer, initargs)
  File "/usr/local/lib/python2.6/multiprocessing/pool.py", line 84, in __init__
    self._setup_queues()
  File "/usr/local/lib/python2.6/multiprocessing/pool.py", line 130, in _setup_queues
    from .queues import SimpleQueue
  File "/usr/local/lib/python2.6/multiprocessing/queues.py", line 22, in <module>
    from multiprocessing.synchronize import Lock, BoundedSemaphore, Semaphore, Condition
  File "/usr/local/lib/python2.6/multiprocessing/synchronize.py", line 33, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant