Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem on windows #47

Open
moissinac opened this issue Jun 16, 2016 · 5 comments
Open

Problem on windows #47

moissinac opened this issue Jun 16, 2016 · 5 comments
Assignees

Comments

@moissinac
Copy link

Hello
StrepHit seems very interesting
I've installed it on Windows. perl and TreeTagger are working and in the PATH
When I execute the following command line
python -m strephit extraction process_semistructured -p 1 samples/corpus.jsonlines
I get
c:\python.exe: Error while finding spec for 'strephit.main' (<class 'ImportError'>: No module named 'annotation'); 'strephit' is a package and cannot be directly executed

Any idea?

@marfox
Copy link
Member

marfox commented Jun 17, 2016

Thanks @moissinac for the report.
We will investigate the issue.

@burki
Copy link

burki commented Jun 27, 2016

@moissinac I got this step working on Windows 7, Python 2.7. I got similar errors with old numpy/scipy/scikit-learn versions, but they went away after uninstalling and re-installing with versions from http://www.lfd.uci.edu/~gohlke/pythonlibs/

@burki
Copy link

burki commented Jun 27, 2016

Using TreeTagger gives me an error on Windows:

 python -m strephit commons pos_tag samples/corpus.jsonlines bio en

fails with the error

TypeError: can't pickle thread.lock objects

It seems to be related to the multiprocessing-forking:

  File "strephit\commons\pos_tag.py", line 189, in main
    for i, tagged_document in enumerate(pos_tagger.tag_many(corpus, document_key, pos_tag_key, batch_size)):
  File "strephit\commons\pos_tag.py", line 135, in tag_many
    CHUNKERPROC=self._tokenizer_wrapper
  File "C:\Run\Python27\lib\site-packages\treetaggerpoll.py", line 207, in __init__
    self._build_workers(workerscount, kwargs)
  File "C:\Run\Python27\lib\site-packages\treetaggerpoll.py", line 220, in _build_workers
    p.start()
  File "C:\Run\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Run\Python27\lib\multiprocessing\forking.py", line 277, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Run\Python27\lib\multiprocessing\forking.py", line 199, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Run\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Run\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 568, in save_tuple
    save(element)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\multiprocessing\forking.py", line 67, in dispatcher
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 401, in save_reduce
    save(args)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 554, in save_tuple
    save(element)
  File "C:\Run\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Run\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Run\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Run\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Run\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Run\Python27\lib\pickle.py", line 306, in save
    rv = reduce(self.proto)
TypeError: can't pickle thread.lock objects

@e-dorigatti
Copy link
Collaborator

Hello @burki, thank you for the report. The treetaggerwrapper code is not under our control so I could not do much more than catching the exception and writing a for loop; I would use our parallel module but it is not tested under windows so I refrained. This will result in possibly much slower tagging, sorry!

@burki
Copy link

burki commented Jul 19, 2016

@e-dorigatti Thanks very much, this now works on my Windows 7-machine:
[WARNING] pos_tag.tag_many #139 - failed to initialize tree tragger process pool, fallback to single-process tagging
[INFO] io.process_stream #38 - Loaded input file 'E:\Playground\StrepHit\samples\corpus.jsonlines'
[INFO] pos_tag.main #206 - Done, total tagged items: 19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants