Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with python3 when using threads #13

Closed
dariober opened this issue Apr 8, 2019 · 6 comments
Closed

Incompatibility with python3 when using threads #13

dariober opened this issue Apr 8, 2019 · 6 comments

Comments

@dariober
Copy link

dariober commented Apr 8, 2019

Hi- I think there is an incompatibility between gat v1.3.5 and python3 when using multiple threads via the -t/--num-threads parameter. Using the test data in the gat source code:

gat-run.py -s gat/test/data/segments_single.bed.gz \
    -a gat/test/data/annotations.bed.gz \
    -w gat/test/data/workspace.bed.gz \
    -n 100 \
    -t 1

# job started at Mon Apr  8 09:51:53 2019 on dario-T7500 -- 6eff5a66-3f9f-4a7c-bba5-3c6bd34e0478
# pid: 40016, system: Linux 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64
# annotation_files                        : ['gat/test/data/annotations.bed.gz']
# annotations_label                       : None
# annotations_to_points                   : None
# bucket_size                             : 0
# cache                                   : None
# conditional                             : unconditional
# conditional_expansion                   : None
# conditional_extension                   : None
# counters                                : []
# enable_split_tracks                     : False
# ignore_segment_tracks                   : True
# input_filename_counts                   : None
# input_filename_descriptions             : None
# input_filename_results                  : None
# isochore_files                          : None
# loglevel                                : 1
# nbuckets                                : 100000
# null                                    : default
# num_samples                             : 100
# num_threads                             : 1
# output_bed                              : []
# output_counts_pattern                   : None
# output_filename_pattern                 : %s
# output_force                            : False
# output_order                            : fold
# output_plots_pattern                    : None
# output_samples_pattern                  : None
# output_stats                            : []
# output_tables_pattern                   : %s.tsv.gz
# overlapping_annotations                 : False
# pseudo_count                            : 1.0
# pvalue_method                           : empirical
# qvalue_lambda                           : None
# qvalue_method                           : BH
# qvalue_pi0_method                       : smoother
# random_seed                             : None
# restrict_workspace                      : False
# sample_files                            : []
# sampler                                 : annotator
# segment_files                           : ['gat/test/data/segments_single.bed.gz']
# shift_expansion                         : 2.0
# shift_extension                         : 0
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
# stdin                                   : <_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
# stdout                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# truncate_segments_to_workspace          : False
# truncate_workspace_to_annotations       : False
# workspace_files                         : ['gat/test/data/workspace.bed.gz']
## 2019-04-08 09:51:53,468 INFO segments: reading tracks from 1 files
## 2019-04-08 09:51:53,566 INFO segments: read 1 tracks from 1 files
## 2019-04-08 09:51:53,568 INFO annotations: reading tracks from 1 files
## 2019-04-08 09:51:53,786 INFO annotations: read 7 tracks from 1 files
## 2019-04-08 09:51:53,788 INFO workspaces: reading tracks from 1 files
## 2019-04-08 09:51:55,121 INFO workspaces: read 1 tracks from 1 files
## 2019-04-08 09:51:55,134 INFO collapsing workspaces
## 2019-04-08 09:51:55,135 INFO intervals loaded in 1 seconds
## 2019-04-08 09:51:55,144 INFO collecting observed counts
## 2019-04-08 09:51:55,145 INFO starting sampling
## 2019-04-08 09:51:55,145 INFO sampling: merged: 1/1
## 2019-04-08 09:51:55,145 INFO performing unconditional sampling
## 2019-04-08 09:51:55,147 INFO workspace without conditioning: 279844 segments, 2204303400 nucleotides
## 2019-04-08 09:51:55,148 INFO workspace after conditioning: 279844 segments, 2204303400 nucleotides
## 2019-04-08 09:51:55,149 INFO setting up shared data for multi-processing
## 2019-04-08 09:51:55,153 INFO sampling started
## 2019-04-08 09:51:55,154 INFO generating processpool with 1 threads for 100 items
## 2019-04-08 09:51:55,363 INFO 0/100 done ( 0.00)
## 2019-04-08 09:52:11,781 INFO sampling completed
## 2019-04-08 09:52:11,781 INFO retrieving private data
Traceback (most recent call last):
  File "/home/dario/miniconda3/envs/tritume/bin/gat-run.py", line 317, in <module>
    sys.exit(main(sys.argv))
  File "/home/dario/miniconda3/envs/tritume/bin/gat-run.py", line 295, in main
    annotator_results = fromSegments(options, args)
  File "/home/dario/miniconda3/envs/tritume/bin/gat-run.py", line 218, in fromSegments
    num_threads=options.num_threads)
  File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/gat/__init__.py", line 1010, in run
    track, counts, counters, segs, annotations, workspace, outfiles)
  File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/gat/__init__.py", line 763, in sample
    annotations.unshare()
  File "gat/Engine.pyx", line 2688, in gat.Engine.IntervalContainer.unshare
TypeError: expected bytes, str found

The error TypeError: expected bytes, str found should be due to the difference between python 3 and 2 in handling strings.

Without multithreading things work fine:

gat-run.py -s gat/test/data/segments_single.bed.gz -a gat/test/data/annotations.bed.gz -w gat/test/data/workspace.bed.gz -n 100 -t 0
...
# job finished in 13 seconds at Mon Apr  8 10:04:49 2019 -- 14.57  0.07  0.00  0.00 -- a96d4d32-d60d-420c-b195-8ecc9866f65c

NB: This is using gat installed via conda as:

conda install gat=1.3.5=py36ha92aebf_2

Weeks ago I have submitted a bioconda recipe (v1.3.5-3) that sets the python version to 2.

@Acribbs
Copy link

Acribbs commented Jun 14, 2019

I think gat is only python 2 compatible at the moment

@Acribbs
Copy link

Acribbs commented Jun 14, 2019

I have also found this to be an issue too, even though I think changes were made to make it python3 compatible

@AndreasHeger
Copy link
Owner

I will fix this.

@Acribbs
Copy link

Acribbs commented Jun 14, 2019

Thanks @AndreasHeger really appreciate it. It’s such a useful tool.

@AndreasHeger
Copy link
Owner

Thanks @dariober and @Acribbs , I have pushed a fix.

@AndreasHeger
Copy link
Owner

Released a new version, closing this issue. Please reopen if persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants