Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing exception at merging ChIP and Input #30

Closed
balwierz opened this issue Aug 8, 2016 · 5 comments
Closed

Multiprocessing exception at merging ChIP and Input #30

balwierz opened this issue Aug 8, 2016 · 5 comments

Comments

@balwierz
Copy link

balwierz commented Aug 8, 2016

Unfortunately I need to report another problem:

Binning input.bedpe.bz2 (File: run_epic, Log level: INFO, Time: Sun, 07 Aug 2016 00:50:10 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, M (File: count_reads_in_windows, Log level: INFO, Time: Sun, 07 Aug 2016 00:50:10 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Sun, 07 Aug 2016 02:31:24 )
Traceback (most recent call last):
  File "/usr/local/bin/epic", line 165, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic
    args.number_cores)
  File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input
    for chip_df, input_df in zip(chip_dfs, input_dfs))
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 764, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 715, in retrieve
    raise exception
joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/usr/local/bin/epic in <module>()
    160     elif not args.effective_genome_length and args.paired_end:
    161         logging.info("Using paired end so setting readlength to 100.")
    162         args.effective_genome_length = get_effective_genome_length(args.genome,
    163                                                                    100)
    164 
--> 165     run_epic(args)
    166 
    167 
    168 
    169 

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py in run_epic(args=Namespace(control=['/mnt/biggles/csc_home/piotr/...Rer7.goodChr.sorted.bedpe.bz2'], window_size=200))
     37 
     38     nb_chip_reads = get_total_number_of_reads(chip_merged_sum)
     39     nb_input_reads = get_total_number_of_reads(input_merged_sum)
     40 
     41     merged_dfs = merge_chip_and_input(chip_merged_sum, input_merged_sum,
---> 42                                       args.number_cores)
        args.number_cores = 8
     43 
     44     score_threshold, island_enriched_threshold, average_window_readcount = \
     45         compute_background_probabilities(nb_chip_reads, args)
     46 

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in merge_chip_and_input(chip_dfs=[       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr2  60300400      4

[421692 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr3  63268800     18

[435867 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      5

[316130 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr5  75682000      1

[535305 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr6  59938600      6

[422058 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr7  77275200      1

[527395 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr8  56184600      8

[388639 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr9  58231800      7

[432048 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      3

[328844 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[323983 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[340626 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr13  54093600      9

[398247 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733400      2

[392001 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      2

[322950 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr16  58780600      3

[409774 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr17  53983600      2

[393334 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[344906 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400      1

[366537 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr20  55951800      1

[395725 rows x 3 columns], ...], input_dfs=[        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr2  60300400      8

[1134549 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr3  63268800     24

[1122376 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      6

[805058 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr5  75682000      1

[1402420 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr6  59938600      1

[1149075 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr7  77275600      1

[1461920 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr8  56184600     10

[1047948 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr9  58232000      7

[1109349 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      8

[866951 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[887944 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[920637 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr13  54093600      1

[1032122 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733600      2

[983984 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      5

[871981 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr16  58780600      3

[1068652 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr17  53983600      7

[1039873 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[977370 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400     12

[973110 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr20  55951800      3

[1054667 rows x 3 columns], ...], nb_cpu=8)
     32     assert len(chip_dfs) == len(input_dfs)
     33 
     34     logging.info("Merging ChIP and Input data.")
     35     merged_chromosome_dfs = Parallel(n_jobs=nb_cpu)(
     36         delayed(_merge_chip_and_input)(chip_df, input_df)
---> 37         for chip_df, input_df in zip(chip_dfs, input_dfs))
        chip_dfs = [       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr2  60300400      4

[421692 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr3  63268800     18

[435867 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      5

[316130 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr5  75682000      1

[535305 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr6  59938600      6

[422058 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr7  77275200      1

[527395 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr8  56184600      8

[388639 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr9  58231800      7

[432048 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      3

[328844 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[323983 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[340626 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr13  54093600      9

[398247 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733400      2

[392001 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      2

[322950 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr16  58780600      3

[409774 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr17  53983600      2

[393334 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[344906 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400      1

[366537 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr20  55951800      1

[395725 rows x 3 columns], ...]
        input_dfs = [        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr2  60300400      8

[1134549 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr3  63268800     24

[1122376 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      6

[805058 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr5  75682000      1

[1402420 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr6  59938600      1

[1149075 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr7  77275600      1

[1461920 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr8  56184600     10

[1047948 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr9  58232000      7

[1109349 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      8

[866951 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[887944 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[920637 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr13  54093600      1

[1032122 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733600      2

[983984 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      5

[871981 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr16  58780600      3

[1068652 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr17  53983600      7

[1039873 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[977370 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400     12

[973110 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr20  55951800      3

[1054667 rows x 3 columns], ...]
     38     return merged_chromosome_dfs
     39 
     40 
     41 def get_total_number_of_reads(dfs):

...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=8), iterable=<generator object <genexpr>>)
    759             if pre_dispatch == "all" or n_jobs == 1:
    760                 # The iterable was consumed all at once by the above for loop.
    761                 # No need to wait for async callbacks to trigger to
    762                 # consumption.
    763                 self._iterating = False
--> 764             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=8)>
    765             # Make sure that we get a last message telling us we are done
    766             elapsed_time = time.time() - self._start_time
    767             self._print('Done %3i out of %3i | elapsed: %s finished',
    768                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Sun Aug  7 02:31:27 2016
PID: 10981                                   Python 2.7.12: /usr/bin/python
...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    122     def __init__(self, iterator_slice):
    123         self.items = list(iterator_slice)
    124         self._size = len(self.items)
    125 
    126     def __call__(self):
--> 127         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _merge_chip_and_input>
        args = (       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns])
        kwargs = {}
        self.items = [(<function _merge_chip_and_input>, (       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns]), {})]
    128 
    129     def __len__(self):
    130         return self._size
    131 

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in _merge_chip_and_input(chip_df=       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns], input_df=        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns])
     13 
     14     chip_df_nb_bins = len(chip_df)
     15     merged_df = chip_df.merge(input_df,
     16                               how="left",
     17                               on=["Chromosome", "Bin"],
---> 18                               suffixes=[" ChIP", " Input"])
     19     merged_df = merged_df[["Chromosome", "Bin", "Count ChIP", "Count Input"]]
     20     merged_df.columns = ["Chromosome", "Bin", "ChIP", "Input"]
     21 
     22     merged_df = merged_df.fillna(0)

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py in merge(self=       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns], right=        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False)
   4432               suffixes=('_x', '_y'), copy=True, indicator=False):
   4433         from pandas.tools.merge import merge
   4434         return merge(self, right, how=how, on=on, left_on=left_on,
   4435                      right_on=right_on, left_index=left_index,
   4436                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 4437                      copy=copy, indicator=indicator)
        copy = True
        indicator = False
   4438 
   4439     def round(self, decimals=0, *args, **kwargs):
   4440         """
   4441         Round a DataFrame to a variable number of decimal places.

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in merge(left=       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns], right=        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False)
     34           suffixes=('_x', '_y'), copy=True, indicator=False):
     35     op = _MergeOperation(left, right, how=how, on=on, left_on=left_on,
     36                          right_on=right_on, left_index=left_index,
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
     38                          copy=copy, indicator=indicator)
---> 39     return op.get_result()
        op.get_result = <bound method _MergeOperation.get_result of <pandas.tools.merge._MergeOperation object>>
     40 if __debug__:
     41     merge.__doc__ = _merge_doc % '\nleft : DataFrame'
     42 
     43 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in get_result(self=<pandas.tools.merge._MergeOperation object>)
    212     def get_result(self):
    213         if self.indicator:
    214             self.left, self.right = self._indicator_pre_merge(
    215                 self.left, self.right)
    216 
--> 217         join_index, left_indexer, right_indexer = self._get_join_info()
        join_index = undefined
        left_indexer = undefined
        right_indexer = undefined
        self._get_join_info = <bound method _MergeOperation._get_join_info of <pandas.tools.merge._MergeOperation object>>
    218 
    219         ldata, rdata = self.left._data, self.right._data
    220         lsuf, rsuf = self.suffixes
    221 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in _get_join_info(self=<pandas.tools.merge._MergeOperation object>)
    348                                     sort=self.sort)
    349         else:
    350             (left_indexer,
    351              right_indexer) = _get_join_indexers(self.left_join_keys,
    352                                                  self.right_join_keys,
--> 353                                                  sort=self.sort, how=self.how)
        self.sort = False
        self.how = 'left'
    354             if self.right_index:
    355                 if len(self.left) > 0:
    356                     join_index = self.left.index.take(left_indexer)
    357                 else:

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in _get_join_indexers(left_keys=[array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200])], right_keys=[array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1400, ..., 60347800, 60348000, 60348200])], sort=False, how='left')
    541 
    542     # bind `sort` arg. of _factorize_keys
    543     fkeys = partial(_factorize_keys, sort=sort)
    544 
    545     # get left & right join labels and num. of levels at each location
--> 546     llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
        llab = undefined
        rlab = undefined
        shape = undefined
        fkeys = <functools.partial object>
        left_keys = [array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200])]
        right_keys = [array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1400, ..., 60347800, 60348000, 60348200])]
    547 
    548     # get flat i8 keys from label lists
    549     lkey, rkey = _get_join_keys(llab, rlab, shape, sort)
    550 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in _factorize_keys(lk=memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200]), rk=memmap([    1400,     1600,     1400, ..., 60347800, 60348000, 60348200]), sort=False)
    708         lk = com._ensure_object(lk)
    709         rk = com._ensure_object(rk)
    710 
    711     rizer = klass(max(len(lk), len(rk)))
    712 
--> 713     llab = rizer.factorize(lk)
        llab = undefined
        rizer.factorize = <built-in method factorize of pandas.hashtable.Int64Factorizer object>
        lk = memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200])
    714     rlab = rizer.factorize(rk)
    715 
    716     count = rizer.get_count()
    717 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15715)()
    854 
    855 
    856 
    857 
    858 
--> 859 
    860 
    861 
    862 
    863 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29784)()
    639 
    640 
    641 
    642 
    643 
--> 644 
    645 
    646 
    647 
    648 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:26059)()
    340 
    341 
    342 
    343 
    344 
--> 345 
    346 
    347 
    348 
    349 

ValueError: buffer source array is read-only
___________________________________________________________________________
@balwierz
Copy link
Author

balwierz commented Aug 8, 2016

I tested it with 1 cpu only (and one chromosome) and I am getting now the following error:

Traceback (most recent call last):
  File "/usr/local/bin/epic", line 165, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic
    args.number_cores)
  File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input
    for chip_df, input_df in zip(chip_dfs, input_dfs))
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 754, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 604, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 567, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 322, in __init__
    self.results = batch()
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 127, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 24, in _merge_chip_and_input
    assert len(merged_df) == chip_df_nb_bins
AssertionError

@endrebak
Copy link
Member

endrebak commented Aug 9, 2016

I suspect the first error is some strange error in pandas and joblib - or an interaction between them - which I have experienced myself. Could you please

  1. report the current version numbers of those two libraries
  2. try upgrading them, reporting success/fail and new version numbers?

Thanks. Will start looking at the AssertionError.

@endrebak
Copy link
Member

endrebak commented Aug 9, 2016

Moving the new question to #31

@endrebak endrebak closed this as completed Aug 9, 2016
@balwierz
Copy link
Author

balwierz commented Aug 9, 2016

python-pandas 0.18.1
python3-pandas 0.18.1
python-joblib 0.9.4
python3-joblib 0.9.4
(all latest in Debian Stretch)

@balwierz
Copy link
Author

balwierz commented Aug 9, 2016

Update: a new joblib (0.10.0) is installed by pip to python2.7/dist-packages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants