Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible chroms between fasta and annotation #47

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

olgabot
Copy link
Collaborator

@olgabot olgabot commented Nov 4, 2016

If a chromosome was in the GTF annotation file but not in the genome fasta file, then outrigger validate would fail with the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-73c99f62ab80> in <module>()
----> 1 check_splice_sites.read_splice_sites(bed, genome, fasta)

/home/obotvinnik/workspace-git/outrigger/outrigger/validate/check_splice_sites.pyc in read_splice_sites(bed, genome, fasta, direction)
     61         records = SeqIO.parse(f, 'fasta')
     62         records = pd.Series([str(r.seq) for r in records],
---> 63                             index=[b.name for b in bed])
     64     # import pdb; pdb.set_trace()
     65     return records

/home/obotvinnik/anaconda/envs/outrigger/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    241                                        raise_cast_failure=True)
    242 
--> 243                 data = SingleBlockManager(data, index, fastpath=True)
    244 
    245         generic.NDFrame.__init__(self, data, fastpath=True)

/home/obotvinnik/anaconda/envs/outrigger/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   4045         if not isinstance(block, Block):
   4046             block = make_block(block, placement=slice(0, len(axis)), ndim=1,
-> 4047                                fastpath=True)
   4048 
   4049         self.blocks = [block]

/home/obotvinnik/anaconda/envs/outrigger/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2662                      placement=placement, dtype=dtype)
   2663 
-> 2664     return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
   2665 
   2666 # TODO: flexible with index=None and/or items=None

/home/obotvinnik/anaconda/envs/outrigger/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, ndim, fastpath, placement, **kwargs)
   1794 
   1795         super(ObjectBlock, self).__init__(values, ndim=ndim, fastpath=fastpath,
-> 1796                                           placement=placement, **kwargs)
   1797 
   1798     @property

/home/obotvinnik/anaconda/envs/outrigger/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
    108             raise ValueError('Wrong number of items passed %d, placement '
    109                              'implies %d' % (len(self.values),
--> 110                                              len(self.mgr_locs)))
    111 
    112     @property

ValueError: Wrong number of items passed 153907, placement implies 153920

This is a result of the number of actual sequences calculated to be fewer than the number of events going in, so now only the found sequences are reported

@olgabot
Copy link
Collaborator Author

olgabot commented Nov 4, 2016

Todos:

  • Add test
  • Add test data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant