Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expected output for example? #24

Open
andypohl opened this issue Jun 22, 2017 · 6 comments
Open

expected output for example? #24

andypohl opened this issue Jun 22, 2017 · 6 comments

Comments

@andypohl
Copy link

andypohl commented Jun 22, 2017

I've installed the latest FAST-iCLIP and I'm still having a lot of problems. I'm just trying to run the example command on the example data:

$ fasticlip -i rawdata/example_MMhur_R1.fastq rawdata/example_MMhur_R2.fastq --GRCm38 -s docs/GRCm38/GRCm38_STAR/ -n MMhur -o results

I get:

  • a "SettingWithCopyWarning" from pandas.
  • bowtie gives me:
Performing Bowtie...
Result :  
Error :  12466 reads; of these:
  12466 (100.00%) were unpaired; of these:
    12301 (98.68%) aligned 0 times
    56 (0.45%) aligned exactly 1 time
    109 (0.87%) aligned >1 times
1.32% overall alignment rate

Result :  
Error :  100228 reads; of these:
  100228 (100.00%) were unpaired; of these:
    98946 (98.72%) aligned 0 times
    381 (0.38%) aligned exactly 1 time
    901 (0.90%) aligned >1 times
1.28% overall alignment rate

which looks very poor.

  • a more sinister pandas error just after this:
Process mapped data
Traceback (most recent call last):
  File "fasticlip/retroviralMapping.py", line 150, in <module>
    bedR2=readBed(mappedBed[1])
  File "fasticlip/retroviralMapping.py", line 143, in readBed
    bedFile = pd.read_table(path,dtype=str,header=None)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 315, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 523, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5214)
pandas.io.common.EmptyDataError: No columns to parse from file

Perhaps the pandas error is a consequence of the low alignment rate. What is the expected output? I get some output, but no figures are generated because of a matplotlib/Qt error (which I'll try to fix on my end before mentioning it here).

@bdo311
Copy link

bdo311 commented Jun 23, 2017

Thanks for the comment and sorry you've been running into all of these issues. I've been contributing less to the newer versions of fasticlip but I'll try to answer these as best as I can.

We use bowtie for mapping reads to exogenous retroviruses and tRNA, and STAR for mapping to endogenous retroviruses and the genome. So, these two outputs correspond to viral and tRNA mapping and we should expect a low rate.

The pandas error comes from trying to make plots from the retroviral data. It looks like it's complaining because, as you said, there might be too few reads mapping causing pandas to try to unsuccessfully read in an empty file. This is likely just an artifact of us providing small test files -- larger files will have enough retroviral reads to make data frames with.

@andypohl
Copy link
Author

Ok. That's why I wanted to know what the expected output is supposed to look like. I won't know it's working unless I can reproduce something. The download of all the genome indexes, etc was nearly 50 GB. I'm happy to download another 5, 10, 20 GB if it's a better, more realistic example. It's nice to have quick-running toy examples, but I'm more concerned about getting it right than getting it quick. Anyway I'm pleased it might be nearly working.

@frank42195
Copy link

I get the exact same output trying to run the example file. Have you found a solution? I am completely at a loss on how to get this to work.

@bdo311
Copy link

bdo311 commented Jun 24, 2017

@frank42195 This output is happening unfortunately because we updated the script to search for retroviral reads but our example is too small to include any, and so pandas is complaining of an empty data frame. We hope to push out an update over the next few days to address this. Sorry for the inconvenience!

@andypohl
Copy link
Author

Whatever the new example involves, I'll still stress the importance of not just providing the example command, but providing some sort of summary of the output. I know the program creates a ton of output (many files). But as far as that's concerned, I think a priority should be placed on the output that goes to the screen while the program is running. Just having that would bring me a lot of piece of mind that I've installed everything correctly. Thanks for your efforts.

@frank42195
Copy link

frank42195 commented Jun 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants