"Exception: Not valid subsetter: 1" while using epic2-df #32

wesleylcai · 2019-09-06T13:08:35Z

I'm trying to analyze knockout and wildtype samples (including input for each) using epic2-df. However I get the following error: "Exception: Not valid subsetter: 1"

Here's the full output:
epic2-output.txt

Here are examples (head -n100 of the input files):
TKO: Sample_2D_KDM2A_me3.mqsd.head100.bedpe.txt
CKO: Sample_2D_KDM2A_input.mqsd.head100.bedpe.txt
TWT: Sample_2D_Arab2_me3.mqsd.head100.bedpe.txt
CWT: Sample_2D_Arab2_input.mqsd.head100.bedpe.txt

Here's my command:
epic2-df --treatment-knockout Sample_2D_KDM2A_me3.mqsd.bedpe --control-knockout Sample_2D_KDM2A_input.mqsd.bedpe --treatment-wildtype Sample_2D_Arab2_me3.mqsd.bedpe --control-wildtype Sample_2D_Arab2_input.mqsd.bedpe --genome hg19 --false-discovery-rate-cutoff 0.01 --false-discovery-rate-comparison 0.01 --bin-size 200 --gaps-allowed 3 --fragment-size 200 --chromsizes hg19.chrom.sizes --output-knockout Sample_2D_KDM2A_me3.mqsd --output-wildtype Sample_2D_Arab2_me3.mqsd;

Interesting, some of the commands worked (with another set of bedpe) so it may be incompatibility between some of my bedpe files? Any assistance would be appreciated!

The text was updated successfully, but these errors were encountered:

endrebak · 2019-09-06T13:40:08Z

Is this reproducible with just the head? Will look at it on Monday :) Thanks for bothering to report :)

wesleylcai · 2019-09-06T14:13:45Z

I tried it with the head and also again with head -n100000

Looks like it works for those files... Hmm so maybe there are some wonky lines in the files? How do you think we can pin-point the problem?

endrebak · 2019-09-06T14:24:10Z

The error seems to be in my pyranges library. The error message says that the chromosome is an int, but it should always be a string. Dunno why it happens, but I am trying to fix it :)

Can you check your version of pyranges with

$ python
import pyranges as pr
pr.__version__

wesleylcai · 2019-09-06T14:28:08Z

Ahaaaa. I think I might know why... I used bowtie2 to map my fastq and then converted them to bedpe using bedtools. The scaffold names are "1, 2, 3...X, Y, MT", instead of "chr1, chr2, chr3...chrX, chrY, chrM". Indeed I had to use a custom chrom.sizes file that lists the scaffolds as 1,2,3.

Do you think this could be the cause?

endrebak · 2019-09-06T14:28:22Z

The error is in epic2-df after it has successfully run epic on both KO and WT. So the error happens when it works on the result of those epic2 runs.

endrebak · 2019-09-06T14:29:36Z

Do you think this could be the cause?

No, but I wondered why you used a custom genome sizes file for hg19. When I realized why you did it I added a warning message to epic2 when the chromosome size names and chromosome names in the read file are incompatible.

endrebak · 2019-09-06T14:35:33Z

That is okay, I am hoping the error is due to your pyranges being old :)

wesleylcai · 2019-09-06T14:37:59Z

Looks like it's version 0.0.53

$ python Python 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] on linux import pyranges as pr pr.__version__ '0.0.53'

The error is in epic2-df after it has successfully run epic on both KO and WT. So the error happens when it works on the result of those epic2 runs.

Indeed, the individual outputs work well and I get two files in the output folder. So I agree with your assessment.

endrebak · 2019-09-06T14:48:48Z

That is the latest version. Do you have the opportunity to send the zipped dataset to me via dropbox or google drive? I will treat it as confidential. Then debugging would be easy :)

…

On Fri, Sep 6, 2019 at 4:37 PM wescaiju ***@***.***> wrote: Looks like it's version 0.0.53 `(/gpfs/ysm/project/wc376/conda_envs/for_epic2) ***@***.*** ~]$ python Python 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. import pyranges as pr pr.*version* '0.0.53'` — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#32?email_source=notifications&email_token=AEHURUQJDJSB6PG42PFEAELQIJTMPA5CNFSM4IUI2H52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6DBIVA#issuecomment-528880724>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEHURUQSUL4BKGIZ7LUPUSTQIJTMPANCNFSM4IUI2H5Q> .

wesleylcai · 2019-09-06T14:49:58Z

Yes, I can send you a google drive link. Which email should I use?

endrebak · 2019-09-06T14:51:20Z

endrebak85 # gmail.com. Thanks!

…

On Fri, Sep 6, 2019 at 4:49 PM wescaiju ***@***.***> wrote: Yes, I can send you a google drive link. Which email should I use? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#32?email_source=notifications&email_token=AEHURUWMNPSN75DH2URCZD3QIJUZNA5CNFSM4IUI2H52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6DCNZA#issuecomment-528885476>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEHURUQMNTIA5NXVTOGUNWTQIJUZNANCNFSM4IUI2H5Q> .

wesleylcai · 2019-09-06T17:32:56Z

I have sent you an invite via google drive! Thanks for your help.

endrebak · 2019-09-09T12:41:41Z

I have downloaded the files and am running the analysis now. I have some potential fixes that I will attempt tomorrow :)

endrebak · 2019-09-09T12:45:17Z

l was able to reproduce the error. Hooray! Will continue tomorrow. Thanks for sharing a reproducible example :)

endrebak · 2019-09-09T12:45:28Z

(Did not mean to close)

endrebak · 2019-09-10T12:22:01Z

(Notes to self)

The error seems to be due to the following:

When pandas reads a table it guesses the types of the columns. For our files it guesses that the chromosome is of type int since it starts with 1, ..., 2, ...., but when it gets to Y and X it changes its mind and thinks the type is object/str.

sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.

So you end up with the following different chromosomes:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y']

So initially, it uses an int for lookup.

I have fixed this in epic2 now, I will also need to find a fix that works for PyRanges in general.

Try pip install epic2==0.0.41. The fix will take a few hours to be out on bioconda.

Feel free to reopen if this did not fix it for you :)

wesleylcai closed this as completed Sep 6, 2019

wesleylcai reopened this Sep 6, 2019

endrebak closed this as completed Sep 9, 2019

endrebak reopened this Sep 9, 2019

endrebak closed this as completed Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Exception: Not valid subsetter: 1" while using epic2-df #32

"Exception: Not valid subsetter: 1" while using epic2-df #32

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019

endrebak commented Sep 6, 2019 •

edited

endrebak commented Sep 6, 2019

wesleylcai commented Sep 6, 2019 •

edited

endrebak commented Sep 6, 2019 via email

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019 via email

wesleylcai commented Sep 6, 2019

endrebak commented Sep 9, 2019

endrebak commented Sep 9, 2019

endrebak commented Sep 9, 2019

endrebak commented Sep 10, 2019

"Exception: Not valid subsetter: 1" while using epic2-df #32

"Exception: Not valid subsetter: 1" while using epic2-df #32

Comments

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019

endrebak commented Sep 6, 2019 • edited

endrebak commented Sep 6, 2019

wesleylcai commented Sep 6, 2019 • edited

endrebak commented Sep 6, 2019 via email

wesleylcai commented Sep 6, 2019

endrebak commented Sep 6, 2019 via email

wesleylcai commented Sep 6, 2019

endrebak commented Sep 9, 2019

endrebak commented Sep 9, 2019

endrebak commented Sep 9, 2019

endrebak commented Sep 10, 2019

endrebak commented Sep 6, 2019 •

edited

wesleylcai commented Sep 6, 2019 •

edited