Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Index WGBS Error #16

Open
bacantre opened this issue Nov 10, 2017 · 15 comments
Open

Building Index WGBS Error #16

bacantre opened this issue Nov 10, 2017 · 15 comments

Comments

@bacantre
Copy link

Hello,

I am trying to use BS Seeker 2 to make an index of the Bovine Reference Genome for later alignment with WGBS data.

I ran the code:
python bs_seeker2-build.py -f ~/reference/file/location --aligner=bowtie2 -d ~/output/file/location

I got the error:
Traceback (most recent call last):
File "bs_seeker2-build.py", line 5, in ,module. from bs_index.wg_build import *
ImportError: No module names bs_index.wg_build

I had my university core download BS Seeker2 and Bowtie2 to the server I am using. I have also downloaded the bs_seeker2-build.py, bs_seeker2-align.py, and bs_seeker2-call_methylation.py files from here. Are there other files that I or the university core need to download?

I was also assuming the .py files were meant to run as is so I did not modify them any except save them in a .txt and then change the file name to .py before transferring it to the server.

Also do I need to unzip the reference genome from fa.gz to make it fa or will it run as a fa.gz?

Thank you,
Bonnie

@guoweilong
Copy link
Collaborator

guoweilong commented Nov 10, 2017 via email

@bacantre
Copy link
Author

bacantre commented Dec 20, 2017 via email

@guoweilong
Copy link
Collaborator

Thanks for reporting this error message. It was a bug, but rare reported. Now it has been fixed in v2.1.5.
And I guess you may not give the right path for genome.fa. -d is to specific where to store the index directory. If you want to specify the path for genome.fa, you can use "-f <path_to_folder>/genome.fa".

Best,
Weilong

@bacantre
Copy link
Author

bacantre commented Dec 21, 2017 via email

@guoweilong
Copy link
Collaborator

@bacantre
The error message said:
Is your input genome file : /users/ b/a/bacantre/WGBS_Bovine_Brain/reference/UMD3.1_chromosomes.fa a TXT file or a binary file?

It should be TXT file.

And do you really have a chromosome or contig name as "8__fO_" ? You can double check if you specific right genome file.

Best,
Weilong

@bacantre
Copy link
Author

bacantre commented Dec 28, 2017 via email

@guoweilong
Copy link
Collaborator

Hi @bacantre ,

As you built the index using the following command:

python bs_seeker2-build.py -f ~/WGBS/reference/referenceUMD3.1.1.fa –aligner=bowtie2 -d ~/WGBS/reference/

Then you need to specifying the folder in following way

python bs_seeker2-align.py -1 ~/WGBS/fastQfiles/D2239Amy_1.fq -2 ~/WGBS/fastQfiles/D2239Amy_2.fq --aligner=bowtie2 -o ~/WGBS/alignment/D2239Amy.bam -f bam -g referenceUMD3.1.1.fa -d ~/WGBS/reference/

Please note that, for bs_seeker2-align.py,

  1. parameter "-g" should specify the genome file name, without the path
  2. parameter "-d" should specify the parental directory where you created the index folder, without the index folder name

Let me know if it still not works.

Best,
Weilong

@justinjohns
Copy link

justinjohns commented Jan 21, 2018

Related issue -- I've tried 2 genome indexes, once in default directory, and once here:

python bs_seeker2-build.py -f /shafer3/lynx_meth/genome/lynx.fa --aligner=bowtie2 -r -c AT-TAAT,ATGCA-T -d /shafer3/lynx_meth/genome/bs2/

Failed alignment, cannot find directory:

python bs_seeker2-align.py -1 /shafer3/lynx_meth/data/raw_fastq/1_R1.fastq -2 /shafer3/lynx_meth/data/raw_fastq/1_R2.fastq --aligner=bowtie2 -o /shafer3/lynx_meth/data/bs_bam/0001.bam -f bam -g lynx.fa -d /shafer3/lynx_meth/genome/bs2/

 BS-Seeker2 v2.1.3 - Oct. 25, 2017

ERROR: Index DIR "lynx.fa.." cannot be found in /shafer3/lynx_meth/genome/bs2/. Please run the bs_seeker2-build.py to create it with the correct parameters for -g, -r, --low, --up and --aligner.

Contents of the indexed directory: lynx.fa_rrbs_ATTAAT-ATGCAT_20_500_bowtie2
index_directory.txt
Head of genome:
head_lynx.txt

Bowtie2 warns that Warning: Encountered reference sequence with only gaps, but I have indexed and aligned successfully with Bismark (using Bowtie2), so I don't see why this isn't working. Originally the genome was named ena.fa, but I renamed to lynx.fa, both indexes came up with the same results/

My dd enzymes were AseI / NsiI.

Thanks for any tips!
Justin

@guoweilong
Copy link
Collaborator

guoweilong commented Jan 22, 2018 via email

@christinalrichards
Copy link

I am having a related issue trying to use BSseeker2 to map PE reads of WGBS to the reference tomato genome (AEKE03.fasta). I did not indicate a specific path with the -d option but used the default for the index with this command:

[clr@rra-login1 BSseeker2-master]$ python bs_seeker2-build.py -f AEKE03.fasta --aligner bowtie2

Which resulted in a directory full of .data files:

C_C2T.1.bt2
ENA_AEKE03000623_AEKE03000623.1.data
ENA_AEKE03001259_AEKE03001259.1.data
ENA_AEKE03001895_AEKE03001895.1.data
ENA_AEKE03002531_AEKE03002531.1.data
C_C2T.2.bt2
ENA_AEKE03000624_AEKE03000624.1.data
ENA_AEKE03001260_AEKE03001260.1.data
ENA_AEKE03001896_AEKE03001896.1.data
ENA_AEKE03002532_AEKE03002532.1.data
C_C2T.3.bt2
ENA_AEKE03000625_AEKE03000625.1.data
etc

and then ran for PE conversion to single end mode:
[clr@rra-login0 BSseeker2-master]$ python bs_seeker2-align.py -1 10_P_1.fq -2 10_P_2.fq -g AEKE03.fasta -o 10_P.bam -u unmapped

got the error for pysam:
[Error] It seems that you haven't install "pysam" package.. Please do it before you run this script.

We installed it with:
[clr@rra-login0 BSseeker2-master]$ module load apps/python/2.7.15-el7
[clr@rra-login0 BSseeker2-master]$ pip freeze | grep pysam

then ran:
[clr@rra-login0 BSseeker2-master]$ python bs_seeker2-align.py -1 10_P_1.fq -2 10_P_2.fq -g AEKE03.fasta -o 10_P.bam -u unmapped

and got this error:

 BS-Seeker2 v2.1.8 - Oct. 30, 2018

ERROR: Index DIR "AEKE03.fasta.." cannot be found in /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes.
Please run the bs_seeker2-build.py to create it with the correct parameters for -g, -r, --low, --up and --aligner.

Does it need -r --low or --up fro WGBS? Or do I need to modify the way that the index is built for PE?

@guoweilong
Copy link
Collaborator

As you use "--aligner bowtie2" for buiding the index, you need alto to use "--aligner=bowtie2" for the alignment step.
By default, bs_seeker2-align.py will use "bowtie" rather than "bowtie2" for alignment.

Best,
Weilong

@christinalrichards
Copy link

Thanks so much for your quick response!! It looks like it started with this command:

[clr@rra-login1 BSseeker2-master]$ python bs_seeker2-align.py -1 10_P_1.fq -2 10_P_2.fq -g AEKE03.fasta --aligner=bowtie2 -o 10_P.bam -u unmapped

And ran this far to a new error: OSError: [Errno 2] No such file or directory
BS-Seeker2 v2.1.8 - Oct. 30, 2018

[2019-08-23 09:50:15] Mode: Bowtie2, local alignment
[2019-08-23 09:50:15] Filter for tag XS: #(mCH)/#(all CH)>50.00% and #(mCH)>5
[2019-08-23 09:50:15] Temporary directory: /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui
[2019-08-23 09:50:15] Reduced Representation Bisulfite Sequencing: False
[2019-08-23 09:50:15] Pair end
[2019-08-23 09:50:15] Aligner command: None/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x %(reference_genome)s -f -1 %(input_file_1)s -2 %(input_file_2)s -S %(output_file)s
[2019-08-23 09:50:15] ----------------------------------------------
[2019-08-23 09:50:15] Filename for 1st mate: 10_P_1.fq
[2019-08-23 09:50:15] Filename for 2nd mate: 10_P_2.fq
[2019-08-23 09:50:15] The first base (for mapping): 1
[2019-08-23 09:50:15] The last base (for mapping): 200
[2019-08-23 09:50:15] Path for short reads aligner: None/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x %(reference_genome)s -f -1 %(input_file_1)s -2 %(input_file_2)s -S %(output_file)s

[2019-08-23 09:50:15] Reference genome library path: /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2
[2019-08-23 09:50:15] Directional library
[2019-08-23 09:50:15] Number of mismatches allowed: 4
[2019-08-23 09:50:15] --------------------------------
[2019-08-23 09:52:33] Start reading and trimming the input sequences
Detected data format: fastq
[2019-08-23 09:52:51] Start mapping
[2019-08-23 09:52:51] Starting commands:
[2019-08-23 09:52:51] Launched: None/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/W_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui/Trimed_FCT_1.fa.tmp-6778898 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui/Trimed_RGA_2.fa.tmp-6778898 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui/W_C2T_fr_m4.mapping.tmp-6778898
[2019-08-23 09:52:51] Launched: None/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/C_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui/Trimed_FCT_1.fa.tmp-6778898 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui/Trimed_RGA_2.fa.tmp-6778898 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-t9gMui/C_C2T_fr_m4.mapping.tmp-6778898
Traceback (most recent call last):
File "bs_seeker2-align.py", line 469, in
options.Output_unmapped_hit
File "/shares/pi_clr/BSSeeker/BSseeker2-master/bs_align/bs_pair_end.py", line 799, in bs_pair_end
'output_file' : CG2A_fr} ])
File "/shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/utils.py", line 332, in run_in_parallel
for i, proc in enumerate([subprocess.Popen(args = shlex.split(cmd), stdout = stdout) for cmd, stdout in commands]):
File "/apps/python/2.7.15-el7/lib/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/apps/python/2.7.15-el7/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
[clr@rra-login1 BSseeker2-master]$

@christinalrichards
Copy link

I may have solved this error by "adding" bowtie2 again since it read (on line 6):
[2019-08-23 09:50:15] Aligner command: None/Bowtie2...

now that line says:
[2019-08-23 12:50:12] Aligner command: /apps/bowtie/2.3.4.1/bin/bowtie2...

I had run [clr@rra-login1 ~]$ module add apps/bowtie/2.3.4.1 to create the index, but I guess I have to re-add it for each session?

Now its reading:
[2019-08-23 12:52:49] Starting commands:
[2019-08-23 12:52:49] Launched: /apps/bowtie/2.3.4.1/bin/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/W_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_FCT_1.fa.tmp-8477077 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_RGA_2.fa.tmp-8477077 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/W_C2T_fr_m4.mapping.tmp-8477077
[2019-08-23 12:52:49] Launched: /apps/bowtie/2.3.4.1/bin/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/C_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_FCT_1.fa.tmp-8477077 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_RGA_2.fa.tmp-8477077 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/C_C2T_fr_m4.mapping.tmp-8477077

@christinalrichards
Copy link

Hi again!

It went this far but I'm not sure it was finished?

[2019-08-23 12:52:49] Launched: /apps/bowtie/2.3.4.1/bin/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/W_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_FCT_1.fa.tmp-8477077 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_RGA_2.fa.tmp-8477077 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/W_C2T_fr_m4.mapping.tmp-8477077
[2019-08-23 12:52:49] Launched: /apps/bowtie/2.3.4.1/bin/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/C_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_FCT_1.fa.tmp-8477077 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_RGA_2.fa.tmp-8477077 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/C_C2T_fr_m4.mapping.tmp-8477077
[2019-08-23 13:00:20] Finished: /apps/bowtie/2.3.4.1/bin/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/W_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_FCT_1.fa.tmp-8477077 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_RGA_2.fa.tmp-8477077 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/W_C2T_fr_m4.mapping.tmp-8477077
[2019-08-23 13:00:20] Finished: /apps/bowtie/2.3.4.1/bin/bowtie2 --local --quiet -D 50 --no-mixed --norc --sam-nohead --no-discordant -k 2 -p 2 -X 500 --fr -x /shares/pi_clr/BSSeeker/BSseeker2-master/bs_utils/reference_genomes/AEKE03.fasta_bowtie2/C_C2T -f -1 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_FCT_1.fa.tmp-8477077 -2 /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/Trimed_RGA_2.fa.tmp-8477077 -S /tmp/bs_seeker2_10_P.bam_-bowtie2-local-TMP-a1eCZi/C_C2T_fr_m4.mapping.tmp-8477077

@guoweilong
Copy link
Collaborator

Hi @christinalrichards ,

Sorry for the late reply, as I might have missed this message in email.

It takes some time to run if you have lots of data.
Here are some suggestions for improving the performance:
https://github.com/BSSeeker/BSseeker2#1-performance

Best,
Weilong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants