reduced_genome.sh error on #get only sequences from FASTA file #31

trotos · 2017-04-17T10:24:32Z

Hi,
there seem to be 2 issues regarding the following step:

#get only sequences from FASTA file
grep -v '^>' ${genome}_${enzyme}_flanking_sequences_${fl}_unique_2.fa | sort | uniq -i -u | grep -xF -f - -B 1 ${genome}_${enzyme}_flanking_sequences_${fl}_unique_2.fa | grep -v '^--' > ${genome}_${enzyme}_flanking_sequences_${fl}_unique.fa

1st:
while the removing of the ">" lines, sorting and getting the unique files works and produces something like the following: (grep -v '^>' ${genome}_${enzyme}flanking_sequences${fl}_unique_2.fa | sort | uniq -i -u)

ggccaaaacaaaggggttacagggcacgtgccctgtaacagaaagcagtc
ggccaaaacaaaggggttataggacccctgcaagtctgaaatccagaagg
ggccaaaacaaagtggctacaggccccatgtgagtccaaaatccagtagg
ggccaaaacaaatgggctattggccccatgtaagtctgaaacccaacagg
ggccaaaacaaattctttacttatgctaattttgccttagacttttattt
GGCCAAAACAAGGAAACACTGAGAAGAGACGCCTGTTTGCCAGGTGCTTG
ggccaaaacaagggggctacaggcccatataagtcagcaatccagcaggg
GGCCAAAACAAGTTCTTTTTCCATTATTGAATACTTTTTGAGCTATAAGT
GGCCAAAACAATTCTTAAATAATTACTGCATTTTTTTTCAAAATGTAAGA
ggccaaaacacctaacccctctgttctctctgcctcagttgactcatgaa
GGCCAAAACAGAAGAGTCTGGCTGTCCCATCTCTGTATCTTTCAGGATCC

the following command ( grep -xF -f - -B 1 ${genome}_${enzyme}flanking_sequences${fl}_unique_2.fa ) produces a zero bytes file.
Could you please see if there is an error?

2nd:
why you remove the duplicates? this is a PCR based technique and it is expected for the file to be full of PCR duplicates.

Thank you for your answer in advance

The text was updated successfully, but these errors were encountered:

rr1859 · 2017-04-28T21:12:22Z

Hi,

1)You are missing an underscore here - grep -xF -f - -B 1 ${genome}${enzyme}flanking_sequences${fl}unique_2.fa. It should be : grep -xF -f - -B 1 ${genome}${enzyme}flanking_sequences${fl}_unique_2.fa. Please try that and let me now.

the duplicates that are being removed here are the fragments that we are mapping to (short fragments adjacent to the RE sites. If there are duplicate sequences in the genome we cannot properly assign our interactions to these regions.

mindfaraway · 2017-05-03T19:37:18Z

I got the same problem when running reduced_genome.sh.
I only changed line 13 to fl=100, 15 to enzyme=dpnii. After the program #get only sequences from FASTA file, it created a 0 byte file mm10_dpnii_flanking_sequences_100_unique.fa. Then, creating a 0 byte file mm10_dpnii_flanking_sites_100_unique.bed. I ran the script with MacOS Sierra 10.12.3 by Terminal command-line and I have no idea why the error occurs.

Any suggestion?

rr1859 · 2017-05-04T21:25:34Z

Do you have oligoMatch and bedtools installed?

mindfaraway · 2017-08-08T09:10:49Z

Hi,
I am now using another MAC machine to run this program.
It return an Error message: "grep: -: No such file or directory"
and create a 0 byte file "mm10_dpnii_flanking_sequences_50_unique.fa"
The error might be caused by the dash symbol in the command "grep -xF -f - -B 1 ${genome}_${enzyme}flanking_sequences${fl}_unique_2.fa" in line 32.
I guess there may have some incompatible command format between linux and MAC terminal when processing the grep command.
How can I solve this problem?

mindfaraway · 2017-08-09T07:03:39Z

Okey...
I replaced line 32 with the code from the issues "reduced_genome.sh error - grep: memory exhausted"
Here's the code I use:
paste - - < ${genome}_${enzyme}flanking_sequences${fl}unique_2.fa | awk '{a[toupper($2)]++; b[$2]=$1}; END {for(n in a) if (a[n] == 1) print b[n]"\n"n}' | sort -k1,1 -k2,2n -k3,3n | awk '{print $1":"$2"-"$3"\n"$4}' > ${genome}${enzyme}flanking_sequences${fl}_unique.fa

Now it generated a final file which is about 143 MB.
Seems to be the right way.....

By the way, I also edit the code at line 40 and 41 due to that the "sed" code in my MAC machine requires a backup file name for creating backup file. I changed it into:

sed -i .bak 's/>//g' ${genome}_${enzyme}flanking_sites${fl}unique.bed
rm ${genome}${enzyme}flanking_sites${fl}unique.bed.bak
sed -i .bak 's/:|-/\t/g' ${genome}${enzyme}flanking_sites${fl}unique.bed
rm ${genome}${enzyme}flanking_sites${fl}_unique.bed.bak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduced_genome.sh error on #get only sequences from FASTA file #31

reduced_genome.sh error on #get only sequences from FASTA file #31

trotos commented Apr 17, 2017

rr1859 commented Apr 28, 2017

mindfaraway commented May 3, 2017 •

edited

rr1859 commented May 4, 2017

mindfaraway commented Aug 8, 2017 •

edited

mindfaraway commented Aug 9, 2017 •

edited

reduced_genome.sh error on #get only sequences from FASTA file #31

reduced_genome.sh error on #get only sequences from FASTA file #31

Comments

trotos commented Apr 17, 2017

rr1859 commented Apr 28, 2017

mindfaraway commented May 3, 2017 • edited

rr1859 commented May 4, 2017

mindfaraway commented Aug 8, 2017 • edited

mindfaraway commented Aug 9, 2017 • edited

mindfaraway commented May 3, 2017 •

edited

mindfaraway commented Aug 8, 2017 •

edited

mindfaraway commented Aug 9, 2017 •

edited