Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string index out of range (when sequence end by Ns) #38

Closed
Juke34 opened this issue Aug 21, 2019 · 4 comments
Closed

string index out of range (when sequence end by Ns) #38

Juke34 opened this issue Aug 21, 2019 · 4 comments
Labels

Comments

@Juke34
Copy link
Collaborator

Juke34 commented Aug 21, 2019

Original question from @Iseez
Just one question more, when i was tryng to obtain the embl for a different species i encountered the following error:

Traceback (most recent call last):                                             ]
  File "/cm/shared/apps/emblmygff3/1.2.6/bin/EMBLmyGFF3", line 11, in <module>
    load_entry_point('EMBLmyGFF3==1.2.6', 'console_scripts', 'EMBLmyGFF3')()
  File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1383, in main
    writer.write_all( outfile )
  File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1179, in write_all
    self._add_mandatory()
  File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 195, in _add_mandatory
    if seq[end] == 'n' :
IndexError: string index out of range

Is the problem due to the files I'm using as input?

@Juke34
Copy link
Collaborator Author

Juke34 commented Aug 21, 2019

Potentially yes. Do you have any other error / trace before this ?
If you can send me your files I can have a look more in detail.

@Iseez
Copy link

Iseez commented Aug 26, 2019

There is no trace before this one.
The files I was trying to use are attached to this reply.

gff&fasta.zip

@Juke34
Copy link
Collaborator Author

Juke34 commented Aug 27, 2019

Thank you for having reported this problem. This is indeed a bug.
Line 194 of EMBLmyGFF3.py
while end:
has to be replaced by
while end < len(seq):

It will be fixed in a future release.
Until then, you can fix it yourself that way:

Uninstall EMBLmyGFF3:

pip unistall EMBLmyGFF3

clone the repo in a nice place:

mkdir ~/git
cd ~/git
git clone https://github.com/NBISweden/EMBLmyGFF3.git
cd EMBLmyGFF3

replace line 194 of EMBLmyGFF3.py as indicated before (here using the nano text editor but you can use what ever you want):
nano EMBLmyGFF3/modules/feature.py

install:

python setup.py install

or if you do not have administartive rights on your machine:

python setup.py install --user


Except this bug I can point 2 other problems:

  1. The fasta headers are different from those reported in column 1 of the gff file. Be sure the annotation has been done using this fasta sequences. If it is the case fix the name they must be similar in both files otherwise EMBLmyGFF3 will not be able to match the features (gene,cds,etc) to the proper sequence. Currently no feature at all will be attached to the sequences.
  2. if you plan to submit the EMBL file created like that, you will have a rejection from ENA. Indeed they do not accept sequences that start or/and end with N (gaps). You have plenty of cases like that. This is also related to the current bug. If you shrink the N from extremities you will passthrough the current bug in EMBLmyGFF3 and do not encounter any rejection due to trailing Ns during ENA submission.
    To do so first you must fix names between your gff and fasta files as indicated point 1 and then you can use the script gff3_sp_clipN_seqExtremities_and_fixCoordinates.pl from the GAAS repository.

@Juke34 Juke34 added the bug label Aug 27, 2019
@Iseez
Copy link

Iseez commented Aug 27, 2019

I realised about the headers after having a problem related with that with another species, the hedears will be changed, the EMBL sequences are not going to be submitted, I needed them to make some analysis.
Thank you, I really apreciate your help.

@Juke34 Juke34 changed the title string index out of range string index out of range (when sequence end by Ns) Sep 12, 2019
@Juke34 Juke34 closed this as completed in 5153f53 Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants