-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue generating the annotations from Ensembl #205
Comments
Hi @moldach ! The ensembl_host is the host server that is serving the ensembl db/API. We keep a copy locally. It took some searching but I was able to find it in their docs here (it's not easy to find anymore it seems). The perl script isn't required so long as you format the final result to match the JSON expectations you can use another tool or script if you prefer (perl can be difficult to install). I haven't tried with non-human genomes but from the ensembl website instructions it looks like you may need additional perl modules . You will also need to edit the script itself to look for non-human data. I've linked the relevant line here: https://github.com/bcgsc/mavis/blob/develop/tools/generate_ensembl_json.pl#L136. Assuming the non-human genes are in a similar format in their database it should work from there. I am pinging @mattdoug604 and @calchoo to comment as well since they have generated non-human genome files for rn6 and mm10 respectively. |
Thanks for the follow-up @creisle . Can anyone confirm if the ensembl perl API is working now? I tried this yesterday but I am no longer getting the |
Okay the API is now back up again and I get the After making changes to It looks like I have all of the Perl modules, according to the ensembl website:
Now I get the error:
Okay so that's progress. So you mentioned:
I'm confused as to what you mean here about Looking inside the
I tried replacing the empty host string with:
|
I think that
or
This gets rid of the error message. The script has been running for
I would really appreciate your help in sorting this out @creisle @calchoo @mattdoug604 . Thanks |
@moldach how long it runs is probably species and connection dependent. It took a couple hours when we did it for human, but that was with a local instance of the DB so there was no other traffic and maximum connection speed. @mattdoug604 how long did it take when you did this for non-human genes? @moldach the '.' are output 1 per gene, if they are still being written then likely it is still running |
This was very helpful! Silver lining is that it looks like the process is working. I re-ran the script and although the top of the log says In case this is useful for anyone else the job took 36 hours. The model organism is C. elegans and the connection is from an academic HPC so that's strange it took so long. |
Glad you were able to get it output! That does seem particularly long but the retrieval rate is likely affected by the load on the remote ensembl server. Since this is the main public ensembl database access it probably has very high usage. The version we have locally was a dump that we host internally so we were 1 of only a couple of users running scripts against it at the time.
I'll take a look at the script and see if it suggests anything that would be the cause of this discrepancy |
I couldn't find anything that would suggest why the Are there any other outstanding issues here? is it ok to close this? |
No issue here I guess, thanks |
MAVIS version: 2.2.6
Python version: 3.8.0
OS: CentOS Linux release 7.5.1804 (Core)
I have downloaded helper script from mavis and have followed the installation steps from the ensembl site to verify the connection.
Test connection to API
Run the perl script help menu
I don't see any reference to
--ensembl_host
in the MAVIS docs, nor searching the ensembl API docs so I'm wondering what I need to put here?Furthermore, I want to get the Ensembl annotations for C elegans and not the human so what parameter do I set on the generate_ensembl_json.pl to specify the model organism of interest?
Thank you
The text was updated successfully, but these errors were encountered: