-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database Management for the pipeline. #95
Comments
@grabear status? Closeable? lol |
Not yet lol. @sdhutchins |
Update on this issue. Scope
Tested FunctionalityThe following checked items have been tested by changing the parameters in the config file.
Configuration
ArchivingBugs need to be fixed with the file movement and deletion after archiving.
DeletionNot Tested Config File Explanation and PreviewThe config file is loaded into Python as a nested dictionary. The top key value pairs such as: email: "rgilmore@umc.edu"
driver: "sqlite3" are used for changing the parameters in the BaseDatabaseManagement class. The various strategies for dispatching tasks include the following and are dictionary keys: ['Full', 'Projects', 'NCBI', 'NCBI_blast', 'NCBI_blast_db', 'NCBI_blast_windowmasker_files', 'NCBI_pub_taxonomy', 'NCBI_refseq_release', 'ITIS', 'ITIS_taxonomy'] Some keys are nested in the config file. The concept to note here is that top level keys (or strategies) have flags that control any sub level strategies. So if the configure_flag for 'Full' is True, then the configure_flag for 'Projects', 'NCBI', 'NCBI_blast', 'NCBI_blast_db', 'NCBI_blast_windowmasker_files', 'NCBI_pub_taxonomy', 'NCBI_refseq_release', 'ITIS', and 'ITIS_taxonomy' will also be interpreted as True when the database functions are dispatched. Below I've added a preview of the entire database_config.yml file for consideration of the above statements. Database_config:
email: "rgilmore@umc.edu"
driver: "sqlite3"
Full:
configure_flag: False
archive_flag: False
delete_flag: False
project_flag: False
_path: "!!python/object/apply:pathlib.Path ['']"
Projects:
Project_Name_1:
configure_flag: True
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['Project_Name_1']"
Project_Name_2:
configure_flag: True
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['Project_Name_2']"
Project_Name_3:
configure_flag: True
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['Project_Name_3']"
NCBI:
configure_flag: False
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['NCBI']"
NCBI_blast:
configure_flag: False
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['NCBI', 'blast']"
NCBI_blast_db:
configure_flag: False
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['NCBI', 'blast', 'db']"
NCBI_blast_windowmasker_files:
configure_flag: False
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['NCBI', 'blast', 'windowmasker_files']"
taxonomy_ids: ""
NCBI_pub_taxonomy:
configure_flag: True
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['NCBI', 'pub', taxonomy']"
NCBI_refseq_release:
seqtype: "rna" # Other seqtypes are protein and genomic
seqformat: "gbff"
collection_subset: "vertebrate_mammalian"
configure_flag: False
archive_flag: False
delete_flag: False
upload_flag: False
_path: "!!python/object/apply:pathlib.Path ['NCBI', 'refseq', 'release']"
upload_list: [1,2,3,4,5,6,7,8,9,10]
ITIS:
configure_flag: True
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['ITIS']"
ITIS_taxonomy:
configure_flag: True
archive_flag: False
delete_flag: False
_path: "!!python/object/apply:pathlib.Path ['ITIS', 'taxonomy']" |
Current ToDo List:
|
Fix the NCBI_refseq_release database_management functionality: OrthoEvolution/OrthoEvol/Manager/database_management.py Lines 505 to 543 in a487971
|
One question as I test this out, @grabear:
|
@sdhutchins I don't quite understand your question though. Are you asking how do we know if our data is up to date? Do you still need help with this? |
Things to do:
NCBITaxa(taxdump_file="out_path/taxdump.tar.gz")
|
[ ] gi lists OR Should we convert this to accession.version via this or thisvertebrate_mammalianThe text was updated successfully, but these errors were encountered: