Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gene annotation to midas_merge_snps #46

Merged
merged 16 commits into from
Jan 13, 2021
Merged

Conversation

zhaoc1
Copy link
Contributor

@zhaoc1 zhaoc1 commented Jan 13, 2021

In this PR, we added the gene annotation of representative genome for given {species_id} to the midas_merge_snps workflow. To be specific, four columns were added to the one of the output file: {species_id}.snps_info.tsv:

  • locus_type: IGR, CDS, tRNA and etc
  • `gene_id: gene id generated by Prokka if locus_type is CDS
  • site_type: degeneracy
  • amino_acids: amino acids encoded by 4 possible alleles

We also added a subcommand build_gene_features to convert the Prokka GFF format into genes features desired by MIDAS . For each genome, the generated output is saved in s3://microbiome-igg/2.0/genes_annotations/{species_id}/{genome_id}.genes}. The build_gene_featurescommand can be merge intoannotate_genes` issue 45.

For a given genomic position, in order to locate the gene carrying that position, we first converted the list of gene ranges into a list of half-open gene boundaries, and used binary search to find the range that given position fell into. If return even index, then the given position fell between two genes, return "IGR"; if return odd, then the given position fell within one gene, return corresponding gene_type as the locus_type.

Passed the unit test.

python -m iggtools midas_merge_snps --samples_list samples_list.tsv --num_cores 4 merged_midas_output

@zhaoc1 zhaoc1 merged commit 9f6ef8c into master Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant