Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kegg annotation #99

Closed
Sisov opened this issue Aug 14, 2017 · 28 comments
Closed

Kegg annotation #99

Sisov opened this issue Aug 14, 2017 · 28 comments

Comments

@Sisov
Copy link

Sisov commented Aug 14, 2017

Dear author:
Since the clusterProfiler is a very useful tools for GO and Kegg annotation.At present I want to use it to enrich for kegg result while only have the KO number ,So I want to convert the KO number to the pathway function,Is there have any function or methods in the software can convert it?any help will be appreciated

Thanks

@GuangchuangYu
Copy link
Member

ko is actually pathway map. I think you are talking about K number mapping to ko pathway.

> bitr_kegg("K00844", "kegg", "Path", "ko")
     kegg    Path
1  K00844 ko00010
2  K00844 ko00051
3  K00844 ko00052
4  K00844 ko00500
5  K00844 ko00520
6  K00844 ko00521
7  K00844 ko00524
8  K00844 ko01100
9  K00844 ko01110
10 K00844 ko01120
11 K00844 ko01130
12 K00844 ko01200
13 K00844 ko04066
14 K00844 ko04910
15 K00844 ko04930
16 K00844 ko04973
17 K00844 ko05230

@Sisov
Copy link
Author

Sisov commented Aug 14, 2017

Yeah,sorry,It's really the K number,Since I want to obtain the pathway according the K number,such like this ,did have any methods to achieve it ?
1 K00799 Drug metabolism - cytochrome P450

Thanks

@GuangchuangYu
Copy link
Member

just write a ko2name function for this purpose.

> bitr_kegg("K00799", "kegg", "Path", "ko") -> x
> ko2name(x$Path) -> y
> merge(x, y, by.x='Path', by.y='ko')
     Path   kegg                                         name
1 ko00480 K00799                       Glutathione metabolism
2 ko00980 K00799 Metabolism of xenobiotics by cytochrome P450
3 ko00982 K00799            Drug metabolism - cytochrome P450
4 ko01524 K00799                     Platinum drug resistance
5 ko05204 K00799                      Chemical carcinogenesis
6 ko05418 K00799       Fluid shear stress and atherosclerosis

@Sisov
Copy link
Author

Sisov commented Aug 14, 2017

well,unfortunately,It appears some error while run the
x<-bitr_kegg("K00799", "kegg", "Path", "ko"),
Error in match.arg(toType, id_types) :
'arg' should be one of “ncbi-proteinid”, “ncbi-geneid”, “uniprot”, “kegg”

@GuangchuangYu
Copy link
Member

see the Prerequisites session, https://github.com/GuangchuangYu/clusterProfiler/issues/new.

@GuangchuangYu
Copy link
Member

BTW: you can use enrichKEGG with K number by specifying organism="ko".

@Sisov
Copy link
Author

Sisov commented Aug 20, 2017

Thanks,it's works well ,the software is so good !!!

@liuxianghui
Copy link

Dear GuangChuang:
Thank you very much for using the ko for analysis of organisms not existed in KEGG organisms.
This is cool! I like it very much! I works on bacteria and some are not in KEGG organisms and barely have any annotations... no GO and no KEGG... Anyway I can work it for KEGG pathway enrichment analysis. Biologists like it. The only one limitation is when I try to plot the KEGG pathway with pathview. I am unable to put the correct fold change data on the map. I guess it is because we use K number. Multiple genes will have the same K number... Do you kindly have a solution for that?

@MichaelFokinNZ
Copy link

BIG-BIG-GREAT THANK YOU!!!!

@ShenTTT
Copy link

ShenTTT commented Apr 30, 2020

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Hi @GuangchuangYu I am working with a non-model organism. First I used KAAS to annotate the genome with K numbers, then I got a list of genes vs K numbers. In order to do KEGG pathway analysis, I need to translate K values to ko numbers. In this case, should I use enrichKEGG or enricher?

If I use enricher, I need to translate all K numbers to pathways first, and eventually get a list of pathways2genes as the TERM2GENE, right?

If I use enrichKEGG, according to your reply I can set organism='ko'. How this can be achieved?There is no way for me to input the gene vs K number list right?

I appreciate it if you can clarify this.

Thank you so much

@Stepmata
Copy link

Stepmata commented Jun 9, 2020

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Hi @GuangchuangYu I am working with a non-model organism. First I used KAAS to annotate the genome with K numbers, then I got a list of genes vs K numbers. In order to do KEGG pathway analysis, I need to translate K values to ko numbers. In this case, should I use enrichKEGG or enricher?

If I use enricher, I need to translate all K numbers to pathways first, and eventually get a list of pathways2genes as the TERM2GENE, right?

If I use enrichKEGG, according to your reply I can set organism='ko'. How this can be achieved?There is no way for me to input the gene vs K number list right?

I appreciate it if you can clarify this.

Thank you so much

Hi! You solved your problem? I'm doing a kegg enrichment analysis, also with a non-model organism. I used the enrichKEGG( ) function but a get this error message:

ca_kegg <- enrichKEGG(ca_list, organism = 'ko', keyType = 'kegg', universe = BBRB_KEGG, pAdjustMethod = "BH")
--> No gene can be mapped....
--> Expected input gene ID: K00895,K01810,K21622,K16370,K15779,K01218
--> return NULL...

In this case ca_list is my list of DE gene ID's and BBRB_KEGG is a dataframe of two columns with gene ID's and KEGG annotations that I get with Trinotate.

How could I solve this problem and what means that "gene can be mapped"?
Thank you!

@ShenTTT
Copy link

ShenTTT commented Jun 10, 2020

BTW: you can use enrichKEGG with K number by specifying organism="ko".

Hi @GuangchuangYu I am working with a non-model organism. First I used KAAS to annotate the genome with K numbers, then I got a list of genes vs K numbers. In order to do KEGG pathway analysis, I need to translate K values to ko numbers. In this case, should I use enrichKEGG or enricher?
If I use enricher, I need to translate all K numbers to pathways first, and eventually get a list of pathways2genes as the TERM2GENE, right?
If I use enrichKEGG, according to your reply I can set organism='ko'. How this can be achieved?There is no way for me to input the gene vs K number list right?
I appreciate it if you can clarify this.
Thank you so much

Hi! You solved your problem? I'm doing a kegg enrichment analysis, also with a non-model organism. I used the enrichKEGG( ) function but a get this error message:

ca_kegg <- enrichKEGG(ca_list, organism = 'ko', keyType = 'kegg', universe = BBRB_KEGG, pAdjustMethod = "BH")
--> No gene can be mapped....
--> Expected input gene ID: K00895,K01810,K21622,K16370,K15779,K01218
--> return NULL...

In this case ca_list is my list of DE gene ID's and BBRB_KEGG is a dataframe of two columns with gene ID's and KEGG annotations that I get with Trinotate.

How could I solve this problem and what means that "gene can be mapped"?
Thank you!

Hi, I guess you used K number instead of ko number. I am not familiar with Trinotate but can you check the output from Trinotate? There should be another column with ko number (koxxxxx). Use that number instead of Kxxxxx

@Stepmata
Copy link

Actually I'm using Ko number (Ko:xxxx) but I removed the prefix "KO:" of the KEGG terms, that's why it looks like that.

@ShenTTT
Copy link

ShenTTT commented Jun 10, 2020

Actually I'm using Ko number (Ko:xxxx) but I removed the prefix "KO:" of the KEGG terms, that's why it looks like that.

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually).

enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH")

Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

@Stepmata
Copy link

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually).

enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH")

Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Ohh I can see. There's a way to get the description for every Ko number?

@ShenTTT
Copy link

ShenTTT commented Jun 10, 2020

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually).
enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH")
Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Ohh I can see. There's a way to get the description for every Ko number?

Just a reminder, I feel that you are still using the K numbers instead of the KEGG pathways. KEGG KO (ko:Kxxxxx) is just the enzyme in the pathway. Normally you get one such KO per gene. Here we actually want to use the pathway id (koxxxxx (without ':') or mapxxxxx, the 'xxxxx' in ko and map are the same. One KEGG KO can be mapped to zero or multiple pathways. So you are supposed to get zero or multiple koxxxxx or mapxxxxx per gene. I used eggnog for annotation so I get both KO and pathway columns, do check your annotation to see if you get such pathway ids (koxxxxx, or mapxxxxx), this is what you want.

I am not sure if your 'Ko:xxxx' is KEGG KO or pathway. If you got multiple terms per gene then you can directly use that since I assume thats already pathway ids. If you only got one such term per gene, more possibly it's just the K number.

Once u get the pathway id, install KEGG.db package, you can get a list of all pathway numbers to names using KEGGPATHID2NAME. The pathway numbers are the xxxxxx in your pathway ids (koxxxxxx or mapxxxxx), NOT KEGG KO ids (ko:Kxxxxx).

If you only got the K numbers (Kxxxxx) map it to the pathways using the method described previously in this post by @GuangchuangYu

Hope this is helpful. It did take me a long time to figure all these out...

For more info on how KO and pathways are in different formats, check:
https://www.genome.jp/kegg/ko.html
https://www.genome.jp/kegg/pathway.html

@Stepmata
Copy link

I can imagine it, this is a bit confusing. I'll check that, thank you very much for all the info.

@Stepmata
Copy link

Stepmata commented Jun 11, 2020

Just a reminder, I feel that you are still using the K numbers instead of the KEGG pathways. KEGG KO (ko:Kxxxxx) is just the enzyme in the pathway. Normally you get one such KO per gene. Here we actually want to use the pathway id (koxxxxx (without ':') or mapxxxxx, the 'xxxxx' in ko and map are the same. One KEGG KO can be mapped to zero or multiple pathways. So you are supposed to get zero or multiple koxxxxx or mapxxxxx per gene. I used eggnog for annotation so I get both KO and pathway columns, do check your annotation to see if you get such pathway ids (koxxxxx, or mapxxxxx), this is what you want.

I am not sure if your 'Ko:xxxx' is KEGG KO or pathway. If you got multiple terms per gene then you can directly use that since I assume thats already pathway ids. If you only got one such term per gene, more possibly it's just the K number.

Once u get the pathway id, install KEGG.db package, you can get a list of all pathway numbers to names using KEGGPATHID2NAME. The pathway numbers are the xxxxxx in your pathway ids (koxxxxxx or mapxxxxx), NOT KEGG KO ids (ko:Kxxxxx).

If you only got the K numbers (Kxxxxx) map it to the pathways using the method described previously in this post by @GuangchuangYu

Hope this is helpful. It did take me a long time to figure all these out...

For more info on how KO and pathways are in different formats, check:
https://www.genome.jp/kegg/ko.html
https://www.genome.jp/kegg/pathway.html

Thank you so much for taking the time to give me all this information, was very helpful. My analysis is already done! I Will share the information in case that other person have the same problem! n_n

@ShenTTT
Copy link

ShenTTT commented Jun 11, 2020

@Stepmata Glad to hear that :)

@edlopez78
Copy link

Actually the enrichKEGG with organism='ko' never worked in my case. So I switched to the enricher function (Set everything manually).
enricher(gene_list,TERM2GENE=background,TERM2NAME=kegg2name_data,pvalueCutoff = 1,qvalueCutoff = 1, pAdjustMethod = "BH")
Gene_list is my genes of interest. background is equivalent to your BBRB_KEGG but with ko numbers as the first column, kegg2name is a dataframe with 2 columns mapping ko numbers to the corresponding descriptions (This can be skipped if you want to get the enriched ko number rather than the textual descriptions).

Ohh I can see. There's a way to get the description for every Ko number?

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

@Stepmata
Copy link

Stepmata commented Jun 22, 2020

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package.
Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one.
Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

@edlopez78
Copy link

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package.
Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one.
Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. 👍

@Stepmata
Copy link

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

That's nice!! Your welcome!! n_n

@Esteban-Escobar
Copy link

Hi i did the ORA analysis from my organism data with the k numbers and it worked but, it reported back human diseases pathways and i'm working with Physcomitrella (moss). I wanted to know if there's any possibility that i could get the species-specific IDs for the ORA analysis from the K numbers or other way do obtain them. Thanks.

@tobytaogla
Copy link

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package.
Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one.
Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. 👍

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

@Stepmata
Copy link

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package.
Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one.
Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

Hi, I used Trinotate to generate all my KEGG annotations. What kind of problem do you have running Trinotate?

@tobytaogla
Copy link

Hi. I'm working with a non-model specie and Trinotate. Please, could you share me your solution about the setting KEGG Trinotate output to use with enricher?

Hi! To use enricher function with my KEGG annotation I firts get the patways ID (ko number) mapping all my KEGG terms (k number) to KEGG data base using bitr_kegg function. Once a I had the pathways ID I get the pathways name using ko2name function. This two functions are from KEGG.db R package.
Now to run enricher a made two dataframes of two columns, one dataframe that I called "term2gene" with ko numbers in first column and annotated genes ID in the second one. The other dataframe that I called "term2name" had ko numbers in first column and pathways name in the second one.
Also to apply enricher you have to create a vector with all your differentially expressed genes ID, and that's all, you need all this information to run your KEGG enrichment test! n_n

Hi!!. My analysis is already done. Thank you so much for your help and your time!!. +1

Hi, I have difficulties to generate KEEG into the Trinotate annotation file? What software did you use to generate the kegg annotation? Can you help for this? Many thanks!

Hi, I used Trinotate to generate all my KEGG annotations. What kind of problem do you have running Trinotate?

Thanks for the quick reply. Trinotate does not have kegg annotation by default. So I assume you generate the kegg file by yourself. So what kind of software you run to have this file. Sorry for the silly question.

@Stepmata
Copy link

Stepmata commented Apr 28, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants