Ciao-Sin Chen
cscchen@umich.edu

### Compute the number of pathways of each gene

In [87]:
def get_map_gene_pathways(map_gene_pathway_filename): 
    """
    Compute the map of each gene and the set of pathways the gene is in.

    Parameters:
    map_gene_pathway_filename (str): The file name (.txt) of the mapping information.

    Returns:
    map_gene_pathways (dict): The map of each gene and the set of pathways the gene is in.

    """
    map_gene_pathways = {}
    with open(map_gene_pathway_filename,'r') as map_gene_pathway_file:
        for gene_pathway in map_gene_pathway_file:
            gene_pathway_pair = gene_pathway.split("\t")
            gene = gene_pathway_pair[0]
            pathway = gene_pathway_pair[1][:-1]
            if gene in map_gene_pathways:
                map_gene_pathways[gene].add(pathway)
            else: 
                map_gene_pathways[gene] = {pathway}
    return map_gene_pathways

def get_pathway_count(gene_id, map_gene_pathways):
    """
    Compute the number of pathways the given gene is in.

    Parameters:
    gene_id (str): The KEGG ID of the gene.
    map_gene_pathway (dict): the map of each gene and the set of pathways the gene is in. Can be generated by get_map_gene_pathway function.

    Returns:
    pathway_count (int): The number of pathways the given gene is in.

    """
    pathways = map_gene_pathways[gene_id]
    pathway_count = len(pathways)
    return pathway_count

def get_map_gene_pathway_counts(map_gene_pathways):
    """
    Compute the map of each gene and the number of pathways the gene is in.

    Parameters:
    map_gene_pathway (dict): The map of each gene and the set of pathways the gene is in.

    Returns:
    map_gene_pathway_counts (dict): The map of each gene and the number of pathways the gene is in.

    """
    map_gene_pathway_counts = {}
    for gene in map_gene_pathways.keys(): 
      map_gene_pathway_counts[gene] = get_pathway_count(gene, map_gene_pathways)  
    map_gene_pathway_counts = dict(sorted(map_gene_pathway_counts.items(), key = lambda kv: kv[1], reverse = True))
    return map_gene_pathway_counts

In [88]:
map_gene_pathways = get_map_gene_pathways("map_gene_pathway.txt")
map_gene_pathway_counts = get_map_gene_pathway_counts(map_gene_pathways)

In [None]:
map_gene_pathway_counts

### Reflections
#### What was your biggest challenge in this project? (regarding writing code and not only)
My biggest challenge in this project was to find out how to sort a dictionary by its values. The sort method is only avaliable for lists, and the sorted function returns only a list. It took me a while to find hwo to use the items method with the sorted function and the dict function to create a sorted dictionary.

#### What did you learn while working on this project? (regarding writing code and not only)
While working on this project, I learned how to process file content line by line using a for loop, how to manipulate strings by slicing, how to craete or add items into a dictionary, and how to sort a dictionary by its values. 

#### If you had more time on the project what other question(s) would you like to answer? (at least one question is required)
If I had more time on the project, I would look for pathways that contains the highest numbers of genes, which are likely to be an overview of multiple pathways, such as biosynthesis of amino acids (hsa01230). These can help with visualizing the metabolic difference when there are different groups or time points in a study.