<h1 align="center">[MCB-163L] Introduction to Allen Mouse Brain Atlas Tools</h1>
<h3 align="center">Estimated Duration: 30 mins</h3>

---
### Getting Help With Jupyter Notebooks
If you run into an error or have a question about this Jupyter notebook, you can receive in-person help from the **Data Peer Consultants**. Located in Moffitt Library (3rd Floor) almost every weekday during the semester, the Data Peer Consultants are available to help with programming, Jupyter, and/or data science questions. See their full drop-in schedule and specialities at [their website](https://data.berkeley.edu/discovery/consulting); if you can't make their drop-in times, you can also schedule an appointment by emailing [ds-peer-consulting@berkeley.edu](mailto:ds-peer-consulting@berkeley.edu)

To try to solve some errors yourself, you can also consult the [Student Debugging Guide](https://docs.google.com/document/d/1kxismvvgjf10tiAqYtAKHTocJTyyM0q3Ey0Lk2UQR0Q/edit?usp=sharing).

---

## Introduction

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
    In the first part of the lab, we used the Mouse Connectivity Atlas to analyze projection data within regions of the brain. In this part of the lab, we will instead use the Mouse Brain Atlas to explore the gene expression density within specific regions and entire structure of the brain

</center></p>. </div>

## Pre-lab: Importing Data

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
    
Similar to the first part of the lab, we will be importing APIs for future use. The cell below downloads collections of graphing, data analysis, and mathematical data that we'll be using to analyze gene expression. However, unlike the previous lab, we will take you through an example of gene data analysis with the subiculum and allow you to repeat this lab with a brain region and gene of your choice. 

</div>

In [None]:
# Run this cell
!pip install -r requirements

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline 
import seaborn as sns


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
Note: This lab will be split into two rounds, the first of which will simply be taking you through analysis with the subiculum. Following this, you will see instructions being split into **1st round** and **2nd round**. 
    
    1st round: We will be reading data that we imported into this lab folder through the read_csv() method that takes in a .csv file and converts its contents into a table.
    
    2nd round: Replace the file name within the quotations in read_csv() with the file you downloaded.
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
    Run the next cell block to check if the data was successfully retrieved. You should see a table with a list of 20 experiments for the subiculum. From now on, we'll be using the variable maa to retrieve all data.

</div>

In [None]:
# Creates variables to access the API with.
maa = pd.read_csv("./data/convertcsv.csv")
maa

## Part 1: Gene Expression Within Brain Regions

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">

In this section, we will begin to explore the relationship between the expression tendencies of specific genes within brain regions.

</div>

## Part 1.1: Selecting a gene

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">

1st round: We'll be analyzing G protein-coupled receptor 161 (Gpr 161) and its relevance in the subiculum. Since we no longer need other experiments for now, we'll be taking only experiments that have the gene id Gpr 161 by searching through all experiments and checking the gene symbol for each one.

2nd round: Replace Gpr161 to the gene of your choice. You may also change the variable name gpr161, if you choose. Then run the cell block. One way to vertify that the code retrieval was successful is by counting how many rows in the above table have the gene-symbol of your choice. You should see the same number of rows below.

</div>

In [None]:
#selecting rows that match the gene symbol Gpr161
gpr161 = maa[maa["gene-symbol"] == "Gpr161"]
gpr161

## Part 1.2 : Calculating gene expression

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
1st round: To gather some perspective on gene expression, we'll be finding a measure for how much more a gene is expressed in the subiculum than in other brain regions. This measure is called fold-change and is shown in the table above. To extract the fold-change for gpr161, we'll be creating a method called expression. This function takes in one parameter: 
       
       gene: the gene-symbol of the gene currently being analyzed 

**Expression** will display a numerical value that denotes how much more the gene is expressed in the subiculum relative to other brain regions by taking the average of all fold change values for a gene.

2nd round: If you changed the variable name gpr161 earlier, replace the name here. Otherwise, run the cell block to get the percentage of your chosen gene in your brain region.

</div>

In [None]:
def expression(gene):
    occurrences = gene["fold-change"].mean()
    return occurrences
expression(gpr161)

## Part 1.3: Graphing gene expression

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
    
1st found: To ensure you are on the right track, below is a bar graph indicating the top few genes that are most expressed in the subiculum. Simply run the cell block.

2nd round: Fill in your specified brain region within quotations below. Then, run the code below to construct a bar graph showing data relevant to your brain region.

</div>

In [None]:
def bar_graph():
    listOfGenes = [] #gathering all unique gene symbols 
    for g in maa["gene-symbol"]:
        if g not in listOfGenes: 
            listOfGenes += [g] 
            
    expressions = [] #measure of gene expression of each unique gene symbol relative to all gene expression 
    for gene in listOfGenes: 
        geneData = maa[maa["gene-symbol"] == gene]
        expressions += [expression(geneData)]
        
    plt.bar(listOfGenes, expressions) #plotting
    plt.xlabel('Gene Symbol', fontsize = 12)
    plt.ylabel('Measure of gene expression', fontsize = 12)
    plt.tight_layout()
    plt.xticks(rotation=90)
    
bar_graph()

## Part 2: Free Exploration

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #f0f0ff; ">
As described earlier, you'll now get the chance to run through this lab with a brain region and gene of your choosing. In order to start, we have to find and import the specified data. Below are instructions to get you started.
    
          1. Go to the Mouse Brain Atlas website from the Allen Brain Atlas website.
          2. On the left bar about midway down the page, click on differential search. 
          3. Enter your brain region into target structures, and click search. 
          4. Click on XML at the bottom of the page. It should say "this data is available in XML." Save that file in the same folder as your lab. You should be able to open the home page and see both the file and the lab. 
          5. Go to http://convertcsv.com/xml-to-csv.htm to convert your XML file to CSV file. You are free to choose whatever file name you want, so long as it differs from the one used already in our example (convertcsv.csv). Make sure to save this file in the same folder as your lab. 
          6. You're good to go! Run through this lab again, this time reading instructions for 2nd round. 
          7. After you're done with the 2nd round, please print out the bar graph and paste it in your notebook. You can save this lab immediately to Google Drive or save this as a pdf and print the last page of the notebook. 
          
              
</div> 

## References 

* Pandas.read_csv - Pandas 0.24.2 Documentation, pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html.
    
* Allensdk.api.queries.mouse_atlas_api Module - Allen SDK Dev Documentation, allensdk.readthedocs.io/en/latest/allensdk.api.queries.mouse_atlas_api.html.
    
* “Pandas Tutorial 1: Pandas Basics (read_csv, DataFrame, Data Selection, Etc.).” Data36, 16 Jan. 2019, data36.com/pandas-tutorial-1-basics-reading-data-files-dataframes-data-selection/.

**Notebook developed by: Kayli Jiang**