# Creation of links to available images by MGI identifiers

## Requirements
### Pandas python module
### A CSV file of genes of interest with MGI identifiers (Of the form MGI:######). This can be generated by plugging genes into the batch query function of the MGI website (https://www.informatics.jax.org/batch).

In [1]:
import pandas as pd # Pandas required for dataframe handling

In [2]:
gene_list = pd.read_csv('') # Path to csv file containing genes for link creation, need the MGI identifier

In [3]:
gene_list.head(5) # Tests that file loaded correctly by printing the first five entries.

Unnamed: 0,Input,Input Type,MGI Gene/Marker ID,Symbol,Name,Feature Type
0,IRS4,current symbol,MGI:1338009,Irs4,insulin receptor substrate 4,protein coding gene
1,ERAS,current symbol,MGI:2665023,Eras,ES cell-expressed Ras,protein coding gene
2,PCSK1N,current symbol,MGI:1353431,Pcsk1n,proprotein convertase subtilisin/kexin type 1 ...,protein coding gene
3,FOXD1,current symbol,MGI:1347463,Foxd1,forkhead box D1,protein coding gene
4,EMX2,current symbol,MGI:95388,Emx2,empty spiracles homeobox 2,protein coding gene


### The following two cells generate links to the collection of images for a given gene in the GXD. MGI GXD links are in the form [http://www.informatics.jax.org/gxd/marker/] + [gene symbol] + [specified conditions]

#### You can get the condition appendix for the link by manually searching for a gene on the GXD database and setting the desired flags from the gray buttons under 'Filter expression by:' 

#### Make sure you are on the images tab, then copy the part of the link after the MGI identifier (the part that begins with '?tab=images...' ) and replace the text following link_template_end = ' ' in cell [5] below. The default included in this notebook shows wild type images from Theiler stages 12-14.

In [None]:
mgi_ids = gene_list['MGI Gene/Marker ID'].to_list() #This extracts the mgi IDs from your provided csv as a list to iterate over.

#You can use below line to test that MGI Gene IDs are being correctly extracted from your file (remove pound sign at beginning)
#mgi_ids[0:2].

In [5]:
link_template_begin = 'http://www.informatics.jax.org/gxd/marker/' # This is the generic beginning of any GXD link

# Paste your conditional link appendix below after link_template_end = (make sure it has quote [""] or single quote [''] marks around it)
link_template_end = '?tab=imagestab#gxd=markerMgiId%3DMGI%3A102764%26theilerStage%3D%26assayType%3D%26results%3D25%26startIndex%3D0%26sort%3D%26dir%3Dasc%26tab%3Dimagestab%26theilerStageFilter%3D12%26theilerStageFilter%3D13%26theilerStageFilter%3D14%26wildtypeFilter%3Dwild%20type'


links = []
for i in mgi_ids:
    links.append(link_template_begin + i + link_template_end)

#links[0:2] #You can use this line to generate the first two links of your set and check that they have been correctly generated. NB: There may be no images matching your criterio for those genes.

['http://www.informatics.jax.org/gxd/marker/MGI:1338009?tab=imagestab#gxd=markerMgiId%3DMGI%3A102764%26theilerStage%3D%26assayType%3D%26results%3D25%26startIndex%3D0%26sort%3D%26dir%3Dasc%26tab%3Dimagestab%26theilerStageFilter%3D12%26theilerStageFilter%3D13%26theilerStageFilter%3D14%26wildtypeFilter%3Dwild%20type',
 'http://www.informatics.jax.org/gxd/marker/MGI:2665023?tab=imagestab#gxd=markerMgiId%3DMGI%3A102764%26theilerStage%3D%26assayType%3D%26results%3D25%26startIndex%3D0%26sort%3D%26dir%3Dasc%26tab%3Dimagestab%26theilerStageFilter%3D12%26theilerStageFilter%3D13%26theilerStageFilter%3D14%26wildtypeFilter%3Dwild%20type']

In [7]:
gene_list['links'] = links # This appends the links as a new column to your original csv (loaded as a pandas dataframe).

In [8]:
gene_list.head(5) #This displays the first 5 genes of your dataset so you can check that links have been properly appended.

Unnamed: 0,Input,Input Type,MGI Gene/Marker ID,Symbol,Name,Feature Type,links
0,IRS4,current symbol,MGI:1338009,Irs4,insulin receptor substrate 4,protein coding gene,http://www.informatics.jax.org/gxd/marker/MGI:...
1,ERAS,current symbol,MGI:2665023,Eras,ES cell-expressed Ras,protein coding gene,http://www.informatics.jax.org/gxd/marker/MGI:...
2,PCSK1N,current symbol,MGI:1353431,Pcsk1n,proprotein convertase subtilisin/kexin type 1 ...,protein coding gene,http://www.informatics.jax.org/gxd/marker/MGI:...
3,FOXD1,current symbol,MGI:1347463,Foxd1,forkhead box D1,protein coding gene,http://www.informatics.jax.org/gxd/marker/MGI:...
4,EMX2,current symbol,MGI:95388,Emx2,empty spiracles homeobox 2,protein coding gene,http://www.informatics.jax.org/gxd/marker/MGI:...


In [9]:
gene_list.to_csv('') # This saves the modified file, be sure to add your desired file path for where to save it.