# White Matter Anatomy Database (WMAD)

## Introduction
In this notebook we will demonstrate interaction with the White Matter Anatomy Database.  In this preliminary phase, we are utilizing a jupyter notebook, as run in a [binder](https://mybinder.org/), likeley run from a [jupyterbook](https://jupyterbook.org/intro.html) website.

## Database contents
After running this first cell, you should obtain a spreadsheet-based overview of the articles currently curated by the database.  Each of the relevant columns is searchable, so you should be able to locate an article based upon characteristics like title, first author, publication year, number of tracts discussed, and the curator.

Note:  running all of these cells at once will not return a desirable result, as this notebook will attempt to parse and visualize every description and figure entry in the database.  Use the subsequent two interactive spreadsheets to perform a search and thereby narrow your querry.

In [1]:
import subprocess
import os

#get top directory path of the current git repository, under the presumption that 
#the notebook was launched from within the repo directory
gitRepoPath=subprocess.check_output(['git', 'rev-parse', '--show-toplevel']).decode('ascii').strip()

#move to the top of the directory
os.chdir(gitRepoPath)

import json
#load the WMADB JSON
with open(os.path.join('dbStore','WMAnatDB.json')) as json_data:
    WMAnatDB = json.load(json_data)
    
#build components of pandas spreadsheet
import pandas as pd
curators=[WMAnatDB[x]['curator']['1'] for x in WMAnatDB.keys()]
titles=[WMAnatDB[x]['title'][0] for x in WMAnatDB.keys()]
tractNums=[len(WMAnatDB[x]['tractDepictions']) for x in WMAnatDB.keys()]
years=[WMAnatDB[x]['published']['date-parts'][0][0] for x in WMAnatDB.keys()]
authors=[WMAnatDB[x]['author'][0]['given'] + ' ' + WMAnatDB[x]['author'][0]['family'] for x in WMAnatDB.keys()]

#make the spreadsheet
dataStructure={'title': titles, 'firstAuthor': authors, 'year': years, 'numOfTracts': tractNums, 'curator': curators}
overviewFrame=pd.DataFrame(data=dataStructure)

#use itables to make it nice and interactive
import qgrid
qgrid_widget= qgrid.show_grid(overviewFrame,show_toolbar=True)
qgrid_widget

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

## Performing an interactive search
Now that we have gotten a sense of the articles included in the database, we can now search the database for discussions of particular tracts we are interested in.  In additon to the previous categories that we considered, we can now also search in any of the following information categories:

- **species**: the species of interest in the study.
- **methods**: the investigative approach(es) used in the study.
- **tractTermsUsed**: the terminology used to refer to the structure of interest.

The [qgrid](https://github.com/quantopian/qgrid) interface allows you to narrow the structures under consideration to those meeting the criteria you specify.  These can include any combination of the aforementioned characteristics.  Be sure to click the appropriate boxes in the search fields in order to impose your search criteria

Once you have completed your query, you then move on to the subsequent cells to view those relevant portions of text or view figures depicting the structure(s) of interst.

NOTE: should you wish to perform another query, return to the cell below, and select new criteria, then run the subsequent cells once more.

In [2]:
def unpackEntry(dfEntry):
    #this function unpacks entries in the dataframe for clarity and conciseness
    if type(dfEntry)==dict:
        #for the columns we have specified, we do not care about the keys used, they are just placeholders
        dfEntry=list(dfEntry.values())
    if type(dfEntry)==list and len(dfEntry)>1:
        #if it's a list of multiple items, concat them
        dfEntry=' '.join(dfEntry)
    if type(dfEntry)==list and len(dfEntry)==1:
        #if it's a list of length 1, just unpack it
        dfEntry=dfEntry[0]
        #if it's just a string, we probably want it left the way it is
    return dfEntry

def unpackArticleJSONtoPdDF(articleJSONDict):
    #this function unpacks an article such that there are n rows (with duplicated information)
    #in the output dataframe, where n is the number of tracts in the tractDepictions record
    #unpack heighest tier of json dictionary
    dfBaseRow=pd.json_normalize(articleJSONDict,max_level=0)
    #unpack the entries for each tract
    expandedDepictions=pd.json_normalize(dfBaseRow['tractDepictions'],max_level=0)
    #for each tract, expand to terms used, descriptions and figures
    expandedTractDF=[pd.json_normalize(iEntries, max_level=0) for iEntries in expandedDepictions.values.tolist()]
    
    #create a replicated row dataframe to merge these tract entries on to
    toMergeDF=pd.concat([dfBaseRow]*len(expandedDepictions.columns), ignore_index=True)
    #concat them together
    outDF=pd.concat([toMergeDF,expandedTractDF[0]],axis=1)
    #drop the tractDepictions entry as we no longer need this
    outDF=outDF.drop('tractDepictions',axis=1)
    
    #finally, convert the dictionaries in the relevant columns into lists
    #these are the columns we wish to convert
    columnsToConvert=['curator','doi','species','methods','published','title','container-title','tractTermsUsed']
    for iColumnsToConvert in columnsToConvert:   
        outDF[iColumnsToConvert]=outDF[iColumnsToConvert].map(lambda x: unpackEntry(x) ) 
    #convert descriptions and figures to lists
    outDF['descriptions']=outDF['descriptions'].map(lambda x: list(x.values()))
    outDF['figures']=outDF['figures'].map(lambda x: list(x.values()))
    return outDF

#extract the article dictionary JSONs to a list
articleListObjects=[WMAnatDB[x] for x in WMAnatDB.keys()]
#perform the tract expansion on them
unpackedArticleDFs=[unpackArticleJSONtoPdDF(iArticles) for iArticles in articleListObjects]
#merge them in to one dataframe
wholeDBDF=pd.concat(unpackedArticleDFs,axis=0,ignore_index=True)
#just index the year for date
wholeDBDF['published']=wholeDBDF['published'].map(lambda x: x[0][0] )
#import qgrid and use it to view the DB
import qgrid
#show the informative columns
qgrid_widget= qgrid.show_grid(wholeDBDF[['curator','title','species','methods','tractTermsUsed','published']],show_toolbar=True)
qgrid_widget

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

### Viewing text descriptions

Now that we have selected our records of interest, we can view the text descriptions or figure depictions associated with those entries.  To do so, run the subsequent cell.  Remember, if you wish to perform another query, return to the previous cell and select new criteria then rerun the subsequent cells.

NOTE: In some cases a text description may not have been associated with the structure recording, and as such a record may be empty (or an error might occur).

In [3]:
#pull out the relevant columns and expand the description entries
descriptionTable=wholeDBDF.loc[qgrid_widget.get_changed_df().index,['title','descriptions']].explode('descriptions')
descriptionTable.dropna(axis=0, inplace=True)

from pyWMAD import scrape
#convert the regex based urls to text
descriptionTable['descriptions']=descriptionTable['descriptions'].map(lambda x: scrape.extractGoogleHighlightLinkText(x))
#set it to 
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', 500)
descriptionTable

Unnamed: 0,title,descriptions
19,"Anatomo-functional study of the temporo-parieto-occipital region: dissection, tractographic and brain mapping evidence from a neurosurgical perspective","Regex search failed to return match;\nPossible redirect issue, check host publisher's website"
23,"Anatomo-functional study of the temporo-parieto-occipital region: dissection, tractographic and brain mapping evidence from a neurosurgical perspective","Postero-superiorly, we identified the IFOF stem. In the middle part (middle trapezoid), we left in situ the EC (Fig. 4C)"
23,"Anatomo-functional study of the temporo-parieto-occipital region: dissection, tractographic and brain mapping evidence from a neurosurgical perspective","Regex search failed to return match;\nPossible redirect issue, check host publisher's website"
84,Tracing short connections of the temporo-parieto-occipital region in the human brain using diffusion spectrum imaging and fiber dissection,"On lateral view of the hemisphere to observe the endpoints, the MdLF and IFOF were closer to the midline than the TP (Fig. 4 D E)."
65,Elucidation of White Matter Tracts of the Human Amygdala by Detailed Comparison between High-Resolution Postmortem Magnetic Resonance Imaging and Histology,"Figure 2. Coronal panels for detailed anatomical delineation of the amygdala using postmortem diffusion tensor imaging (DTI) and histology (Luxol fast blue with hematoxylin–eosin staining), compared with an example of in vivo T1 and DTI images. From the postmortem DTI data, b0, trace and color-coded orientation maps are shown at 14 coronal slice levels. Panels (A,B) were from a different postmortem coronal slab than (C–N). Panel (A) corresponds to the rostral end of the amygdala that defines 0 mm to indicate the slice separation of the subsequent panels, and panel (N) shows the caudal end (13 mm). The red bounding boxes represent a 24 × 20 mm field of view at the corresponding locations on the in vivo and postmortem images with the MNI coordinates (s, sagittal; a, axial; c, coronal). Colored arrows indicate the locations of the reconstructed white matter structures shown in Figure 3. White arrows indicate the semiannular sulcus (amygdaloid fissure) defining the medial boundary of the amygdala. Abbreviations: A, amygdala; BL, basolateral amygdaloid nucleus; BN, basal nucleus diffuse part; BM, basomedial nucleus; Ce, central nucleus; CL, claustrum; CxA, amygdalocortical transition area; EGP, external globus pallidus; Epn, endopiriform nucleus; ERC, entorhinal cortex; GP, globus pallidus; H, hippocampus; HT, hypothalamus; IGP, internal globus pallidus; LA, lateral nucleus; Me, medial nucleus; PedL, peduncle of the lentiform nucleus; Pir, piriform cortex; PL, paralaminar nucleus; Put, putamen; S, subiculum; sCLA, superficial cortex-like amygdala; SN, substantia nigra; TCd, tail of caudate; UN, uncus; VP, ventral putamen; ac, anterior commissure; al, ansa lenticularis; alv, alveus; cp, cerebral peduncle; ifo, inferior fronto-occipital fasciculus; ilf, inferior longitudinal fasciculus; li, intermediate medullary lamina; ll, lateral medullary lamina; lm, medial medullary lamina; ot, optic tract; slic, sublenticular part of the internal capsule; sls, sublenticular stria; st, stria terminalis; tap, tapetum; unc, uncinated fasciculus."
65,Elucidation of White Matter Tracts of the Human Amygdala by Detailed Comparison between High-Resolution Postmortem Magnetic Resonance Imaging and Histology,"Throughout these areas, a limited number of white matter tracts could be clearly identified in the in vivo images, including the anterior commissure (ac), optic tract (ot) and the inferior fronto-occipital fasciculus (ifo)."
66,Elucidation of White Matter Tracts of the Human Amygdala by Detailed Comparison between High-Resolution Postmortem Magnetic Resonance Imaging and Histology,"Figure 2. Coronal panels for detailed anatomical delineation of the amygdala using postmortem diffusion tensor imaging (DTI) and histology (Luxol fast blue with hematoxylin–eosin staining), compared with an example of in vivo T1 and DTI images. From the postmortem DTI data, b0, trace and color-coded orientation maps are shown at 14 coronal slice levels. Panels (A,B) were from a different postmortem coronal slab than (C–N). Panel (A) corresponds to the rostral end of the amygdala that defines 0 mm to indicate the slice separation of the subsequent panels, and panel (N) shows the caudal end (13 mm). The red bounding boxes represent a 24 × 20 mm field of view at the corresponding locations on the in vivo and postmortem images with the MNI coordinates (s, sagittal; a, axial; c, coronal). Colored arrows indicate the locations of the reconstructed white matter structures shown in Figure 3. White arrows indicate the semiannular sulcus (amygdaloid fissure) defining the medial boundary of the amygdala. Abbreviations: A, amygdala; BL, basolateral amygdaloid nucleus; BN, basal nucleus diffuse part; BM, basomedial nucleus; Ce, central nucleus; CL, claustrum; CxA, amygdalocortical transition area; EGP, external globus pallidus; Epn, endopiriform nucleus; ERC, entorhinal cortex; GP, globus pallidus; H, hippocampus; HT, hypothalamus; IGP, internal globus pallidus; LA, lateral nucleus; Me, medial nucleus; PedL, peduncle of the lentiform nucleus; Pir, piriform cortex; PL, paralaminar nucleus; Put, putamen; S, subiculum; sCLA, superficial cortex-like amygdala; SN, substantia nigra; TCd, tail of caudate; UN, uncus; VP, ventral putamen; ac, anterior commissure; al, ansa lenticularis; alv, alveus; cp, cerebral peduncle; ifo, inferior fronto-occipital fasciculus; ilf, inferior longitudinal fasciculus; li, intermediate medullary lamina; ll, lateral medullary lamina; lm, medial medullary lamina; ot, optic tract; slic, sublenticular part of the internal capsule; sls, sublenticular stria; st, stria terminalis; tap, tapetum; unc, uncinated fasciculus."
66,Elucidation of White Matter Tracts of the Human Amygdala by Detailed Comparison between High-Resolution Postmortem Magnetic Resonance Imaging and Histology,"Throughout these areas, a limited number of white matter tracts could be clearly identified in the in vivo images, including the anterior commissure (ac), optic tract (ot) and the inferior fronto-occipital fasciculus (ifo)."
10,Associative white matter connecting the dorsal and ventral posterior human cortex,"These canonical tracts have been widely described in other works, with the SLF constrained to the superior-medial white matter of the frontal and parietal lobes, the ILF to the temporal and occipital, the Arc connecting temporal and frontal lobes, and the IFOF spanning the occipital, temporal, and frontal white matter (Fig. 3).Consistent with previous reports (de Schotten et al. 2011; Catani and de Schotten 2012), we show canonical tracts extending posteriorly into the parietal, temporal, and occipital lobes, and anteriorly either toward prefrontal cortex [SLF, Arc (Makris et al. 2004; Martino et al. 2013b); IFOF (Martino et al. 2010a, b)] or into the anterior temporal lobe (ILF; Davis 1921)."


### Viewing Figures

Similar to the previous cell, by running the following cell we can view records associated with **graphical depictions** of the structures of interest.  As these are **images** the interface will be slightly different than the previous cell.  Use the slider to select which image you wish to view. 

NOTE: In some cases a text description may not have been associated with the structure recording, and as such a record may be empty (or an error might occur).  Additionally, some publishers implement more stringent access criteria than others which may prevent access with the methods in use here.

In [5]:
#create the figure table
figureTable=wholeDBDF.loc[qgrid_widget.get_changed_df().index,['title','figures']].explode('figures')
figureTable.dropna(axis=0, inplace=True)
#import ipywidgets and make slider widget
from ipywidgets import IntSlider
figureSlider=IntSlider( value=0, min=0, max=figureTable.shape[0]-1,  step=1, description='Figure Index', continuous_update=False)
#define the widget function
def plotTableFig(figTableIndex):
    currentFigURL=figureTable['figures'].iloc[figTableIndex]
    from pyWMAD import scrape
    outFig=scrape.queryImage(currentFigURL)
    display(outFig)
#implement interactive widget
from ipywidgets import interact
interact(plotTableFig,figTableIndex=figureSlider)

interactive(children=(IntSlider(value=0, continuous_update=False, description='Figure Index', max=9), Output()…

<function __main__.plotTableFig(figTableIndex)>