# Why are we talking about JPEGs

## But... why JPEGs?

Although the title of this collection of interactive lessons is WiMSE (White Matter Segmentation Education), and presumably you are here to learn about neuroscience, our first several lessons will have us exploring JPEGs rather than brain images.  **Why is that**?

### Building intuitions from the familiar

Although our ultimate goal will be to develop an intuitive understanding of how to conduct digital investigations of white matter anatomy, it will be helpful to begin with an example of a data structure that everyone (at least, denizens of our modern digital era) is familar with.  This is where JPEGs come in.  [JPEGs (Joint Photographic Experts Group)](https://jpeg.org/jpeg/) are a standardized two dimensional (2D) digital image format that we typicaly encounter dozens of time a day as we browse the internet.  There are other image formats (PNG, for example), but JPEG is perhaps the most famous for images.  As such, we will begin exploring concepts related to structured data representation (and im particular, *image* represenatation) using JPEGs.  We will use JPEGs to establish a number of analogies to the primary image type used in neuroimaging research, the [NIfTI (Neuroimaging Informatics Technology Initiative)](https://nifti.nimh.nih.gov/).  Research has shown that analogies and metaphors are particularly helpful when learning new concepts (REINDERS DUIT, 1991).  Indeed, in many cases the comparisons we make will be more akin to logical extensions than metaphors.  Regardless, it is **not** expected that individuals using this lesson set are particularly familiar with the NIfTI data format, and so several non-technical features of this data type will be be discussed to help build this understanding.

### A reassurance to the initiated

For more experienced users though, our upcoming discussion of the NIfTI format may seem out of place.   NIfTI data objects aren't used to store connectivity data, which is the essence of white matter (and the key feature that data formats for white matter attempt to capture).  While this is true, it is necesary to discuss NIfTI data structures for two important reasons:  (1)  The difficulties in segmenting white matter structures (in connectivity-centric data formats) are well highlighted by comparsion to the process of segmentation volumetric structures (in NIfTI/volumetric-type formats) and (2) A good white matter segmentation can make exentsive use of information stored in or derived from NIfTI files (for example, [FreeSurfer](https://surfer.nmr.mgh.harvard.edu/) parcellations).  Thus, our consideration of white matter segmentation will begin with digital images, proceed to NIfTI-type images (in this case T1 images), and then move on to the topics of tractography and segmentation.

## The big picture

Below you'll find a table that provides an outline of the similarities / analogies between the various data formats that will be discussed.  In the coming lessons we'll begin to make the connections between the analagous characteristics of these data formats explicit.  At various points we will return to a consideration of this table and reflect on how the features being discussed relate to features that have been or will be discussed.  

|   | **Digital Photography** | **Structural Brain Imaging (T1)** | **Diffusion Imaging (DWI)** | **Tractography** |
| --- | --- | --- | --- | --- |
| _Data Token_ | digital photo image | structural brain image (T1)| diffusion image (DWI) | tractogram |
| _Object represented_ | visual scene | cranium / brain | cranium / brain | white matter of brain |
| _Source system_ | camera | MRI scanner | MRI scanner | Mathematical model  | 
| _Source phenomena_ | reflected light | water / magnetic properties | water movement | orientation interpolation |
| _Property of interest_ | topography | volumetric occupancy | tissue structure | putative axon collection traversal |
| _File extension_ | .jpg, .png ... | .nifti, nii.gz | (dwi) .nifti, nii.gz | .fg, .trk, .tck |
| _Metadata_ | exif | header | header | varies by format |
| _Data size_ | 100s kb - 1s MB | ~2.5 - 5 MB | 50 MB - 1.5 GB |500 MB - 10 GB |
| _Data dimensionality_ | &quot;2D&quot;(3 RGB layers) | 3D | 4D |1D nested? |
| _Data &quot;atoms&quot;_ | pixels | voxels | voxel-angle |vectors (streamlines) |
| _Data &quot;atom&quot; content_ | integer | float |float |ordered float sequence (nodes) |

## At the heart of it all

Overall, what this guide is designed to do is to take an individual who is presumed to have little familarity with white matter or white matter segmentations and bring them to a point where they have mastered the "meta-ontology" associated with the practice of white matter segmentation. In essence, "mastery of the meta-ontology" is the state an expert in any given field has acheived when they possess a deep familarity with the various approaches particular to their field _and_ the systematicly structured relations between those approaches.  Acheiving this is obviously no easy feat.  The strategy of this lesson set is to reinforce users' intuitions regarding ontological systems they are already implicitly experts of, and use those as a bridge to understanding ontological systems that were initially utterly mysterious (i.e. digital white matter segmentation). **But what in the heck does ontology and "meta-ontology" mean?**

### What's a meta-ontology?  For that matter, what's an ontology?

For our purposes here, we are considering the notion of an ontology in the sense used by [information science](https://en.wikipedia.org/wiki/Ontology_(information_science)).  Overall, our main curiosity (for the sake of this lesson set's narritive) is *how ought we go about systematically assigning particular entities we encounter to (presumably) meaningful and useful categories*.  Quite simply, the answer to this is the provision or specification of an *ontology*.

For us, in the following lesson sets, the "provision or specification of an ontology" will entail the (either explicit or implicit) [operationalization](https://en.wikipedia.org/wiki/Operationalization) of certian components that are common to all ontologies in a fashon that is specific to the current ontology.  Those components are outlined in the following table ([adapted from wikipedia](https://en.wikipedia.org/wiki/Ontology_components)):

| **Ontology component** | **Rough definition** | 
| --- | --- |
| "Individuals" | Those entities which "exist" and are submitted for assignment to "Classes" | 
| "Classes" | Meaningful/non-arbitrary groupings of "Individuals" (e.g. labels, categories, etc. |
| "Attributes" | Properties or characteristics that "Individuals" or "Classes" can have | 
| "Relations" | Those manners or dimensions with which "Individuals" and "Classes" can be compared or related | 

This is, admittedly, a very abstract overview of how we will be using ontologies.  Interestingly though, the structured analogy provided in the analogy table under "The big picture" _already provides_ specifications for several characteistics.  As it turns out, by decomposing the salient characteristics of the imaging modalities we'll be considering, we appear to have "carved nature at its joints" (to borrow from Plato) and thereby aligned those modalities along their ontologically salient dimensions.

| **Ontology component** | **Rough definition** | **Modality characteristic**
| --- | --- |  --- |
| "Individuals" | Those entities which "exist" and are submitted for assignment to "Classes" | _Data &quot;atoms&quot;_ |
| "Classes" | Meaningful/non-arbitrary groupings of "Individuals" (e.g. labels, categories, etc. | [depends on intent/usage] |
| "Attributes" | Properties or characteristics that "Individuals" or "Classes" can have | _Data &quot;atom&quot; content_ |
| "Relations" | Those manners or dimensions with which "Individuals" and "Classes" can be compared or related | Mathematical relations for "Individuals" (due to "Attributes"), Class relations depend on Class systemization |

Thus, any time we apply a segmentation (to one of the _Data modality/method_(s)) what we'll be doing is systematically providing rules based on the "Attributes" (i.e. quantative characteristics) of the various "Individuals" (i.e. _Data &quot;atoms&quot;_) for a _Data Token_ in order to divide up those _Data &quot;atoms&quot_ into "Classes" of interest.  

### A second insight from the analogy framework: the role of representation

The decomposition in the earlier analogy table also offers another interesting insight.  Each data modailty we are considering is a way of systematically _representing_ something in the world.  Because the relations of the various aspects of the data modalities is preserved by the structure of the analogy posed, it is possible to provide a general account of how each of the relevant characteristics from the analogy table helps to engender a representation relation with the associated data object.

Explicitly, each _Data Token_ represents a "real thing in the world" (i.e. _Object represented_ ).  That _Data Token_ is generated by the _Source system_ , which systematically measures the _Source phenomena_ , one of several "Attributes" possessed by the _Object represented_ .  Furthermore, the _Source phenomena_ (and implicitly, the _Source system_ ) are chosen from among the potential alternatives because the _Data Token_ generated preserves information related to the _Property of interest_ that can be used to ascribe the desired "Classes" to the "Individuals" of the _Data Token_ .  Because the _Data Token_ "preserves the relevant information" (and in doing so, instantiates the representation relation with the "real thing in the world"), the "Classes" ascribed to the _Data Token_'s "Individuals" can be back-mapped back on to the _Object represented_.  Indeed, though provided as an account of representation for the modalities in the analogy table, its possible that this could serve as an account of representation more generally.

The above paragraph also just so happens to highlight why we care about the various image modalities covered by the analogy table.

##  Why we care about image representations generally... and specifically

Inherent in the account of representation above is the fact that we can computationally leverage digital images (in any of their various forms) to label their individual components in accordance with whatever goals we may have.  Indeed, this is precisely what our brains do in order to help us interact with the world: they carve the world into categories (and instances of those categories) so that we can make use of that information to guide our behavior.  However, digital images and the algorithms we apply to them increasingly (as our computational and algorithmic technologies advance) have two distinct advantages:  their formal reliability and their processing speed.  So long as we can formulate a sensible set of formal rules to apply to the elements of digital images, we can apply them, with great speed and reliabilty, to obtain far more nuanced categorizations that we are able to when using our natural sensory capabilties.  But therein lies the rub: "So long as we can formulate a sensible set of formal rules".  This is at the heart of our difficulties with image labeling algorithms generally, and our specific goal of performing white matter segmentations.

Because we are using computational approches, our ability to pose formal rules is fairly decent.  Indeed, there are those who have formalized computational languages specific to white matter segmentation (i.e. WMQL) to extend the ability to pose formal white matter segmentation rules as robustly as possible.  However, it has yet to be proven that any particular of these approaches is sufficiently versitile to delineate any specific white matter structure we could possibly be interested in (perhaps because we don't actually know what is entailed by "any white matter structure we could be interested in).   More to the point though, such endeavors are merely the provision of a [syntax](https://en.wikipedia.org/wiki/Syntax) for describing white matter, and syntax (a formal system of rules for making well formed expressions) alone isn't enough if our goal is to systematically link our _Data Tokens_ to "objects in the world".  What they lack is a [semantics](https://en.wikipedia.org/wiki/Semantics) for describing white matter--a systematic method for mapping meaning (and thus by necessity, representation) to the various components of our data of interest.  It isn't at all immediately clear how one could go about this, but this lesson set, as a whole, is offered up as a rough attempt at this.

## A rough attempt at a semantics for digital white matter

Traditionally, one of the greatest challenges for naturalistic accounts of representation of mental content was finding a compelling account of the link between mental content and their putative associations in the world.  One of the more prevalent theories offered was that of [teleosemantics](https://plato.stanford.edu/entries/content-teleological/) which endeavored to explain mental representation in terms of inferred functions or purposes.  This paradigm is subject various forms of rebuttal due to the limitations of our ability to infer the "true" function of a given entity posessing representational properties.  However, this isn't the case for digital models of white matter.  Because we have developed the imaging devices and the algorithms that trandsuce the measurements in to data objects and derivative models we don't need to infer.  

Instead, what we need is a robust account of how the we can systematically modify and transform the representational elements available to us in such a way that carve digital models of white matter into what we take to be the meaningful subcomponents of the white matter.  This means identifying, tracing, and describing the elementary methods used to compose or describe a given white matter structure (in a tractography model).  To do this we will need to consider all of the information available to us in our various data objects, and what steps we can take to select a specific and coherent structure of interest.  In essence, while we typically think of the establishment of a representation relationship as the forging of a the relevant kind of association between a specific thing in the world and some data object, what we're actually doing here is applying an (ideally finite) set of rules to a data object to exclude everything _except_ the specific class (or token) we are interested in.  In providing a semantics for this process we will be providing a comprehensive (to the extent that we are able) account of tactics that can be used to eliminate data/representation elements that are inconsistent with the properties representational class corresponding to the "real world" class.