In this walkthrough, we're going to start playing around with real scRNA-seq data.  I had no end goal in mind when starting this post; let's see what happens!

# The Dataset

We'll use the [Single Cell Expression Atlas](https://www.ebi.ac.uk/gxa/sc/home) to find a suitable dataset to explore.  The website has a nice design, and there're a lot of species to choose from.

![Species Options](cell-expression-atlas-species.png)

I chose to look at <i style="color:#EB1960">Danio rerio</i> (<b style="color:#EB1960">Zebrafish</b>) because it was the animal with the most experiments that I hadn't really heard about before^[Apparently it does seem to be one of the 'useful' animals in biology - you learn new things every day!  Shoutout to Sam's blog for introducing me to being able to use footnotes.].  I didn't want to look at plants/fungi/protists because they might require additional considerations I'm not aware of (especially protists).

<details>
    <summary style="color:#C0CF95"><b>&lt;i&gt; vs &lt;em&gt;</b></summary>
    <p>This is very tangential, but in trying to learn the right way to represent scientific names such as <i style="color:#EB1960">Danio rerio</i> I stumbled onto a debate about <span style="color:#757575">&lt;i&gt;</span> vs <span style="color:#757575">&lt;em&gt;</span> (html tags to represent italics), and analogously about <span style="color:#757575">&lt;b&gt;</span> vs <span style="color:#757575">&lt;strong&gt;</span> (to represent bolds).  In my investigation into this debate, I've encountered that <span style="color:#757575">&lt;i&gt;</span>/<span style="color:#757575">&lt;b&gt;</span> are to be used when there is no semantic emphasis on the words, whereas <span style="color:#757575">&lt;em&gt;</span>/<span style="color:#757575">&lt;strong&gt;</span> contain semantic emphasis [@b-vs-strong].  This unfortunately means I've been using them wrong 😅 - since I've just been <span style="color:#757575">&lt;strong&gt;</span>-ing all my bolds even though I basically never bold for emphasis these days.  (I'm not a very emphatic person).</p>
</details>

I chose the dataset from the paper "Single-cell transcriptional analysis reveals innate lymphoid cell (ILC)-like cells in zebrafish" [@zebrafish-data], because it was one of the most recent but also not exceedingly large (<1000 cells).

## Experimental Design

Before downloading the data, we want to check if it's actually useful to us - is it mRNA, what type of cells are they, were specific genes targeted, etc.  Since the previous blog posts were talking about the wetlab generation of the data, let's do a deep dive into what they did.  The paper [@zebrafish-data] is freely available online, and the pertinent information will be contained in the <b style="color:#EB1960">Materials and Methods</b> section.  The paper is the source of two datasets on the <b style="color:#EB1960">Single Cell Expression Atlas</b> so we will have to keep that in mind when reading about the methods.

### Sample Selection

> The aim of this study was to characterised innate and adaptive <strong style="color:#A6A440">lymphocytes</strong> in zebrafish in steady state and following the immune challenge, using scRNA-seq. <strong style="color:#C0CF95">Multiple zebrafish</strong>, either in <strong style="color:#A6A440">steady state</strong> or <strong style="color:#537FBF">exposed to immune challenge</strong>, were used to collect cells for sequencing.
>
> -- <cite><b style="color:#EB1960">Study Design Subsection; Materials and Methods Section;</b> @zebrafish-data </cite>

A <b style="color:#A6A440">lymphocyte</b> is a specific type of cell that is part of your immune system; B, T, and NK cells are all <b style="color:#A6A440">lymphocytes</b>.  We can also see that the experiment was done on multiple zebrafish, rather than just one - and that some were "exposed to immune challenge", which I assume means they were made sick to try to trigger interesting processes in their immune system.

The paper goes into more depth into this immune challenge (it involves <i style="color:#EB1960">Vibrio anguillarum</i>, a "fish pathogen" [details: @vibrio-anguillarum]), however this does not seem to be relevant for our dataset.  The paper seems to refer to our dataset as the <b style="color:#EB1960">Smart-seq2 experiment</b>, whereas the other dataset is the <b style="color:#EB1960">10x experiment</b>.  The immune challenge was only applied to the <b style="color:#EB1960">10x experiment</b>.  We can double-check this by comparing the sample characteristics of [our dataset](https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-7117/experiment-design) and [the other dataset](https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-7159/experiment-design) on the <b style="color:#EB1960">Single Cell Expression Atlas</b>, noting that <b style="color:#757575">infect</b> is not one of the experimental variables for our dataset.

A triple-check, if the above was not convincing enough^[It wasn't really, at least for me - I like to be sure!], can be found when we read the whole paper:

> To allow easy retrieval of sequencing data from zebrafish innate and adaptive lymphocytes we generated a cloud repository (https://www.sanger.ac.uk/science/tools/lymphocytes/ lymphocytes/) with transcriptional profiles of over 14,000 single cells collected from healthy and immune challenged zebrafish using 10x genomics and Smart-seq2 methodology <strong style="color:#C0CF96">(please
see Explanatory Note in Supplementary Material)</strong>.
>
> -- <cite><b style="color:#EB1960">"Single cell atlas of innate and adaptive lymphocytes in zebrafish" section;</b> @zebrafish-data </cite>

We can then track down this explanatory note^[I found it on [NIH](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258902/); warning - you'll have to download a 16MB file to read it!] to see what it has to say;

> Available datasets includes <strong style="color:#C0CF96"><b style="color:#EB1960">Smart-seq2 data</b> from kidney, thymus, spleen, guts and gills of healthy, unstimulated <b style="color:#A6A440">wild-type</b> zebrafish</strong> as well as <strong style="color:#C0CF96"><b style="color:#EB1960">10x datasets</b> from gut of unstimulated zebrafish both <b style="color:#A6A440">wild-type</b> and <b style="color:#A6A440">rag1<sup>-/-</sup> mutant</b> and <b style="color:#537FBF">immune-challenged</b> (V. anguillarum- or A. simplex-injected) <b style="color:#A6A440">rag1<sup>-/-</sup> mutant</b></strong>.
>
> -- <cite><b style="color:#EB1960">Supplementary Material; Explanatory Note</b> @zebrafish-data </cite>

We know the <b style="color:#EB1960">10x experiment</b> contains over 10,000 cells, and experienced <b style="color:#537FBF">immune challenges</b>.  Ours doesn't.  Our dataset should be from multiple body parts (corroborated by the sample characteristics of [our dataset](https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-7117/experiment-design)) and the <b style="color:#EB1960">10x dataset</b> should only be from the gut (corroborated by the sample characteristics of [the other dataset](https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-7159/experiment-design)).

The above quote also points out that all our zebrafish were wild-type.  Interestingly, the sample characteristics of [our dataset](https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-7117/experiment-design) do list multiple genotypes, including <b style="color:#A6A440">rag1<sup>-/-</sup> mutants</b>.  The quote seems to imply that <b style="color:#A6A440">rag1<sup>-/-</sup></b> is not "<b style="color:#A6A440">wild-type</b>", as does the rest of the paper.  However, the rest of the paper does explicitly say that the <b style="color:#EB1960">Smart-seq2 data</b> contains <b style="color:#A6A440">rag1<sup>-/-</sup> mutants</b>, so I assume this was either an oversight by the authors or a minor misinterpretation on my part.  Either way, I'm satisfied with this understanding; we can move on to the tissue preparation.

<details>
    <summary style="color:#C0CF95">About <b>rag1<sup>-/-</sup> mutants</b></summary>
    <p>
        Well, maybe not move on just yet!  TODO: THIS
    </p>
</details>

### Tissue Preparation

> Kidneys from heterozygote transgenic zebrafish either <b style="color:#A6A440">wild-type</b> or <b style="color:#A6A440">rag1-/- mutant</b>, were dissected and processed as previously described [@tissue-preparation].
>
> -- <cite><b style="color:#EB1960">FACS Sorting Subsection; Materials and Methods Section;</b> @zebrafish-data </cite>

Which gives us another paper to read!  TODO: THIS

### Cell Isolation

TODO: THIS