<img src="resources/cropped-SummerWorkshop_Header.png">  

<h1 align="center">{Dataset} Workshop SWDB 2021 </h1> 
<h3 align="center">Tuesday, July 13th, 2021</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>This notebook will introduce you to the presentation format for the data set presentations for the Summer Workshop on the Dynamic Brain. 

<p>We will outline the kind of material we would like you to create and describe the format of the presentations. 

<p>You should consult more complete examples from past years of the course to see what material has been used.  In particular, please look at the 2019 course, as that is the year we adopted the current format.
    
<a href="https://github.com/AllenInstitute/swdb_2019">SWDB 2019 GitHub repo</a>
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">


    
<p>(In addition, this document serves as a guide for formatting the actual jupyter notebooks that we will use.  You can double-click a Markdown cell to see the html formatting.  Just copy this html into your Markdown cells and follow the same conventions.  Please don't forget to use the closing div tags, otherwise GitHub will not render the notebook correctly online.)
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p> Each dataset presentation takes place in the afternoon of one of the first 3 days of the workshop, and runs from about 1:00 PM to just before dinner time (6:15 PM), and then reconvenes after dinner briefly.
    
<p> The dataset presentations are designed to introduce students to one of the featured datasets for the course for that year.  They center around jupyter notebooks (like this one!) that introduce the Python-based software that we use to access data with hands-on examples.
    
<ul>
<li>A Short ~30 min lecture with slides 
<li>A brief (~1.5 hours MAX) hands on walkthrough of a jupyter notebook
<li>An open work time during which students will work on provided problem sets.
<li>A final session after dinner for answering questions, providing solutions, and directing students to longer problems that may form the basis of projects.
</ul>
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
<h2> Short Lecture </h2>
    
<p> The initial portion of the presentation will be a ~30 min lecture outlining the data that has been collected so that students will understand the basic experimental setup as well as the organization of the data as it is represented in the tools that we give them (e.g. the AllenSDK).
    
<p>  You should plan for this lecture to be given by one of the people in your group. This is a standard slide based presentation and is not "hands-on".     
    
<p> It is *very* easy for this presentation to go on interminably.  PLEASE do everything possible to make sure this part of the presentation sticks to the most salient information.  It will be tempting to go over everything in ecsquisite detail.  Focus on brevity.  Allow time for questions from the students.
    
    
<p> Also, consider the slides for this lecture as a resource for the students. There might be important aspects that will be complicated or just too time consuming to fully unpack in the lecture, but will be important for students to be able to reference once they start working with the data. But they need to have the information, so you can touch on topics lightly so that they know this information is available and to have it in mind as they get started. 
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
<h2>Preliminaries</h2>
   
Before we get started with the main material, there is often some kind of preliminary setup or information.  That will go in this section.  Usually these are Python imports (which are pretty straightforward and simple) and setting up access to the data itself (which can be complicated).  We try to set these things up in such a way that we minimize the chaos that can and will arise when no one can access the data and you have to waste precious time during your presentation helping people get to the point where they can even pay attention to what you want to say.
    
Import statements that are not central to the presentation (meaning modules you are not highlighting, like `numpy`, can go in a cell like this:
</div>

In [3]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p> Please keep dependencies to the absolute minimum!  
    
<p> This is both to make it less likely we encounter problems and to minimize the firehose problem for students who are newer to Python.  You might want to use some fancy visualization package or something else, but that's one more thing in the brain of a complete beginner.
    
<p> If you *must* introduce a dependency that is not covered elsewhere, you need to describe it to students so that they know what it is.  We will need to coordinate this across dataset presentations so that we can distribute introductions to useful aspects of Python not covered in the Bootcamp across the days.  Remember that this is a secondary function of these notebooks, though, so don't get bogged down introducing cool Python tricks and tools.
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
The more important setup step is making sure the students have access to the data.  In case they are working on AWS (through the instances that we provide them) this is easy and the default setting of the cell below will reflect this.  We also give the students portable hard drives with all (or much) of the data pre-loaded.  Accessing these will depend upon the individual student's platform.  The cell below will have reasonable defaults that *should* work, but we will need to be prepared to help.
    
This cell sets up a variable called `data_root` that you should use in any code below to access the dataset in question (e.g. paths to manifest files for the SDK should be made relative to this variable).
    
(Hint:  If you edit this cell to make your life easier while writing this notebook (e.g. to put in a local path for your machine) PLEASE remember to remove these edits afterward.)
    
</div>

In [4]:
import platform
platstring = platform.platform()

if 'Darwin' in platstring:
    # macOS 
    data_root = "/Volumes/Brain2021/"
elif 'Windows'  in platstring:
    # Windows (replace with the drive letter of USB drive)
    data_root = "E:/"
elif ('amzn1' in platstring):
    # then on AWS
    data_root = "/data/"
else:
    # then your own linux platform
    # EDIT location where you mounted hard drive
    data_root = "/media/$USERNAME/Brain2021/"

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
The variable below (`manifest_path`) should point to the location of the data for this notebook, ideally relative to the `data_root` variable.    The example below was adapted from the 2019 Neuropixels notebook.
    
</div>

In [None]:
manifest_path = os.path.join(data_root, "dynamic-brain-workshop/visual_coding_neuropixels/2021/manifest.json")

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
<h2> Dataset walkthrough </h2>

<p>  This portion of the presentation will be jupyter notebook based, and it will be preparation for the students to work through a second notebook containing a problem set (described in the next section).  The idea is to lead the students through the basics of accessing the data in a hands-on fashion, using actual code that they can manipulate and providing examples of the basic functions available for retrieving the data described in the lecture.
    
<p> Your group should plan for this to be ~1.5 hours.  While this presentation is interactive and "hands-on", nothing in this portion of the presentation should be an involved exercise (those come later).
    
<p> This notebook should accomplish several things:
<ul>
<li> Demonstrate the central module or API that your data uses.
<li> Show how to find and identify the data available.
    <ul>
<li> Show how to obtain relevant summaries of the data available (e.g. the experimental sessions) and provide examples of deriving and working with data from theses dataframes.
<li> Show the metadata available for the datasets and demonstrate usage (e.g. search for experiments from specific visual areas or Cre lines).  Note: your dataset may have multiple types of dataframes with different metadata.
</ul>
<li> Show an example of loading a specific data set (e.g. an experimental session) Note: If people are using their hard drives and haven't set up their environments correctly, or the `data_root` is not configured properly, you can grind students' work to a halt at this step because they'll start downloading data.  We want to avoid this!
<li> Demonstrate loading and manipulating the various kinds of data that are in a session (e.g. plot some spike rasters, show some behavioral traces, etc.)  Typically this will include both neural responses AND metadata regarding stimuli or behavioral variables.  This will be the bulk of the notebook.
<li>  Consider leading up to an example analysis (although a simple one!).  Example:  Plot a tuning curve in response to gratings of different orientations.
</ul> 

<p>IMPORTANT!  Make sure your notebook sets students up to accomplish the exercises you will give in the Problem Set notebook!
    
Most of this notebook will be descriptions of actions along with example code that performs the action.
    
For example:
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
The Neuropixels Visual Coding dataset is accessed with the Python module `EcephysProjectCache`.  We begin by importing this module.
    
</div>

In [None]:
from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
We instantiate this module using the `manifest_path` variable defined above.
</div>

In [None]:
cache = EcephysProjectCache(manifest = manifest_path)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>and so on.
    
    
<p> Throughout the notebook for this section, most cells will be Markdown explanation (like this one) followed by code that the students can just run along with you as you talk through it.  Every once in awhile you might want to be a little more engaging and have the students answer a question on their own.  As an example, you might stop and ask everyone to find the list of experiments in their dataset that were recorded in a specific area.  
    
<p> For the dataset walkthrough, these questions should be *short*, i.e. accomplishable within a minute or so.  Don't stop the class for 15 minutes to have the students work on some problem.  Those kinds of problems come in the next session.
    
<p> We call these short questions Tasks and they look like the following example.  (Note:  make sure the previous cells in the notebook set the students up to complete this as a Task, i.e. within a minute or so.)
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Task 1.1:</b>  Print the number of units recorded in your experimental session.
</div>

In [5]:
# Make sure you leave a blank coding cell for them to work in.  
# Also make sure you've set them up correctly in the preceding work in the notebook.  
# This should be *very short*!
# You might consider adding some code to get them started depending on what you're asking them to do.

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p> The notebook for this section should be mostly Markdown explanation and demonstration of the code, with a few "Tasks" sprinkled in to keep things interesting.  
    
<p> Some groups in the past have used the Tasks to coordinate a collective comptutation that all the students participate in (e.g. look up the orientation tuning for your favorite cell so it can be added to a global plot on the board.)  
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2> Problem set using the data</h2>
    
<p>This section is an ~2 hour session (around 4:00 - 6:00) during which students will work on a problem set, either alone or in groups, as they please, with TAs helping out and answering questions as needed.
    
<p>  This section will use a second notebook.  The structure of this notebook is very simple.  It will be a list of exercises with whatever explanation is needed for the context of the exercise.
    
<p> For example:
    
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 1.1: Plot the distribution of reaction times for go trials from one session</b>

<p> 1) get the <code>trials</code> dataframe from the session object. 
    
<p> 2) Filter the trials dataframe to get go trials only. 
    
<p> 3) Use the values of the <code>response_latency</code> column to plot a histogram of reaction times. 
    
<p> <code>response_latency</code> is the first lick time, in seconds, relative to the change time. 
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p> You have some freedom in how you want to format this.  You can set up exercises in multiple parts to lead students stepwise through an analysis and thus have exercises 1.1, 1.2, etc. or you can have independent lists of problems that don't connect.  The important goal is to get students to engage in the dataset in some complex way to demonstrate how to perform an analysis of interest.  As they work through the notebook, they should become more comfortable and more familiar with the dataset.  
    
<p> One of the difficulties you will face but not be aware of until you see students work through your notebook is that your audience has a variety of backgrounds and expertise, particularly with Python itself.  Be careful to target your exercises for this range.  Some should be accessible to relative or complete beginners in Python.
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3> Solution Set </h3>
<p>  You will also need to provide a separate solution set for this notebook, so that students can have a reference or self study guide.  In addition, save an html version of the solution set notebook so that there will be an unalterable version in case students play around with the solution notebook.
    
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2> Final Session</h2>
   
The last session, which will convene after dinner, serves a few purposes.
    
<ul>
<li> Answering any questions that students have.
<li> Demonstrating any solutions that might be useful for everyone to see or talk through.
<li> Outlining potential longer problems that could be the basis of projects
<li> Listing project ideas
</ul>

    
<p>The hard part of this section is the last two points.  Your group can decide in advance or on the day of the event whether certain solutions should be highlighted.  What you need to prepare are lists of more involve problems (e.g. these could be problems that you didn't include in the Problem Set above) and providing potential project ideas.  
    
<p> Construct a slide or slides of a few project ideas for your dataset and take a few minutes to talk through these ideas with the students at this session.
    
</div>


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h1> A few more things </h1>
    
<p> Keep things as simple as possible!  If you don't need a module or method to introduce the data, leave it out.  
<p> On note note of simplicity:  be careful not to get students bogged down in complex visualizations where they spend all their time struggling with matplotlib or something.  
    
<p> Be VERY mindful of what might be distracting for a student in both cognitive and technical terms.  It's impossible to hold everyone's attention and harder for people to learn something if TAs have to run around fixing technical issues.
    
<p> If there is a method or tool that you *really* want to show students but isn't part of the main thrust of the notebook, please consider contributing to our library of tutorial notebooks (that cover topics like regression, clustering, etc.)

    
<p> All of this material will be put into a github repo at <a href="https://github.com/AllenInstitute/swdb_2021">SWDB 2021 GitHub repo</a>, including this template (until the course starts).
    
<p> Please work on a fork on this repo for your group.
        
<p>  Please be aware that jupyter notebooks can be finicky with git.  We've had situations in the past where people have accidentally erased other people's work.  Please be mindful of your commits and backup your work just in case.  

</div>