# Data types and flow control
### as commonly used in neuroscience
19 June 2023<br>
NRSC 7657<br>
Daniel J Denman and John Thompson<br>
University of Colorado Anschutz<br>
<br>

## whether defined in native python (`str`, `int`, ...) or by a package `numpy.int16`,`numpy.ndarray`):
#### many data types are straighforward. for example

In [1]:
#float: any number with decimals
_float = 16.0
_float2 = 16.2

#int: any number without decimals. 
_int = 16
_float_fancy_saving_characters = 16.

#complex number are also a type but we won't have to use them much if ever in the normal course oof things:
_complex = complex(16,1)

In [3]:
_float2

16.2

Q: why am i naming these `_float` and not `float`?

strings are the way to do things with letters

In [None]:
_string = 'letters'

Q: why are these called "string"? 

finally, native python has several ways to group things: lists and dictionaries. 

In [5]:
_list = [1,2,3,'STRING',_string,12.0]
_dict = {'key1':1,
         'key2':2.,
         'key3':'three'}

In [None]:
_list[4]

note that strings can be indexed like lists

In [None]:
_string[2]

't'

another important native type: `bool`:

In [26]:
_bool = True
_bool2 = False

In [18]:
if _bool2:
    print('something')

let's return to why i put underscores in the variable names. 
<br> let's use `type()`, the python function for checking type, and comparison `==` to see why again

In [None]:
type(_bool)

In [None]:
float == type(_float)

In [32]:
float = type(float)

In [29]:
#bad
float = 16.8

finally, floating point numbers have precision of encoding in computers, and this precision can matter (but usually won't). numpy has all the [numerical types](https://numpy.org/doc/stable/user/basics.types.html) you will need. 

In [33]:
import numpy as np

In [34]:
print(np.int16(_float2))
print(np.uint16(_float2))
print(np.uint8(_float2))
print(np.single(_float2))
print(np.double(_float2))

16
16
16
16.2
16.2


### This brings up namespaces - the way that variable names are defined. 
when you define something it lives in a *namespace* and has a *scope*. <br>
the kernel namespace is, in fact, a dictionary in the kernel. this fact that it is a dictinary is itself not important, but that it is a simple mapping is important.
<br>
in the notebook, variables go in the kernal namespace, but they could also be in a function or classes's namespace. we will return to this as we make functions and classes today.

In [None]:
#in ipython (like jupyter, or like in vscode) green things have a meaning in the namespace that you didn't define

In [None]:
np

In [None]:
float

## Everything in python is an object. 
This means that everything (...just about everything) has associated _properties_ and _methods_ <br>
_properties_ are values

In [40]:
_complex.real

16.0

_methods_ are code, generally functions

In [46]:
_string.startswith('et',2)

False

we will come back to this object-associated code, (functions and assets), and how it becomes associated with some object, but here are some examples:

In [None]:
_string.startswith()

### Functions are ways to repeat code snippets that you have validated, without copying and pasting. 
A function can be thought of as a program that operates on input. Functions can have zero input argumnets, taking there inputs from a dynamic source, but more typically have 1+ input arguments which are passed in `()`. Functions only run when they are called. <br><br>
Let's recreate `startswith()` as our own function. We'll also go through some python flow control as we do. this will include `for`, `if-else`, `pass` and `break`, and `while` statements

In [47]:
startswith()

NameError: name 'startswith' is not defined

** --> we just used `_string.startswith('l')`, why didn't this work? **

first, let's prototype what we want to accomplish in a cell or some cells

now, let's put it in a function so it's easier to reuse
<br> 
<br>
_note here: this concept of repeating code by putting it in function is definitely language agnostic and is a nice step towards writing cleaner and easier to maintain code._

don't forget default arguments

functions can be defined in a notebook, or imported from a .py file or package. 

other flow control: `try-except`, `while`
<br>also `switch-case` for MATLAB (and now python)

In [1]:
age = 120

if age > 90:
    print("You are too old to party, granny.")
elif age < 0:
    print("You're yet to be born")
elif age >= 18:
    print("You are allowed to party")
else: 
    "You're too young to party"

You are too old to party, granny.


## Classes: making your own objects
Classes are a way to make bundles of your own objects, entities with their own properties and methods that you define. 

when you use your class the first time, you are creating a new *instance* of that class. that instance has all of the* attributes*(*properties* and *methods* of the parent )

In [102]:
t = ThingWeMade()

In [104]:
t.fire_a_thing('cannon')

neuron fires an cannon


notice that i capitalized and camelcased or CapWords'd `ThingWeMade`. this is the convention according to [PEP8](https://www.python.org/dev/peps/pep-0008/#class-names). it is not required, but it might help you understand someone else's code if that person is following convention. (also PEP means Python Enhancement Proposal and it is where the core python developers propose, debate, and make rules and changes to how core python works.)

Finally, an example of inheritance:

In [107]:
b.make_it_better()

neuron  better


In [109]:
b.fire_a_thing('employee')

neuron fires an employee


Classes are generally not something you'd create in a notebook workflow, although you could. You will find classes in more complete codebases especially in packages, where you will make an instance of a class and then pass it your data, so you can then use the class methods to do whatever it is (`scikit-learn`, the most popular and flexible ML codebase, works like this, where many of it's analyses are classes that you train and then you can use that trained class to test on other data.). One of my lab's behavioral paradigms is also a monster class. 

### Summary:
Why classes and functions? 
- organization: if you need to fix a bit of code, and it is in a function, you fix it once and then it works wherever it is called. if you copy pastaed...
- reproducability: your function does the thing it is expected to (if it is correct), and not some slighly different thing than the copy pastaed v4 of the code snippet
- readability: one line of a function call with a descriptive name is a lot more readable than a bunch of code. same for classes, ABetterThing.thing is more readable than many many variables with names like thing_for_ABetterThing. 

# some more commonly encountered data structures

## numpy: ndarray
this is almost so ubiquitous it belongs above, in the basic data types section. the ndarray is the matrix of python, the n-dimensional data structure where most any data are stored and transformed.

In [110]:
import numpy as np

In [115]:
a = np.array([4,1,2,3])

In [116]:
type(a)

numpy.ndarray

In [117]:
a.argmax()

0

why not just use embedded `list`s?
<br><br>
note that `list`s can be "ragged" (sometimes also called "jagged"), where as `ndarray`s (like matrices) are always rectangular

time series are 1D; images are 2D (or potentially 3D if color); movies are 3D (or potentially 4D if color). 
<br><br>
let's do some image work that will also help reinforce indexing

## pandas and .csv / .xlsx files
#### tables for your tabular data needs
if you were going to do that in Excel, you can do it with pandas. faster, and powerful filtering and data selection. 

In [155]:
import pandas as pd

In [156]:
df = pd.read_csv('iris.csv') # this is a classic ML toy data set for 

### HDF5 and .mat files
hdf5 is a Heirarchical Data Format, which has several nice features: good metadata, fast I/O (if implemented well), and good cross-platform portability. <br>
several data formats commonly used in neuroscience are in fact based on hdf5, including .mat files and NWB files (to be talked about later). for MATLAB .mat files, there are several options for loading into python:

In [7]:
from scipy.io import loadmat

In [None]:
rez = loadmat('rez.mat')

In [5]:
import h5py

In [None]:
rez = h5py.File('rez.mat')

In [None]:
type(rez)

you can think of hdf5 filles as nested dictionaries, with keys that take you down levels in the heirarchy of the heirarchical data format:

once we hit something in that structure, we can work with it based on the data type that it is. for example, a 2D array we can plot like an image

## Neurodata Without Borders
#### a unified format for neuroscience data
![nwb_schematic](https://www.nwb.org/wp-content/uploads/2020/02/nwb_datatypes_overview-1024x491.png)
<br>
the point is to have a useful cross-platform data format that has metadata in specific places, so that analysis tools and data are easier to share across labs. this is what file format is, same as .xlsx or .jpeg or .tiff, for example. 

<br>
data from DANDI set [000006](https://dandiarchive.org/dandiset/000006?search=svoboda&page=2&sortOption=0&sortDir=-1&showDrafts=true&showEmpty=false&pos=11)


<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>

finally, note that both John's lab and my lab are using this format, as are many labs (known at Allen, UCSF, NYU, Janelia)

note: there is also a set of tools for MATLAB: [MATNWB](https://neurodatawithoutborders.github.io/matnwb/)

## xarray
#### pandas functionality with more dimensions. alternatively, ndarrrays with labels

In [None]:
import xarray as xr

np.random.seed(123)

xr.set_options(display_style="html")

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset(
    {
        "tmin": (("time", "location"), tmin_values),
        "tmax": (("time", "location"), tmax_values),
    },
    {"time": times, "location": ["IA", "IN", "IL"]},
)

ds

In [None]:
df = ds.to_dataframe()
df.head()

### JSON
another common format, kind of like hdf5 but way more general and not for large data. this is often configuration files or something that requires lots of nicely organized metadata. <br>
the `json` package comes with base python and is what to use to interact with JSON type data. like hdf5, it is basically a dictionary, so you can load it into a dictionary and if you have a dictionary you can dumps (save) it it into a JSON file

In [119]:
import json