# Introduction to Python (for Systems Neuro)
## a guided tour of data analysis with python
11 March 2024<br>
NRSC 7610 Systems Neuroscience<br>
Daniel J Denman<br>
University of Colorado Anschutz<br>
<br>

# Important: this is not meant to be a comprehensive guide. 
Use the internet! [Python documentation](https://docs.python.org/3.7/tutorial/index.html), [Stack Overflow](https://stackoverflow.com/), Google, Markdown cheatsheets [(e.g. this one)](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) all are your friends.
### we're going to cover two things in the introduction here:
0. Jupyter notebooks and some pure Python basics
1. Importing useful packages for neuroscience analysis [and doing a few things with them]
<br>[hot tip: python indexing starts with 0!]

# 0. Jupyter notebooks and some pure Python basics
#### Here, we are using a Jupyter notebook environment to run a Python kernel
##### [for posterity: you've had the option to run the Jupyter locally or on Google colab]
##### First, let's get our bearings in a Jupyter notebook<br>
In a Jupyter notebook, we can iteratively explore data, do computations, make plots, and define functions and objects.<br>
The notebook will contain a mix of code, markdown (a simple way to make formatted text) that might explain what is going on in the code, and outputs. The outputs will be in the form of printed statements and plots.
<br>
The fundamental unit of the Jupyter notebook is the cell. Here is an empty code cell:

- You can see the empty brackets on the left; this bracket is empty until the cell is executed
- Cells can be "code", "markdown", or "raw". This cell, for example, is a "markdown cell". When i execute it (by pressing ```Shift + Enter```), it renders the text I have entered. <br>
- In the cell below, a code cell, we will enter some code. To execute it, enter that cell and press ```Shift + Enter```. 

In [None]:
message = 'hello world! time to do some science' #define a variable. this variable is a string, because we put the value in ''
print(message) #print() is a function that is part of core python. it prints text; in the case of notebook, this will appear in the output of this cell, below

Notice that the empty brackets on the left of the cell above have now been filled with a number, which is the order in in which the cell was executed. This will forever increment until the  this bracket is empty until the kernel (or Jupyter notebook) is restarted.
<br>Also, you'll notice i added some text after a ```#```; this is commented code, which is ignored by the interepreter. you can write whatever you want, but the idea is to note what you did, if it is important/not clear from the syntax
<br>
<br>
<br>
So far, this notebook has only the Python kernel at the moment. It can only do basic Python things, like define simple variables and do simple operations:
<br>
<br>
with letters

In [None]:
word = 'some letters'
name = 'your name'


with numbers:

In [None]:
a = 12
b = 3. 

python has built in variable types. in this case, we have used ```str```, ```int```, and ```float```. 
<br>```str``` is declared with ```''```
<br>```int``` is declared with any number [no decimal point]
<br>```float``` is declared with a number with a decimal point
<br>
<br>now we can do some things with the variables

In [None]:
print(type(a))
print(type(word))

In [None]:
a + b

In [None]:
word + name

here are some other useful standard python variable types:
<br>list:

In [None]:
empty_list = []
list_of_things = [1, 'a', a]

dictionary:

In [50]:
dict_of_things = {'key1':a,
                  'key2':b,
                  'key3':9,
                  'someotherkey':group_of_things}

In [None]:
dict_of_things

Another key concept for Jupyter is the difference between using the keyboard to add code or markdown, and using the keyboard to change the notebook itself. When you are "in" a cell, you are adding code or markdown. To jump up a level to add a cell, you need to ```esc``` out of that cell. Now at the noteook level, you can use letters on the keyboard to do thinge like add, delete, copy, paste, etc. cells. To add, press ```a``` (to add above where you are) or ```b``` (to add below). ```c``` for copy, ```x``` for cut, ```v``` for paste. Remember, no ```Shift```s needed. Try adding a cell below this one:

These shortcuts can be found over there on the left in menu with a palette, or on the internet. Other important ones: ```dd``` to delete, and ```Enter``` to go down a level and in to the cell you have selected

--> There are other tricks to notebooks: moving cells, copy/cuttting cells, running all or sets of cells, etc. Demonstrate some.

# Analyzing time series
Almost all of the concepts you will hear about in this course involve measurements of neural activity, usually relative to something an experimenter is controlling or measuring. You will see spikes, Ca2+ transients that reflect spikes, field potentials, maybe others. 

#### We are going to start with a reduced case, to focus a bit on the coding in this lecture. 
We'll have one neuron's action potential times and the times of stimuli that may (or may not) affect this neuron. <br><br>
The nature of the neuron and the stimuli aren't critical for this intro; several lectures will get into greater detail wbith analysis applications later in the course. <br>

In [None]:
#load one cell
#load some stim times

our goal will be to plot the peri-event (also called peri-stimulus in some cases) histogram, or PETH (or PSTH). like so:

<br><br>
# 1. Importing useful packages
The basic python things are useful, but we will probably need to import some packages to do any kind of data analysis. For most science, numpy and matplotlib (or packages that use matplotlib) are a good place to start. For many "data science" applications and some forms of analysis, pandas is also a very useful package. seaborn goes well with pandas, especially for making "big data" plots

## numpy

In [None]:
import numpy as np

We can now use numpy, an extensive package of quantitative tools (**num**erical **py**thon)
<br>**As with all packages (and objects), you access attributes of the package with the ```.``` notation. The ```.``` means you are "going in something", to get an attribute or function that lives inside of it**
<br>So with that ```import numpy as np``` statement above, we have brought the numpy package and all of its attributes into our notebook. if we want to use a numpy function, we use ```np.name_of_numpy_function_we_want```. For example, numpy has a function called ```save```, which saves a thing to disk. You would use this by typing ```np.save(thing_to_save)```. similarly, the numpy ```load``` function is invoked with ```np.load(thing_to_load)```

packages also have non-function attributes - strings and floats or whatever else. for example, numpy has pi as an attribute, since one sometimes wants pi, to, you know, do numerical calculateion type things:

In [None]:
np.pi

<br>The most important thing is the new variable type: the numpy ndarray. an ndarray is an n-dimensional group of numbers. Here are example one, two, and three dimensional ndarrays:

In [None]:
one_dim = np.array([1,2,3])
two_dim = np.array([[1,2,3,4,5],[1,2,3,4,5]])
3_dim = np.array([[[1,2,3],[1,2,3],[1,2,3]],[[1,2,3],[1,2,3],[1,2,3]]])

ok, here we've done something wrong, and python has given us an error after it tried to do the wrong thing we told it to do. in this case, we tried to name a variabile with an integer at the beginning. that's not allowed. 

In [10]:
three_dim = np.array([[[1,2,3],[1,2,3],[1,2,3]],[[1,2,3],[1,2,3],[1,2,3]]])

### note: these ```ndarray```s are basically lists of numbers. 
but, of course, that is the core of what we are doing with *any* kind of quantitative analyusis. ```numpy``` has many, many functions to manipulate and do analysis on them. and numpy ```ndarray```s are much faster than pure python lists. ```scipy``` has many other anaylses, and expects ```numpy``` ```ndarray```s as input. Machine learning packages like ```scikit-learn```, ```tensorflow```, etc. will also often expect numpy, be faster with them, or at least be compatible. 

let's move on to plotting some things

In [3]:
import matplotlib.pyplot as plt

## matplotib
an extensive package of plotting tools

first, make some numpy nd arrays to plot. these are going to be:
- ```x```: a 1D array increasing from 1.0 to 30.0, over 1000 data points
- ```y```: a 1D array of a the sin(x), over 1000 data points

In [7]:
x = np.linspace(1,30,1000)
y = np.sin(x)

now we can plot what ```x``` and ```y``` look like:

In [None]:
plt.plot(x)

In [None]:
plt.plot(x,y)
plt.xlabel('x data')
plt.ylabel('y data')

and do some simple calculations on them:

In [None]:
print('mean: '+str(np.mean(y)))
print('s.d.: '+str(np.std(y)))

we can also do more complicated measurements, like finding the area between two parts of a curve.

In [None]:
plt.plot(x,y)
plt.axhline(np.mean(y),color='red')
plt.axhline(np.std(y),color='pink');plt.axhline(np.std(y)*-1,color='pink')
plt.xlabel('x data')
plt.ylabel('y data')

question: in what ranges of x is y above the s.d. of y?

In [None]:
indices = np.where(y>np.std(y))[0]
print(indices)

In [None]:
plt.plot(x,y)

plt.axhline(np.mean(y),color='red')
plt.axhline(np.std(y),color='pink')
plt.axhline(np.std(y)*-1,color='pink')

plt.fill_between(x[indices], y[indices], np.std(y),color='pink')

plt.xlabel('x data')
plt.ylabel('y data')

## Pandas
The core data structure in ```pandas``` is a ```DataFrame```.<br>
A ```DataFrame``` is essentially a nice table - you could think of it like Excel with simpler indexing and the ability to make any kind of plot your heart desires. 

In [11]:
import pandas as pd

in fact we start with a nice function of pandas, ```read_csv```, which is importing a csv file (of the type you'd import into excel). note: you can also ```pd.read_excel``` to just read an .xlsx file!

make a DataFrame, where the rows are each trial and columns are anything you want for that trial

In [36]:
df = df[df.year < 2020]

## seaborn
for easy/pretty plotting with pandas

In [30]:
import seaborn as sns

In [None]:
sns.lineplot(data=df,x='year',y='count',hue='language')

this seems silly to look back that far - let's limit to after python was developed, 1991:

also, we'll flip the order:

In [None]:
sns.lineplot(data=df,x='year',y='count',hue='language',hue_order=['MATLAB','python'])

you can of course control every aspect of these plots. for example, colors!

In [None]:
sns.set_palette(sns.color_palette('colorblind')[::-1])
sns.lineplot(data=df,x='year',y='count',hue='language',hue_order=['MATLAB','python'])

## Flow control
To script things (make function, automate the boring stuff, etc), we need to know the syntax for "flow control". This means things like ```if``` statements and ```for``` loops. We'll do examples of those, as well as put some of that into a function we define.
<br><br>
Let's go back to a variabile we had before, ```group_of_things```

In [49]:
group_of_things

[12, 3.0, 'some letters']

Let's say we want to print only the non-string things in ```group of things```. We'll make a ```for``` loop with an ```if``` statement

What if we want to do the same thing for a different list? 

In [None]:
another_list = list(dict_of_things.values())
print(another_list)

we could copy and paste, or we could put the flow control we have already written in a function: