## Intro


It's not a step-by-step introduction to Python programming, check out Software Carpentry lessons and join one of their workshops if you need it: https://swcarpentry.github.io/python-novice-inflammation/, but:


* Python is meant to resemble a human language
* you can go long way without knowing the details of Python syntax,
* just load some data and start playing with it
* you can start to use Python to replace your excel sheets or statistical package or create simple graphs to share with colleagues
* you can also use python as a programmable calculator

## Using Jupyter lab

* moving around
* editing mode
* executing cells
* getting help
* keyboard shorcuts: 
  - Enter (to enter edito mode), 
  - Shit-Enter (Run), 
  - Esc (enter command mode), 
  - M (markdown, in command mode), 
  - X (remove cell, in command mode)
  - b (command mode, insert new cell below)

## Basic Python

##### expressions:

  ```python
  a = 4
  b = a + 1
  print(f"{a} + 1 = {b}")
  ```

##### data structures

  ```python
  # list
  my_list = [1, 5, 6]
  print(my_list[0])

  # string
  my_string = "hello world"

  # tuple
  my_tuple = (4, 5)
  x, y = my_tuple

  # dictionary
  my_dict = {'a': 1, 'b': 3}
  print(my_dict['a']) 

```

##### conditionals:

  ```python
  if a > 0:
     print("a is positive")
  ```

##### loops

  ```python
  my_list = [1, 2, 3, 4]
  for i in range(4):
      print(my_list[i])
  ```

##### functions

  ```python
  def my_function(a):
      return a + 1
  print(my_function(5))
  ```


### Quiz

Name the type of the following data structures:

  a) `var_a = {'k': 0, 'l': 5}`

  b) `var_b = "Paris"`
  
  c) `var_c = ('hello', 'world')`
  
  d) `var_d = [(1, 1), (2, 2),  (3, 3)]`

What are the values of the following expressions:

  a) `var_a['k']`
  
  b) `var_b[1]`
  
  c) `var_d[2]`
  
  d) `var_a[1]`

## Importing and exploring data

* importing libraries
* pandas
* `read_csv`, `describe`, `head`

In [None]:
import pandas as pd

In [None]:
#url = 'https://raw.githubusercontent.com/btel/2022-09-21-eitn-school/main/eeg_powers.csv'
url = 'https://bit.ly/3BTE0A1'
df = pd.read_csv(url, index_col=0)

In [None]:
df.to_csv('eeg_data_temp.csv', index=True)

Definitions of EEG bands:

* delta 0.5 -- 4 Hz
* alpha 8 -- 13 Hz,  
* beta 13 -- 30 Hz, 
* gamma: > 30 Hz

For details, see my notebook with feature extraction: https://www.kaggle.com/btelenczuk/eeg-extract-features


## Working with categorical data

* `unique`, `nunique`, `value_counts`

## Plotting: distributions


* pandas: `hist`

## Transforming data

* **Goal**: "normalize" powers distribution
* boolean indexing/masking/filtering
* `.apply`
* seaborn: `distplot`
*  interpretting results, building hypotheses

In [None]:
import seaborn as sns

## Scatter plots

*  **Goal**: identify dependencies between continous variables (powers)
* `.plot.scatter` or `.plot(kind='scatter', ...)`
* refine hypotheses

## Compare groups

* **Goal**: 
  - 1) explore dependencies between categorical and continuous variables 
  - 2) identify causes for underlying variability in CV (stratification)
* `.groupby`
* transposition, `.T`
* `.plot.bar`
* `sns.boxplot`
* **Excercise**: powers vs electrodes

## Tidy data (advanced)

* **Goal**: Combine multiple data dimensions in one graph (2 cat and 1 CV)
* tidy data
* pandas `df.melt`
* `sns.boxplot(data= , x=, y=, hue=)`

In order to plot different freq bands and states on the same graphy, we need to reformat the data in the [tidy (long) format](https://seaborn.pydata.org/tutorial/data_structure.html#long-form-vs-wide-form-data). For example, we will use it to plot boxplots with mutliple bar hues: https://seaborn.pydata.org/generated/seaborn.boxplot.html 

## Splitting data

* **Goal**: split data into distinct groups based on data (stratification)
* boolean indexing
* `.isin`
* `sns.pairplot`

## Visualising correlations

* **Goal**: Analyse correlations in stratified data
* `sns.pairplot`
* `df.corr`
* `sns.heatmap`

## Statistics (advanced)

* **Goal**: Look for differences in powers in different states
* statsmodels
* `statsmodels.formula.api.ols`
* `statsmodels.api.stats.anova_lm`

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

## Clustering (advanced)

* **Goal**: unsupervised learning
* sklearn
* `sklearn.cluster.KMeans`
* `.fit`, `labels_`
* plotting multiple graphs in the same axes

In [None]:
from sklearn import cluster

## Groupby revisited

* **Goal**: plot number of channel recordings belonging to each cluster (subjects and states confounded)
* `.value_counts`