# Overview

We have covered a number of key libraries:

* Using Pandas to organize data, extract subsets, calculate group-level statistics and make plots.
* Numpy for processing multidimensional data (primarily images and electrophysiology).
* Scipy for curve fitting.

A few special-purpose ones:

* PyABF (for reading Axon Binary Format files)
* czifile for reading in confocal data
* PIL for creating and manipulating images
* scanpy for analyzing single cell gene expression
* scikit-learn (sklearn) for machine learning
* statsmodels for linear regression
* nibabel for working with MRI data

And plotting libraries:

* Matplotlib
* Seaborn
* Bokeh

And concepts including:

* Creating functions
* List comprehensions
* Tuple unpacking
* For loops
* Variable assignment
* Copies vs views
* Inspecting objects and learning how to manipulate them
* Looking up documentation

# Numpy vs Pandas. How to choose?

Pandas provides high-level data manipulation tools built on Numpy. As a rule of thumb, if you have data that resembles what you'd store in an Excel spreadsheet (particularly if it's a mix of different types such as numbers and text), you almost certainly want Pandas. If you have homogeneous numeric data, then your choice will depend on what you want to do with the data. If you need to do linear algebra operations (e.g., dot products, affine transforms, etc.) or signal processing (e.g., FFT, resampling, etc.), Numpy is a better choice.

You can easily convert your data back and forth between Pandas and Numpy depending on what you need to do with the data (see Stephen and Charlie's lecture in which they packaged their Numpy arrays into dataframes for use with `statsmodels`). You are not stuck with your initial choice.

## Example
You have a 3D dataset (e.g., 2D images acquired over time). Numpy is typically a good choice for this kind of data (as shown by last week's class which was a series of 3D images acquired over time). In theory, we could coerce 3D and 4D data into a dataframe. Let's take a hypothetical example where we have acquired a series of 2D images over time. This can be represented by a 3D array where each color indicates a 2D slice acquired at a particular timepoint:

![timeseries data as array](timeseries_array.png)

This could also be represented by a dataframe where columns are the X-values and we use an index with hierarchial labels to denote the timepoint and Y values (note that we took advantage of Pandas' timeseries functionality when creating the timepoint labels).

![timeseries data as dataframe](timeseries_dataframe.png)

Your choice of which to use will depend on the analysis you need to perform with the data. Numpy provides For example, do you need to take advantage of the functionality offered by Pandas? This functionality includes:

* Rich time series handling (e.g., dates and times)
* Label-based data extraction and alignment
* Missing data statistics
* Grouping, merging and joining

Or, do you need to do basic numerical calculations on Numpy arrays such as:

* Linear algebra
* Signal processing

# Running code

In this class, we've only used Jupyter notebooks for writing and running Python code. They're not a bad thing to use. I actually use Jupyter notebooks quite a bit in my own data analysis. However, you should know how to run code outside of a Jupyter notebook. If you look at the notebook files, they end in the `.ipynb` extension. Notebooks are a special format that mix Python code and Markdown text. In contrast, a Python script or module is a file that ends in `.py`. Let's test this out.

## Exercise

Let's run a Python script from the command-line using the following steps. To obtain a script, download an [example from Seaborn's gallery](https://seaborn.pydata.org/examples/index.html). Remember where you saved the file to.

Next, open the file in a text editor (any text editor is fine). At the very end of the file, add the following two lines. These lines are required when running a Python file as a script to view the plots.

    import matplotlib.pyplot as plt
    plt.show()
    
Finally, open up an Anaconda prompt and run the following sequence of commands. Be sure to substitute the appropriate values for the `<placeholders>` below.

    source activate NEUS642
    cd <path to folder containing file>
    python <script_name.py>
    
For example, If I was using a Windows laptop and downloaded the `color_palettes.py` script to the Downloads folder, I would enter the following sequence of commands (my username is `bburan`):
    
    source activate NEUS642
    cd c:/users/bburan/Downloads
    python color_palettes.py
    
If you are successful, you will see a figure pop up in a contained window. Close the window to exit the script.

## Exercise

Now, let's create our own really simple script and run it from the command line. Using a text editor of your choice, create a new file named `my_script.py` and enter the following code:

    color = input('What is your favorite Matplotlib color? ')
    print(f'My favorite color is {color}, too!')
    
Now, run the code by entering the following commands in the terminal:

    source activate NEUS642
    cd <path to folder containing file>
    python my_script.py
    
## Exercise

Let's expand on the previous exercise and make the code a little more complex. Let's make sure that it's a valid Matplotlib color. The function `matplotlib.colors.is_color_like` takes a color name and returns True if it's valid, False otherwise.

    import matplotlib as mp

    while True:
        color = input('What is your favorite Matplotlib color? ')
        if mp.colors.is_color_like(color):
            break
        else:
            print(f'{color} is not a valid Matplotlib color.')

    print(f'My favorite color is {color}, too!')

No cheating by cutting and pasting! Type out the whole thing (including indentation).

# Using a code editor

We haven't discussed this during the class, but Python can be picky about whitespace and indentation. If you used a text editor in the previous exercise that's not Python-aware, you may have run into a few issues. For example, Python does not let you mix the use of tabs and spaces for indentation.

When you were writing code in the Jupyter notebook, Jupyter quietly handled the indentation for you. Recall how you could type `if x > 5:` and hit enter? The cursor would automatically be indented by four spaces on the next line. If you used the tab key instead of the space bar to indent your code, Jupyter quietly converted your tab to four spaces. Your code also is colored using syntax highlighting. This is an example of a Python-aware text editor.

There are many good code editors out there. A very popular, and free, option is [Spyder](https://www.spyder-ide.org/).

How do we install it?

# Python environments

First, let's take a step back and talk about Python environments. Recall how we set up an environment named `NEUS642` for the class? The reason I asked you to do this is because some of you were already using Python and I wanted to make sure that everyone was working with the *same* version of all the libraries used in the class. Some of you encountered this problem early in the course (e.g., you were using an older version of `seaborn` that didn't have some of the newer functions used in the exercises).

It's not uncommon for packages to implement changes that can break your code. For example, version 1.0.0 of `scipy` removed some statistical functions such as `f_value`.  If you had written code using this function, you would not be able to upgrade to the latest version of `scipy`. What if you wanted to write a new script that uses `scipy.signal.find_peaks`? This function `scipy.signal.find_peaks` was added in version 1.1.0. You can't have two versions of `scipy` installed on your computer ... or, can you?

This is where Python environments come in handy. When you installed [Anaconda Python](https://www.anaconda.com/download), it automatically created a base (i.e., default) environment for you. It contains a certain version of Python (e.g., 2.7 or 3.6 depending on which installer you chose) and some basic packages (e.g., `numpy`, `scipy`, etc.). In addition to this base environment, you can create as many additional environments as needed. Each environment can contain a different version of Python and other packages as illustrated by the following cartoon (sourced from [freeCodeCamp](https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c)).

![cartoon of environments](environments.jpeg)

We already created an environment called `NEUS642` for the course. When you first open up an Anaconda prompt, it uses the base environment by default. If you want to use a different environment, you need to specifically activate it:

    conda activate NEUS642
    
To switch back to the base environment:

    conda activate base
    
If you're using an older version of Anaconda on OSX or Linux, you may need to type the following instead:

    source activate NEUS642
    source activate base
    
The syntax for creating a new environment using the Anaconda prompt is:

    conda create --name my_environment_name python=3.7 numpy scipy=1.2.0
    
Here, we have: 

* Specified the name, `my_environment_name` (which we provide to `conda activate` whenever we wish to switch to that environment).
* Requested a specific version of Python and scipy.
* Requested the latest version of numpy which is compatible with Python 3.7 and scipy 1.2.0.

`conda` will then work to find the packages and download them. All packages are downloaded from the default channel on [anaconda.org](https://anaconda.org/anaconda). These packages are maintained by employees at Anaconda (the company responsible for creating the Anaconda Python Distribution). If the packages are not available via the default channel, they may be available by channels maintained by community members. Popular channels include [bioconda](https://anaconda.org/bioconda/) which maintains a series of bioinformatics packages and [conda-forge](https://anaconda.org/conda-forge) which sometimes contains newer versions of packages than the default channel. If you need to check whether a package is available, you can search [anaconda.org](https://anaconda.org).

Once you've created your new environment, you can install additional packages:

    conda install -n my_environment_name package_name

If you find a package in another channel that you'd like to install, you can specify the channel using the `-c` flag:

    conda install -n my_environment_name -c channel_name package_name
    
Note that this is different from what I asked you to do during class. In class, I asked you to do the following:

    conda activate NEUS642
    conda install package_name
    
Either approach is fine!

What if you can't find the package on [anaconda.org](https://anaconda.org)? If it's available via the [Python package index](https://pypi.org/), then you can install it via `pip` (note that you need to activate the environment first):

    conda activate my_environment_name
    pip install package_name
    
A key difference between `conda` and `pip` packages is that `conda` packages are always compiled for the platform (i.e., Windows, Linux, OSX). This means that they are ready to go out of the box. In contrast, some packages installed via `pip` may contain Fortran, C or C++ code. In this event, you will need to install the Fortran, C or C++ compiler before `pip` can install the package. On Linux computers this is generally a very simple process. On Windows (and, possibly, OSX) it can be a pain. Dealing with this is outside the scope of today's exercise. Fortunately, the majority of mainstream scientific Python packages are easily installable via `conda` and `pip`.
 
## Exercise
Create an environment named `NEUS642_final` containing Python 3.6, `numpy` (any version), `scipy` (any version), `palettable` and `NEUS642-fake-package`.

Once you've set this up, create a Python script containing the following:

    from NEUS642_fake_package import print_report
    print_report()
    
Run the script using your new environment. If you've successfully set it up properly, you will see the following output:

    Correct version of Python is installed. You are running 3.6.
    You successfully installed numpy
    You successfully installed scipy
    You successfully installed palettable
    Congratulations, you successfully set up your environment.


In [None]:
# Answer
# > conda create -n NEUS642_final python=3.6 numpy scipy
# > conda install -n NEUS642_final -c conda-forge palettable
# > conda activate NEUS642_final
# (NEUS642_final) > pip install NEUS642-fake-package

# Installing code editors   

Now that we've discussed Python environments, let's install a code editor. There are many code editors. Some are environment-aware. Some aren't. An environment-aware code editor allows you to switch between the varoius Python enviornments when running code. 

## Spyder

Right now, Spyder isn't environment-aware. This means that you need to install Spyder into the environment you want to use it with:

    conda install -n my_environment_name spyder
    
To run Spyder:

    conda activate my_environment_name
    spyder
    
If you wish to use spyder with a different environment, you'll need to install it into that environment.

### Demo of spyder

Charlie will walk you through some of the features of Spyder.

![spyder screenshot](spyder.png)

## Jupyter

Jupyter Notebook and Jupyter Lab (the next version of Jupyter Notebook) are environment aware. This means you can install Jupyter into your base environment and then register your new environments with it. For example, to install Jupyter Notebook:

    conda install -n base jupyter
    
To install Jupyter Lab (which is a browser-based code editor):

    conda install -n base -c conda-forge jupyterlab
    
To register your environments so that you can select which one is used for your notebook or script in Jupyter (remember how you could select the kernel in exajupyter back when it worked?), run the following commands (`--display-name "My cool new Python env"` is optional):
    
    conda install -n my_environment_name ipykernel
    conda activate my_environment_name
    python -m ipykernel install --user --display-name "My cool new Python env"
    
If you're wondering about Jupyter, I had you install Jupyter into your `NEUS642` environment at the beginning of the semester. In retrospect, I should have asked you to use the approach above (where you install Jupyter into your base environment and then register NEUS642 with Jupyter).

# Code reuse and sharing

We've discussed the difference between Jupyter notebooks and Python scripts. However, we haven't even discussed a key topic. You've written a set of functions that you'd like to package up and share with others. If you look around online, you'll see references to things like `sys.path` and `PYTHONPATH`. **Ignore those.** Although they will work, you are just setting yourself up for a painful transition in a few months as your coding skills develop and your needs expand beyond these.

Let's start with a very basic example of how to re-use functions defined in one Python file in another Python file. Using Spyder, create a folder called `code_reuse_demo` containing the following files and folders:

    code_reuse/
        special/
            functions.py
        data_generator.py
        analysis_script.py

For `code_reuse/special/functions.py`, add the following code:

    def print_value(x):
        print(f'The value is {x}')
        
For `code_reuse/data_generator.py`, add the following code:

    import random
    
    def make_integer():
        return random.randint(0, 10)

For `code_reuse/analysis_script.py`, add the following code:

    from special import functions
    from data_generator import make_integer
    x =  make_integer()
    functions.print_value(x)
    
Now, run `analysis_script.py` in Spyder. Note that you can also use the interactive terminal to import your code. Try it out and play with different import options:

    import data_generator
    data_generator.make_integer()
    
    from data_generator import make_integer
    make_integer()
    
    from data_generator import make_integer as mi
    mi()
    
This works because Python's import machinery looks in several places to see if it can find the package you've requested. If you do `import my_module`:

* It first starts at the directory containing the script and checks to see if there's a `my_module.py` or `my_module` folder contained in the same directory as the script.
* If that fails, then it checks all the packages that you've installed using `conda` or `pip` to see if there's a match.
* If that fails, you get an `ImportError`.

This is how you write functions that you can share among several scripts. However, what if you want to be able to make the functions available without having to ensure the script is in the same folder as the files defining the functions? We are starting to get into Python developer territory.

# Creating a Python package

# Sharing assets

## Code

## Other assets such as notebooks

## Licensing

In [None]:
# GUIs

# Creating EXEs

# Working with large data files