# Overview

We have covered a number of key libraries:

* Using Pandas to organize data, extract subsets, calculate group-level statistics and make plots.
* Numpy for processing multidimensional data (primarily images and electrophysiology).
* Scipy for curve fitting.

A few special-purpose ones:

* PyABF (for reading Axon Binary Format files)
* czifile for reading in confocal data
* PIL for creating and manipulating images
* scanpy for analyzing single cell gene expression
* scikit-learn (sklearn) for machine learning
* statsmodels for linear regression
* nibabel for working with MRI data

And plotting libraries:

* Matplotlib
* Seaborn
* Bokeh

And concepts including:

* Creating functions
* List comprehensions
* Tuple unpacking
* For loops
* Variable assignment
* Copies vs views
* Inspecting objects and learning how to manipulate them
* Looking up documentation

# Numpy vs Pandas. How to choose?

Pandas provides high-level data manipulation tools built on Numpy. As a rule of thumb, if you have data that resembles what you'd store in an Excel spreadsheet (particularly if it's a mix of different types such as numbers and text), you almost certainly want Pandas. If you have homogeneous numeric data, then your choice will depend on what you want to do with the data. If you need to do linear algebra operations (e.g., dot products, affine transforms, etc.) or signal processing (e.g., FFT, resampling, etc.), Numpy is a better choice.

You can easily convert your data back and forth between Pandas and Numpy depending on what you need to do with the data (see Stephen and Charlie's lecture in which they packaged their Numpy arrays into dataframes for use with `statsmodels`). You are not stuck with your initial choice.

## Example

You have a 3D dataset (e.g., 2D images acquired over time). Numpy is typically a good choice for this kind of data (as shown by last week's class which was a series of 3D images acquired over time). In theory, we could coerce 3D and 4D data into a dataframe. Let's take a hypothetical example where we have acquired a series of 2D images over time. This can be represented by a 3D array where each color indicates a 2D slice acquired at a particular timepoint:

![timeseries data as array](timeseries_array.png)

This could also be represented by a dataframe where columns are the X-values and we use an index with hierarchial labels to denote the timepoint and Y values (note that we took advantage of Pandas' timeseries functionality when creating the timepoint labels).

![timeseries data as dataframe](timeseries_dataframe.png)

Your choice of which to use will depend on the analysis you need to perform with the data. Do you need to take advantage of the functionality offered by Pandas? This functionality includes:

* Rich time series handling (e.g., dates and times)
* Label-based data extraction and alignment
* Missing data statistics
* Grouping, merging and joining

In contrast, Numpy provides: 

* Robust handling of multidimensinoal data
* Optimized linear algebra algorithms
* Signal processing tools 

# Running code

In this class, we've only used Jupyter notebooks for writing and running Python code. They're not a bad thing to use. I actually use Jupyter notebooks quite a bit in my own data analysis. However, you should know how to run code outside of a Jupyter notebook. If you look at the notebook files, they end in the `.ipynb` extension. Notebooks are a special format that mix Python code and Markdown text. In contrast, a Python script or module is a file that ends in `.py`. Let's test this out.

## Exercise

Let's run a Python script from the command-line using the following steps. To obtain a script, download an [example from Seaborn's gallery](https://seaborn.pydata.org/examples/index.html). Remember where you saved the file to.

Next, open the file in a text editor (any text editor is fine such as Notepad on Windows and TextEdit on OSX). At the very end of the file, add the following two lines. These lines are required when running a Python file as a script to view the plots.

    import matplotlib.pyplot as plt
    plt.show()
    
Finally, open up a terminal or command prompt (on Windows, you may have to open the Anaconda Prompt from the start menu instead of the regular command prompt). Be sure to substitute the appropriate values for the `<placeholders>` below: 

    source activate NEUS642
    cd <path to folder containing file>
    python <script_name.py>
    
For example, If I was using a Windows laptop and downloaded the `color_palettes.py` script to the Downloads folder, I would enter the following sequence of commands (my username is `bburan`):
    
    source activate NEUS642
    cd c:/users/bburan/Downloads
    python color_palettes.py
    
On OSX, the folder is most likely located at:

    /Users/bburan/Downloads
    
On Linux, it would be:

    /home/bburan/Downloads

If you are successful, you will see a figure pop up in a contained window. Close the window to exit the script.

## Exercise

Now, let's create our own really simple script and run it from the command line. Using a text editor of your choice, create a new file named `my_script.py` and enter the following code:

    color = input('What is your favorite Matplotlib color? ')
    print(f'My favorite color is {color}, too!')
    
Now, run the code by entering the following commands in the terminal:

    source activate NEUS642
    cd <path to folder containing file>
    python my_script.py
    
## Exercise

Let's expand on the previous exercise and make the code a little more complex. Let's make sure that it's a valid Matplotlib color. The function `matplotlib.colors.is_color_like` takes a color name and returns True if it's valid, False otherwise. Create a new script, `my_script_verify_color.py` with the following:

    import matplotlib as mp

    while True:
        color = input('What is your favorite Matplotlib color? ')
        if mp.colors.is_color_like(color):
            break
        else:
            print(f'{color} is not a valid Matplotlib color.')

    print(f'My favorite color is {color}, too!')

**No cheating by cutting and pasting! Type out the whole thing (including indentation).**

As an aside, if you were wondering about the code. This uses a `while` loop. We haven't discussed this type of loop in class (it's less commonly used than `for` loops). The syntax for a while loop is:

    while condition:
        do something
        
Condition is a Python expression that evaluates to True or False. For example, you can have the following code block that increments `x` by 1 while it's less than 5:

    x = 0
    while x < 5:
        x = x + 1
        
In the example I asked you to write, the condition is `True`. That is:

    while True:
        ...
        
That means the loop will *never* exit! Seems pretty strange, doesn't it? It's actually a common approach in Python. You'll often see `while` loops that look like:

    while True:
        ...
        if condition:
            break
            
Here, `break` is a special statement in Python that indicates it's time to exit the `while` loop (you can also use it in `for` loops to terminate the loop early). Looking back at our example above, we are telling the program to loop continuously *until* the user provides a valid Matplotlib color name. Once the user provides a valid Matplotlib color name, the `break` statement is executed and the `while` loop exits. You *could* have written the code as:

    color = input('What is your favorite Matplotlib color? ')
    while not mp.colors.is_color_like(color):
        print(f'{color} is not a valid Matplotlib color.')
        color = input('What is your favorite Matplotlib color? ')

    print(f'My favorite color is {color}, too!')
    
However, note that you had to copy the same line twice (once before the loop to initialize the `color` variable and once in the loop to get a new color if the user provided a bad one). Python programmers generally favor the `while True` approach because removes this redundancy.

# Using a code editor

We haven't discussed this during the class, but Python can be picky about whitespace and indentation. If you used a text editor in the previous exercise that's not Python-aware, you may have run into a few issues. For example, Python does not let you mix the use of tabs and spaces for indentation.

When you were writing code in the Jupyter notebook, Jupyter quietly handled the indentation for you. Recall how you could type `if x > 5:` and hit enter? The cursor would automatically be indented by four spaces on the next line. If you used the tab key instead of the space bar to indent your code, Jupyter quietly converted your tab to four spaces. Your code also is colored using syntax highlighting. This is an example of a Python-aware text editor.

There are many good code editors out there. A very popular, and free, option is [Spyder](https://www.spyder-ide.org/).

How do we install it?

# Python environments

First, let's take a step back and talk about Python environments. Recall how we set up an environment named `NEUS642` for the class? The reason I asked you to do this is because some of you were already using Python and I wanted to make sure that everyone was working with the *same* version of all the libraries used in the class. Some of you encountered this problem early in the course (e.g., you were using an older version of `seaborn` that didn't have some of the newer functions used in the exercises).

It's not uncommon for packages to implement changes that can break your code. For example, version 1.0.0 of `scipy` removed some statistical functions such as `f_value`.  If you had written code using this function, you would not be able to upgrade to the latest version of `scipy`. What if you wanted to write a new script that uses `scipy.signal.find_peaks`? This function `scipy.signal.find_peaks` was added in version 1.1.0. You can't have two versions of `scipy` installed on your computer ... or, can you?

This is where Python environments come in handy. When you installed [Anaconda Python](https://www.anaconda.com/download), it automatically created a base (i.e., default) environment for you. It contains a certain version of Python (e.g., 2.7 or 3.6 depending on which installer you chose) and some basic packages (e.g., `numpy`, `scipy`, etc.). In addition to this base environment, you can create as many additional environments as needed. Each environment can contain a different version of Python and other packages as illustrated by the following cartoon (sourced from [freeCodeCamp](https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c)).

![cartoon of environments](environments.jpeg)

We already created an environment called `NEUS642` for the course. When you first open up an Anaconda prompt, it uses the base environment by default. If you want to use a different environment, you need to specifically activate it:

    conda activate NEUS642
    
To switch back to the base environment:

    conda activate base
    
If you're using an older version of Anaconda on OSX or Linux, you may need to type the following instead:

    source activate NEUS642
    source activate base
    
The syntax for creating a new environment using the Anaconda prompt is:

    conda create --name my_environment_name python=3.7 numpy scipy=1.2.0
    
This is just an example! You don't actually have to specify the packages listed above if you don't need them for your program.
    
Here, we have: 

* Specified the name, `my_environment_name` (which we provide to `conda activate` whenever we wish to switch to that environment).
* Requested a specific version of Python and scipy.
* Requested the latest version of numpy which is compatible with Python 3.7 and scipy 1.2.0.

`conda` will then work to find the packages and download them. All packages are downloaded from the default channel on [anaconda.org](https://anaconda.org/anaconda). These packages are maintained by employees at Anaconda (the company responsible for creating the Anaconda Python Distribution). If the packages are not available via the default channel, they may be available by channels maintained by community members. Popular channels include [bioconda](https://anaconda.org/bioconda/) which maintains a series of bioinformatics packages and [conda-forge](https://anaconda.org/conda-forge) which sometimes contains newer versions of packages than the default channel. If you need to check whether a package is available, you can search [anaconda.org](https://anaconda.org) (be sure to note the channel the package is available through).

Once you've created your new environment, you can install additional packages:

    conda install -n my_environment_name package_name

If you find a package in another channel that you'd like to install, you can specify the channel using the `-c` flag:

    conda install -n my_environment_name -c channel_name package_name
    
Note that this is different from what I asked you to do during class. In class, I asked you to do the following:

    conda activate NEUS642
    conda install package_name
    
Either approach is fine!

What if you can't find the package on [anaconda.org](https://anaconda.org)? If it's available via the [Python package index](https://pypi.org/), then you can install it via `pip` (note that you need to activate the environment first):

    conda activate my_environment_name
    pip install package_name
    
A key difference between `conda` and `pip` packages is that `conda` packages are always compiled for the platform (i.e., Windows, Linux, OSX). This means that they are ready to go out of the box. In contrast, some packages installed via `pip` may contain Fortran, C or C++ code. In this event, you will need to install the Fortran, C or C++ compiler before `pip` can install the package. On Linux computers this is generally a very simple process. On Windows (and, possibly, OSX) it can be a pain. Dealing with this is outside the scope of today's exercise. Fortunately, the majority of mainstream scientific Python packages are easily installable via `conda` and `pip`.
 

## Exercise

Create an environment named `NEUS642_final` containing Python 3.6, `numpy` (any version), `scipy` (any version), `palettable` and `NEUS642-fake-package`.

Once you've set this up, create a Python script containing the following:

    from NEUS642_fake_package import print_report
    print_report()
    
Run the script using your new environment. If you've successfully set it up properly, you will see the following output:

    Correct version of Python is installed. You are running 3.6.
    You successfully installed numpy
    You successfully installed scipy
    You successfully installed palettable
    You successfully installed matplotlib
    Congratulations, you successfully set up your environment.

In [None]:
# Answer
# > conda create -n NEUS642_final python=3.6 numpy scipy matplotlib
# > conda install -n NEUS642_final -c conda-forge palettable
# > conda activate NEUS642_final
# (NEUS642_final) > pip install NEUS642-fake-package

# Installing code editors   

Now that we've discussed Python environments, let's install a code editor. There are many code editors. Some are environment-aware. Some aren't. An environment-aware code editor allows you to switch between the varoius Python enviornments when running code. 

## Spyder

The current version of Spyder isn't environment-aware. This means that you need to install Spyder into the environment you want to use it with:

    conda install -n my_environment_name spyder
    
To run Spyder:

    conda activate my_environment_name
    spyder
    
If you wish to use spyder with a different environment, you'll need to install it into that environment. For now, let's install it into our `NEUS642_final` environment. How do you do that?

If you get a segmentation fault, you probably need to update the Qt libraries that Spyder uses for creating the GUI. Run:

    conda update -n NEUS642_final qt
    
It should update the libraries from version 5.6 to 5.9.

![spyder screenshot](spyder.png)

There are many options you can set up in order to customize your Spyder environment. We'll just go through a couple useful ones so you can see where they're located. I'd encourage you to look around more on your own and customize Spyder to best suit your needs/preferences.

By default (I think) Spyder will output any figures you make into the IPython console (we'll talk about what this is in a bit). This isn't very convenient if you want to interact with your plots (zoom, save, etc.) 

### Exercise
Navigate to tools -> preferences -> IPython console -> Graphics and make sure the backend is set to Automatic.

When you're done, paste the code below into the large text window on the left and press the green arrow button on the top toolbar. When prompted, save this file as `my_first_spyder_script.py`. You should see a new window pop up with your plot. If not, ask a TA for help.

```
import matplotlib.pyplot as plt
import numpy as np

X = np.arange(0, 50, 0.1)
Y = np.sin(X)

plt.figure()
plt.plot(X, Y, 'o-', markerfacecolor='purple', color='k')
```

## Writing and running a script
Congratulations! You've just created and run your first Python script in Spyder. Notice how Spyder colors the relevant Python syntax (like Jupyter did for us). It also does things like deal with indentations (as Brad talked about earlier) and warns us when we've entered code that looks like an error.

### Mini-exercise
Beneath the plot command, type the following line: `plt.plot(X, Z)`. Wait a second... what do you notice?

**How does Spyder actually run the code?**
So, what exactly happens when you hit the "big green play button"? Spyder takes all of the code in your active script window, and executes it in the IPython console (the window in the bottom right for the default setup). This is like running a code cell in Jupyter. By default, Spyder will save any variables that are created when you run a script. Thus, if you run two different scripts in the same IPython console, you'll have to be careful about what you name your variables, or you could end up with some unexpected errors/outputs. This can get to be a hassle. There are a couple of ways around this. The simplest is to reopen preferences and navigate to "Run" on the left hand side toolbar. In the first options box, check the option to execute code in a NEW console. Another way to think of this is that each console is a separate instance of Python. This will force Spyder to execute your code in a new IPython console every time! Therefore, you never have to worry about naming conflicts between different scripts. 

However, this can get a little unweidly quickly if you repeatedly re-run your script. You'll find that you have many many IPython windows open simultaneously. This is probably not desired and it will be hard to keep track of them of all. To avoid this, it's often preferred to develop code in the IPython console itself, rather than in the script itself. Then, once you have code you like, paste it into your script and save it. Let's look more into this.

## Using the IPython console

#### Mini-exercise
Type the following command: `whos` into the active console prompt in the bottom right corner of your IDE. What's the output?

This illustrates what we've already discussed above, the active console has saved and stored the variables from the script that was just run. Another way to access this same information is to use the Variable Explorer. In the default Spyder layout, the Variable Explorer lives in the panel just above the IPython console. Notice that the information here is identical to the output of `whos`.

#### Mini-exercise
Now, right click on the console tab itself (it should say something like `Console 1/A`. From the menu that pops up, select "open an IPython console". Now, again type `whos` what do you notice? Also, take a look at the Variable Explorer tab (upper right corner). Toggle back and forth between the two consoles.

### Exercise
You can run and test code directly in the IPython console window. In fact, this is really the most convenient place to develop code before you "commit" to adding it into your script. This way you can make sure whatever you write works before you add it into a part of your larger analysis. With this in mind, use the console to figue out the correct code needed in order to plot the derivative of `Y` (the graph we created in the first part of this section). **The point of this exercise is to learn how to use the console, not to learn how to plot a derivative. Please read the hints below if you are feeling stuck.**

**Hints:**
* Navigate back to console 1, where the variables X and Y exist in the workspace.
* Use the function `np.diff`. Remember you can type `np.diff?` to get function documentation.
* Remember that the derivative is the instantaneous rate of change of a function (we'll have to approximate this as the "rise over the run", or the slope, for each pair of points in our discrete data set.)
* Finally, you might notice that `np.diff` will return a vector of length `len(X)-1`. Therefore when you plot the derivative, `yprime`, you'll want to use the following command: `plt.plot(X[:-1], yprime)`.

Once you've successfully gotten your plot to display, copy and paste the necessary code into your script so that the result plots both the function `Y` and its derivative on the same axis.

One final note... You may have found that you don't like to type in the IPython console and that you'd prefer to work in the actual text editor (the script panel). If you'd like, you can still do this and run one line or one chunk of code at a time before pressing the green run button to execute all the code at once. To do this, simply type your code in the script window, highlight the section you want to run, right click and select "run cell". This will execute the highlighted code only in the active IPython console.

These are just a few basic features that I use all the time in Spyder. There are many others, many of which I'm probably not aware of, that you might find very helpful. I'd encourage you to play around some more on your own to customize your Spyder environment to your liking.

## Jupyter

Jupyter Notebook and Jupyter Lab (the next version of Jupyter Notebook) are environment aware. This means you can install Jupyter into your base environment and then register your new environments with it. For example, to install Jupyter Notebook:

    conda install -n base jupyter
    
To install Jupyter Lab (which is a browser-based code editor):

    conda install -n base -c conda-forge jupyterlab
    
To register your environments so that you can select which one is used for your notebook or script in Jupyter (remember how you could select the kernel in exajupyter back when it worked?), run the following commands (`--display-name "My cool new Python env"` is optional):
    
    conda install -n my_environment_name ipykernel
    conda activate my_environment_name
    python -m ipykernel install --user --display-name "My cool new Python env"
    
If you're wondering about Jupyter, I had you install Jupyter into your `NEUS642` environment at the beginning of the semester. In retrospect, I should have asked you to use the approach above (where you install Jupyter into your base environment and then register NEUS642 with Jupyter).

# Code reuse and sharing

We've discussed the difference between Jupyter notebooks and Python scripts. However, we haven't even discussed a key topic. You've written a set of functions that you'd like to package up and share with others. If you look around online, you'll see references to things like `sys.path` and `PYTHONPATH`. **Ignore those.** Although they will work, you are just setting yourself up for a painful transition in a few months as your coding skills develop and your needs expand beyond these.

Let's start with a very basic example of how to re-use functions defined in one Python file in another Python file. Using Spyder, create a folder called `code_reuse_demo` containing the following files and folders:

    code_reuse/
        __init__.py
        special/
            __init__.py
            functions.py
        data_generator.py
        analysis_script.py

Both of the `__init__.py` files should be blank. The presence of an `__init__.py` file indicates that the folder is a *package* (i.e., a collection of modules) and, therefore, importable. Loosely defined, a *module* is a single Python source file that can define one or more *functions*, *classes* or *variables*. A *package* is a collection of modules. You *can* define classes and functions in the `__init__.py` file if you wish.

For `code_reuse/special/functions.py`, add the following code:

    def print_value(x):
        print(f'The value is {x}')
        
For `code_reuse/data_generator.py`, add the following code:

    import random
    
    def make_integer():
        return random.randint(0, 10)

For `code_reuse/analysis_script.py`, add the following code:

    from special import functions
    from data_generator import make_integer
    x =  make_integer()
    functions.print_value(x)
    
Now, run `analysis_script.py` in Spyder. Note that you can also use the interactive terminal to import your code. Try it out and play with different import options:

    import data_generator
    data_generator.make_integer()
    
    from data_generator import make_integer
    make_integer()
    
    from data_generator import make_integer as mi
    mi()
    
This works because Python's import machinery looks in several places to see if it can find the package you've requested. If you do `import my_module`:

* It first starts at the directory containing the script and checks to see if there's a `my_module.py` or `my_module` folder contained in the same directory as the script.
* If that fails, then it checks all the packages that you've installed using `conda` or `pip` to see if there's a match.
* If that fails, you get an `ImportError`.

To figure out what your current working directory is (where Python first looks for modules to import), just type `pwd()` in the console.

This is how you write functions that you can share among several scripts. However, what if you want to be able to make the functions available without having to ensure the script is in the same folder as the files defining the functions? We are starting to get into Python developer territory. Remember how I said to ignore solutions that involve `sys.path`? Well, let's give you an example of what I mean. Let's say you've written a suite of Python modules stored elsewhere on the disk:

    /home/Brad/python/my_core_functions/load_data.py
    /home/Brad/python/my_core_functions/analyze_data.py
    /home/Brad/python/my_core_functions/write_paper.py
    
However, your current analysis code is stored in:

    /home/Brad/documents/analysis_for_Nature_paper
    
How can your scripts stored in `analysis_for_Nature_paper` import functions that are defined in the files under `/home/brad/python/my_core_functions`? The simplest way is to add, at the top of your analysis scripts:

    import sys
    sys.path.append('/home/Brad/python/my_core_functions')
    
You can then import your modules, functions and classes in your analysis script:

    import load_data
    import analyze_data
    from write_paper import generate_abstract
    generate_abstract()
    
The reason this works is because Python maintans a list of paths where it should look for code that it needs to import. Go ahead and take a look at the code below:

In [None]:
import sys
sys.path

Notice how the first line is a blank string? This means it first looks for modules stored in the same folder as this notebook. If it can't find a file that matches what you want to import, it then moves to the next folder in the list. It keeps looking until it finds a match. If it cannot find a match, it gives up and raises an `ImportError`. Remember how you can `pip install` or `conda install` a package? Those packages get saved to `/home/bburan/bin/miniconda3/envs/python3/lib/python3.6/site-packages` (the actual location will vary depending on how you installed Python, but the general idea remains the same).

## Exercise

Let's go ahead and do this for our `code_reuse` package. In Spyder, first do the following:

* In the file explorer pane, change your current folder to a different directory (e.g., you no longer want to be in the `code_reuse` folder).
* Open up a new IPython console.

Now, try to run the following:

    from special import functions
    
Does it work? Why not? What's in `sys.path`? Let's go ahead and add the folder containing our `code_reuse` package to `sys.path` and try the import again.

What if we, instead, added the parent folder. For example, instead of `/home/bburan/class/code_reuse` we added `home/bburan/class` instead, would it work (if you want to test this, be sure to restart the kernel or try it in a new console so that you have a fresh `sys.path`?

# Creating a Python package

Now that you've learned a little about how Python finds code that you want to import, it's time to learn the proper way to set up your code for re-use. First, let's reset `sys.path` to it's original state. Do this by right-clicking on the console and selecting "restart kernel" from the context menu. Try running:

    from special import functions
    
You should get an import error. Good. Now, let's move on.

Remember how you can run `pip install module_name`?  Wouldn't it be great to do that for your own package? Let's go back to our code reuse exercise. Find the folder you created called `code_reuse`, e.g.:

    /home/bburan/class/code_reuse/
        special/
            functions.py
        data_generator.py
        analysis_script.py
    
Now, we need to move `code_reuse` to a folder called `code_reuse_package`:

    /home/bburan/class/code_reuse_package/
        code_reuse/
            special/
                functions.py
            data_generator.py
            analysis_script.py

Now, we create a new file called `setup.py` in the `code_reuse_package` folder which gives `pip` instructions on how to install the package. Here's an example of a bare-bones `setup.py`:
    
    from setuptools import find_packages, setup

    setup(
        name='Code Reuse',
        version='0.0.1',
        packages=find_packages(),
    )

A few things to note:

* The `name` and `version` are simply labels. They don't affect how you import the package.
* `packages` is important. It's a list of all the packages in the folder that you want to be available for importing. Fortunately, for very simple projects you can use the `find_packages` function, which will automatically scan for all packages in the `code_reuse_package` folder.

Now, you can install your package. Open up a command prompt:

    conda activate NEUS642_fake_environment
    cd /home/bburan/class
    pip install ./code_reuse_package
    
Windows users, take note that you'll probably need to run the last line as (i.e., no `./`):

    pip install code_reuse_package
    
You should see the following at the end of the output:

    Successfully installed Code-Reuse-0.0.1

Now, go back to your IPython console. Be sure to reset the console. Now, try running:

    from code_reuse.special import functions
    
Be sure to note where the file lives:

    print(functions.__file__)

Congratulations! You've created your first Python package!

# Sharing assets

## Code

We don't have time to go in-depth on this topic, but you have several ways to share your code. 

### PIP

If you want to make it `pip` installable, you've already taken the first step (i.e., creating a `setup.py` file). You'll need to add a few more entries to the `setup.py` file then upload it to `PyPI`.  There's a number of steps involved. Once you're ready to share your code, there are some good tutorials online for [fleshing out the setup.py file](https://pythonhosted.org/an_example_pypi_project/setuptools.html) and [publishing to PyPI](https://jonemo.github.io/neubertify/2017/09/13/publishing-your-first-pypi-package/).

### Github

You've already seen Git in action when you did `git pull` to get the latest notebooks. Git is a source code management tool that allows you to revert to earlier versions of your code if you make a mistake. It also supports pushing your code to online repositories such as [GitHub](https://github.com). Once you're ready to try out Git, there are many [good tutorials online](https://www.sitepoint.com/git-for-beginners/).

## Other assets such as notebooks

Notebooks and data can be shared on Github as well.

## Licensing

A license tells other people what they are allowed to do with the code and resources you are sharing with them. Many Python packages are published under the MIT or BSD license, which are simple, permissive licenses. They allow people do almost anything they want with your project (including to make a closed-source version). In contrast, the GPLv3 license allows people to do almost anything they want with their project (**except** to distribute closed source versions). Need help [choosing a license](https://choosealicense.com)?

For non-source code, there are [other options](https://choosealicense.com/non-software/). For example, the [Creative Commons](https://creativecommons.org/) family of licenses were designed for data, media and text. Need help [choosing a Creative Commons license](https://creativecommons.org/choose/)?

# GUIs

There is no one library available for creating graphical user interfaces. Tkinter comes with Python by default and is good for simple interfaces. If your needs become more complex, look into [PyQt5](https://pypi.org/project/PyQt5/) or [Enaml](https://enaml.readthedocs.io). We don't have time to get into these libraries, but you can find examples online that you can adapt.

# Creating EXEs

Creating stand-alone executables is currently a pain and will require some advanced knowledge. My recommendation is to provide other users with instructions for replicating your conda environment (e.g., tell them what versions of packages to install). Once they've set up their environment, they can obtain a copy of your code and execute it within the environment. If you *must*, you can look into [conda build](https://conda.io/docs/user-guide/tutorials/index.html) for creating conda packages and [constructor](https://github.com/conda/constructor) for creating self-installing programs. [PyNSIST](https://pypi.org/project/pynsist/) is another popular tool.

That said, do yourself a favor and just write instructions on how to install Anaconda Python and use `conda` to get the packages needed to run your code. It'll be much faster.