# Why Python?

Python is now the second most popular language on GitHub, after only JavaScript.

![GitHub Languages](./img/GitHubLang.png)

* Still growing at a rate of 151%!
* Jupyter notebooks growth over 100% every year for the last three years!

[State of the Octoverse, 2019](https://octoverse.github.com)

# Why Python?

![PyPI Languages](./img/PYPLLang.png)

[PyPL rankings](http://pypl.github.io/PYPL.html) of some of the most popular languages for data science.

# Timeline of Python

* 1994: Python 1.0 released
* 1995: First array package: Numeric
* 2003: Matplotlib
* 2005: Numeric and numarray merged into Numpy
* 2008: Pandas introduced
* 2012: The Anaconda python distribution was born
* 2014: IPython produces the Jupyter project and notebook
* 2016: LIGO's discovery was shown in a Jupyter Notebook, and was written in Python
* 2017: Google releases TensorFlow
* 2019: All Machine Learning libraries are primarily or exclusively used through Python

# Timeline of Python, key points


## 2005: Numpy
* Merged two competing codebases, created single ecosystem

## 2008: Pandas
* Took on specialized statistics languages (like R) with a *library* in a general purpose language
* Pioneered "Pythonic" shortcuts, breaking down traditional design barriers

## 2014: Jupyter
* The notebook format, with code, outputs, and descriptions interleaved, became multilingual

# Python vs. a compiled language

Python is an interpreted language. When we talk about Python, we usually mean CPython, which is not even Just In Time (JIT) compiled; it's purely interpreted.

TLDR: Python is *slow*.

Hundreds to thousands of times slower than C/C++/Fortran/Go/Swift/Rust/Haskell... You get the point.

Python is like a car. Compiled languages are like a plane.

So why use it?

# A hybrid approach

If you want to get to South America, the fastest way to do so is take a car to get to the airport to catch a plane. 

Same idea for Python and compiled languages. You can do the big, common, easy tasks in compiled languages, and steer it with Python.

And, as you'll see today, that's easier than you think!

# Mini-courses


## High Performance Python: CPU

* Today's class
* How to make Python code fast *without* fully leaving Python


## High Performance Python: GPU

* The sequel, in a few weeks
* Using accelerators to boost your code



## Compiled code & Python (in development)

* Date TBD, Spring next year.
* How to interface and accelerate with compiled code

# Lessons

* [00 Intro](./00_intro.ipynb): The introduction
* [01 Fractal accelerate](./01_fractal_accelerate.ipynb): A look at a fractal computation, and ways to accelerate it with Numpy changes, numexpr, and numba.
* [02 Temperatures](./02_temperatures.ipynb): A look at reading files and array manipulation in Numpy and Pandas.
* [03 MCMC](./03_mcmc.ipynb): A Marco Chain Monte Carlo generator (and metropolis generator) in Python and Numba, with a focus on profiling.
* [04 Runge-Kutta](./04_runge_kutta.ipynb): Implementing a popular integration algorithm in Numpy and Numba.
* [05 Distributed](./05_distributed.ipynb): An exploration of ways to break up code (fractal) into chunks for multithreading, multiproccessing, and Dask distribution.
* [06 Tensorflow](./06_tensorflow.ipynb): A look at implementing a Negative Log Likelihood function (used for unbinned fitting) in Numpy and Google's Tensorflow.
* [07 Callables](./07_callables.ipynb): A look at Scipy's LowLevelCallable, and how to implement one with Numba.

We may not go through these in order; I really want to go over LowLevelCallables!

## Survey

Before we finish, please complete [the survey here](https://forms.gle/B8muBQu7WeYZjpNB7). I will give you some time near the end to fill it out.

# Background

Python lists/tuples can contain any Python object, and so waste memory and layout:

In [None]:
import numpy as np
import math

In [None]:
lst = [1, 'hi', 3.0, '🤣']
lst

*Each* python object stores *at least* a type and a reference count. They can be different sizes, so Python has to chase pointers down to get them. Numpy introduced an array class:

In [None]:
arr = np.array([1,2,3,4])
arr

The array object is a normal Python object (with refcounts and such), but the items *inside it* are stored nicely packed in memory, with a single "dtype" for all the data. You can use `dtype=object`, but if it is anything else, this is much nicer than pure Python for larger amounts of data. All the standard datatypes are present, rather than the simple 64-bit `float` and unlimited `int` that regular Python provides.

Numpy provides "array" processing, where operations and functions are applied to arrays rather than in loops, and this allows the operations to usually loop in a compiled language, skipping the type lookups and such Python would have to do. To facilitate this, Numpy introduced UFuncts, Generalized UFuncts, and functions that operate on arrays. They also helped Python come up with a memory buffer interface (formalized in Python 3) for communicating the Numpy data structure between libraries without Numpy, and an overload system for UFuncts and later array functions.

Out of all of that, let's peak at a UFunct:

In [None]:
vals = np.linspace(0, np.pi, 9)
print(np.sin(vals))

`np.sin` is a ufunc. It can be called on any dimension of array, and it will return the same dimensionality array, with the function (`sin`, in this case) transforming each element. If it took multiple arguments, each could be ND, and the output would be the broadcast combination of the inputs (fails if not compatible). There are a set of standard arguments, such as `out=` (use an existing array for the output), `where=` (mask items), `casting`, `order`, `dtype`, and `subok`. You can also call a set of standard methods, such as `accumulate`, `at`, `outer`, `reduce`, and `reduceat` - though some do not work on all ufuncts. There are some properties, too.

Let's use out to pre-allocate our own output:

In [None]:
vals = np.linspace(0, np.pi, 9)
out = np.empty_like(vals)
np.sin(vals, out=out)
print(out)

The operators on arrays, along with most of the methods on arrays, are actually ufuncts and array functions defined elsewhere in Numpy:

In [None]:
out_simple = vals + vals

out_inplace = np.empty_like(vals)
np.add(vals, vals, out = out_inplace)

np.testing.assert_array_equal(out_simple, out_inplace)

We will consider the simple form of this, array manipulation with the simple operations, to be the baseline. There is a "simpler" baseline, or maybe just an older one, of loops over arrays. I *think* most people who learn Python today or in the last few years start quite early with array programming, and that is the one most familiar, so we will start there.

In [None]:
# Array looping method, do not use

vals = np.linspace(0, np.pi, 9)
out = []
for val in vals:
    out.append(math.sin(val))
print(out)

# Interesting projects

I am part of [Scikit-HEP](http://scikit-hep.org), a project to build tools for High Energy Physicists in Python. Some of the projects are applicable outside of HEP:

* [AwkwardArray](https://github.com/scikit-hep/awkward-array): 
* Vector: A package for 2D, 3D, and Lorentz vectors (used to be HEPVector, in development)
* [boost-histogram](https://github.com/scikit-hep/boost-histogram): A compiled package for powerful, fast histograms in Python
    - hist, a package for fast analysis and plotting of histograms (in development)
* [iMinuit](https://github.com/scikit-hep/iminuit): A powerful minimization package (used in HEP and Astrophysics)

Other projects I work on or know about:

* [Plumbum](https://plumbum.readthedocs.io/en/latest/): A toolkit for bash-like scripting in Python
* [CLI11](https://github.com/CLIUtils/CLI11): A command line parser for C++11

# Further reading

## My Materials

* [ISciNumPy](https://iscinumpy.gitlab.io): My blog, with lots of interesting topics
* [CompClass](https://github.com/henryiii/compclass): A computational physics class that I taught a year ago

## Jim Pivarski's materials

Jim taught earlier iterations of this mini-course, and his materials are great:

* [Mini-course Fall 2018](https://github.com/jpivarski/python-numpy-mini-course)
* [Mini-course Spring 2019](https://github.com/jpivarski/2019-04-08-picscie-numpy)
* [CoDaS HEP Summer 2019](https://github.com/jpivarski/2019-07-23-codas-hep)
* [DPF Summer 2019](https://github.com/jpivarski/2019-07-29-dpf-python)