# Instructor and Administrative Details

## Schedule

1. Intro, NumPy, Pandas, Numba
2. Multiprocessing, Dask, DaskML
3. GPU, Rapids, BlazingSQL, Numba, DL/Custom Loss
4. Bonus Tools and Patterns

## Instructor: Adam Breindel

<img src="https://materials.s3.amazonaws.com/i/med-head.jpg" width=200>

### Contact
* LinkedIn https://www.linkedin.com/in/adbreind
* Email adbreind@gmail.com
* Twitter `@adbreind` (no cat or breakfast pix, pretty much just tech)
* 20+ years building all kinds of systems for startups and large enterprises
* 10+ years teaching (data engineering, AI/ML, frontend, backend, mobile)

### Interesting projects...
* My first full-time job in tech was streaming neural-net fraud scoring
* Realtime & offline analytics for banking
* Music synchronization and licensing for networked jukeboxes

### Industries
* Finance, Insurance, Travel, Media / Entertainment, Government, Energy

# High Performance Python

### Python: A Systems Approach to High-Performance

In 2015, Reynold Xin, Apache Spark's Chief Architect, gave a series of presentations on the second major refactoring of Spark's internals in as many years.

A thorough understanding of the new Spark paradigm would require substantial additional knowledge about the changes, and Reynold needed to explain and justify the dramatic overhaul of Spark in the pursuit of higher performance.

*Why couldn't Spark be improved bit by bit, fixing hotspots and bottlenecks, to achieve speed in a less disruptive way?*

For small speed goals, that might work, Mr. Xin explained. But for substantial speedups that were targeted -- 10x or even 100x -- the arithmetic just doesn't work out.

Difficult to get order of magnitude performance speed ups with profiling techniques

* For 10x improvement, would need of find top hotspots that add up to 90% and make them instantaneous
* For 100x, 99%

Instead, look bottom up, how fast should it run?

If the goal is to achieve orders-of-magnitude speedup, and fully exploit modern hardware, it is both more efficient and more effective to refactor the underlying mechanics to exploit that that hardware.

__Ideal Philosophy for Python Perf__

Although Spark's refactor was focused on the Scala/JVM ecosystem, Python is even more amenable to that approach.

Why?

Because one of Python's strengths is its easy integration to native code: the SciPy stack is fundamentally built on leveraging Python as a control language (or interface/automation language) while deferring implementation of expensive work to native code.

While there are a few patterns to improve the speed of typical Python code, that approach is not the primary mechanism to achieve speed.

In [None]:
import math, numba, numpy as np

In [None]:
%%timeit

my_list = list(range(1000000))
out = []

for i in my_list:
    out.append(math.sqrt(i))

In [None]:
%%timeit

out = []

for i in range(1000000):
    out.append(math.sqrt(i))

In [None]:
%%timeit

out = [math.sqrt(i) for i in range(1000000)]

In [None]:
%%timeit

out = np.sqrt(np.arange(1000000))

In [None]:
@numba.jit(nopython=True)
def get_roots(vec):
    return np.sqrt(vec)

In [None]:
%%timeit

get_roots(np.arange(1000000))

While it's always good practice to try and use a language efficiently, Python is not designed to require or heavily exploit syntax- and idiom-level performance tricks. 

Instead, Python encourages a straightforward API and idiom placed above a high-performance implementation.

### What About Higher-Level Tools/APIs?

This pattern -- locating performance-critical functionality, then employing libraries with high-level Python APIs and high-performance, low-level implementations -- is common to many functional domains, not just numeric computation:

* Web/REST frameworks (https://falconframework.org/#sectionBenchmarks)
* Graph analysis (https://graph-tool.skewed.de/performance)
* Structured data I/O (https://www.h5py.org/)
* etc.

### Interpreter "Wars" and Tradeoffs

Over the years, several alternate runtimes have been considered for Python, with various goals.

However, for data-instensive computing, the original Python interpreter ("CPython") is the only practical solution today (2020).

__Why?__

Consider...
* Jython gains some speed from JVM infrastructure 
    * but cannot interoperate with Python modules using C extensions (most of the SciPy stack)
* PyPy is significantly faster than CPython (https://www.pypy.org/) and is always "worth a try"
    * but ... has numerious package incompatibilities: http://packages.pypy.org/
    * issues include fundamental packages like Pandas and Scikit-learn
    
The extensibility issues __may__ be addressed in general in the future. Travis Oliphant and Quansight have come out in support of the `epython` spec ... but for app dev, this is likely many years away: https://openteams.com/projects/epython


Fundamentally, CPython is necessary for most data-intensive workloads today (2020) and CPython isn't the fastest interpreter out there.

Luckily, as we'll see, leveraging the proper libraries obviates those issues.