[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jkVzPlJe1indZnIGzG9S45F5O27tppCs?usp=sharing)

*Note: GitHub.com does not render everything when viewing these notbeook files, so click the "Open in Colab" button above to see everything as it's intended.*

# Python/Jupyter Review

This course assumes some "basic" familiarity with Python (cf. ["Learn the Basics" at LearnPython.org](https://www.learnpython.org/)).  We're not going to be doing what I'd call "hardcore" Python, and many "hard" things will be taken care of for you via utility routines so you'll often only have to "fill in the blanks" here and there.

Still, there's likely some need for review as well as to highlight important "tricks", and things you maybe haven't seen before. In particular, we'll be making use of the "science stack" (e.g., `numpy`, `matplotlib`,a little `pandas`) but not the "web stack" (e.g. not Flask or Django).  So, let's make a list of things you'll see in this course...


## Key Points about Python 

### It's superb for "messing around"
> "*Decades of programming in strongly-typed, declarative, pre- and post-allocation-based languages held me back from 'getting' this key fact about Python: It's extremely well-suited for **messing around**, ...and one of best playgrounds for messing around in is the Jupyter notebook environment.*" - S.H.

Everything is mutable, everything is overridable, everything is extendable. This is a blessing and a curse. For years I was preoccupied with the curse part, but you'll do well to remember the blessing side: you can do what you want (within limits)! 
<center><img src="https://d3qdvvkm3r2z1i.cloudfront.net/media/catalog/product/cache/1/thumbnail/85e4522595efc69f496374d01ef2bf13/d/o/dowhatiwant_newthumb-again.png" width="25%"></center>

It's not obvious that you'd want to write production code in Python: it *can* be fast, but it's not secure *at all*; for learning things and rapid prototyping though, it's awesome. 

### There's a Library/Package for Everything
I think this is another key to Python's success. Other languages have some of this (e.g. JavaScript & npm), but with Python there really is already some package that will do much of the heavy lifting for you if you want.

### Writing Fast Python is a 'Habit'

Speed matters for deep learning because we'll be dealing with *gajillions* of calculations, and a the difference between 10 microseconds vs. 10 milliseconds per operation will make the difference between you getting an answer to a homework problem in a few minutes vs. a few *days*.  (Again, this is a different mindset than, say, web programming in Python, where speed matters a bit but not nearly as much.)

<center><img src="https://i2.wp.com/comicsandmemes.com/wp-content/uploads/Famous-Movie-Qoutes-1986-Top-Gun-I-feel-the-need-for-speed.jpg?resize=768%2C401&ssl=1" width="40%"></center>

When you're first starting out, it's easy to accidentally write *really slow* code, particularly if you're coming from other programming languages. Learning to think in terms of "vectorized" operations and writing "one-liners" (e.g. "list comprehensions" which often are faster that multi-line implementations) can take some getting used to but eventually becomes a habit. So in what follows, we'll talk about few speedy ways of "prasing" things when writing Python code.

## Jupyter notebooks / Colab

You could write raw Python code as a text file and execute in the command-line (I used to do this), or use some IDE like PyCharm, but for this course we'll need access to **other people's computers** that give us access to GPUs (Graphics Processing Units) that we'll use for heavy number-crunching -- again, speed is key. 

Everything for this course is designed to be run on [Google Colab]() which is kind of a Google-Flavored version of the Jupyter environment. So, when I say "Jupyter notebooks", I mean like what the file I'm writing right now, regardless of whether it's hosted in an actual Jupyter environment (e.g. on Paperspace Gradient) or on Colab. Generally I'll assume Colab. 

(There are some special things you can do in regular-Jupyter that you can't do on Colab, and probably vice versa, but I'll try to minimize mention of those.) 

* REPL
* Cell navigation
* Special moves: 
    * `!` shell commands 
    * `%` "magic" 
    * `?` documentation tricks 
* Colab vs. Jupyter
* Jupyter vs. IPython?

## Packages: NumPy and PyTorch 
We're going to use the numerical package NumPy a *lot*, and when we compute things in neural networks we'll use PyTorch, which has routines that usually the same name (though not always the same keyword arguments!) as the corresponding NumPy routines. 


In [115]:
import numpy as np  # Everybody always abbreviates numpy as np
import torch        # the PyTorch package is called 'torch' for short. 


* vectorizing = speed
* slicing 
* broadcasting
* adding & removing axes
* transposing
* `random` and seeds

## Functions & Methods

### lambda is just a function with no name
I'm not going to make you write "lambda" functions, but if you see one don't worry, it's the same as an unnamed function (e.g. there's no "def __name__()"). And because everything is mutable, you can *give it* name. So the following 3 snippets do the same thing:

In [4]:
def a(x): return x+5
print("a =",a(4))

b = lambda x: x+5 
print(f"b = {b(4)}")

a = 9
b = 9


...Oh yea, see in that last line I used a "f-string." They've been a feature since Python 3.5. We'll use them a lot because I think they're great. But if you try to use an older Python interpreter (e.g. Python 2.7), you'll get a syntax error.  Actually everything we'll be doing will assume at least Python 3.6. Let's see what version we're running:

In [7]:
import sys
print(f"Python version is {sys.version}")

Python version is 3.7.10 (default, Feb 20 2021, 21:17:23) 
[GCC 7.5.0]


### Generators are not that bad
A [generator](https://www.learnpython.org/en/Generators) is just a function with the word "yield" in it (which functions kind of like "return") and they can be used as iterators (e.g. for for loops & list comprehensions).  Here's a generator. 

In [13]:
def gen(z, step=1):
    count = 0
    for i in range(z):
        count += step
        yield(count)

print([x for x in gen(10)])
print([x for x in gen(10,step=-2)])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[-2, -4, -6, -8, -10, -12, -14, -16, -18, -20]


They are often use to supply the next batch of data to something, like we'll do when training our neural networks.  But this will mostly be done for us by our library routines; I don't think you'll need to write your own generators. 



## Classes 
and namespaces

## Imports

## Modules vs. Packages?

"How do I make my own Python package?" is something we can cover later. 

 

## Data types & Function types


### Lists 

 

### Dictionaries 
[Dictionaries](https://www.learnpython.org/en/Dictionaries) are great, especially as fast "[hashes](https://en.wikipedia.org/wiki/Hash_function)", i.e. as fast look-up-functions between two sets of data.  The Pandas data science library is largely based on dictionaries. 




## Loops
Different ways of writing loops can be faster or slower than others, and it depends on the application and the number of iterations. Generally,
* Try avoid loops for simple things, instead prefer vectorized numpy operations, e.g. use `.mean()` on a NumPy array instead of  looping over all elements and keeping a running sum and then dividing by the number of iterations.

In [112]:
arr = np.random.rand(99999999)  # generate lots o' numbers

In [113]:
%%timeit
arr.mean()

10 loops, best of 5: 69.4 ms per loop


In [114]:
%%timeit 
sum, n = 0, len(arr)
for i in range(n):
    sum += arr[i]
mean = sum/n

1 loop, best of 5: 28 s per loop


...yeah.  Which one is faster?


* When we write loops, there are 3 main ways we do it. Letting y denone some iterator such as `range(n)` or a list, then these 3 ways are:
   1. standard loops: `for x in y: _somthing_involving_x` (and yes, loops can be one-liners or they can span multiple lines)
   2. list comprehensions: `[_somthing_involving_x for x in y]`
   3. (more advanced) `map` operations: `_some_list_ = list(map(func, y))`

Handy loop iterators:
* range 
* enumerate() & zip() 

If you're looping in order to build up a bunch of data or do a bunch of single operations, then [List comprehensions](https://www.learnpython.org/en/List_Comprehensions) usually preferable.  

Let's do a comparison:

## Plotting:
* matplotlib 
* Others

## Pandas stuff:
