IS6733: Python Crash Course


## Part 1: Python Background

### Python, the Language

Python is an **intepreted** language.

 - Contrast with **compiled** languages
 - Performance, ease-of-use
 - Modern intertwining and blurring of compiled vs interpreted languages

Python is a very **general** language.

 - Not designed as a specialized language for performing a specific task. Instead, it relies on third-party developers to provide these extras.

## Part 2: Language Basics

The most basic thing possible: Hello, World!

In [None]:
print("Hello, world!")

Hello, world!


Yep, that's all that's needed!

(Take note: the biggest different between Python 2 and 3 is the `print` function: it technically wasn't a function in Python 2 so much as a *language construct*, and so you didn't need parentheses around the string you wanted printed; in Python 3, it's a full-fledged *function*, and therefore requires parentheses)

### Variables and Types

Python is dynamically-typed, meaning you don't have to declare types when you assign variables. Python is also *duck-typed*, a colloquialism that means it *infers* the best-suited variable type at runtime ("if it walks like a duck and quacks like a duck...")

In [None]:
x = 5
type(x)

int

In [None]:
y = 5.5
type(y)

float

It's important to note: even though you don't have to specify a type, Python still assigns a type to variables. It would behoove you to know the types so you don't run into tricky type-related bugs!

In [None]:
x = 5 * 5

What's the type for `x`?

In [None]:
type(x)

int

In [None]:
y = 5 / 5

What's the type for `y`?

In [None]:
type(y)

float

There are functions you can use to explicitly *cast* a variable from one type to another:

In [None]:
x = 5 / 5
type(x)

float

In [None]:
y = int(x)
type(y)

int

In [None]:
z = str(y)
type(z)

str

### Data Structures

There are four main types of built-in Python data structures, each similar but ever-so-slightly different:

 1. Lists (the Python workhorse)
 2. Tuples
 3. Sets
 4. Dictionaries

(Note: generators and comprehensions are worthy of mention; definitely look into these as well)

Lists are basically your catch-all multi-element data structure; they can hold anything.

In [None]:
some_list = [1, 2, 'something', 6.2, ["another", "list!"], 7371]
print(some_list[3])
type(some_list)

6.2


list

Tuples are like lists, except they're *immutable* once you've built them (and denoted by parentheses, instead of brackets).

In [None]:
some_tuple = (1, 2, 'something', 6.2, ["another", "list!"], 7371)
print(some_tuple[5])
type(some_tuple)

7371


tuple

Sets are probably the most different: they are mutable (can be changed), but are *unordered* and *can only contain unique items* (they automatically drop duplicates you try to add). They are denoted by braces.

In [None]:
some_set = {1, 1, 1, 1, 1, 86, "something", 73}
some_set.add(1)
print(some_set)
type(some_set)

{73, 1, 86, 'something'}


set

Finally, dictionaries. Other terms that may be more familiar include: maps, hashmaps, or associative arrays. They're a combination of sets (for their *key* mechanism) and lists (for their *value* mechanism).

In [None]:
some_dict = {"key": "value", "another_key": [1, 3, 4], 3: ["this", "value"]}
print(some_dict["another_key"])
type(some_dict)

[1, 3, 4]


dict

Dictionaries explicitly set up a mapping between a *key*--keys are unique and unordered, exactly like sets--to *values*, which are an arbitrary list of items. These are very powerful structures for data science applications.

### Slicing and Indexing

Ordered data structures in Python are 0-indexed (like C, C++, and Java). This means the first elements are at index 0:

In [None]:
print(some_list)

[1, 2, 'something', 6.2, ['another', 'list!'], 7371]


In [None]:
index = 0
print(some_list[index])

1


However, using colon notation, you can "slice out" entire sections of ordered structures.

In [None]:
start = 0
end = 3
print(some_list[start : end])

[1, 2, 'something']


Note that the starting index is *inclusive*, but the ending index is *exclusive*. Also, if you omit the starting index, Python assumes you mean 0 (start at the beginning); likewise, if you omit the ending index, Python assumes you mean "go to the very end".

In [None]:
print(some_list[:end])

[1, 2, 'something']


In [None]:
start = 1
print(some_list[start:])

[2, 'something', 6.2, ['another', 'list!'], 7371]


### Loops

Python supports two kinds of loops: `for` and `while`

`for` loops in Python are, in practice, closer to *for each* loops in other languages: they iterate through collections of items, rather than incrementing indices.

In [None]:
for item in some_list:
    print(item)

1
2
something
6.2
['another', 'list!']
7371


 - the collection to be iterated through is at the end (`some_list`)
 - the current item being iterated over is given a variable after the `for` statement (`item`)
 - the loop body says what to do in an iteration (`print(item)`)

But if you need to iterate by index, check out the `enumerate` function:

In [None]:
for index, item in enumerate(some_list):
    print("{}: {}".format(index, item))

0: 1
1: 2
2: something
3: 6.2
4: ['another', 'list!']
5: 7371


`while` loops operate as you've probably come to expect: there is some associated boolean condition, and as long as that condition remains `True`, the loop will keep happening.

In [None]:
i = 0
while i < 10:
    print(i)
    i += 2

0
2
4
6
8


**IMPORTANT**: Do not forget to perform the *update* step in the body of the `while` loop! After using `for` loops, it's easy to become complacent and think that Python will update things automatically for you. If you forget that critical `i += 2` line in the loop body, this loop will go on forever...

Another cool looping utility when you have multiple collections of identical length you want to loop through simultaneously: the `zip()` function

In [None]:
list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [7, 8, 9]

for x, y, z in zip(list1, list2, list3):
    print("{} {} {}".format(x, y, z))

1 4 7
2 5 8
3 6 9


This "zips" together the lists and picks corresponding elements from each for every loop iteration. Way easier than trying to set up a numerical index to loop through all three simultaneously, but you can even combine this with `enumerate` to do exactly that:

In [None]:
for index, (x, y, z) in enumerate(zip(list1, list2, list3)):
    print("{}: ({}, {}, {})".format(index, x, y, z))

0: (1, 4, 7)
1: (2, 5, 8)
2: (3, 6, 9)


### Conditionals

Conditionals, or `if` statements, allow you to branch the execution of your code depending on certain circumstances.

In Python, this entails three keywords: `if`, `elif`, and `else`.

In [None]:
grade = 82
if grade > 90:
    print("A")
elif grade > 80:
    print("B")
else:
    print("Something else")

B


A couple important differences from C/C++/Java parlance:
 - **NO** parentheses around the boolean condition!
 - It's not "`else if`" or "`elseif`", just "`elif`". It's admittedly weird, but it's Python

Conditionals, when used with loops, offer a powerful way of slightly tweaking loop behavior with two keywords: `continue` and `break`.

The former is used when you want to skip an iteration of the loop, but nonetheless keep going on to the *next* iteration.

In [None]:
list_of_data = [4.4, 1.2, 6898.32, "bad data!", 5289.24, 25.1, "other bad data!", 52.4]

for x in list_of_data:
    if type(x) == str:
        continue
    
    # This stuff gets skipped anytime the "continue" is run
    print(x)

4.4
1.2
6898.32
5289.24
25.1
52.4


`break`, on the other hand, literally slams the brakes on a loop, pulling you out one level of indentation immediately.

In [None]:
import random

i = 0
iters = 0
while True:
    iters += 1
    i += random.randint(0, 10)
    if i > 1000:
        break

print(iters)

192


### File I/O

Python has a great file I/O library. There are usually third-party libraries that expedite reading certain often-used formats (JSON, XML, binary formats, etc), but you should still be familiar with input/output handles and how they work:

In [None]:
text_to_write = "I want to save this to a file."
f = open("some_file.txt", "w")
f.write(text_to_write)
f.close()

This code writes the string on the first line to a file named `some_file.txt`. We can read it back:

In [None]:
f = open("some_file.txt", "r")
from_file = f.read()
f.close()
print(from_file)

I want to save this to a file.


Take note what changed: when writing, we used a `"w"` character in the `open` argument, but when reading we used `"r"`. Hopefully this is easy to remember.

Also, when reading/writing *binary* files, you have to include a "b": `"rb"` or `"wb"`.

### Functions

A core tenet in writing functions is that **functions should do one thing, and do it well**.

Writing good functions makes code *much* easier to troubleshoot and debug, as the code is already logically separated into components that perform very specific tasks. Thus, if your application is breaking, you usually have a good idea where to start looking.

**WARNING**: It's very easy to get caught up writing "god functions": one or two massive functions that essentially do everything you need your program to do. But if something breaks, this design is very difficult to debug.


Functions have a header definition and a body:

In [None]:
def some_function():  # This line is the header
    pass              # Everything after (that's indented) is the body

This function doesn't do anything, but it's perfectly valid. We can call it:

In [None]:
some_function()

Not terribly interesting, but a good outline. To make it interesting, we should add input arguments and return values:

In [None]:
def vector_magnitude(vector):
    d = 0.0
    for x in vector:
        d += x ** 2
    return d ** 0.5

In [None]:
v1 = [1, 1]
d1 = vector_magnitude(v1)
print(d1)

1.4142135623730951


In [None]:
v2 = [53.3, 13.4]
d2 = vector_magnitude(v2)
print(d2)

54.95862079783298


### NumPy Arrays

If you looked at our previous `vector_magnitude` function and thought "there must be an easier way to do this", then you were correct: that easier way is NumPy arrays.

NumPy arrays are the result of taking Python lists and adding a ton of back-end C++ code to make them *really* efficient.

Two areas where they excel: *vectorized programming* and *fancy indexing*.

Vectorized programming is perfectly demonstrated with our previous `vector_magnitude` function: since we're performing the same operation on every element of the vector, NumPy allows us to build code that implicitly handles the loop

In [None]:
import numpy as np

def vectorized_magnitude(vector):
    return (vector ** 2).sum() ** 0.5

In [None]:
v1 = np.array([1, 1])
d1 = vectorized_magnitude(v1)
print(d1)

1.4142135623730951


In [None]:
v2 = np.array([53.3, 13.4])
d2 = vectorized_magnitude(v2)
print(d2)

54.95862079783298


We've also seen indexing and slicing before; here, however, NumPy really shines.

Let's say we have some super high-dimensional data:

In [None]:
X = np.random.random((500, 600, 250))

We can take statistics of any dimension or slice we want:

In [None]:
X[:400, 100:200, 0].mean()

0.5001888592259797

In [None]:
X[X < 0.01].std()

0.0028837986611768814

In [None]:
X[:400, 100:200, 0].mean(axis = 1)

array([0.4878191 , 0.46028259, 0.49667229, 0.45756306, 0.51257128,
       0.53048905, 0.50963822, 0.51340647, 0.48215825, 0.50589002,
       0.4886673 , 0.5139198 , 0.44234373, 0.50035936, 0.48160639,
       0.45752935, 0.42551659, 0.48189657, 0.48596965, 0.51624658,
       0.52221284, 0.52593491, 0.49238735, 0.48110024, 0.4913599 ,
       0.47982085, 0.46193525, 0.47952403, 0.51137355, 0.54527014,
       0.5220307 , 0.55954275, 0.44463562, 0.41147046, 0.53053387,
       0.46298719, 0.52452601, 0.5154674 , 0.4990779 , 0.54857314,
       0.52371635, 0.49242696, 0.53536509, 0.49064221, 0.51248257,
       0.4614997 , 0.50810832, 0.48652515, 0.47253665, 0.45422089,
       0.55162731, 0.48968268, 0.49745104, 0.52680833, 0.49167122,
       0.47097084, 0.55001122, 0.49735161, 0.52115446, 0.50438363,
       0.51154354, 0.48603022, 0.47546669, 0.50887932, 0.52319255,
       0.48577708, 0.55447458, 0.49989952, 0.47529532, 0.51035091,
       0.50167409, 0.49601202, 0.44814726, 0.51657128, 0.50609

In [None]:
X[:400, 100:200, 0].mean(axis = 1).shape

(400,)