# Python Fundamentals

**Contents:**
   1. Python
   2. Sequence Types
        1. Strings
        2. Lists
        3. Tuples
   3. Mappable Types
       1. Dictionaries
       2. Sets
   4. NumPy 
       1. Advantages
       2. Implementation
       3. Vectorization
   5. Pandas

## Part 1: Python
* Python is object oriented; in Python, everything is an object
* The most common implementation of Python is CPython
  * The language is build primarily in C, C++ and Python
  * Many of the base functions map directly to C or C++ functions, making it relatively easy to understand language implementation details
  * There is also a C API which allows users to call C/C++ code from within Python as well as to extend the Python interpreter!

### The Zen of Python

In [34]:
import this; # A Python easter egg!

Some essential functions to learn Python

In [35]:
l = [1, 3, 5, 7, 11]
type(l) # Return class of object

list

In [36]:
dir(l)[40:45] # See attributes of Python objects

['index', 'insert', 'pop', 'remove', 'reverse']

In [37]:
help(l.pop) # Display class or function documentation

Help on built-in function pop:

pop(index=-1, /) method of builtins.list instance
    Remove and return item at index (default last).
    
    Raises IndexError if list is empty or index is out of range.



## Part 2: Sequences

Sequence types in Python share a set of common operations for subsetting and manipulation. These include:

|||
| --- | --- |
| Indexing | Multiplying |
| Slicing | Membership |
| Concatenating | Iterating |



### Strings
* In Python strings are an _immuatable_ type; therefore all string methods return copies

**Declaration**

In [38]:
string1 = "a b c d e f g" # Declare string
string1

'a b c d e f g'

**Indexing**

In [40]:
print(string1[0]) # First item
print(string1[-1]) # Last item

print(string1.replace('a', 'z'))
string1 # Original not modified

a
g
z b c d e f g


'a b c d e f g'

**Slicing**
* Syntactic sugar for indexing
* Takes the form $seq[start:stop:step]$, where $stop$ is exclusive.

In [41]:
print(string1[0:3]) # Index 0, 1 and 2
print(string1[:5]) # Index 1 to 4
print(string1[4:]) # Index 4 onward
print(string1[::2]) # Every second item
print(string1[::-1]) # Reverse

a b
a b c
c d e f g
abcdefg
g f e d c b a


**Membership**

In [42]:
'a' and 'b' in string1

True

**Interating**
* Strings (and other sequences) are iterable

In [43]:
for char in string1[::2]:
    print(char)

a
b
c
d
e
f
g


**Concatentation and Repetition**

In [45]:
print(string1 + 'xyz') # The plus operator concatetes strings
print(string1 * 3) # The multiplcation operator repeats strings

a b c d e f gxyz
a b c d e f ga b c d e f ga b c d e f g


There are lots of other string methods. See [the string method docs](https://docs.python.org/3/library/stdtypes.html#string-methods) for more information.

### Lists
* Unlike string, lists are *mutable*! This means they support in place modification, no copies needed
* Lists are the 'work horse' of Python; they can store any Python object, from base types to user implemented objects
* The basic implentation of a Python list is as an array of pointers to the memory location of the objects it contains
  * This makes lists flexible but fairly 'heavy' in terms of memory usage and frequency of allocation
  * Despite this, lists are more than fast enough for most operations
* Lists extend the sequence type, therefore they are *ordered* and *sortable*

**Declaration**

In [46]:
list1 = [1, 2 ,3, 5, 7] # Lists are declared with square brackets
list1

[1, 2, 3, 5, 7]

**Slicing**

In [47]:
list1 = string1.split(' ') # Split string to list on ' '
print(list1)
print(list1[:3])
print(list1[::2])
print(list1[2::-2])

['a', 'b', 'c', 'd', 'e', 'f', 'g']
['a', 'b', 'c']
['a', 'c', 'e', 'g']
['c', 'a']


**List Methods**

Python provides various methods that modify lists:    

|__method__     |__Description__        |
|:--- |:---|
| list.append(item)    | Append an _item_ to the end of the list.                                                           |
| list.extend(list_1)    | Append the _items_ in the `list_1` to the list.                                       |
| list.pop([index])    | Remove the item with that _index_ from the list or at the end of the list if an index is not given.|
| list.remove(item)    | Remove the first occurrence of the item.                                                     |
| list.reverse( )      | Reverse the list.                                                                           |
| list.insert(int, item)| Insert an item at the given index, subseqent items are shifted to the right. |

In [None]:
list1 = [1,2,4,5,6]

In [49]:
list1.append(5) # Append item to list
print(list1)
list1.extend(['hotdog', 'donut']) # Append list to list
print(list1)
print(list1.pop()) # Remove the last item
print(list1)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 5, 'hotdog', 5]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 5, 'hotdog', 5, 'hotdog', 'donut']
donut
['a', 'b', 'c', 'd', 'e', 'f', 'g', 5, 'hotdog', 5, 'hotdog']


For mutable types, dot functions tend to modify *in-place*

In [55]:
#list1.remove('hotdog')
print(list1)
list1 = [1, 2, 3, 4, 5]
list1.sort(reverse=True) # Sort in-place
print(list1)

[5, 4, 3, 2, 1]
[5, 4, 3, 2, 1]


Functions with a *return* tend to modify a *copy*

In [56]:
print(sorted(list1)) # Return to original order
print(list1) # First list unmodified

[1, 2, 3, 4, 5]
[5, 4, 3, 2, 1]


In [57]:
print(list1 + [1,2,3]) # Concatenate to list and return a copy

[5, 4, 3, 2, 1, 1, 2, 3]


In [18]:
list1 * 3 # Repeat list and return a copy

[6, 5, 5, 4, 2, 1, 6, 5, 5, 4, 2, 1, 6, 5, 5, 4, 2, 1]

**Membership**

In [58]:
2 in list1

True

**Iterating**

In [59]:
for item in list1:
    if item > 3:
        print(item)
    else:
        print('Nope!')

5
4
Nope!
Nope!
Nope!


#### List Comprehension

* One of the most convenient things about Python lists (and other sequence/mappable types) is a language feature called 'comprehension'

In [62]:
[ x % 2 for x in list1]

[1, 0, 1, 0, 1]

In [22]:
[ y ** 5 for y in list1 if y < 3]

[32, 1]

The general structure of a list comprehension can be written as follows:

    [<output expression> <loop expression <input expression>> <optional predicate expression>]
    
A list comprehension is always enclosed in brackets. It starts with an expression followed by a `for` expression, then zero or more `for` or `if` clauses. 

The result is a much more compact and clean syntax. Compare:
1. This old way

In [63]:
listA = [15, 35, 76, 83, 910, 1234]
listB = [1234, 234, 83, 3, 4, 5]

new_list = []

for a in listA:
    for b in listB:
        if a == b: 
            new_list.append(a)

print(new_list)

[83, 1234]


2. The new way

In [64]:
[a for a in listA for b in listB if a == b]

[83, 1234]

This syntax can be confusing at first, but with a bit of practice you will rarely want to write a loop any other way. 

It isn't just syntactic sugar though, due to the implementation of this feature list comprehension can be up to two times faster than standard loop constructs!

This performace increase gets better for higher Big O complexity, with a strong divergence of list comprehension from either a for loop or the map function.

### Tuples
* Tuples are very similar to lists, but are immuatable!
* As a result they tend to be faster than lists, and should be preferred over lists if list contents aren't expected to change during exectution

**Declaration**

In [66]:
tup = tuple('yeiouAEIOU') # Split string into tuple with constructor
tup

('y', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U')

In [67]:
tup = 1, 2, 3 ,4, 5 # Can declare tuples as comma separated list
tup

(1, 2, 3, 4, 5)

In [69]:
tup = 2, # You can declare single item tuples
type(tup)

tuple

**Unpacking**
* A useful feature of tuples is unpacking, allowing multiple variables to be assigned to the environment in a single line
* This is particularly important as Python functions can technically only return a single object

In [70]:
x, y, z = 'a', 'b', 'c' # Declare multiple variables
print(x)
print(y)
print(z)

a
b
c


In [71]:
list1 = [1, 2 , 3]
a, b, c = list1
print(a)
print(b)
print(c)

1
2
3


In [72]:
list1 = [(x, x**2) for x in range(6)] # Returns a list of tuples
list1

[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

**Mutability**
* Confusingly, tuples are immutable, but the objects they contain are **NOT**!

In [74]:
tup = 1, 'a', [1, 3, 4], 'hotdog'
print(tup)
tup[1] = 'b' # Error

(1, 'a', [1, 3, 4], 'hotdog')


TypeError: 'tuple' object does not support item assignment

In [75]:
tup[2][0] = 5
tup

(1, 'a', [5, 3, 4], 'hotdog')

**Zips and Iterators**
* The `zip()` function is an example of a function that returns a sequence of tuples 
* This function pairs up the elements from multiple sequences, starting with the first values, then the second, etc 

In [76]:
list_num = [1,2,3,4,5]
list_alpha =['a','b','c','d','e']
zipped = zip(list_num, list_alpha)

In [79]:
type(zipped)

zip

Zips are useful for creating dictionaries, which will be discussed further in subsequent sections.

In [80]:
keys = ["a", "b", "c", "d"]
values = [1, 2, 3, 4]
new_dict = dict(zip(keys, values))
new_dict

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

The object returned by `zip()` is also an **iterator**.

An **iterator** in Python is a special object type that works as a sequence, and can be looped over using (for example) a `for` statement. An iterator is created when we loop over Python lists, tuples, or dictionaries.

In [81]:
hasattr(zipped, '__iter__')

True

You can create an iterator from any iterable type using the `iter()` function:

In [None]:
iterator = iter(list_num)
iterator

Let's loop over `zipped`:

In [82]:
for i in zipped:
    print(i)

(1, 'a')
(2, 'b')
(3, 'c')
(4, 'd')
(5, 'e')


In [83]:
zipped_list = list(zipped) # Doesn't work because an iterator can only be traversed once!
zipped_list # An empty list; the iterator has been iterated already

[]

## Part 3: Mappable Types
* These types are common to most (all?) programming languages, but the name varies
  * Names: hash, hash map, associative array, map...
* They are implemented similar to lists, but instead of sequential indexes the index an object is placed at is calculated by a *hash function* of the key for that item
  * E.g., if a key is 'key', a potential (but bad) hash function could be len('key') + 1

### Dictionaries
* In Python the object storing key:value pairs is called a `dict`
* It is technically the only mappable base type implemented in Python as of version 3.8.1, but we will talk about an edge case later

**Declaration**

In [84]:
# There are lots of ways to make a dictionary
a = dict(one=1, two=2, three=3) # Constructor
a

{'one': 1, 'two': 2, 'three': 3}

In [85]:
b = {'one': 1, 'two': 2, 'three': 3} # Set notation (collection)
c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) # Zip of two lists
d = dict([('two', 2), ('one', 1), ('three', 3)]) # List of tuples
e = dict({'three': 3, 'one': 1, 'two': 2}) # Constructor + set notation

In [86]:
a == b == c == d == e

True

**Basic Operations**

In [87]:
len(d) # Length, number of items

3

In [88]:
'one' in d # Check for keys

True

In [89]:
print(a.pop('one')) # Item with key by reference, return value
a

1


{'two': 2, 'three': 3}

In [90]:
print(a.popitem()) # Automatically removes the last item, returns a tuple
a

('three', 3)


{'two': 2}

In [91]:
list(d) # Using list constructor returned a copy of the keys

['two', 'one', 'three']

**Subsetting and Assignment**

In [92]:
d['one'] # Select item by key

1

In [93]:
d['two'] = 4 # Modify at key by reference
d

{'two': 4, 'one': 1, 'three': 3}

In [94]:
d['five'] # Bracket subsetting/assignment errors if key is not present

KeyError: 'five'

In [95]:
d.get('five') # Using get, a default value is returned if key is not in dict (default defaults to None)

In [98]:
d.setdefault([1, 2, 3], 5) # Modify by reference, return None if not found

TypeError: unhashable type: 'list'

**Dictionary Views**

A dictionary view, similar to a view in SQL, is a Python object composed purely of references (pointers) to the original dictionary. Thus you can create subsets of dictionaries without copying them which will be updated with the original dictionary.

In [99]:
d.values() # We see the returned type is no a dictionary

dict_values([4, 1, 3, 5])

In [100]:
d.keys()

dict_keys(['two', 'one', 'three', 'five'])

In [101]:
print(id(d.values()))
print(id(d.values())) # We can see that every call to d.values returns a different view object
d.values() == d.values()

140335378924240
140335378926864


False

In [103]:
d_vals = d.values()
print(id(d_vals))
print(id(d_vals)) # Now they are the same
d_vals == d_vals

140335378911824
140335378911824


True

In [104]:
view = d.items() # Get a view of the dictionary
view

dict_items([('two', 4), ('one', 1), ('three', 3), ('five', 5)])

In [105]:
d['two'] = 10
view # Modification at key 'two' is also present in the view

dict_items([('two', 10), ('one', 1), ('three', 3), ('five', 5)])

In [106]:
val_list = list(d.values()) # Can coerce to a list, this will copy the values (no longer a view)
val_list

[10, 1, 3, 5]

**Iteration**

In [107]:
for k, v in d.items():
    print("Keys: ", k)
    print("Values: ", v)

Keys:  two
Values:  10
Keys:  one
Values:  1
Keys:  three
Values:  3
Keys:  five
Values:  5


**Dictionary Comprehensions**

In [108]:
empty_dict = {k: 0 for k in ['a', 'b', 'c']}
empty_dict

{'a': 0, 'b': 0, 'c': 0}

In [109]:
modified_values = {k: v**2 for k, v in d.items()}
modified_values # Now squared all the values

{'two': 100, 'one': 1, 'three': 9, 'five': 25}

In [110]:
modified_keys = {k: v for k in ["c", "d", "e", "f"] for v in d.values() if v != 5}
modified_keys # Renamed all the keys and removed values equal to five

{'c': 3, 'd': 3, 'e': 3, 'f': 3}

## Part 4: Numpy

Numpy is short for numerical Python, and is essential to most Python workflows. While Python does actually provided an Array type in the `array` module, it is rarely used because it only supports 1D arrays and also lacks much of the rich functionality that makes `ndarrays` so fundamental to much of the Python infrastructure.




Numpy provides an efficient representation of arrays, matrices and tensors with a range of numerical operations defined on for the object type.

Numpy's main object is the homogeneuous multidimensional array. It is a table of elements (usually numeric), all of the same type and indexed by a tuple of non-negative integers. In numpy (and Pandas), dimensions are refered to as axes.

In [3]:
import numpy as np
import sys

## Declaration

There are many ways to make an array. One of the most common is converting from a list.

In [4]:
array = np.array([1.5, 3.2, 7.6]) # Make an array from a list
array

array([1.5, 3.2, 7.6])

Another is using the `arange` method. It takes the form **$numpy.arange([start, ]stop, [step, ]dtype=None)$**

**REMEMBER**: Arguments in `[]` are optional! This is standard notation in the Python docs.

In [5]:
range_arr = np.arange(15)
range_arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

The `arange` function has additional value beyond declaring arrays. 

You can use it to make your iteration faster! Because the implementation of a loop or comprehension in Python is based on the type of iterator object used in the call, often using `arange` instead of `range` can move your looping out of base Python and into 'NumPy Space', where the iteration instead gets executed in highly optimzed C code.

**TL;DR** - Use `arange` instead of `range` when iterating over a numeric sequence, it is faster!

There are a number of additional other declaration methods for different use cases, we will get into them later.

### Array Attributes

The `ndarray` type in NumPy comes with a variety of built-in properties describing the object.

In [8]:
range_arr.ndim    # How many dimensions does the array have?

1

In [9]:
range_arr.shape    # What are the dimensions of the array? (rows, columns, ...)

(15,)

In [10]:
range_arr.dtype    # What datatype are the array elements?

dtype('int64')

In [11]:
range_arr.itemsize    # How big is on array element?

8

In [12]:
range_arr.size    # How many array elements are there?

15

In [13]:
range_arr.nbytes    # How big is the array in bytes?

120

In [15]:
range_arr.data    # Returns a buffer object containing the array data

<memory at 0x7f076b29a390>

### Slicing

The same powerful *slicing* syntax that works for `strings`, `lists` and other sequence types also work for `ndarrays`.


**RECALL:** Slicing syntax follows the $start:stop:step$ pattern.

In [61]:
range_arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Suppose you want to retrieve 3, 4, 5. How would you do it?

In [64]:
range_arr[]

array([3, 4, 5])

How about the number 2, 5, 8?

In [None]:
range_arr[]

What about 13, 11, 9, 7?

In [None]:
range_arr[]

The slicing syntax is simple for 1D arrays, but can get complicated (and powerful) when working with multi-dimensional arrays. We will revist this after discussion different ways to intialize `arrays`.

### Intialization

It is advantageous to predeclare large arrays in situations where you know the size that you need. There are several NumPy methods that make this easy to do! 

By predeclaring your array, you avoid the memory allocations that take place as the array grows; this speeds up your code, especially for large arrays.

#### Arrays of zeros: 

$numpy.zeros(shape, dtype=float, order='C')$

**ASIDE:** What does the argument `order` do?


In [23]:
zero_array = np.zeros((2, 5, 3))

zero_array.shape

In [24]:
zero_array

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

**\*ASIDE:**
Row vs Column Major Order:
  - All arrays in are stored in memory contiguously
  - The 'shape' of an array does not change the representation in memory, it just changes the way the array is indexed
  - There are two organization themes:
  
  ![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Row_and_column_major_order.svg/170px-Row_and_column_major_order.svg.png)
  
There is an array property that lets use see details about the internal representation of an array:

In [26]:
zero_array.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Here, *C_CONTIGUOUS* represents column-major order and *F_CONTIGUOUS* is for row-major order.

#### Arrays of ones:

$numpy.ones(shape, dtype=None, order='C')$

In [66]:
ones_array = np.ones((2, 3, 4), dtype='int8', order='F')

In [67]:
ones_array

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int8)

In [68]:
ones_array.flags    # Now our array is in row-major order

  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Understanding the internal representation of arrays helps us peel back the simplicity of Python syntax and get a deeper understanding of how the language works. Row vs column major order is utilized in NumPy to perform by reference transposition of matrix and tensor objects - in this operation the underlyling data is not changed, just the order of the index.

In [69]:
print(ones_array.T)

[[[1 1]
  [1 1]
  [1 1]]

 [[1 1]
  [1 1]
  [1 1]]

 [[1 1]
  [1 1]
  [1 1]]

 [[1 1]
  [1 1]
  [1 1]]]


In [70]:
ones_array.T.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

We can see here that a transpose operation (`.T`) inverts row vs column major order. Similar to slicing, it returns a *view* of the object, not a copy. Thus modifying a transposed array will change the original array.

In [71]:
ones_array.T[1, 0, 1] = 2

In [72]:
ones_array

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 2, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int8)

#### Arrays With Custom Values:

$numpy.full(shape, fill\_value, dtype=None, order='C')$

In [76]:
custom_array = np.full((2, 3), 0.+0.j)    # An array with complex numbers

In [75]:
custom_array

array([[0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j]])

In [77]:
custom_array.dtype

dtype('complex128')

We can also replicate the shape of other arrays using `np.full_like`.

In [78]:
like_ones_array = np.full_like(ones_array, 4)

In [79]:
like_ones_array

array([[[4, 4, 4, 4],
        [4, 4, 4, 4],
        [4, 4, 4, 4]],

       [[4, 4, 4, 4],
        [4, 4, 4, 4],
        [4, 4, 4, 4]]], dtype=int8)

In [80]:
like_ones_array.shape == ones_array.shape

True

### Random Numbers

NumPy provides native support for random numbers and sampling via the `random` submodule. These allow application of NumPy in various statistical applications. In fact `SciPy`, one of the major packages for advanced mathematics in Python, is built on top of NumPy.