### Introduction

This is part 1 of Recitation 0 for the Fall 2019 iteration of the course 11-785: Intro to Deep Learning. The recitation is split into 3 parts, with part 1 introducing relevant python skills and libraries, part 2 going more in depth with vector operations using numpy, and part 3 going over AWS, SageMaker, IPython, and Jupyter Notebook. 

### Python 

All information about python, from downloading to documentation, can be found here: https://www.python.org. We recommend that you use python 3 for the homeworks in this course. 


### Modules

Install external modules using the pip package manager(https://pypi.org/project/pip/). Modules we'll be using in the course include numpy and torch (Pytorch).

#### Importing Modules

In [None]:
import numpy
a = numpy.array([1,2,3])

import numpy as np
a = np.array([1,2,3])

from math import ceil

from math import * 

### File Formats and Loading Data 

 - .txt: plain text file
 - .pkl: python objects
 - .csv: tabular data - fields separated by commas
 - .npz: zipped archive of npy files
 - .npy: numpy arrays (saved using numpy library)

In [13]:
f = open("someRandomFile.txt", "r")

print(f.readlines())

f.close()

['This is the first line of text,\n', 'Followed by the second']


In [14]:
with open("someRandomFile.txt", "r") as f:
    print(f.readlines())

['This is the first line of text,\n', 'Followed by the second']


More about file i/o functions: 
More about file access modes: https://stackoverflow.com/questions/16208206/confused-by-python-file-mode-w 

In [15]:
import pickle

mydict = {"student1": "Alice", "Student2": "Bob", "Student3": "Rachel"}

pickle.dump(mydict, open("store.pkl", "wb"))

loaded = pickle.load(open("store.pkl", "rb"))
print(loaded)

{'student1': 'Alice', 'Student2': 'Bob', 'Student3': 'Rachel'}


In [16]:
import csv

with open ("SacramentocrimeJanuary2006.csv", "r") as f:
    reader = csv.reader(f, delimiter=",")
    
    i = 0 
    
    for row in reader:
        if (i == 10):
            break
        print(row)
        i+=1        

['cdatetime', 'address', 'district', 'beat', 'grid', 'crimedescr', 'ucr_ncic_code', 'latitude', 'longitude']
['1/1/06 0:00', '3108 OCCIDENTAL DR', '3', '3C        ', '1115', '10851(A)VC TAKE VEH W/O OWNER', '2404', '38.55042047', '-121.3914158']
['1/1/06 0:00', '2082 EXPEDITION WAY', '5', '5A        ', '1512', '459 PC  BURGLARY RESIDENCE', '2204', '38.47350069', '-121.4901858']
['1/1/06 0:00', '4 PALEN CT', '2', '2A        ', '212', '10851(A)VC TAKE VEH W/O OWNER', '2404', '38.65784584', '-121.4621009']
['1/1/06 0:00', '22 BECKFORD CT', '6', '6C        ', '1443', '476 PC PASS FICTICIOUS CHECK', '2501', '38.50677377', '-121.4269508']
['1/1/06 0:00', '3421 AUBURN BLVD', '2', '2A        ', '508', '459 PC  BURGLARY-UNSPECIFIED', '2299', '38.6374478', '-121.3846125']
['1/1/06 0:00', '5301 BONNIEMAE WAY', '6', '6B        ', '1084', '530.5 PC USE PERSONAL ID INFO', '2604', '38.52697863', '-121.4513383']
['1/1/06 0:00', '2217 16TH AVE', '4', '4A        ', '957', '459 PC  BURGLARY VEHICLE', '22

### Storing Data

 - lists: generic container - allow for numeric indexing
 - tuples: immutable lists
 - dictionaries: keys act as indices - keys must be unique
 - sets: group of unique elements

In [17]:
same_type_list = [1, 3, 4, 89, 23, 43, 90]
diff_type_list = [1, 3, "hello", 4.9, "c"]

print(same_type_list[3])
print(len(same_type_list))

same_type_list[0] = "I'm new"
print(same_type_list)

print(same_type_list[8])

89
7
["I'm new", 3, 4, 89, 23, 43, 90]


IndexError: list index out of range

In [18]:
new_list = same_type_list + diff_type_list
print(new_list)

["I'm new", 3, 4, 89, 23, 43, 90, 1, 3, 'hello', 4.9, 'c']


In [19]:
new_list_2 = ["hi", "hello"] * 2
print(new_list_2)

['hi', 'hello', 'hi', 'hello']


In [20]:
same_type_tuple = (1, 10, 7)
diff_type_tuple = (1, 2, "foo") 

print(diff_type_tuple[2])
print(same_type_tuple[1])

same_type_tuple[0] = 3

foo
10


TypeError: 'tuple' object does not support item assignment

In [22]:
my_dict = {"student1": "Alice", "student2": "Bob", "student3": "Rachel"}

print(my_dict["student1"])
# print(my_dict["student4"])
print(my_dict.get("student4", "student does not exist"))

Alice
student does not exist


In [23]:
my_dict["student1"] = "Billy"
print(my_dict)

{'student1': 'Billy', 'student2': 'Bob', 'student3': 'Rachel'}


In [24]:
my_set = {"obj1", "obj2", "obj3"}
print(my_set)

{'obj1', 'obj2', 'obj3'}


In [25]:
print(my_set[1])

TypeError: 'set' object does not support indexing

In [26]:
# some more list/tuple functions

print(max(same_type_tuple))
print(min(same_type_tuple))
print(sorted(same_type_tuple))

10
1
[1, 7, 10]


### Filtering Lists

 - slicing and dicing
 - list comprehensions

In [27]:
# slicing & dicing
# general format: sliced_list = [start_idx : end_idx+1 : step]

some_list = [2, 5, 2, 45, 7, 9, 76, 80, 21, 53]
print(some_list[5:])
print(some_list[:3])
print(some_list[3:9:2])

[9, 76, 80, 21, 53]
[2, 5, 2]
[45, 9, 80]


#### Slicing with 2D and 3D arrays

3D arrays are indexed across the 3 dimensions as follows:

![alt text](3d_array.png)

Where if you consider your 3D array to be a stack of matrices, i selects the matrix, j selects the row in that matrix and k selects the column in that matrix.

In [28]:
# slicing 3D array examples:

three_d_array = np.array([[[10, 11, 12], [13, 14, 15], [16, 17, 18]],
               [[20, 21, 22], [23, 24, 25], [26, 27, 28]],
               [[30, 31, 32], [33, 34, 35], [36, 37, 38]]])




In [29]:
# ------------ selecting a row ------------
# you want to specify the matrix, then the row

print(three_d_array[0,2]) # matrix 0, row 2

[16 17 18]


In [30]:
# ------------ selecting a column ------------
# you want to specify the matrix, ignore the row, and then specify the column

print(three_d_array[1, :, 1]) # matrix 1, column 1

[21 24 27]


In [31]:
# ------------ selecting a matrix ------------

print(three_d_array[2]) # matrix 2

[[30 31 32]
 [33 34 35]
 [36 37 38]]


In [32]:
# ------------ creating a row across matrices ------------
print(three_d_array[:, 1, 2]) # for every matrix, row 1, column 2 

[15 25 35]


In [33]:
# ------------ creating a matrix from rows ------------
print(three_d_array[:, 1]) # for every matrix, row 1

[[13 14 15]
 [23 24 25]
 [33 34 35]]


In [34]:
# ------------ creating a matrix from columns ------------
print(three_d_array[:, :, 1]) # for every matrix, for every row, column 1

[[11 14 17]
 [21 24 27]
 [31 34 37]]


You can also slice within rows, columns and matrices in a 3D array, the same way you would in 1D arrays. 

In [35]:
# list comprehensions
# general format: new_list = [expression for_loop_one_or_more condtions]

res = [num for num in same_type_list if isinstance(num, int) and num>0]
print(res)


[3, 4, 89, 23, 43, 90]


In [36]:
%%timeit 

res2 = []
res2 = [i for i in range(500)]

22.1 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [37]:
%%timeit

res3 = []
for i in range(500):
    res3.append(i)

51.4 µs ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### Classes

You'll use classes extensively in the homeworks in this course. You will need classes for defining your models as well as your datasets. We'll consider an example of implementing the dataset class here. 

In [None]:
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __getitem__(self, index):    
        return (self.x[index], self.y[index])
    
    def __len__(self):
        return len(self.x)   
        

### Debugging - pdb

In [None]:
import pdb

for i in range(5):
    pdb.set_trace()
    print("this is iteration " + str(i))

> <ipython-input-38-469cc786a009>(5)<module>()
-> print("this is iteration " + str(i))
(Pdb) next
this is iteration 0
> <ipython-input-38-469cc786a009>(3)<module>()
-> for i in range(5):
(Pdb) next
> <ipython-input-38-469cc786a009>(4)<module>()
-> pdb.set_trace()
(Pdb) next
> <ipython-input-38-469cc786a009>(5)<module>()
-> print("this is iteration " + str(i))
