# Python Introduction Tutorial

<table align="center">
<td align="center"><a target="_blank" href="http://inspiredk.org">
<img align="center" src="https://i.ibb.co/gbKhWJ9m/Inspired-K-org-Logo-No-Whitespace-Small.png" style="padding-bottom:10px;" />InspiredK.org Website</a></td>

<td align="center"><a target="_blank" href="https://colab.research.google.com/github/InspiredK-organization/PythonTutorial/blob/main/Lab0 - Python Introduction Tutorial.ipynb">
<img align="center" src="https://i.ibb.co/2P3SLwK/colab.png" style="padding-bottom:10px;" />Run in Google Colab</a></td>
</table>

In this lab, we will practice working with some of Python's basics that are very helpful for AI including Python's built-in data types (lists, tuples, and dictionaries), loops, and Python libraries (NumPy).

## 1. Python's Built-in Types

Python has several built-in types that are useful for storing and manipulating data: lists, tuples, and dictionaries.

Here is the official Python documentation on these types (and many others): https://docs.python.org/3/library/stdtypes.html.

### 1.1 Lists

Lists are a data type in Python that contain structured items. They can be changed in many different ways.

#### 1.1.1 Build a List with Five Names (Zach, Jay, Richard, Abi, Kevin)

In [None]:
# Initialize an example list containing names.
names = ["Zach", "Jay"]

In [None]:
# Access the first name in the list with indexing.
print(names[0])

In [None]:
# Add a new name to the list with the .append method.
names.append("Richard")
print(names)

In [None]:
# Find the length of the list with the len() method.
print(len(names))

In [None]:
# Concatenate two lists together with the += operator.
# The += operator is short hand for list1 = list1 + list2. It can also be used for -, *, and / as well as on other types of variables.
names += ["Abi", "Kevin"]
print(names)

#### 1.1.2 Other List Examples

In [None]:
# There are two ways to initialize an empty list.
more_names = [] # One using square brackets.
more_names = list() # One with the list() method.

In [None]:
# Lists can contain different data types without issues.
stuff = [1, ["hi", "bye"], -0.12, None] # Integers, inner lists, floats, null values, etc.
print(stuff)

#### 1.1.3 List Slicing 
List slicing is one way to change lists by accessing specific elements in a list or parts of a list.

In [None]:
# Initialize an example array of the integers 0 to 6.
# Always remember that Python code starts counting from 0.
numbers = [0, 1, 2, 3, 4, 5, 6]

# Slice the list from a given start index (inclusive) to a given end index (exclusive).
print(numbers[0:3])

In [None]:
# When start index is not specified, it is assumed to be the start of the list.
print(numbers[:3])

In [None]:
# When end index is not specified, it is assumed to be the end of the list.
print(numbers[5:])

In [None]:
# Using ':' creates a copy (in memory) of all the elements in a list, which will be very useful later in NumPy.
print(numbers[:])

In [None]:
# Negative indices can be used to start at the end of list.
print(numbers[-1])

In [None]:
# Negative indices can be used for slicing in the same way, but in reverse.
print(numbers[-3:])
print(numbers[3:-2])

### 1.2 Tuples

Tuples are lists that cannot be changed. This includes adding elements, removing elements, and slicing.

In [None]:
# Parentheses are used to define tuples, while lists use square brackets.
names = ("Zach", "Jay")

In [None]:
# To access an element from the tuple or get the length of the tuple, the methods are the same as with lists.
print(names[0])
print(len(names))

In [None]:
# However, unlike lists, tuples do not allow item changes (this code will return an error).
names[0] = "Richard"

In [None]:
# In addition to using parentheses for tuple initialization, the tuple() method can be used as well.
empty = tuple()
print(empty)

# To create a tuple with a single item, a comma must be added after the item to ensure it is stored as a tuple.
single = (10,)
print(single)

### 1.3 Dictionary

Dictionaries are lists of mappings between keys and values. They are often used in AI to map characters to indices.

In [None]:
# There are two ways to initialize an empty dictionary.
phonebook = {} # One with curly brackets.
phonebook = dict() # One with the dict() method.

In [None]:
# To create dictionary with one item, a key-value pair has to be defined.
phonebook = {"Zach": "12-37"} # In this example, it is a string-string pair, but other data types can be used as well.

In [None]:
# To add another item to the dictionary, a given value must be assigned to a given key for the dictionary.
phonebook["Jay"] = "34-23" # Another string-string pair, but dictionary keys or values don't have to be the same types.

In [None]:
# Sometimes, you need to check if a key is in the dictionary before referencing that key.
print("Zach" in phonebook) # Zach is in our example dictionary.
print("Kevin" in phonebook) # But Kevin is not.

In [None]:
# Getting a corresponding value for a key in a dictionary can be done by calling the key within the dictionary.
print(phonebook["Jay"]) # What will be outputted?

In [None]:
# The del keyword is used to delete a key-value pair from a dictionary.
del phonebook["Zach"]
print(phonebook) # What will the example dictionary look like now?

## 2. Loops
Loops are a method in Python that are used to iterate over different data types. There are two different types of loops: for loops and while loops. They are both very important, but for loops are most commonly used for AI.

### 2.1 For Loops

In [None]:
# A for loop is used to iterate over lists or dictionaries, and they are useful for looping in a set range.
for i in range(5): # This line is the basis for the for loop - the general formatting is `for <variable> in <list>:`
    # Every line of code inside the loop will start with spaces or a tab, called indentation.
    # This helps tell the computer what is inside the loop and what is not.
    print(i)

#### 2.1.1 Looping Through Lists

In [None]:
# To iterate over a list, the process is similar:
names = ["Zach", "Jay", "Richard"]
for name in names: # Using the same base formatting from before.
    print(name)

In [None]:
# There are two ways to iterate over the values in a list and their indices.

# First, looping through the indices and accessing the corresponding list item for each index.
for i in range(len(names)):
    print(i, names[i])

In [None]:
# Second, using the enumerate() method on the list to get the value and its index at the same time.
for i, name in enumerate(names):
    print(i, name)

#### 2.1.2 Looping Through Dictionaries

In [None]:
# For dictionaries, there are several different ways to iterate over them.
phonebook = {"Zach": "12-37", "Jay": "34-23"}

# To iterate over the keys, use the same format as the loop for the list.
for name in phonebook:
    print(name)

print()

# Another way to iterate over the keys is to call the .keys() method on the dictionary.
for name in phonebook.keys():
    print(name)

In [None]:
# To iterate over the values, the .values() method can be used similar to .keys().
for number in phonebook.values():
    print(number)

In [None]:
# To iterate over the keys and values in a dictionary, the .values() method is used with similar format to the enumerate() method from list looping.
for name, number in phonebook.items():
    print(name + ': ' + number)

## 3. Python Libraries

### 3.1 NumPy
NumPy is a Python library that adds support for large, multi-dimensional arrays and matrices as well as a large collection of optimized, high-level mathematical functions to use with these arrays.

If you are doing this lab on Google Colab, then you can skip these instructions and continue the lab. 

If you are doing this lab locally, you may need to install NumPy first before importing it in the next cell.
There are many ways to manage your packages, but we suggest you use Anaconda. Anaconda creates sections called conda environments on your computer so that anything you install within an environment does not affect anything outside the environment.
 - [Download Anaconda](https://www.anaconda.com/download/success). Create a new conda environment for this lab.
 - Depending on the coding software you are using, there will be different steps to enable your conda environment. To find these steps, search "how to enable a conda environment in ..." for your coding software.

In [None]:
# Import NumPy as np for cleaner code in the future.
import numpy as np

In [None]:
# NumPy arrays are similar to lists but support multiple dimensions and faster operations. They are defined with np.array.

x = np.array([1, 2, 3]) # This NumPy array will be the same as a normal array but is called a vector because its only has one dimension.
y = np.array([[3, 4, 5]]) # This NumPy array now has some dimensionality to it, but it is still a vector because it essentially has one dimension.
z = np.array([[6, 7],[8, 9]]) # This NumPy array also has dimensionality, but now it is a matrix because it has two dimensions.

# Let's see how NumPy defines these arrays' shapes.
# When working with NumPy arrays, the .shape method tells you the shape of a given array. This will be a very useful for debugging and inspecting later.
print(x.shape)
print(y.shape)

In [None]:
# Based on the first 2 NumPy arrays' shapes, what will the last one look like?
print(z)
print()
print(z.shape)

Vectors are represented as 1-D arrays with shape (N,) or 2-D arrays with shape (N, 1) or (1, N). However, it is important to note that the shapes (N,), (N, 1), and (1, N) are not the same and may lead to different behaviors when manipulated.

Matrices are generally represented as 2-D arrays of shape (M, N) or more.

The best way to ensure your code has the behavior you expect is to keep track of your array shapes and refer back to documentation when unsure.

In [None]:
# Create a NumPy array that contains whole numbers from 0 till the provided number (exclusive) with the np.arange method.
a = np.arange(10)
print(a)

In [None]:
# Use the np.reshape method to change the shape of this array.
b = a.reshape((5, 2))
print(b)

#### 3.1.1 Array Operations

##### 3.1.1.1 The np.max Method
There are many NumPy operations that can be used to manipulate a NumPy array. Let's start with the np.max operation (documentation: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html).

In [None]:
# First, define an example NumPy array to work with.
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)

In [None]:
# What shape do you think this array will have?
print(x.shape)

In [None]:
# The np.max method can be used to get the highest number in each row, column, etc. to create a new array.
print(np.max(x, axis = 1)) # axis = 0 means columns, axis = 1 means rows, axis = 2 means a different dimension, etc.

In [None]:
# This new array no longer needs a second dimension since each row was summarized into one value.
print(np.max(x, axis = 1).shape) # What do you think the shape of the array will be?

In [None]:
# When the keepdims parameter is set to True, the array keeps its second dimension.
print(np.max(x, axis = 1, keepdims = True))

In [None]:
print(np.max(x, axis = 1, keepdims = True).shape) # What shape do you think this keepdims array will have?

##### 3.1.1.2 Matrix Operations
Next, let's look at some matrix operations. First, let's start with an element-wise product (Hadamard product) where the corresponding elements in two matrices are multiplied to form a new matrix.

In [None]:
# Define two NumPy arrays with the same dimensions.
A = np.array([[1, 2], [3, 4]])
B = np.array([[3, 3], [3, 3]])
print(A)
print(B)

# Perform the element-wise product by using the * operator.
print()
print(A * B)

The dot product is another way to manipulate matrices, but it is more commonly used for vectors. The dot product multiplies each of the corresponding elements in each vector and sums all of those products to reach a final answer.

In [None]:
# Define two vectors with the same shape to be compatible for the dot product.
u = np.array([1, 2, 3])
v = np.array([1, 10, 100])

print(np.dot(u, v)) # Perform the dot product calculation with the np.dot method.
# This will calculate (1 * 1) + (2 * 10) + (3 * 100) = 321.

print(u.dot(v)) # Another way to perform the dot product is to call the dot product of one vector on another vector.

Let's try the dot product with two arrays that have different dimensions.

In [None]:
# Define another NumPy array to see what will happen to the dot product.
W = np.array([[1, 2], [3, 4], [5, 6]])
print(v.shape) # This is the shape of the previous vector that we worked with.
print()
print(W.shape) # What do you think the new matrix's shape will be?

In [None]:
# Let's take the dot product of the vector and the matrix to see what happens.
print(np.dot(v, W))
print(np.dot(v, W).shape)

In [None]:
# That dot product worked because the first vector could be multiplied with the first column and the second column of the matrix to create a new matrix with two values.

# This new dot product does not work. Why do you think that is?
print(np.dot(W, v))

In [None]:
# That dot product did not work because the matrix could not be multiplied into a vector where the dimensions of the matrix does not match the vector.
# This issue can be resolved by transposing W, which switches the data in the rows and columns.
print(W)
print()
print(W.T)

In [None]:
# What do you expect the output of this dot product to be? What shape will this output have?
print(np.dot(W.T, v))
print(np.dot(W.T, v).shape)

Matrix multiplication combines two matrices by taking the dot product of each row from the first matrix and each column from the second matrix. Then, these dot products are combined to create one new matrix.

This is similar to the dot product example we last examined.

In [None]:
# There are two ways to perform matrix multiplication.
print(np.matmul(A, B)) # One with the np.matmul method.
print(A @ B) # One with the @ operator.

#### 3.1.2 Indexing

Slicing / indexing numpy arrays is a extension of the Python concept of slicing (lists) to N dimensions.

In [None]:
# Initialize a NumPy array with shape 3 x 4 that contains randomly generated real numbers.
x = np.random.random((3, 4))

# The ':' method selects the entire array.
print(x[:])

In [None]:
# We can create another NumPy array that contains the values 0 and 2 and combine that vector with the ':' method to select all the elements in the 1st and 3rd rows.
print(x[np.array([0, 2]), :])

In [None]:
# We can also select the 1st row as a vector and further select its 1st and 2nd elements.
print(x[1, 1:3])

In [None]:
# All of the values that return True for a given Boolean expression can be extracted to create a new vector containing them.
print(x[x > 0.5])

In [None]:
# A 3-D matrix with shape (3, 4, 1) can also be created using the ':' method and the np.newaxis method.
print(x[:, :, np.newaxis])

#### 3.1.3 Broadcasting

The term broadcasting describes how NumPy manipulates arrays with different shapes in order to correctly perform arithmetic operations on them.

**General Broadcasting Rules** 

When operating on two arrays, NumPy compares their shapes element-wise starting with the rightmost dimension and moving to the left. Two dimensions are compatible when:
- they are equal, or
- one of them is 1 (in which case, elements on the axis are repeated along the dimension)

More details can be found at https://numpy.org/doc/stable/user/basics.broadcasting.html

In [None]:
# Initialize another NumPy array with shape 3 x 4 that contains randomly generated real numbers.
x = np.random.random((3, 4))

# Initialize two more random NumPy arrays with shape 3 x 1 and 1 x 4, respectively.
y = np.random.random((3, 1))
z = np.random.random((1, 4))

# In this example, the last two arrays are broadcasted to match the shape of the first array.

In [None]:
# Let's add the first and second arrays to see how broadcasting works.
s = x + y

print(x.shape)
print(y.shape)

# The second array is broadcasted along its rows.
print()
print(s.shape)

In [None]:
# For another example, let's perform an element-wise product for the first and third arrays.
p = x * z

print(x.shape)
print(z.shape)

# The third array is broadcasted along its columns.
print()
print(p.shape)

In [None]:
# Let's initialize two new NumPy arrays to further see how broadcasting works.
a = np.zeros((3, 3)) # This array will be a 3 x 3 matrix of zeros.
b = np.array([[1, 2, 3]]) # This array will be a 3 x 1 vector containing the integers 1, 2, and 3.

# What do you think these arrays will look like?
print(a)
print()
print(b)

In [None]:
# Let's see what happens when we add these arrays together.
print(a+b)

Now that we have covered the basics of broadcasting, let's look at a more complex example.

In [None]:
# Let's define three new NumPy arrays:
a = np.random.random((3, 4)) # 3 x 4
b = np.random.random((3, 1)) # 3 x 1
c = np.random.random((3, )) # 3

print(a)
print()
print(b)
print()
print(c)

While we go through the next few examples, think about these questions:
- What is the expected broadcasting behavior for these operations?
- What do the following operations give us?
- What are the resulting shapes?

In [None]:
# Let's try adding the second array and its transposition.
result1 = b + b.T

# What do you think the shapes of these arrays will be?
print(b.shape)
print(b.T.shape)

In [None]:
# What will the output of this operation look like, and what will its shape be?
print(result1)
print(result1.shape)

In [None]:
# Now, let's try adding the first and third array together. Do you think this will work?
# Remember, the shape of the first array is 3 x 4 and the shape of the third array is only 3.
result2 = a + c

In [None]:
# This operation did not work because the two arrays have incompatible dimensions, and there is no clear way to broadcast either array to make it work.
# There are two ways to resolve this problem.

# First, you can transpose the first array so that broadcasting can add the third array to each row of the transposed array for the operation.
result2 = a.T + c

# However, the result will have to be transposed back to have the correct dimensionality.
print(result2.T)

In [None]:
# Second, you can add a second dimension to the third array so that its shape becomes 3 x 1.
result2 = a + c[:, np.newaxis]

print(result2)
# Can you come up with any other ways to fix this problem?

In [None]:
# Let's try a final operation by adding the second and third arrays.
result3 = b + c

# Will this operation work? If so, how did broadcasting make this operation work?
# What will the output of this operation look like, and what will its shape be?
print(result3)
print(result3.shape)

#### 3.1.4 Efficient NumPy Code

When working with NumPy arrays, avoid using for loops over indices/axes as they will dramatically slow down your code (usually by about 10-100x).

For example, let's compare the for loop and NumPy operations. We can time each of these codes using the %%timeit command from Jupyter notebooks and Google Colab.

In [None]:
%%timeit
# This is the explicit for loop that attempts to add 5 to every element in rows 100 to 1000 in a 1000 x 1000 NumPy array.
slowX = np.random.rand(1000, 1000)
for i in range(100, 1000):
    for j in range(1000):
        slowX[i, j] += 5

In [None]:
%%timeit
# This code does the exact same thing, but with NumPy operations to achieve an extremely faster speed.
fastX = np.random.rand(1000, 1000)
fastX[np.arange(100,1000), :] += 5

You can see from the console outputs of both code blocks that the NumPy operations are far faster than the standard Python for loop.