# **CIS 520: Machine Learning**
## **Python Tutorial Notebook**


- **Content Creators:** Siyun Hu, Di Wu, Ani Cowlagi
- **Content Reviewers:** Lyle Ungar
- **Acknowledgements:** This notebook contains an excerpt from [Google's Python Class](https://developers.google.com/edu/python/) and [DataCamp's free Intro to Python Tutorial](https://www.learnpython.org/). 

- **Objectives:** This worksheet covers topics include features of Python language, basic data types and how to manipulate Python objects using simple, readable Python syntax. I highly recommend you to complete the whole worksheet. It could help you get prepared for the coding homework later in this semester in a short period of time. For those who already know Python, I suggest you to finish the questions (there are eight questions in total). That's a great method to test your foundation, as well as help you review Python. You could directly find the solution by scrolling down to the end of the notebook.

- **Outlines:** You can find the outline by clicking the first button at the upper-left of your screen.  



In [None]:

#@markdown Tell us your thoughts about what you want to learn.
thoughts = '' #@param {type:"string"}
import time
try: t0;
except NameError: t0=time.time()

## **Autograding and the PennGrader**

First, you'll need to set up the PennGrader, which we'll be using throughout the semester to help you with your homeworks and worksheeets.

PennGrader is not only **awesome**, but it was built by an equally awesome person: Leo Murri.  Today, Leo works as a data scientist at Amazon!

PennGrader was developed to provide students with *instant* feedback on their answer. You can submit your answer and know whether it's right or wrong instantly. We then record your most recent answer in our backend database.

### Imports and Setup

 Run but do not modify this Section!

In [None]:
%%capture
!pip install penngrader


In [None]:
import random 
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
from numpy.linalg import *
np.random.seed(42)  # don't change this line

import dill
import base64

In [None]:
# For autograder only, do not modify this cell. 
# True for Google Colab, False for autograder
NOTEBOOK = (os.getenv('IS_AUTOGRADER') is None)
if NOTEBOOK:
    print("[INFO, OK] Google Colab.")
else:
    print("[INFO, OK] Autograder.")
    sys.exit()

[INFO, OK] Google Colab.


### Insert PennID here!

In [None]:
#PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. IF NOT, THE AUTOGRADER WON'T KNOW WHO 
#TO ASSIGN POINTS TO YOU IN OUR BACKEND
STUDENT_ID = 99999999 # YOUR PENN-ID GOES HERE AS AN INTEGER#

In [None]:
import penngrader.grader

grader = penngrader.grader.PennGrader(homework_id = 'CIS_5200_202230_HW_Python_Intro_WS', student_id = STUDENT_ID)

PennGrader initialized with Student ID: 99999999

Make sure this correct or we will not be able to store your grade


In [None]:
# A helper function for grading utils
def grader_serialize(obj):        # A helper function
    '''Dill serializes Python object into a UTF-8 string'''
    byte_serialized = dill.dumps(obj, recurse = True)
    return base64.b64encode(byte_serialized).decode("utf-8")

## **Why Python?**

- Accessible language
- Great scientific libraries and support, especially for machine/deep learning (Scikit-learn)
- Jupyter notebooks are great environments for exploring, visualizing, and sharing data analyses


## **Indentation**

One unusual Python feature is that the whitespace indentation of a piece of code affects its meaning. A logical block of statements such as the ones that make up a function should all have the same indentation. If one of the lines in a group has a different indentation, it is flagged as a syntax error.

Some gotchas about indentation:
- Avoid using TABs as they greatly complicate the indentation scheme 
- According to the [official Python style guide (PEP 8)](https://www.python.org/dev/peps/pep-0008/#indentation), you should indent with 4 spaces


*Tip: In Colab, you can change your preferred indentation under Tools -> Preferences... -> Editor*


## **Basic Data Types**

Standard data types of Python include **Numeric** , **Sequence**, **Boolean**, **Set** and **Dictionary**.

- **Numeric**: numeric data type represent the data which has numeric value. Numeric value can be integer, floating number or even complex numbers
- **Sequence Type**: sequence is the ordered collection of similar or different data types. Sequences allows to store multiple values in an organized and efficient fashion. Sequence value contains Strings, List, Tuple.
- **Dictionary**: dictionary is an unordered collection of data values, which holds key-value pair. 
- **Set**: set is an unordered collection of data type that is iterable, mutable and has no duplicate elements. 
- **Boolean**: boolean is a data type with one of the two built-in values, True or False.

*Tips*: 
- You could use type() function to determine the type of data.
- You could use the predefined functions like int(), float(), str() to perform explicit type conversion.
- Implicit Type Conversion is automatically performed by the Python interpreter.

In [None]:
a = 5
print("Type of a: ", type(a)) 

Type of a:  <class 'int'>


In [None]:
b = float(a)
print("Type of b: ", type(b))

Type of b:  <class 'float'>


### *Question 1: Mutable?*

A very important concept in Python is mutable or immutable. Everything in Python is an object. Once an object is instantiated, it is assigned a unique object id. A mutable object means it could be changed after it is initiated. On the contrary, an immutable object cannot be changed once it is initiated. 

Therefore, which of the above data types are mutable? (For this worksheet, you can find the solutions for all questions at the bottom of the notebook)

## **Basic Operations**

Just as any other programming languages, the addition, subtraction, multiplication, and division operators can be used with numbers. The main difference you might see is division operation.


In [None]:
a = 4/2
print("Type of a: ", type(a)) 

Type of a:  <class 'float'>


Surprisingly, from the above example, we could discover that even if the divisor is a factor of the dividend, the result is still a float. 

Another thing that is worth mentioning is **operation overloading**. Some operations such as +,* are defined to work with different types.

In [None]:
s1 = "abc" + "def"
print(s1)

abcdef


In [None]:
l1 = [0] + [1] + [2]
print(l1)

[0, 1, 2]


In [None]:
s2 = "hello"*3
print(s2)

hellohellohello


## **Advanced Operations**
Besides the basic operations mentioned above, each data type has their in-built operations. Here, we will briefly introduce some commonly used methods.

### **List Operations**



List is a weakly type array, which could be a mixture of multiple classes.

**List construction**

Here are several methods we could use to iniatiate a list.

```python 
a = []
b = [1]
```

And we could use append() function to add elements to the end of the list. 
```python
a.append(b)
```

*Tip*: append() is in-place operation.

**List Comprehensions** enable us to create a new list based on another list, in a single, readable line.

#### Basic Syntax

```python
result_list = [output_exp for var in input_list if (condition on var is true)]
```

Example

In [None]:
# create a new list by iteration
nums = range(100000)
squares = []

for num in nums:
  squares.append(num**2)


In [None]:
# create a new list by list comprehension
nums = range(100000)
squares = [num**2 for num in nums]

# multiple of 10
multi10 = [num for num in nums if num%10==0]

**List concatenation/extending** is done by using the the add operation or extend() function. 


In [None]:
x = [1,2]
y = [3,4,5]

In [None]:
y.extend(x)
print(y)

[3, 4, 5, 1, 2]


In [None]:
y += x
print(y)

[3, 4, 5, 1, 2, 1, 2]


## Exercise 1: manipulating the list

The cell below is given a function for listing all even nature numbers which are less than the input number and outputing them as a list. For example, if the input is 5, the output should be [0,2,4]. Now, it's your turn to use the basic syntax and its examples above to finish this function.

Tips: in python, you may use '%' to calculate the reminder of a integer. e.g 5%3=2

In [None]:
def get_even_num(i):
    """
    Get the list of even numbers less than the input integer

    Args:
    i: type: int, input number

    Output:
    even_num: type: list, even numbers that are less than the input integer
    """
    even_nums = []

    # Fill in the even_nums list with the list of even numbers less than i

    return even_nums

print(get_even_num(10))

[0, 2, 4, 6, 8]


**Time to use the autograder for the first time!** Run the cell below to get instant feedback on whether your implementation of ```get_even_num``` is correct! 

Don't modify this line to hardcode answers, graders will manually check for this!

In [None]:
grader.grade(test_case_id = 'test_case_get_even', answer = (get_even_num(10), get_even_num(15), get_even_num(1), get_even_num(-4)))

### *Question 2: Valid list*

[*True or False*] List defined in the following code is a valid list.

```python
l = ["CIS", 520, True]
```


Do you think the list above is valid? Record your answer below!

In [None]:
is_list_valid =  ## Insert answer here 

grader.grade(test_case_id = 'test_case_valid_list', answer = is_list_valid)

Correct! You earned 1.0/1.0 points. You are a star!

Your submission has been successfully recorded in the gradebook.


### *Question 3*: List Comprehension

```python
nums = range(100000)
odd_squares = []

for num in nums:
  if num % 2 == 1:
    odd_squares.append(num**2)
```

How can we convert the above code into a more concise version, using the idea of list comprehension?

In [None]:
#@markdown Please type your code in the field below:

#@markdown Note: You may test your code before entering it
w3_list_comp = '' #@param {type:"string"}

Also copy and paste the code below and run the cell!

In [None]:
odd_squares_list_comp =         # INSERT LIST COMPREHENSION CODE HERE!

grader.grade(test_case_id = 'test_case_list_comprehension', answer = (w3_list_comp, odd_squares_list_comp))

### **String Operations**

**Basic String Operations**

There are many powerful string operations, such as len(), count(), index(), isdigit(), strip(), split().

Just for your information, some useful resourses I found about basic 
string operations:
- [Python Documentation](https://docs.python.org/3/library/string.html)
- [Tutorial](https://www.learnpython.org/en/Basic_String_Operations)
- [Programiz](https://www.programiz.com/python-programming/methods/string)

**String formating**

We could use C-style string formatting to create new, formatted strings. The "%" operator is used to format a set of variables enclosed in a "tuple" (a fixed size list), together with a format string.

Example
```python
# This prints out "Hello, John!"
name = "John"
print("Hello, %s!" % name)
```

### *Question 4*

Given a list of name and a list of age, print out the required sentences.

Note: This is a practice and we won't ask you to submit it, but it will be very useful to understand how to do it for your future homeworks and project.

```python
Input: name = ['John', 'Jack', 'Mary', 'Tommy'], age = [10, 7, 9, 11]

Output:

"""John is 10 years old.

Jack is 7 years old.

Mary is 9 years old.

Tommy is 11 years old."""
```


*Hint: Please use string formatting*

### **Dictionary Operations**

Dictionary maps between key, value pairs. A valid dictionary should satisfy:
- Key must be hashable
- Values can be any data type, including numeric values, dictionary, tuple and etc. 


**Dictionary Construction**

In [None]:
# create empty dictionary
d1 = {}

# create empty dictionary using dict() method
d2 = dict({})

# create a dictionary with integer keys
d3 = {1: 'CIS', 2: '520'} 

# create a dictionary with each item as a Pair
d4 = dict([(1, 'CIS'), (2, '520')]) 

In [None]:
print(d4)

{1: 'CIS', 2: '520'}


We could add new key-value pairs into dictionary.

Basic syntax
```python
y = {}
y[key] = value
```

**Dict comprehensions** is quite similar with list comprehensions.

In [None]:
nums = range(100000)
odd_dict = {}

for num in nums:
  if num % 2 == 1:
    odd_dict[num] = num **2

In [None]:
nums = range(100000)
odd_dict = {num : num ** 2 for num in nums if num % 2 == 1}

**Dictionary Iterations**
Dictionaries can be iterated over, just like a list. However, a dictionary, unlike a list, does not keep the order of the values stored in it. 

Example

In [None]:
phonebook = {"John" : 938477566,"Jack" : 938377264,"Jill" : 947662781}
for name, number in phonebook.items():
    print("Phone number of %s is %d" % (name, number))

Phone number of John is 938477566
Phone number of Jack is 938377264
Phone number of Jill is 947662781


### *Question 5*


Given a dictionary with integer key, return all keys in the descending order.

```python
Input: dict = {1:'SEAS', 2:'Wharton', 3:'SAS'}
Output: list = [1,2,3]
```

In [None]:
def descending_dict(dic):
    """
    Args:
    dic: type: dict, input dictionary

    Output:
    keys: type: list, keys of the input dictionary with descending order
    """
    keys = []
    
    # Fill in the function here
    
    return keys

dic = {1:'SEAS', 2:'Wharton', 3:'SAS'}
print(descending_dict(dic))

[3, 2, 1]


Grade your ```descending_dict``` implementation by running the cell below. Again, do not modify this cell as we will be verifying this manually!

In [None]:
nums = range(10000)
dic2 = {num : str(num ** 2) for num in nums if num % 2 == 1}
reversed_keys = descending_dict(dic2)

grader.grade(test_case_id = 'test_case_descending_dict', answer = (dic2, reversed_keys))

### **Indexing Operations**

Indexing/Slicing operation could be applied to sequential data types, including list, string, array, tuple.

Here are standard rules of slicing and indexing in Python.
- `i:j:k` syntax corresponds to starting at index `i`, ending at index `j` with step size `k`
  - omitting `k` implies a step size of 1
- `i:` selects all indices beginning at index `i`
- `:j` selects all indices up to but not including index `j`
- `:` by itself selects all indices along an axis
- negative indexes are used to slice from the end of the string

Examples

In [None]:
# start with list of 0 to 9
import numpy as np

vec = np.array(range(10))
print(vec)

[0 1 2 3 4 5 6 7 8 9]


In [None]:
# select all even numbers
print(vec[0:10:2])

[0 2 4 6 8]


In [None]:
# selects indices starting with 5 to the end
print(vec[5:])

[5 6 7 8 9]


In [None]:
# selects indices up to, but not including, index 5
print(vec[:5])

[0 1 2 3 4]


In [None]:
# select the last two elements
vec[-2:]

array([8, 9])

In [None]:
# same idea with 2D arrays, but with two axes
A = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(A)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [None]:
# select the first two rows
print(A[:2, :])

[[1 2 3]
 [4 5 6]]


**Logical Indexing**

We can select according to boolean conditions across axes as well.

Example

In [None]:
np.random.seed(0)
num_animals = 100000
animal_weights = np.random.uniform(0, 50, num_animals)

# Use logical indexing
is_dog = animal_weights > 30
is_cat = animal_weights <= 30

### *Question 6*
Given the code, test them in a code cell and try to understand their outputs

Note: This is a practice and we won't ask you to submit it, but it will be very useful to understand how to do it for your future homeworks and project.
```python
a = np.array(range(5))
# Question 6.1
a[::-1]
# Question 6.2
a[-2:-1]
# Question 6.3
a[:]
```

## **Loop and Condition**

Just as other programming language, there are for loop and while loop in Python.

**The "for" Loop**

For loops can 
- iterate over a given sequence
- iterate over a sequence of numbers using the "range" functions. 


In [None]:
# iterate over list
primes = [2, 3, 5, 7]
for prime in primes:
    print(prime)

2
3
5
7


In [None]:
# iterate over a sequence of numbers 
for x in range(5):
    print(x)

0
1
2
3
4


**"while" loop**

While loops repeat as long as a certain boolean condition is met.


**"continue" and "break" statement**

*break* is used to exit a for loop or a while loop, whereas *continue* is used to skip the current block, and return to the "for" or "while" statement. 

*Tips:* the functionality of while, break, continue are the same in Python as they are in other programming languages.

**"else" clause for loops**

Unlike languages like C,CPP, we can use else for loops. When the loop condition of "for" or "while" statement fails then code part in "else" is executed. If a break statement is executed inside the for loop then the "else" part is skipped. Note that the "else" part is executed even if there is a continue statement.

In [None]:
count=0
while(count<5):
    print(count)
    count +=1
else:
    print("count value reached %d" %(count))

0
1
2
3
4
count value reached 5


### *Question 7*
Given the code, write down the output.

```python
for i in range(1, 10):
    if(i%5==0):
        break
    print(i)
else:
    print("CIS 520")
```

In [None]:
#@markdown Please type in the output of above code on a single line, with new lines represented by spaces e.g. if the output is 1 then 2, enter "1 2":

#@markdown Tips: you may test it in a code cell to verify your thought
w3_statement = '1 2 3 4' #@param {type:"string"}

**The "in" operator**

The "in" operator could be used to check if a specified object exists within an iterable object container, such as a list/dictionary.

In [None]:
name = "John"
birth_year = {"John": 1998, "Rick": 2000}
if name in birth_year:
    print("Your name is either John or Rick.")

Your name is either John or Rick.


**The "is" operator**

Unlike the double equals operator "==", the "is" operator does not match the values of the variables, but the instances themselves. 

In [None]:
x = [1,2,3]
y = [1,2,3]
print(x == y) 
print(x is y) 

True
False


**The "not" operator**

Using "not" before a boolean expression inverts an expression.

## **Classes and Objects**

The constructor for all python objects is called 

```python
def __init__(self, ...)
```
The self argument is in all class methods. This allows access to all object class attributes and methods inside the current function. It needs to be the first argument in every class method.


class attributes can be defined anywhere, and attributes are not typed.

```python
def __init__(self, ...)
    self.attr = [] 
    # the class now has an attribute called attr and a list is stored there
```

Example:

```python
class example:
    def __init__(self, x):
		    self.x_list = [x]
	  def add_to_list(self, x):
		    self.old_x_list = self.x_list.copy()
		    self.x_list.append(x)
        
ex = example(3)
ex.add_to_list(5)

```



### *Question 8*

Create a Dog class that at least fulfills the following requirements:
- Have a class attribute called species with the value "Canis familiaris".
- Each instance has name and age


Note: This is a practice and we won't ask you to submit it, but it will be very useful to understand how to do it for your future homeworks and project.

In [None]:
class Dog:
  # Fill in missing code here
    
  # missing code end

a = Dog("buddy", 3)
b = Dog("coco", 9)
print(a.species)
print(b.species)
print(a.name)
print(b.get_age())

Canis familiaris
Canis familiaris
buddy
9


Grade your ```Dog``` implementation by running the cell below.

In [None]:
grader.grade(test_case_id = 'test_case_dog_class', answer = Dog)

## Submitting to the Autograder

Now go to the File menu and choose "Download .ipynb".  Go to [Gradescope](https://www.gradescope.com/courses/409970) and:

1. From "File" --> Download both .ipynb and .py files
1. Name these files `Python_Intro_WS.ipynb` and `Python_Intro_WS.py` respectively
1. Sign in using your Penn email address (if you are a SEAS student we recommend using the Google login) and ensure  your class is "CIS 5200"
1. Select **Worksheet: Python Introduction**
1. Upload both files
1. PLEASE CHECK THE AUTOGRADER OUTPUT TO ENSURE YOUR SUBMISSION IS PROCESSED CORRECTLY!

You should be set! Note that this assignment has 10 autograded points that will show up upon submission. Points are awarded based on a combination of correctness and sufficient effort. 

## Solutions

*Question 1*: [0,2,4,6,8]

*Question 2*: True 

*Question 3*: 
```python
# list comprehensions are (often) more concise
nums = range(100000)
odd_squares = [num ** 2 for num in nums if num % 2 == 1]
```

*Question 4*: 
```python
name = ['John', 'Jack', 'Mary', 'Tommy']
age = [10, 7, 9, 11]

for i in range(len(name)):
    print("%s is %d years old." % (name[i], age[i]))
```


*Question 5*: 
```python
d = {1:'SEAS', 2:'Wharton', 3:'SAS'}
res = []
for k in d.keys():
  res.append(k)
print(sorted(res, reverse= False))
```

*Question 6:*
```python
array([4, 3, 2, 1, 0])
array([3])
array([0, 1, 2, 3, 4])
```

*Question 7:*
1
2
3
4

*Question 8*
```python
class Dog:
    # Class attribute
    species = "Canis familiaris"

    def __init__(self, name, age):
        self.name = name
        self.age = age
    def get_name(self):
        return self.name
    def get_age(self):
        return self.age

a = Dog("buddy", 3)
b = Dog("coco", 9)
print(a.species)
print(b.species)
print(a.name)
print(b.get_age())
``` 