<a href="https://colab.research.google.com/github/anyuanay/info212/blob/main/INFO212_Week1_Lecture_python_review.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# INFO 212: Data Science Programming 1

## Python Review: Python Basics and Built-in Data Structures
---

## **What you should know about Python**
* Python basics
* tuples are immutable
* list.append() to add elements at the end of a list
* enumearte(list) generates sequences of tuples (index, element)
* dictionary keys must be immutable
* zip combines two sequences to a sequence of tuples
* zip(*sequence of tuples) unzip to separate sequences
* [x for x in a sequence if filter] list comprehension is more compact
* set is dictionary with only keys
---

# Write High Quality Code at Any Point of Your Programming Journey

## Code quality is defined by a set of attributes such as
- Maintainability
- Reusability
- Readability
- Efficiency
- Error proneness
- Modularity.

## Adopt a Coding Convention
- Use camelCase and PascalCase
- camelCase:
- aVariables
- aFunctionName
- PascalCase:
- ClassName
- ConstantName
- EnumSet
- Meaningful names
- Short lines

## Write comments
- Add comments to explain your functions
- Add comments to explain your logic
- Add comments to explain your thoughts
- Add comments to explain the pre- and post- conditions
- Add comments to explain the usage
- It is all for your own benefits and others.

## Adopt test-driven development
- Test and verify the code geneated by AI tools, that is where you separate yourself from a machine.
- Think about all the things that could trip your code up and write tests to ensure that your code caters for all scenarios.
- Murphy’s Law: Anything can go wrong, will go wrong.
- You test should achieve 100% coverage.
- Test all extreme cases and boundary conditions.

# Python Basics

#### Comments
Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter.
This is often used to add comments to code. At times you may also want to
exclude certain blocks of code without deleting them. An easy solution is to comment
out the code

#### Functions
A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function. A function can return data as a result.

In Python a function is defined using the def keyword. To call a function, use the function name followed by parenthesis. Information can be passed into functions as arguments.

```
# Example
def my_function(name):
  print("Hello " + name + " from a function")

my_function("John")
```

#### Variables and argument passing

When assigning a variable (or name) in Python, you are creating a reference to the
object on the righthand side of the equals sign. In practical terms, consider a list of
integers

```
data = [1, 2, 3]

append_element(data, 4)

data
```

When you pass objects as arguments to a function, new local variables are created referencing
the original objects without any copying. If you bind a new object to a variable
inside a function, that change will not be reflected in the parent scope. It is
therefore possible to alter the internals of a mutable argument. Suppose we had the
following function:
```python
def append_element(some_list, element):
    some_list.append(element)
```

#### Imports
In Python a module is simply a file with the .py extension containing Python code.
Suppose that we had the following module:

```python
# some_module.py
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b
```

If we wanted to access the variables and functions defined in some_module.py, from
another file in the same directory we could do:
```
import some_module
result = some_module.f(5)
pi = some_module.PI
```

```
from some_module import f, g, PI
result = g(5, PI)
```

```
import some_module as sm
from some_module import PI as pi, g as gf

r1 = sm.f(pi)
r2 = gf(6, pi)
```

#### Mutable and immutable objects
Most objects in Python, such as lists, dicts, NumPy arrays, and most user-defined
types (classes), are mutable. This means that the object or values that they contain can
be modified:

strings and tuples, are immutable:

In Python, keys of dictionay must be immutable. tuples can be keys of dictionary.

---



# Python Data Types

### Scalar Types
Python along with its standard library has a small set of built-in types for handling
numerical data, strings, boolean (True or False) values, and dates and time. These
“single value” types are sometimes called scalar types and we refer to them in this
book as scalars.

#### Numeric types

#### Strings
Many people use Python for its powerful and flexible built-in string processing capabilities.
You can write string literals using either single quotes ' or double quotes ":

```
a = 'one way of writing a string'
b = "another way"
```

The syntax s[:3] is called slicing and is implemented for many kinds of Python
sequences.

The backslash character \ is an escape character, meaning that it is used to specify
special characters like newline \n or Unicode characters.

#### Booleans
The two boolean values in Python are written as True and False. Comparisons and
other conditional expressions evaluate to either True or False. Boolean values are
combined with the and and or keywords:

#### Type casting
The str, bool, int, and float types are also functions that can be used to cast values
to those types:

#### None
None is the Python null value type. If a function does not explicitly return a value, it
implicitly returns None:

```
def add_and_maybe_multiply(a, b, c=None):
    result = a + b

    if c is not None:
        result = result * c

    return result
```

#### Dates and times
The built-in Python datetime module provides datetime, date, and time types. The
datetime type, as you may imagine, combines the information stored in date and
time and is the most commonly used:

In [None]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute

30

Given a datetime instance, you can extract the equivalent date and time objects by
calling methods on the datetime of the same name:

In [None]:
dt.date()

datetime.date(2011, 10, 29)

In [None]:
dt.time()

datetime.time(20, 30, 21)

The strftime method formats a datetime as a string:

In [None]:
dt.strftime('%m/%d/%Y %H:%M')

Strings can be converted (parsed) into datetime objects with the strptime function:

In [None]:
datetime.strptime('20091031', '%Y%m%d')

When you are aggregating or otherwise grouping time series data, it will occasionally
be useful to replace time fields of a series of datetimes—for example, replacing the
minute and second fields with zero:

In [None]:
dt.replace(minute=0, second=0)

The difference of two datetime objects produces a datetime.timedelta type:

In [None]:
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta
type(delta)

Adding a timedelta to a datetime produces a new shifted datetime:

In [None]:
dt
dt + delta

### Exercise:
- Print out the weekday of today.
- Print out the numerical order of today's date in this year.

### Control Flow
Python has several built-in keywords for conditional logic, loops, and other standard
control flow concepts found in other programming languages.

#### if, elif, and else

```
if x < 0:
    print('It's negative')
```

In [None]:
x = -2

In [None]:
if x <0:
    print("it's negative")

it's negative


```
if x < 0:
    print('It's negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')
```

If any of the conditions is True, no further elif or else blocks will be reached. With
a compound condition using and or or, conditions are evaluated left to right and will
short-circuit:

It is also possible to chain comparisons:

In [None]:
4 > 3 > 2 > 1

#### for loops
for loops are for iterating over a collection (like a list or tuple) or an iterater. The
standard syntax for a for loop is:

```
for value in collection:
    # do something with value
````

You can advance a for loop to the next iteration, skipping the remainder of the block,
using the continue keyword. Consider this code, which sums up integers in a list and
skips None values:
```
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue
    total += value
```

A for loop can be exited altogether with the break keyword. This code sums elements
of the list until a 5 is reached:
```
sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break
    total_until_5 += value
```

As we will see in more detail, if the elements in the collection or iterator are sequences
(tuples or lists, say), they can be conveniently unpacked into variables in the for
loop statement:
```
for a, b, c in iterator:
    # do something
```

#### while loops
A while loop specifies a condition and a block of code that is to be executed until the
condition evaluates to False or the loop is explicitly ended with break:

```
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2
```

#### pass
pass is the “no-op” statement in Python. It can be used in blocks where no action is to
be taken (or as a placeholder for code not yet implemented); it is only required
because Python uses whitespace to delimit blocks:

```
if x < 0:
    print('negative!')
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print('positive!')
```

#### range
The range function returns an iterator that yields a sequence of evenly spaced
integers:

In [None]:
range(10)
list(range(10))

In [None]:
list(range(0, 20, 2))
list(range(5, 0, -1))

seq = [1, 2, 3, 4]
for i in range(len(seq)):
    val = seq[i]

sum = 0
for i in range(100000):
    # % is the modulo operator
    if i % 3 == 0 or i % 5 == 0:
        sum += i

### Exercise:
- Write a for loop to print out the numbers between 0-20 indicating whether the number is even or odd.

## Exception in Python

In [None]:
fileName = 'empty_lines.txt'
try:
    with open(fileName, 'r') as f:
        line = f.readlines()
        print(lines)
except FileNotFoundError:
    print("{} desn't exist".format(fileName))

['this is a Line\n', '\n', 'this is a Line\n', '\n', '\n', '\n', 'this is a Line\n']


In [None]:
filename = 'empty_lines'

In [None]:
try:
    with open(filename, 'r') as f:
        line = f.readlines()
        print(lines)
except OSError:
    print("{} doesn't exist".format(filename))

empty_lines doesn't exist


## Tuple
A tuple is a fixed-length, immutable sequence of Python objects. The easiest way to
create one is with a comma-separated sequence of values. Tuple is defined using ().

## Exercise:
Create a tuple with 3 items. Show that the tuple is immutable

## List
lists are variable-length and their contents can be modified
in-place. You can define them using square brackets [] or using the list type function:

In [None]:
ls = [3, 4, 5, 6]
ls[3] = 0
ls

[3, 4, 5, 0]

### List Slicing
List items can be iterated through positive or negative indices.

The following figure shows a helpful illustration of slicing with positive and negative
integers. In the figure, the indices are shown at the “bin edges” to help show
where the slice selections start and stop using positive or negative indices.
![](https://i.imgur.com/zJA7O16.png)

## Exercise:
Given a list of 9 items. What is the positive index of 6th item? What is the negative index of the 6th item? What is the range of positive indices of the third, fourth, and fifth items? What is the range of negative indices of the third, fourth, and fifth items? What is the range of mixed positive and negative indices for the third, fourth, and fifth items?

### Built-in Sequence Functions
Python has a handful of useful sequence functions that you should familiarize yourself
with and use at any opportunity.

#### enumerate

#### zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a
list of tuples:
```
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)
list(zipped)
```

## Dictionary
Dictionary is likely the most important built-in Python data structure. A more common
name for it is hash map or associative array. It is a flexibly sized collection of key-value
pairs, where key and value are Python objects. One approach for creating one is to use
curly braces {} and colons to separate keys and values:

#### Creating dicts from sequences
It’s common to occasionally end up with two sequences that you want to pair up
element-wise in a dict. As a first cut, you might write code like this:

```
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```

#### Valid dict key types

Keys of dicitonaries must be hashable or immutable.
```
hash('string')
hash((1, 2, (2, 3)))
hash((1, 2, [2, 3])) # fails because lists are mutable
```

## Exercise:
Is `(1, 2, '23')` immutable or mutable? Use code to show it.

## Set

A set is an unordered collection of unique elements.
```
set([2, 2, 2, 1, 3, 3])
{2, 2, 2, 1, 3, 3}
```

Set union, intersection, difference, and symmetric difference

```
a.union(b)
a | b
```

```
a.intersection(b)
a & b
```

## Exercise:
List the unique values in the list `[2, 2, 2, 1, 3, 9, 0, 0]`.

## List, Set, and Dict Comprehensions

List comprehensions are one of the most-loved Python language features. They allow
you to concisely form a new list by filtering the elements of a collection, transforming
the elements passing the filter in one concise expression. They take the basic form:
```
[expr for val in collection if condition]
```

This is equivalent to the following for loop:
```
result = []
for val in collection:
    if condition:
        result.append(expr)
```
The filter condition can be omitted, leaving only the expression. For example, given a
list of strings, we could filter out strings with length 2 or less and also convert them to uppercase like this:

```
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]
```

Set and dict comprehensions are a natural extension, producing sets and dicts in an
idiomatically similar way instead of lists. A dict comprehension looks like this:
```
dict_comp = {key-expr : value-expr for value in collection if condition}
```

A set comprehension looks like the equivalent list comprehension except with curly
braces instead of square brackets:
```
set_comp = {expr for value in collection if condition}
```

```
unique_lengths = {len(x) for x in strings}
unique_lengths
```

```
set(map(len, strings))
```

```
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping
```

## Exercise:
Given a list of names `names = ["Alice", "Bob", "Charlie", "David", "Emma", "Frank", "Grace"]`. Use dictionary comprehension to create a dictionary that maps each unique length to the list of names that have the same length.