<a href="https://colab.research.google.com/github/statrliu/data_bootcamp_part1/blob/main/Introduction_to_Python_for_Data_Science_lecture2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Python for Data Science**

## *Flow Control*



### **If-elif-else Statements**
In Python, the if (elif, else) statements are used for conditional execution of code. It is used to make decisions based on the truth value of a condition.

```
if condition:
    # code to be executed if condition is True
elif condition2:
    # code to be executed if condition2 is True and condition is False
else:
    # code to be executed if both condition and condition2 are False

```
**Important**
In Python, indentation is used to define the scope of a code block.
+ Code blocks are used in control structures such as `if`, `for`, `while`, `def`, `class`, etc.
+ In order to indicate which lines of code belong to a code block, you must indent them with whitespace (usually 4 spaces, but sometimes a tab is used instead).


In [None]:
x = -2

if x > 0:
    print("x is positive")
elif x < 0:
    print("x is negative")
else:
    print("x is zero")


### **For Loop**
A `for` loop is used to iterate over a collection of elements (iterable, iterator), such as a list, tuple, string, or dictionary.
```
for variable in collection:
    # code to be executed
```


In [None]:
# iterate over a list
for i in [2,3,4,5]:
    print(i)

In [None]:
# iterate over a dictionary
my_dict = {"apple": 1, "banana": 2, "orange": 3}
for key, value in my_dict.items():
    print(key, value)

In [None]:
# Nested for loop
for i in [1,2,3]:
    for j in (4,5):
        print((i, j))

#### *Break and Continue*

The `break` statement breaks out of the innermost enclosing `for` or `while` loop.

In [None]:
fruits = ["apple", "banana", "cherry", "orange", "kiwi"]
for fruit in fruits:
    if fruit == "orange":
        break
    print(fruit)

The `continue` statement continues with the next iteration of the loop (skip codes in the current iteration.):

In [None]:
fruits = ["apple", "banana", "cherry", "orange", "kiwi"]
for fruit in fruits:
    if fruit == "orange":
        continue
    print(fruit)

### While Loop
The `while` statement is used for repeated execution as long as an expression is true:
```
while condition:
    # code to be executed if condition is True
```
**Note:**

Be aware of an infinite loop! You may need to add some code logics to avoid infinite loop.

In [None]:
count = 1
while count <= 5:
    print(count)
    count += 1


### Loop with `else` Clause

Loop statements may have an `else` clause;
+ The `else` clause is executed when the loop terminates through exhaustion of the iterable (with `for`) or when the condition becomes `false` (with `while`).
+ It is not executed when the loop is terminated by a `break` statement.


In [None]:
fruits = ["apple", "banana", "cherry", "orange", "kiwi"]
for fruit in fruits:
    if fruit == "pineapple":
        print("Pineapple is in the list!")
        break
else:
    print("Pineapple is not in the list.")


## *Functions*
A function is a block of organized, reusable code that performs a specific task.

Functions provide a way to break down complex programs into smaller, modular pieces, making them easier to understand, reuse, and maintain.



### **Define a Function**
+ functions are defined using the `def` keyword, followed by the function name, any parameters the function takes in (if any), and a colon `:`.
+ The function code block is then indented, typically by four spaces or a tab.
```
def add_numbers(a, b):
    result = a + b
    return result
```
+ The `return` statement is optional. If there is no `return` statement in the body of the function, `None` will be returned when the function is called.

To use this function, you can simply call it with two arguments, like this:
```
sum = add_numbers(3, 5)
print(sum)
```



In [None]:
def add_numbers(a, b):
    result = a + b
    return result

sum = add_numbers(3, 5)
print(sum)

**Question**

What is the value of `sum` ?
```
def add_numbers(a, b):
    result = a + b
    
sum = add_numbers(3, 5)
```

Fucntions are the first class citizen in Python
  + A function can be assigned to a name in an asighment statement
  ```
  def fcn():
       print("I am a function")
  fcn_1 = fcn  
  ```
  + Funcations can be passed to another function as arguments (higher order function)
  + A function can be returned by another function (higher order function).



#### *Parameter vs Argument*

* When you define a function, the names (object references) in the function definition are called **parameters**

* When the function is called, the values or names (object references) passed to the function are called **arguments**.




In [None]:
def fcn(parameter_1, parameter_2):
      print(parameter_1, parameter_2)
      return 0

print(fcn(20, "abc")) # 20 and "abc" are called arguments.
print(fcn(12, parameter_2 = 20))

##### **Function parameters**
+ Positional-only parameter
+ Positional-keyword parameter
+ Variable-length positional parameter
+ Keyword only paramter
+ Variable-length keyword parameter
    



###### _positional-only parameter (Python 3.8 and later)_
+ When the function is called, we can only use positional argument for this type of parameters. Cannot use keyword argument

In [None]:
## All parameters defined before / are positinal-only
def fcn(p1, /):
    print(p1)

fcn(10) # positional argument
# 10


**Question**
What is the output of the following statements
```
def fcn(p1, /):
    print(p1)

fcn(p1 = 10)
```

a. `"p1"`

b. 10

c. `/`

d. `TypeError`

Positional-only parameter can have a default value.

It is also called an optional (default) positional-only parameter


In [None]:
def fcn(p1, p2 = 10, /): # p2 is optional
    print(p1, p2)

fcn(20)
# Output: 20 10
fcn(20, 30)
# Output: 20 30

###### _Positional-keyword parameter_
Can be called using either positional or keyword argument.

In [None]:
# p1 is positional-keyword parameter, as no / in presence.
def fcn(p1):
    print(p1)

fcn(10)
# 10
fcn(p1 = 10)
# 10

In [None]:
# p1 is positional-only, p2 is positional-keyword parameter.
def fcn(p1, /, p2):
    print(p1, p2)

fcn(10, 20)
# 10 20
fcn(10, p2 = 20)
# 10 20

###### *Variable-length positional parameter*
  


In [None]:
def fcn(*var_p): ## var_p is a variable length positional parameter
    print(var_p)

fcn(10, 30, 50)
# (10, 30, 50)
# notice that var_p is a tuple.

**Question**

What is the output of the statements?
```
def fcn(p1, p2, /, p3, p4, *var_p):
    print(p1, p2, p3, p4, var_p)

print(fcn(1, 2, 3, 4, 5, 6))
```

a. `(p1, p2, p3, p4, var_p)`

b. `1 2 3 4 5 6`

c. `1 2 3 4 (5, 6)`

d. `None of the above`

###### *Keyword-only parameter*
      
+ keyword-only parameter without default value (required keyword-only parameter)
        
+ keyword-only parameter with default value (optional keyword-only parameter)
         

In [None]:
## anything after * is keyword-only parameter
def fcn(*, kp1, kp2 =20):
    print(kp1, kp2)

fcn(kp1 = 10)
# 10 20
fcn(kp2 = 30, kp1 = 10)
# 10 30

**Question**

What is the output of the following statements?
```
def fcn(*, kp1, kp2 =20):
    print(kp1, kp2)

fcn(10)
```
a. `10`

b. `10, 20`

c. `None`

d. `TypeError`

###### *Variable length keyword-only parameter*


In [None]:
## kwvar_p is a variable length keyword-only parameter
def fcn(**kwvar_p):
    print(kwvar_p)

fcn(p1 = 10, p2 = 20)
# {'p1': 10, 'p2': 20}
# notice that kwvar_p is a dictionary.

###### _The order of parameters in a function definition_

  1. Positional-only parameter(s)
  2. Positional-keyword parameter(s)
  3. Variable length positional parameter
  4. Keyword-only parameter(s)
  5. Variable length keyword-only parameter
    
For example:
      
  ```
  def fcn(pos_only, /, pos_keyword, *var_pos, *, keyword_only, **var_keyword_only):
      pass
  ```



Non-default (non-optional, required) positional parameter can not follow default (optional) positional parameter.
  

In [None]:
def fcn(p1, p2 = 2,/, p3):
    pass
# SyntaxError: non-default argument follows default argument

def fcn(p2=2, p3):
    pass
# SyntaxError: non-default argument follows default argument

def fcn(p1 , /, p2=2, p3, p4):
    pass

# SyntaxError: non-default argument follows default argument

###### _The order of arguments in a function call_

  1. Positional argument(s)
  2. Keyword argument(s)
  
For example:
```
fcn(10, 'abc', arg1 = 30, arg2 = 50)
```


  

    





When calling a function, any argument after a keyword argument MUST be keyword argument


In [None]:
def fcn(p1, p2): # p1, p2 are positional-keyword parameters
    print(p1, p2)

fcn(p1=2, p2)
# SyntaxError: positional argument follows keyword argument

**Question**

What is the output of the following statements?
```
def fcn(p1, p2):
    print(p1, p2)

fcn(10, 20, p1 = 30)
```
a. `10 20`

b. `10 30`

c. `30 20`

d. `TypeError`

##### _Function call with `*` and `**`_

A funciton can also be called with 'tuple unpacking' and 'dictionary unpacking'
  


In [None]:
def fcn(p1, p2, p3, p4):
    print(p1, p2, p3, p4)

tup1 = (1,2)
dict1 = {"p3": 3, "p4": 4}   #here keys must match the name of the function
# parameters

fcn(*tup1, **dict1)
# 1 2 3 4

In [None]:
fcn(**dict1, *tup1)
# SyntaxError: iterable argument unpacking follows keyword argument unpacking

##### *Pass by Obeject Reference*

In a function call, variables are passed by object reference. This means that when you pass a variable to a function, you are passing a reference to the object in memory that the variable is referencing, rather than a copy of the object itself.

For example, consider the following code:




In [33]:
def my_function(my_list):
    my_list.append(4)
    print("Inside function: my_list =", my_list)

a = [1, 2, 3]
my_function(a)
print("Outside function: a =", a)

Inside function: my_list = [1, 2, 3, 4]
Outside function: a = [1, 2, 3, 4]


**Question**

What is the value of `a` after running the following statements?
```
def my_function(my_list):
    my_list = [5,6,7]
    my_list.append(4)
    print("Inside function: my_list =", my_list)

a = [1, 2, 3]
my_function(a)
```

a. `[1, 2, 3]`

b. `[5, 6, 7]`

c. `[5, 6, 7, 4]`

d. `[1, 2, 3, 4]`

### *Lambda function*

In Python, a lambda function (also known as an anonymous function) is a small, single-expression function that does not have a name.

+ Lambda functions are defined using the `lambda` keyword
+ followed by the function's parameters (if any), a colon `:`, and the expression to be evaluated.
+ The expression is then returned as the result of the function.
```
sum = lambda a, b: a + b
```

The lambda function can be called like a regular function:

```
result = sum(3, 5)
print(result)
```

Lambda functions are often used in combination with higher-order functions like `map`, `filter` to create concise and expressive code.


In [None]:
numbers = [1, 2, 3, 4, 5]
squares = map(lambda x: x**2, numbers)
print(list(squares))

In [None]:
numbers = [1, 2, 3, 4, 5, 6]
filtered_numbers = filter(lambda x: x % 2 != 0, numbers)
print(list(filtered_numbers))

## *Comprehensions in Python*


### **List Comprehension**

List comprehension is a concise way to create new lists in Python.
+ It allows you to create a new list by applying an **expression** to each element of an existing list (or other iterable object)
+ You can also filter the elements based on a condition.

+ The syntax of a list comprehension is as follows:
```
new_list = [expression for item in iterable if condition]
```
    + `new_list` is the new list that will be created
    + `expression` is the operation to be applied to each item in the iterable
    + `item` is a variable that represents each item in the iterable
    + `iterable` is the existing list or other iterable that the list comprehension is based on
    + `condition` is an optional condition that filters the elements in the iterable


Examples:


In [None]:
# create a list of squares of numbers from 1 to 10
squares = [x**2 for x in range(1, 11)]
print(squares)  # Output: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

# Create a list of even numbers from 1 to 10:
even_numbers = [x for x in range(1, 11) if x % 2 == 0]
print(even_numbers)  # Output: [2, 4, 6, 8, 10]

### **Dictionary Comprehension**
Dictionary comprehension is similar to list comprehension, but instead of creating a new list, it creates a new dictionary
+ It applies an expression to each `key-value` pair in an existing dictionary (or other iterable)
+ It can also filter the `key-value` pairs based on a condition.

The syntax of a dictionary comprehension is as follows:
```
new_dict = {key_expression: value_expression for item in iterable if condition}
```

Examples:


In [None]:
# Create a dictionary of squares of numbers from 1 to 5
squares = {x: x**2 for x in range(1, 6)}
print(squares)  # Output: {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


## *Packages and Modules*

### **Module**
A Python module is a Python code file (`.py` file)  
+ It is a way to organize code into reusable, standalone components that can be easily imported and used in other Python programs.
+ Modules allow you to break up your code into logical, reusable pieces that can be shared and reused by other developers.
+ Python modules can define functions, classes, variables, and constants that can be used in other Python programs.
+ Modules can also include executable code that runs when the module is imported, allowing you to define initialization logic or other setup code that should be executed when the module is loaded.

### **Package**

A Python package is a way to organize related modules into a single namespace, making it easier to manage and import modules in your Python program.
+ A package is simply a directory containing Python modules, along with a special file called `init.py` that is executed when the package is imported.
+ Packages can contain sub-packages, which are themselves packages nested within the main package. This allows for a hierarchical structure of packages and modules that can be organized in a logical way.



Import Packages/Modules


In [40]:
# Use import statement
import pandas # pandas will be the name for referencing the Pandas package
# object

import pandas as pd # pd will be the name to use. But we can not use symbol
# pandas.

pd.Series([1,2])
pandas.Series([1,2]) # NameError: name 'pandas' is not defined

# Use from ... import ... statement
from math import sqrt, cos # sqrt and cos will be in the namespace, can be
# directly called. BUT math is not in the namespace

You can access the functions, classes, and constants in the module using dot notation, for example:
```
pd.DataFrame([[1,2], [3,4]])
```

## *Unicode and UTF-8 Encoding*

Unicode and UTF-8 are related but distinct concepts in computer science. Unicode defines what each character means, while UTF-8 defines how each character is represented in binary form.

+ Unicode is a standard character encoding system that assigns **a unique numeric code point to every character** in the world's writing systems.
https://en.wikipedia.org/wiki/List_of_Unicode_characters

+ UTF-8 is a variable-length character encoding system that can **represent all Unicode code points using one to four bytes**. UTF-8 is the most widely used encoding for the World Wide Web and other modern computing applications.
https://en.wikipedia.org/wiki/UTF-8

+ Beside UTF-8, there are other encoding systems, such as: ASCII, UTF-16, UTF-32




In Python, Unicode strings are represented using the `str` data type, which can store characters from any writing system in the world.

Python uses the UTF-8 encoding by default for Unicode strings.
+ When you create a string literal in your Python code using quotes (either single or double), the characters in the string are automatically encoded using UTF-8. For example:

```
my_string = "Hello, 世界!"
print(my_string)
```

You can also create Unicode strings using escape sequences that represent Unicode code points in hexadecimal notation. For example:


In [None]:
#  The escape sequences use the \u prefix followed by a four-digit hexadecimal
# code point for each character.

my_string = "\u0048\u0065\u006C\u006C\u006F, \u4E16\u754C!"
print(my_string)

### **Encode() and Decode()**

*Encode() string method*

The `encode()` method is used to convert a string from its Unicode representation to a specified character encoding.

It returns a bytes object that contains the encoded version of the string. Here's an example:


In [None]:
string = "Hello, world!"
encoded_string = string.encode("utf-8")
print(encoded_string)

*Decode() byte string method*

The `decode()` method is used to convert a bytes object to a string in a specified character encoding. Here's an example:


In [None]:
encoded_string = b'Hello, world!'
print(encoded_string)

decoded_string = encoded_string.decode("utf-8")
print(decoded_string)

**Note**

It's important to note that when using encode() and decode(), it's necessary to use the same character encoding for both functions to ensure that the data is properly converted.

### **Byte String**

In Python, a byte string is a sequence of bytes that represents a string of characters. Byte strings are often used to store binary data or data that is not in human-readable format.


+ Byte strings can be used to represent non-textual data, such as images, audio files, and binary data.
+ They can also be used for network communication, file input/output
+ Other operations where data needs to be transmitted in a raw byte format.

It's important to note that byte strings are not the same as Unicode strings, which represent human-readable text. While Unicode strings can be encoded into byte strings using a specific character encoding, byte strings cannot be decoded into Unicode strings unless the original encoding is known.


In [None]:
byte_string = b"hello world" # create a byte string using the b prefix
byte_string1 = bytes([0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64])
print(byte_string1)
print(byte_string1.decode('utf8'))
# create a byte string using the bytes() constructor and a list of byte values