### Useful Notes
1. Adding cell below: "ALT + Enter" or "Esc + b"
2. Adding cell above: "Esc + a"
3. Running cell: "Shift + Enter"
4. To switch between windows (e.g., PPT and Jupyter): First open the two windows consecutively, then switch between these two windows using "Alt + Tab"
5. Use "#" to include comments.
6. Do not indent (e.g., include spaces at the beginning) when writing codes (except for comments). Indentation is used to structure your code (e.g., functions).
7. Check typos.

# Section 0: Package Check and Importing
An advantage to use Anaconda is that, it has many of the useful packages (e.g., numpy, pandas, matplotlib, scikit-learn,...) installed for us. In below, we try a couple of syntax to see if these packages have been successfully installed. 

In [5]:
import numpy # importing numpy package

In [6]:
import pandas

In the cells above, we imported two packages/libraries: numpy and pandas. Be aware that we always need to import the packages to apply the corresponding functions/methods. There are two types of imports: general import and selective import.

### Section 0.1: General Import
General imports make all functionality from the package available to you. Just like what we did above, we use **import x** to import package x. More frequently, we use **import xyz as x** to import package xyz, and use abbreviation "x" to refer to "xyz".

In [7]:
import numpy as np
import pandas as pd

### Section 0.2: Selective Import
Instead of loading everything in a package, we can also make the import more selective. For example, we may use only a specific part of a package (e.g., a specific function, a piece of data, a variable, ...) This case, we use: **from A import B**. In here, B is a component of A.

In [8]:
from sklearn.datasets import load_iris
from matplotlib import pyplot as plt

# Section 1: Basic Data Type, Assignment, and Operators

## Section 1.1 Basic data types in Python
 - **int, or integer:** a number without a fractional part. e.g.: 1,2, 100, ...
 - **float, or floating point:** a number that has both an integer and a fractional part, separated by a point. e.g.: 3.14, 5.7, ...
 - **str, or string:** a type to represent text. Use single or double quotes to build a string. e.g.: 'Texas', 'Lucy', ...
 - **bool, or boolean:** a type to represent logical values. Can ONLY be True or False. *Note: No quotation allowed. No abbreviations allowed. Case sensitive.*

We can use the built-in function **type(x)** to obtain the data type for x, if x is a scalar. 



In [14]:
# type(3)
type(True)

bool

In [9]:
type(3),type(3.14), type('Hello World!'), type(True)

(int, float, str, bool)

In [15]:
3>7, type(3>7)

(False, bool)

**Practice:** 

Without running the syntax, answer the questions.
- What is the output of **type('True')** and **type ('3')**?
- What is the output of **type(3.0)**? 
- What is the output of **type(true)**?


In [20]:
type(True),type(False)

(bool, bool)

In [17]:
type('True'),type ('3')

(str, str)

## Section 1.2 Assignment
Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when you assign a value to a variable. Use equal sign (=) to assign values to variables. You can assign a number or a string. Use single or double quotes (i.e., ' ', or " ") for strings.

In [21]:
# Assign an integer:
a = 5
type(a)

int

In [22]:
a+1

6

In [23]:
# Assign a float:
b = 5.2
type(b)

float

In [24]:
# Assign a string (use '')
c = 'John'
type(c)

str

In [25]:
# Assign a Bool
d = False
type(d)

bool

### Multiple Assignments: 
- You can assign one single value to multiple variables simultaneously. 
- You can also assign multiple different values to different variables simultaneously (more formally, "unpacking"). For multiple-multiple assignment, make sure the dimensions are consistent.

In [26]:
## Assigning one (same) value to multiple variables.
x = y = z = 1 
x,y,z 

(1, 1, 1)

In [30]:
## Assinging multiple values to multiple variables simultaneously.
x, y, z = 7, 4, "Apple" 
z

'Apple'

In [32]:
x, y,x1 = 7, 4, "Apple" 

## Section 1.3 Operators on Numbers
We can apply operators to numbers (i.e., int and float) for naive calculation. Some operators that may be unfamiliar to you are:
- "**": Exponential. (Two multiplication marks)
- "/": Division. After devision, result is float.
- "//": Division. After devision, returns only the integer part.

In [38]:
## In below, do some naive calculations
1+2, 1-2, 2*3, 2**3, 2/3, 2//3

(3, -1, 6, 8, 0.6666666666666666, 0)

In [43]:
## Print the result.  - Use print() function. When printing multiple elements, use comma
print("The out put of 4 over 2 (single slash) is:", 4/2, "And the data type is:", type(4/2))
print("The out put of 4 over 2 (double slash) is:", 4//2, "And the data type is:", type(4//2))

The out put of 4 over 2 (single slash) is: 2.0 And the data type is: <class 'float'>
The out put of 4 over 2 (double slash) is: 2 And the data type is: <class 'int'>


# Section 2: Lists

A list contains items separated by commas and enclosed within square brackets (**[ ]**). In a list, you can store different data types (e.g., string, integer, bool, ...). 

In [44]:
My_list = [3, "Texas", True]

### Section 2.1 List Operators
- The plus (+) sign is the list concatenation operator (i.e., combine two lists). 
- The asterisk (*) is the repetition operator.

In [45]:
Add_list = [123]

In [46]:
## Concatenating
My_list + Add_list

[3, 'Texas', True, 123]

In [47]:
Add_list + Add_list

[123, 123]

In [49]:
## Repetition
My_list * 2 + Add_list

[3, 'Texas', True, 3, 'Texas', True, 123]

### Section 2.2 List Indexing and Slicing
The values stored in a list can be accessed using the slice operator (**[ ]** and **[:]**). 

#### Indexing:
- The length of a list is the number of elements. It can be obtained using built-in function **len(x)**, where x is the list of interest.
- The indices start at **0** in the beginning of the list and work their way to end. The last element can also be obtaned through index **len(x) - 1**.
- We can also use negative indices, which refers to the position relative to the end. The last element has index: **-1**.


#### Slicing: Use ":"
- **i:j** stands for: From the ith element (include i) to the jth element (not include j)
- If i is not specified, then by default, start from the first element 
- If j is not specified, then by default, end at the last element (inclusive)


In [54]:
# Get the 1st element, last element, 3rd element from My_list.
print(My_list)
print(len(My_list))
# My_list[1]

[3, 'Texas', True]
3


In [64]:
My_list[0:3:2] #Step method

## Extension: start from index 0 (i), include i, end at index j, not including j, 
## add index by s every time

## [i:j:s]

[3, True]

In [57]:
My_list[-1],My_list[2]

(True, True)

In [63]:
# Obtain the 2nd to last element of My_list
My_list[1:] # j is never included

['Texas', True]

# Section 3. Numpy and Numpy Arrays
NumPy is the fundamental package for scientific computing with Python. It is great for vector arithmetic. 

Python List and 1D Numpy Array are similar in terms of indexing and slicing. Use syntax **np.array()** to convert a list to a numpy array

In [65]:
import numpy as np
x = [1,2,3,4,5] # x is a list
y = np.array(x) # y is a numpy array
print(x[1:3], y[1:3])

[2, 3] [2 3]


If you compare its functionality with regular Python
lists, however, some things have changed.
- Numpy arrays **cannot** contain elements with different types. If you try to build such an
array, some of the elements' types are changed to end up with a homogeneous list.
- The typical arithmetic operators, such as + and * have a different meaning for regular
Python lists and numpy arrays. Now, arithmetic operators work arithmetically, in a fashion either (1) element by element base (if two arrays), or (2) operating on each element of an array (if one array and one scalar)
- You can use **logic values** (True, False) for numpy (known as "logical indexing"). With logical indexing, only elements with index True will be selected. Frequently used in data cleaning, subsampling, visualizations, and so on.

In [66]:
# Example: Numpy array accepts only one type.
x = [1,2,'a']
y = np.array(x)
print(y) 

['1' '2' 'a']


In [70]:
# Example: Arichmetic operators
x = np.array([1,2,3,4,5])
y = [1,2,3,4,5]

x + 1, x*2 # array with a scalar: applied to every single element
x > 3

array([False, False, False,  True,  True])

In [74]:
x1 = np.array([1,2,3,4,5])
x  = np.array([1,2,3,4,5])
x + x1, x - x1 # two arrays interact: element by element interaction. Dimensions must be the same

(array([ 2,  4,  6,  8, 10]), array([0, 0, 0, 0, 0]))

In [75]:
x2 = np.array([1,2,3,4])
x + x2

ValueError: operands could not be broadcast together with shapes (5,) (4,) 

In [72]:
# y + 1
y + [1]

[1, 2, 3, 4, 5, 1]

In [77]:
# Example: logic values
print (x)
# Pick 1, 2, and 5

# original array: [1, 2, 3, 4, 5]
#                [True, True, False, False, True]

my_index = np.array([True, True, False, False, True])

x[my_index]

[1 2 3 4 5]


array([1, 2, 5])

In [81]:
# Logic values cont: print elements greater than 3. 
## Hint: compare numpy array with a number == compare each element in the array with the number. 

my_high_index = x > 3
x[my_high_index]

array([4, 5])

### 2D Numpy Arrays
Consider we have the weight and height of 6 baseball players. The height (in inches) of each player is: 72, 78, 69, 71, 76, 79; and the corresponding weight (in pounds) is: 180, 215, 210, 188, 176, 209.

<img src="Table.png">

In [82]:
## Constructing 2d array
list_2d = [[72,78, 69,71,76, 79],[180, 215, 210, 188, 176, 209]] ## row by row
np_2d = np.array(list_2d)
np_2d

array([[ 72,  78,  69,  71,  76,  79],
       [180, 215, 210, 188, 176, 209]])


**Indexing and Slicing**
- Indexing: For 2-D array x, x[i,j] represents the ith row and jth column
- Slicing: Use ":"
    - i:j stands for from ith element (include i) to the jth element(not include j)
    - if i and j are not specified, then extract all elements

In [86]:
print(np_2d[0:2,0:3])
print(np_2d[:2, :3])
print(np_2d[:,2]) # collect all rows, for clm idx 2
print(np_2d[0,:])

[[ 72  78  69]
 [180 215 210]]
[[ 72  78  69]
 [180 215 210]]
[ 69 210]
[72 78 69 71 76 79]


### Practice
1. Multiply players' weight by 0.454 to convert from pounds to kilograms. Store the resulting numpy array as weight_kg.
2. Use height and weight_kg to calculate the BMI of each player and save the result numpy array as bmi. BMI can be obtained as follows. (Note: 1 meter = 40 inches) 

$$BMI = \frac{weight(kg)}{height(m)^2}$$

3. Report the weight (in kg) of players with bmi greater than 25.


In [None]:
import numpy as np
weight_kg = np_2d[1]*0.454 # weight in kg
height = np_2d[0]/40       # height in meters
bmi = weight_kg/(height*height) # get bmi
print(bmi)

In [None]:
# now get players with bmi > 25, and their corresponding weights
high = bmi>25    # players with bmi > 25
print(high) 
print(weight_kg[high])  # collect the weight