# Workbook 0a: Intro to Python
## Summary
This workbook will cover:
- Arithmatic operators
- Objects
- Data types
    - Integers
    - Floats
    - Strings
    - Booleans
    - Casting
- Data structures
    - Lists
    - Sets
    - Dictionaries
- Programming concepts
    - Loops
    - Functions
- Packages
    - NumPy
    - Pandas
    - OS

## Aritmetic Operators
Python can be used to perform operations like a calculator. These operations include:

| Operator | Definition |
| --- | --- |
| + | addition | 
| - | subtraction |
| * | multiplication |
| / | division |
| // | floor division |
| % | modulo |
| ** | exponation |

In [20]:
# operations
# print the result using the print() function

print(15 + 5) # addition
print(25 - 5) # subtraction
print(2 * 10) # multiplication
print(40 / 2) # division
print(41 / 2) # division
print(41 // 2) # floor division: divide to the nearest whole number
print(41 % 2) # modulo: remainder after dividing
print(2 ** 3) # exponation

20
20
20
20.0
8
8


## Objects
When performing the operations above, the values that resulted from these equations did not store in Python and therefore cannot be referenced later. Creating **objects** allows for values or other elements (e.g., text, boolean data) to be stored. They are defined by assigning an element to a case-sensitive name which can be called later with the object name. 

To measure the size of a desk that is 105 cm X 50 cm X 75 cm (length X width X height). Create 3 objects named length, width, and height. The values are assigned to these names using = . Once defined, the value of a given object can be printed by typing it's name or by using print().

*Note*: Many Python tutorials refer to objects as variables. In this workbook a distinction is made between objects and variables because when we start working with data frames we will often use the word variable to refer to the column of a dataset. 

In [1]:
# objects

# define the objects
length = 105
width = 50
height = 75.0

width
print(width)

50


To calculate the square footage (i.e., length * width) of the desk we can use the objects we've created. We can define this operation as a new object called sqfoot. We can also use this new object, sqfoot, to convert the square footage from centimeters to inches. 

In [22]:
# square footage
sqfoot = length * width

print(sqfoot)

# convert cm > in
print(sqfoot/2.54)


5250
2066.9291338582675


## Data Types
### Numeric data
To look at the data types for each object created use type(). Height returns int aka integer which is one type of **numeric** data. An **integer** is a whole number, positive or negative, without decimals. Another type of numeric data is a **float** which is a number, positive or negative, with one or more decimals.


In [23]:
# numeric data
print(type(width))
print(type(height))

<class 'int'>
<class 'float'>


### Text data
**Text** data are represented as **strings**. Strings can be defined using single or double quotation marks.


In [24]:
# text data
hello = 'hello'
hi = "hi"

print(hello)
print(hi)

print(type(hello))
print(type(hi))

hello
hi
<class 'str'>
<class 'str'>


Multi-line strings can also be defined by using 3 double quotation marks.

In [25]:
# multiline string
multiline = """This
is 
a 
multiline
string"""

print(multiline)

print(type(multiline))

This
is 
a 
multiline
string
<class 'str'>


### Boolean data
Python can also evaluate comparisons or logical equations, for example, to check if 5 is greater than 7. Python will return this comparison with **boolean** data type, whose value is either True or False. Unlike strings, these data are not defined using quotations and are case-sensitive. 

In [26]:
greater1 = 5 > 7
greater2 = False
greater3 = "False"

print(greater1)
print(greater2)
print(greater3)

print(type(greater1))
print(type(greater2))
print(type(greater3)) # note that even though these all look the same upon printing, greater3 is not the same type as greater1 and greater2

False
False
False
<class 'bool'>
<class 'bool'>
<class 'str'>


Other comparison operators that return boolean data include:

| Operator| Definition |
| --- | --- |
| == | Equal To |
| != | Not equal to |
| > | Greater than |
| >= | Greater than or equal to |
| < | Less than |
| <= | Less than or equal to |

There are also logical operators:

| Operator| Definition |
| --- | --- |
| and | True only if both operands are True |
| or | True if at least one of the operands is True |
| not | True if the operand is False |

And membership operators:

| Operator| Definition |
| --- | --- |
| in | True if the element is found in the sequence |
| not in | True if the element is not found in the sequence |

In [46]:
# both comparison and logical operators return boolean data
# we can also combine comparison and logical operators 
print(5 > 2 and 6 > 2)

print(5 < 2 and 6 > 2)

print(5 < 2 or 6 > 2)

True
False
True


### Casting
Changing the data type of an object can be done by **casting**. For example, we assigned length to 105 and because this is a whole number, it was defined as an integer. It can be cast to a float. The casting functions include:

| Function | Definition |
| --- | --- |
| int(x) | casts x to an integer |
| float(x) | casts x to a float |
| str(x) | casts x to a string |
| bool(x) | casts x to a boolean |

In [27]:
# casting
print(type(length)) # data type before casting

length = float(length) # casting

print(type(length)) # data type after casting

<class 'int'>
<class 'float'>


## Data Structures

### Lists
**Lists** are collections of values of any data type e.g., numeric, text, or boolean. Typically lists contain the values of the same data type but they can also contain different types. These lists are defined by encasing the comma-seperated values in []. 

In [28]:
list1 = ["1", "2", "3"]
list2 = [1, 2, 3]
list3 = [True, True, False]
list4 = [True, 2, "3"] # we can have different data types in a single list

print(list1)
print(list2)
print(list3)
print(list4)

print(type(list1))
print(type(list2))
print(type(list3))
print(type(list4))

print(list3[2])

['1', '2', '3']
[1, 2, 3]
[True, True, False]
[True, 2, '3']
<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>
False


Lists can be **indexed** which returns the value located in a given position. Python begins indexing at 0 so, to find the first value located in list1, we would index using list1[0] which would return the string 1. Python also uses negative indexes which returns values at the end of the list. To get the last item in a list, index at -1.

In [31]:
print(list1[0]) # index the first element in a list
print(type(list1[0]))

print(list1[-1]) # index the last element in a list

1
<class 'str'>
3


Lists can be edited with various functions. To use these functions we use name.function(). These functions include:

| Function | Definition |
| --- | --- |
| append(x) | add x to the end of a list |
| extend([y]) | add y values to the end of a list |
| insert(i, x) | add x to a the i position in a list |
| remove(x) | remove the first x in a list |
| pop(i) | remove the item in the i position |
| index(x) | return the position of x in a list |
| count(x) | count the number of times x appears in a list |
| sort() | sort a list in a specific order |



In [30]:
list2.append(4) # add a single value
print(list2)

list2.extend([5, 6, 7]) # add a list of values
print(list2)

list2.insert(0, 4) # in position 0, add a 4
print(list2)
print(list2.count(4)) # how many 4's are there in the list?

list2.remove(4) # remove the first 4 in the list
print(list2)

[1, 2, 3, 4]
[1, 2, 3, 4, 5, 6, 7]
[4, 1, 2, 3, 4, 5, 6, 7]
2
[1, 2, 3, 4, 5, 6, 7]
7


### Sets
**Sets** are similar to lists; they also contain collections of values. The difference being that sets cannot contain any duplicate data. Each value in a set must be completely unique from other values. Sets are defined by encasing comma-seperated values in {}. Like lists, sets can accomodate values of the same data type or mixed data types.

In [36]:
ids = {100, 101, 102, 103, 104, 104, 104} # note when we print this set, only one 104 is returned

print(ids)
print(type(ids))

{100, 101, 102, 103, 104}
<class 'set'>


Sets can be edited with their own functions. To use these functions we use name.function(). These functions include:

| Function | Definition |
| --- | --- |
| add(x) | add x to the end of a set |
| update(y) | add y values to end of a set |
| remove(x) | remove x from a set |

In [42]:
ids.add(105)
print(ids)

ids.update([106, 107, 108])
print(ids)

ids.remove(108)
print(ids)


{100, 101, 102, 103, 104, 105, 106, 107}
{100, 101, 102, 103, 104, 105, 106, 107, 108}
{100, 101, 102, 103, 104, 105, 106, 107}


8

### Dictionaries
**Dictionaries** are unordered sets of value pairs. Each element defined in a dictionary maps onto a definition. When printing the element, Python will return the defintion provided in the dictionary. The example below create a dictionary defining the numbers 1 - 5 as either even or odd. Therefore, we can print a number from our dictionary and check if it is defined as even or odd.

In [34]:
odd_even = {1 : "odd", 2 : "even", 3 : "odd", 4 : "even", 5 : "odd"} # create dictionary
print(odd_even)

print(odd_even[1]) # check if a value is even or odd

# expand the dictionary
odd_even[6] = "even"

print(odd_even)

{1: 'odd', 2: 'even', 3: 'odd', 4: 'even', 5: 'odd'}
odd
{1: 'odd', 2: 'even', 3: 'odd', 4: 'even', 5: 'odd', 6: 'even'}


## Programming Concepts
### Loops
**Loops** are for repeating code. For example, if you need to calculate a sum for multiple objects, you can loop through a list of the object names and calculate the sum for each. There are **for loops** and **while loops** which are distinguished by how the loops terminate. 

For loops are used when you know how many times you'd like the loop to iterate. These loops will iterate for N number of iterations. 

While loops are used when you do not know how many times you'd like the loop to iterate. These loops will iterate while a condition is met and will terminate when the condition is unmet.

In [None]:
# calculate the sum of two objects
# create the objects
x = [21, 12, 35, 54, 25, 106, 17]
y = [102, 111, 80, 93, 103, 99, 131]

# manually print the sum of x and y
print(sum(x))
print(sum(y))

xy_list = [x, y]

# for loop
for i in xy_list:
    print(sum(i))

In [None]:
# create a list of numbers from 1 - 5
# while loop
i = 0
while i < 6:
    print(i)
    i = i + 1

### Functions
Functions are blocks of reusable code that perform a task. There are **standard library functions** that are built-in to Python, such as print() have been using in this workbook. Python users can also create their own functions i.e., **user-defined functions**. For example, a user-defined function can be created to add any two numbers.

Functions are comprised of inputs and outputs. Inputs are information the function needs to run while outputs are what the function produces when it is run. 

In [None]:
# user-defined function to add 2 numbers
# def initiates the function
# addition is the name of the function for example, print is the name of the print() function
# x and y are the inputs that need to be supplied to run the function
# sum creates a new object that adds x and y
# return is the output produced when the function is run
def addition(x, y): 
    sum = x + y
    return sum

# test the function
addition(6, 7)

# the function needs both x & y inputs or it cannot run
#addition(6) # <- if we uncomment and run addition(6) the error returned is that y is missing

When creating our addition function, our inputs (x & y) are positional argurments, meaning the order they're supplied in is important. When we supply the inputs 6, 7 Python automatically assigns 6 to the x position and 7 to the y position which aligns to the order we placed x and y in when we defined our function. 

Let's create a new function to return the value in the y position. The goal is to see how the output changes when we supply our values in different orders. 

In [None]:
# create another function called return y
# we set this function to output the value supplied as y 
def return_y(x, y):
    return y

print(return_y(6, 7)) # 7 is in the y position

print(return_y(7, 6)) # 7 is in the x position

To supply the inputs in a different order than the defaults, manually assign each input to a position.

In [None]:
# manually assign positions
print(return_y(x = 6, y = 7))

# now inputs can be supplied out of order
print(return_y(y = 7, x = 6))

To learn more about a function you can use help().

In [None]:
help(sum)

## Packages
**Packages** are compiled sets of user-defined functions, objects, etc. that other users can download and use. Packages we will use in this course include NumPy and Pandas.

These packages can be installed in Terminal using pip3 install package_name. They are then called into Python using import package_name or import package_name as package_abbreviation. While a package only needs to be installed once, it needs to be imported every time you plan to use it. 

In [47]:
# install the packages below in terminal
#pip3 install numpy
#pip3 install pandas

# load packages
import numpy as np
import pandas as pd

### NumPy
NumPy is a Python package for scientific computing. The complete user guide can be found [here](https://numpy.org/devdocs/user/index.html).

One feature of this package is that it incorporates **arrays** which are similar to lists. However, unlike a list, arrays can only be of a single data type. 

In [51]:
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array(["1", "2", "3", "4", "5", "6"])
z = np.array([1, 2, 3, "4", "5", "6"])

print(x)
print(y)
print(z) # note all the elements in z were coerced to a string

[1 2 3 4 5 6]
['1' '2' '3' '4' '5' '6']
['1' '2' '3' '4' '5' '6']


Lists and sets are both one-dimensional. NumPy's array command can also incorporate two-dimensional data i.e., **matrices**. 

In [57]:
matrix = np.array([[1, 2], [3, 4], [5, 6]])

print(matrix)

# now we index with 2 numbers, 1 for each dimension (row X column)
print(matrix[2, 1]) # print the value in the 3rd row, 2nd column (remember indexing starts at 0)

[[1 2]
 [3 4]
 [5 6]]
6


### Pandas
Pandas is a Python package for data analysis and manipulation. The complete user guide can be found [here](https://pandas.pydata.org/docs/user_guide/index.html#user-guide).

One feature of this package is it incorporates the use of **data frames** which are like a matrix (i.e., two-dimensional array) but with features like column names and row numbers. These column names refer to **variables** in a data set, i.e., Name, Age, MathScore. Variables contain data of the same type. 

In [63]:
data = pd.DataFrame(
    {
        "Name": ["Person 1", "Person 2", "Person 3"],
        "Age": [25, 26, 27],
        "MathScore": [10.2, 12.4, 15],
    }
)

print(data)

# we can index using variable names
print(data["Name"])

       Name  Age  MathScore
0  Person 1   25       10.2
1  Person 2   26       12.4
2  Person 3   27       15.0
0    Person 1
1    Person 2
2    Person 3
Name: Name, dtype: object


In [60]:
# we can also take the matrix we made earlier and coerce it to a data frame
data2 = pd.DataFrame(matrix)

print(data2)

   0  1
0  1  2
1  3  4
2  5  6
