# What is this about?
This notebook is the basis of the Week Zero Python bootcamp for all students participating in **OIT248**. 

To start, try running the code cell in the **Comments** section below. Specifically:

 1. Because the cell is "hidden", let's first open it. You can do this either by clicking the " > " immediately to the left of the section title (**Comments**) or by clicking the grey bar with the text _"1 cell hidden"_  

 2. Let's now run the code inside the cell. You can do this by:
    
  (i) hovering with your mouse on top of the code cell and clicking the little "Play button" that will appear on the left or <br>
  (ii) clicking inside the code cell and hitting `Shift`+`Enter` on your keyboard.

If everything works properly, you should a welcome message displayed immediately below the code cell that also includes a calculation of the total number of minutes in a leap year...

# Comments
Comments in Python start with a `#` like directly below

In [None]:
# this line and the line below it are comments and Python will ignore them
# print("Something interesting")

# instead, the line below this will print a welcome message
print("Welcome to OIT 248! The number of minutes in a leap year is:", 366*24*60)

## In a Jupyter notebook, it's easy to add comments and notes, like this one. This particular type of cell is a Markdown cell, which allows us to comment and document our notebooks.
### You can also see the effect of using one or multiple # signs: these create sections, sub-sections, sub-sub-sections, etc.

#### Click the Run button above of hit `Shift+Enter` on your keyboard to finish this cell and move on to the next one.

#### Here are a few reminders as we get started using python through Jupyter:
* Don't forget to save your work by hitting the save button above (far left) periodically
* We will mostly be using "Markdown" and "Code" cells - you can switch between them with the pulldown menu above
* If you want to clear all of the output and re-run a notebook, go to the "Kernel" menu above, and select "Restart and clear output'
* We will mostly just be using the Run button today, but some of the other buttons will be useful to you as we progress: Stop, Copy, Paste, etc.

<font color=blue>**NOTE**: Throughout this class, we will not always code something in the most efficient way possible, and there are almost always multiple ways of achieving the same thing. We'll try to suggest a combination of best-practice and intuitive/simple code, but don't hesitate to ask us to check your work if you decided to take a different approach.</font>

In [None]:
# This is a code cell - if you want to add a comment in a code cell, just put a pound sign in front of it
# Let's do some simple calculations
6*9 + 41

In [None]:
2**10

In [None]:
600/34 - 5

# Variables
Let's create some simple variables. The equals sign `=` **assigns** a value to the variable.
We will also use a double equals sign `==` later, which **tests for equality**.

In [None]:
# Assign the value 10 to the variable NumStudents
NumStudents = 10

In [None]:
# IMPORTANT NOTE: Variable names are case-sensitive
# This works:
NumStudents

In [None]:
# This will give you an error, because the n is not capitalized:
numStudents

<font color=blue>**Remark.** As you are executing code cells, note how the number next to each input code cell and corresponding output is increasing: your first cell had `In [1]` and `Out[1]`, then `In[2]` and `Out[2]`, and so on. That counter keeps track of the latest instruction that was run. In a Jupyter Notebook, you can actually go back and run older segments of code or even jump ahead and skip some code, so the progression is not sequential. To see this, try running the first line of code again! Assuming that you did not run something twice so far, you should see the counter changing to `In[7]` and `Out[7]`. Being able to jump back and forth is useful because it allows you to play/test different things, without having to run the entire code linearly from top to bottom. **But you have to be careful, because running an older instruction or skipping code could result in unintended things.** For instance, consider the piece of code below: </font>

In [None]:
NumStudents = NumStudents + 1

NumStudents

<font color=blue>This increases the value of the `NumStudents` variable by one. When you run it for the first time, it should print 11 (because that variable was initially assigned a value of 10 in `In[4]`). But if you run this code again, it will print 12, and then 13, etc. </font><br>
    
<font color=blue>**Useful tip**: sometimes, it is helpful to re-run all the instructions in the notebook from top to bottom up to some point. (For instance, suppose we want to re-run everything from the top to this section of the notebook). To do so:<br></font>
  1. <font color=blue> restart the Kernel: either from **"Kernel > Restart"** or by clicking the button with the little clockwise turning arrow that says "restart the kernel (with dialog)"</font>
  2. <font color=blue> select the cell immediately below the point where you want the code to stop </font>
  3. <font color=blue> from the **"Cell"** menu, select **"Run All Above"**</font>
</font>

<font color=blue> Feel free to try this right now! Select this cell of text, and execute the steps above! You will see that the notebook will re-run, but the counter will not make it all the way here! The reason is the (intentional) error in `In[6]`: **the execution thread will stop when it encounters an error!** So you will need to run everything after that manually.</font>

Note that above, we printed the value of a variable by simply typing its name, i.e., `NumStudents`. That works fine if it's the last instruction in the code. But if you have some other instructions (like new variable assignments) following that, the printing would not work. For instance, consider this piece of code:

In [None]:
NumStudents

aux = 5

As you can see, the code above is not displaying the value of `NumStudents` anymore! In this case, to display the value, use the function `print`, like below:

In [None]:
print(NumStudents)
aux = 5

Above, `print` is a **function** that simply displays its given **argument** (i.e., the variable `StudentName`). We will see many more functions as we go along. Note that the arguments to a function are always surrounded by parentheses `(...)`.

Now let's set the number of students back to 10, because that's useful later!

In [None]:
NumStudents = 10
print(NumStudents)

# Data Types
The main kinds of data that we'll be using in OIT 248 are `int` (integers), `float` (floating point values), `str` (string), boolean (`bool`), and some more advanced Python-specific data types (lists and dictionaries, which we'll discuss a bit later). 

In [None]:
# integers (int)
x = 10

# floating point numbers (float)
z = 4.5

# boolean (bool)
is_OIT248_amazing = True

You can create strings either using single quotation `'` or using double-quotation `"`. Each of these is useful depending on the case: you should use double quotes if there is an apostrophe in the string and single quotes if there is a double-quote in the string...

In [None]:
# strings (str) can be created with either ` or ""
one_string = "Why does Arbuckle not have salmon today?"
another_string = 'I LOVE THE GSB!'

# if a string has an apostrophe in it, you must use " when creating it
a_string_with_apostrophe = "Jack's IPhone"
print(a_string_with_apostrophe)

# similarly, if a string has a quotation mark in it, you must use the apostrophe to create it '
a_string_with_quotation = 'An example of string is "Jack"'
print(a_string_with_quotation)

# Printing
We already saw the `print` function, which allows printing variables. Let's dive a bit deeper and print some more complex messages. 

In [None]:
print("We have 10 students in our class.")
# Better approach:
print("We have %d students in our class." % NumStudents)

The '%' symbol is used to put a dynamic object in a print statement. It may look familiar to those of you with C/C++ or MATLAB background. 

The letter afterwards specifies the type of data: %d (for integers), %s (for strings), %f (for "floating point" values)

<font color=blue>__Remark.__ Although the % approach works, there better way to print in Python is using **`f-strings`**. Here is how the code immediately above would look with an f-string:

In [None]:
# print the same thing with f-strings
print(f"We have {NumStudents:d} students.")

With the f-string approach, you can print many types of variables and combine text and variables. 

For instance, let's print several values with different formatting.

In [None]:
name = "Linwei"        # a string
age = 32               # an integer
gpa = 3.92             # a float
income = 245894.242    # a large float

print(f"The person named {name:<10s} with age {age} and GPA {gpa:.2f} has income of {income:,.3f}")
print(f"The person named {'Jonathan':<10s} with age {age*2} and GPA {gpa:.2f} has income of {income*3:,.3f}")

# If-Else Statements
You can implement "if-else statements" in Python using the following syntax:
> if `logical_condition_1`:<br>
> $ \qquad$ first instruction if logical_condition_1 is True<br>
> $ \qquad$ second instruction if logical_condition_1 is True<br>
> elif `logical_condition_2`:<br>
> $ \qquad$ instructions if logical_condition_1 is False and logical_condition_2 is True <br>
> ...<br>
> else:<br>
> $ \qquad$ instructions if all logical conditions above are False

Some examples:

In [None]:
a = 15
b = 17
if (a > b):
    print("a is bigger than b")
elif (a == b):
    print("a is equal to b")
else:
    print("a is smaller than b")

<font color=blue>**A few critical things to note in the code for `if-else` above:** </font><br>
 <font color=blue>1. the colon `:` is critical on the first line. </font><br> 
 <font color=blue>2. the intendation on the second line (and for any instructions in that block) is critical you can use as many spaces as you like or use tabs </font>

The colon and indentation are how python knows what to do. The colon indicates the start of the instructions corresponding to the case when the `if` statement is true, and all following indented lines will be considered part of that code block. As soon as you unindent a line, it is no longer part of the loop.

For instance, consider the following code:

In [None]:
# an if statement to show the effect of indentation
if a > b:
    print("I am only printing this if a is bigger than b")
print("I am printing this regardless of a and b")

Also, forgetting the colon or the indentation can result in errors:

In [None]:
# forgetting the colon :
if (a > b)
    print("a is bigger than b")

In [None]:
# forgetting the indentation :
if (a > b):
print("a is bigger than b")

The logical conditions that appear in the `if-else` statement are constructed with logical operators like `<=`, `>=`, `<`, `>`, `==`, `!=` and 
boolean operators `and`, `or`, `not` that allow you to combine conditions. We'll practice with those soon...

# For Loops
A `for` loop is used to iterate over a sequence. Syntax:
> for `variable` in `sequence`:<br>
> $ \qquad$ instructions line 1<br>
> $ \qquad$ instructions line 2

<font color=blue>**Just like with `if` statements, the colon `:` and indentation are critical in a `for` loop.**</font>

For now, we don't have very nice examples, but they are coming up shortly...

# Ranges
The `range` function in Python returns a sequence of numbers. It syntax is:
>  `range(start,stop,step)`

- `start` is optional and is an **integer** that specifies at which position to start. If you omit this, the default value is 0.
- `stop` is required and is an **integer** that specifies at which position to end.  This value will not be included, so the range will actually end with the value `stop-1`.
- `step` is an optional **integer** and specifies the increment. If you omit it, the default is 1.

This function will be very useful, so please familiarize yourself with the examples below.

Create a range with a single argument.

In [None]:
range(5)

By itself, this is not terribly useful. But we can loop through it using a `for` statement.

In [None]:
for i in range(5):
    print(i)

Now to play with the other versions of `range`

In [None]:
print("Here are all the integers from 5 to 10, INCLUDING 10")
for i in range(5,11):
     print(i)

print("\nAnd here are all the even numbers less than 10")
for i in range(0,10,2):
     print(i)

# Lists

## Fundamentals
Lists are used to store multiple items in a single variable. Lists can be created using square brackets `[.]`.

Let's create a list of student names...

In [None]:
# List of 10 student names
StudentName = ["Ann", "Bob", "Carl", "Dan", "Eva", "Fiona", "Gabe", "Hal", "Irene", "Jack"]

# print it out
print(StudentName)

The square brackets and commas are what define this as a list. 

<font color=blue>__Remark.__ **In Python, a list can contain many kinds of data.** This might be surprising to someone with coding background in other languages... Here's an example of a list that contains a string, a float, another list, and some numbers. </font>

In [None]:
# crazy list containing a string, a float, another list (!), and several 5s
crazy_list = ["a string", 4.75, ['Federer', 'Nadal', 'Djokovic'], 5, 5, 5]
print(crazy_list)

You can check the **length** of the list (i.e., how many elements are in the list) using the `len` function

In [None]:
# Check the length of the list
len(StudentName)

The elements in the list are ordered and you can recover elements at specific locations in the list using the bracket indexing. 

Python uses "zero-indexing", which means that the first element has index 0 and the last one has index `len(list)-1`.

Let's print out the first element in the list:

In [None]:
StudentName[0]

You can also access several contiguous items in the list using a **range** of indices specified with `:`

In [None]:
StudentName[1:3]

<font color=blue>**Important!** The syntax `1:3` works just like `range(1,3)`, so it creates the values `1,2`. In other words, starting at the first number and up to but __not__ including the last number.</font> 

In [None]:
# This one will print Ann, Bob, and Carl
StudentName[0:3]

In [None]:
# If you want to go all the way to the end of the string, just leave out the last digit
StudentName[6:]

To change an element at a specific location of a list, you can just use the `=` sign with the right indexing. 

For instance, let's print the list and then change the element in the second position and print it again to see the results:

In [None]:
print(StudentName)
StudentName[1] = "Bobby"
print(StudentName)

<font color=red>**You should be careful with the _assignment_ operator `=` for a list!**<br></font>
<font color=red>The assignment operator will **NOT** create a copy of a list; rather, it will create a new/alternative name for the list</font>

In [None]:
# are we creating a copy of `StudentName` stored in the new variable `new_list`?
new_list = StudentName

# print both lists
print("Here are the two lists")
print(StudentName)
print(new_list)

# let's change the first element in this "new" list
new_list[0] = "Georgia"

# print both lists
print("Here are the two lists after changing")
print(StudentName)
print(new_list)
# Note how BOTH lists are changing (because the new list just points to the old one)

If you want to create a genuine **copy** of a list, you can use the `copy` method. Details under the list methods section.

## Looping through lists
There are several ways to loop through lists

**Option 1.**<br>
If you just care about the elements in the list **but not** their indices/locations, you can use a `for` loop through the elements themselves

In [None]:
# loop through the elements in `crazy_list` and store them in 'v'
for v in crazy_list:
    # 'v` now stores an element from the list; let's print 'v'
    print(v)

<font color=blue>**Remark**. The `for` loop above may seem surprising if you have a background in other programming languages. Note that we are actually **looping through the elements of the list directly**, without any need to index them numerically. This kind of flexibility is what makes Python powerful and we encourage you to get used to it quickly! </blue>

Instead, you can loop using a numeric index. To loop through **all** the elements in the list, you need to create the numeric indicies 0, 1, 2, ..., len(list)-1. You can do that using `range`, as follows.

**Option 2.**<br>
If you need the elements in the list **as well as** their indices, you can write a "classic" for loop. Specifically, for a list, we actually know what the indices are: they are 0, 1, 2, ..., number of elements-1. So we can get these using the `range(.)` and `len(.)` functions:

In [None]:
# calculate the number of elements in the list
num_elements_in_list = len(crazy_list)

# produce the range 0 .. num_elements_in_list - 1
indices = range(num_elements_in_list)

# and now let's loop through the elements, printing them as well as their index
for i in indices:
    print("At location", i, "we can find:", crazy_list[i])

Normally, you would not define all of those variables above and instead use this compact form:

In [None]:
# let's loop through the elements, printing them as well as their index
for i in range(len(crazy_list)):
    print("At location", i, "we can find:", crazy_list[i])

**Option 3.**<br>
The cleanest option that gives you access to both the index of elements and the evalues themselves is to use the **enumerate** function:

In [None]:
for idx, val in enumerate(crazy_list):
    print(idx, val)

## Basic Operations

To calculate the length of a list or the minimum or maximum values in the list, use the `len(.)`, `min(.)` and `max(.)` functions.

In [None]:
# create a list of numbers
list_of_numbers = [3, 6, 9, 1, -5, 34, 23, 99]

# print the length
print(len(list_of_numbers))

# print the smallest value
print(min(list_of_numbers))

# print the largest value
print(max(list_of_numbers))

Now let's give each of the students an ID number.

In [None]:
# Give each of the students an ID number from 1000 up
StudentID = list(range(1000,1010))

In [None]:
# What did we just create?
print(StudentID)

Let's say we also wanted to create a variable for which section each student is in. 

In [None]:
# If all 10 were in the same section:
Section = ['Section 1'] * len(StudentName)

In [None]:
Section

In [None]:
# If the first 5 were in Section 1, and the next 5 were in Section 2
Section = ['Section 1'] * 5 + ['Section 2'] * 5

In [None]:
Section

<font color=blue>**The example above show how to use `+` and `*` with a list.** If you want to create a list with repetitions of the same value, you can use `*` **applied to a list** containing just the one value. The operator '+' can be applied to lists: it will just concatenate the list elements.</font>

Note that you need to make sure you operate with **lists**. To understand that, note that adding a string to the list of names creates an error:

In [None]:
# adding a string to the list of names:
StudentName = StudentName + "Joe"

But adding a **list** with the string "Joe" works:

In [None]:
# adding a string to the list of names:
StudentName = StudentName + ["Joe"]
print(StudentName)

Now let's get rid of the last element in the list, because we only want 10 students... You can remove an element by simply reassigning the list with the right indexing:

In [None]:
StudentName = StudentName[0:len(StudentName)-1]
print(StudentName)

Now let's create some salary figures for these 10 students (in thousands of dollars...)

In [None]:
Salaries = [175, 189, 168, 196, 182, 188, 198, 162, 191, 143]

Let's find out who makes the most and who makes the least...

In [None]:
# Maximum value
max(Salaries)

In [None]:
min(Salaries)

What if we wanted to know which students the maximum/minimum salaries belong to? We can use the `index` function.

In [None]:
Salaries.index(198)

In [None]:
Salaries.index(143)

The `index` function simply returns the index (i.e., location) in the list corresponding to a given value.

<font color=blue>Note that the `index` "function" is called a bit differently that what we did so far. We are using the variable name `Salaries` and the dot `.` and then using the function `index`. This kind of function is called a `method`. It's really like any other function, but the key difference is that it "lives" inside the variable that appears before the dot `.` symbol. So when it is called, it is applied and acting upon that variable. (This has to do with object-oriented programming, but we will not dive it that much more deeply because it's not critical for our class...) </font>

Now let's find out which student has the highest salary...

In [None]:
# Which STUDENT has the max salary?
StudentName[Salaries.index(max(Salaries))]

## List Comprehensions
One of the most frequent operations that we'll deal with is to create one list based on some other information. 

For instance, suppose we want to create a list of the students whose names start with the letter 'G'. We can do that using a `for` loop:

In [None]:
students_with_g = []                              # create an empty list  

for s in StudentName:                             # loop through all the student names
    if s[0] == 'G':                               # check if the string 's' starts with the letter 'G'
        students_with_g = students_with_g + [s]   # add 's' to the list

print(students_with_g)

The most elegant way to do that in Python is using a **list comprehension**. This can make life a lot easier and it is worth getting used to list comprehensions!

In [None]:
# create a list using a list comprehension
students_with_g = [s for s in StudentName if s[0]=='G']

**List comprehensions** offer a very simple way to create a new list based on some existing lists. The syntax is:

> `newlist = [`_expression_ `for` _item_ `in` _iterable_ `if` _condition_ `== True]`

_iterable_ can be another list or a range (more broadly, any iterable type). The return value is a new list, leaving the old list unchanged.

In [None]:
# let's create a list with some fruits
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]

# now suppose we want to create a list with all the fruit names ** except apple **
fruits_no_apple = [v for v in fruits if v!= "apple"]
print(fruits_no_apple)

If you want to embed an `if-else` condition, you need to switch the order of the `if` and the `for` loop, as follows:

In [None]:
# a copy of the original list where every occurrence of *apple* is replaced with *walnut*
fruits_apple_walnut = [v if v!= "apple" else "walnut" for v in fruits]
print(fruits_apple_walnut)

## EXERCISE.
**<font color=green>1. Create and then print a list with all the students with salaries of 150 or below.</font>**<br>
**<font color=green>2. Create and then print a list where all the students with salaries of 150 or below have their salaries changed to a value 999.</font>**

# Tuples
Tuples allow storing several items in a single variable. They are defined using round brackets `(.)`.

In [None]:
# define a tuple with a string, a float, and a list
my_tuple = ("apples", 3.14, [1, 2, 3])

print(my_tuple)

You might think that the tuple is quite similar to a list, but the fundamental difference is that tuples and **unchangeable**: once you created a tuple, you cannot change its contents.<br>

We won't be using tuples directly but many functions in Python return tuples, so you should not be surprised to see them!

# Dictionaries
Dictionaries are a widely used data structure in python. A dictionary is just a set of key/value pairs and the data is organized so that we can look up the value based on the key. As such, the `keys` have to be unique (but the values can be repeated). For instance, a dictionary could be used to implement a phonebook: you would have the `name` as a key, and the `value` could be the phone number (and other contact information for that person). 

Let's construct a dictionary in our case. Recall that we had 10 student names, 10 ID numbers, and the corresponding salaries: 

In [None]:
print(StudentName)
print(StudentID)
print(Salaries)

What if we want to create a dictionary where we can look up students based on the name and recover the salary?

We could do something like this:

In [None]:
Salary_dict = {
    StudentName[0] : 175,
    StudentName[1] : 189,
    StudentName[2] : 168,
    StudentName[3] : 196,
    StudentName[4] : 182,
    StudentName[5] : 188,
    StudentName[6] : 198,
    StudentName[7] : 162,
    StudentName[8] : 191,
    StudentName[9] : 143
}

The curly braces `{...}` tell Python that we are creating a dictionary, and the way we are doing it above is through (key,value) pairs separated by a colon `:`

Let's print this out for a look:

In [None]:
# print it
print(Salary_dict)

We can now retrieve the salary for a given name quite quickly! Let's print Dan's salary:

In [None]:
# retrieve Dan's salary
Salary_dict["Dan"]

You can also change the value associated with a key with a simple assignment.

In [None]:
# change Dan's salary
Salary_dict["Dan"] = 1000000
print(Salary_dict)

Of course, the way we created the dictionary above was very manual and error prone. 

The best way would be to do it programmatically - something like this:

In [None]:
# A different way to create this dictionary:

# Initialize the dictionary to be empty
Salary_dict2 = {}

# Add the elements with a for loop
for i in range(len(StudentName)):
    Salary_dict2[StudentName[i]] = Salaries[i]

print(Salary_dict2)

There are many other functions in Python that directly create dictionaries from lists of keys and values, or from other data structures. For our purposes, the above should pretty much be enough.

## EXERCISE.
<font color=green>**Suppose we are worried that two students have the same name... Create a dictionary with keys corresponding to IDs and values corresponding to a list containing the name and the student's salary. Once done, print out the dictionary.**

# Functions
Functions allow organizing the code in blocks that can be called separately (and many times). A function can take several arguments and can return data as a result. In Python, functions are defined using the `def` keyword.

Define several versions of functions, with different levels sophistication 

In [None]:
# a function without any arguments
def hello():
    print("Hello again!")

# test this
hello()

Define another function that adds its two arguments and returns the sum.

In [None]:
def my_smart_addition(a,b):
    return a+b

# this function returns something; let's store and print the result!
result = my_smart_addition(4,7)
print(result)

When calling the function, it is possible to assign the variables using their names. These are called `keyword arguments`.

In [None]:
print(my_smart_addition(a=4, b=7))

# you can even mix keyword arguments with positional arguments, but positional arguments must go first
print(my_smart_addition(4, b=7))

It is possible to return multiple values. For instance, we define a function that returns two values and another function the packs the values in a list and returns the list.

In [None]:
# a function returning the sum and the different of its arguments
def return_sum_and_difference(a,b):
    a_plus_b = my_smart_addition(a,b)
    a_minus_b = a-b
    return a_plus_b, a_minus_b

# this returns TWO things; we can store these in a single variable, which will be a tuple
result = return_sum_and_difference(2,5)
print(result)

# you can also assign the two values separately
sum_v, diff_v = return_sum_and_difference(2,5)
print(f"The sum is {sum_v} and the difference is {diff_v}")

# we can also have the function return a list
def return_sum_and_difference_as_list(a,b):
    a_plus_b = my_smart_addition(a,b)
    a_minus_b = a-b
    return [a_plus_b, a_minus_b]

# test
print(return_sum_and_difference_as_list(2,5))

Lastly, it is possible to assign default values to some arguments and then omit them from the function call.

In [None]:
# Lastly, define a function whose arguments take default values
def a_function_with_default_arguments(a, b=20):
    return a+b

print(a_function_with_default_arguments(5))

<font color=blue>A few important things to remember about defining functions:</font>
 - <font color=blue> The colon `:` is critical in the syntax (just like with `if` and `for`)</font>
 - <font color=blue> Functions can take arguments</font><br>
   <font color=blue> _for instance, the second function takes as arguments two things a, b. These could be any data type._</font>
 - <font color=blue> The keyword `return` tells the function what value to return </font><br>
   <font color=blue>_for instance, the second function returns the sum of its arguments, a + b_</font>
 - <font color=blue>You can return several values; you should be careful to have the correct number of variables to match these return values!</font><br>
 - <font color=blue>You can use keyword arguments when calling the function.</font>

# Importing modules
A module is essentially a library with lots of functions. By "importing" a module with the function `import`, you can use all the functions inside it.

For instance, in this class we will use the `pandas` module a lot (for some reasons, see the separate section covering it!) To import it, you could do any of the following:

> ``import pandas``

This imports the "pandas" module, and allows us to use the functions it contains; to use the function `read_csv()`, we would have to use the syntax ``pandas.read_csv()``.

Because this requires typing the word `pandas` all the time, we can assign it a 'short name' as follows:

> ``import pandas as pd``

This also imports the full "pandas" module, but now we could call the `fname` function using ``pd.read_csv()``. Saving a few characters could mean a lot if you're typing this thousands of times :-)

Lastly, there is one more option that we could use:

> ``from pandas import *``

This imports everything in the pandas module and makes it so that we can just refer to the function using ``read_csv()``. This is useful if the functions are specific enough that you don't think the same name might be defined/used elsewhere, but it could be dangerous if you think there might be overlap.

# Pandas module
Pandas is a Python library used for working with data sets. It has very useful functions for analyzing, cleaning, exploring, and manipulating data, and we will be using it a lot throughout our class. (And in case you're wondering, the name is **not** about an animal -- it's actually short for "panel data"!) Over coverage here will be very brief, but for more details check this resource: <a href="https://www.w3schools.com/python/pandas/default.asp">https://www.w3schools.com/python/pandas/default.asp.</a>

First, let's import the pandas library.

In [None]:
# import the module
import pandas as pd

In `pandas`, data is organized and stored as **DataFrames**. You can think of a DataFrame in close analogy with a table in Excel: it is a two-dimensional table that has data on its columns, and each column may have a header/name.

Normally we read DataFrames from files, but here we will just create a DataFrame from a dictionary. 

Let's first create a dictionary with some data.

In [None]:
# create a dictionary that stores some data
dictionary_with_data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Juan', 'Chenxi'],
    'Age': [25, 30, 22, 28, 27, 30],
    'Id' : [10001, 10002, 10003, 10004, 10005, 10006],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Guadalajara', 'Singapore']
}

print(dictionary_with_data)

Now, let's turn this into a dataframe where the `keys` of the dictionary with be the column names and the `values` in the dictionary go on separate rows of the dataframe.

In [None]:
# create a DataFrame from the dictionary
df = pd.DataFrame(dictionary_with_data)

# display it
display(df)

As you can see, the DataFrame is basically a table with rows and columns. 

The column labels/names here are "Name", "Age", "Id", and "City". 

The rows are labeled with 0, 1, 2, 3, which is simply a unique identifier/label that pandas assigned to each row. This is called an `index` and this `index` is not really a part of our DataFrame, so **the first actual column of data is "Name"**. 

The function `display(.)`, which we used above, is great for visualizing DataFrames. In a large DataFrame, you may want to only display a few rows, which you can do with the `head(.)` method:

In [None]:
# display the first 2 rows
df.head(2)

To find out the number of rows, use `len(.)`

In [None]:
# print the number of rows in the DataFrame
print(len(df))

If you need both the number of rows and the number of columns, use `shape`

In [None]:
df.shape

### Column operations

You can see a specific column using the syntax `df[column_name]`:

In [None]:
# check out the column "Age"
df["Age"]

To obtain the names of all the columns, use the attribute `df.columns`.

In [None]:
# get all the column names
df.columns

As you can see, this returns an `Index` object (which we have not discussed), but you can iterate through it with a usual `for` loop or you can transform it into a regular Python List with the function `list`.

In [None]:
# let's iterate through the names of the columns with a for loop
for c in df.columns:
     print(c)

# let's store the columns in a list and display the list
column_names = list(df.columns)
print(column_names)

<font color=red>**Warning!**</font>`df.columns` **is not a method**! If you use round brackets, you will get an error.

In [None]:
# the following would generate an error!
df.columns()

### Row operations

Each row in a dataframe has a unique identifier or label, which makes it possible to refer to specific elements in the dataframe. 

You can get all the row identifiers using `df.index`.

In [None]:
# get all the row labels
df.index

This also returns an object (a `RangeIndex`), but you can readily loop through it with a `for` or store it as a list.

In [None]:
# let's iterate through the indices of all the rows
for i in df.index:
     print(i)

# let's store the columns in a list and display the list
row_idx = list(df.index)
print(row_idx)

Sometimes, the `index` used for the DataFrame may not be 0,1,2... 

For instance, let's change the row identifiers. You can do that using a python list with **unique** values, by simply setting the `index`:

In [None]:
# change the index to 0, 1, ...
df.index = list(['a','b','c','d','e','e'])

display(df)

### Retrieve elements
One of the most important operations with a DataFrame is to retrieve an element located at a certain row and column.

If you know the row and column labels, there are two approaches:
 1. use `df[column_label][row_label]`
 2. use `df.loc[row_label, column_label]`

First time you see option (1) above, it might look a bit confusing because in mathematics, we write $M_{i,j}$ or $M[i,j]$ for row $i$ and column $j$ of a matrix $M$, so that syntax seems to reverse the order or rows and columns. But recall that `df[column_label]` actually returns the entire column named `column_label` (so all the rows). Then, applying `[row_label]` to that simply returns the element at the `[row_label]` location. 

In contrast, option (2) is the (arguably more natural) approach of indexing first with the row and then the column. Let's see them in action:

In [None]:
# let's get the element on row 'c' and column "Name"
print(df["Name"]['c'])       # approach 1
print(df.loc['c',"Name"])    # approach 2

`df.loc` actually allows you to recover several rows and columns (i.e., an entire sub-table) of the DataFrame

In [None]:
df.loc['c':'e',["Name", "City"]]

Another option is to index using **entirely numeric indices**, using `iloc`, with syntax:
> `df.iloc[numeric_row_index, numeric_column_index]`

This is similar to what we do in math when we write $M_{i,j}$ or $M[i,j]$ for a matrix $M$.

To not get confused here, remember that **Python uses 0-based indexing** and **the very first column that has the index does not count as a proper column** (so in other words, column 0 is the one immediately to the right of the index!)

In [None]:
# let's retrieve the element in row 1 and column 1
df.iloc[0, 0]

### Looping
To loop through the elements in a row or a column (or a subsection of the dataframe), you can just use a regular `for` loop

In [None]:
# let's loop through the entire DataFrame on columns and print the name of the column and the contents:
for c in list(df.columns):
    # for every column
    print(f"Column '{c}' contains:")
    for r in list(df.index):
        # for every row
        print(f"In location {r} : {df.loc[r,c]}")

## Reading data files
We will read data files using panda's `read_csv` or `read_excel` functions. The former reads files with `.csv` (Comma-Separated-Values) extension, whereas the latter reads `.xlsx` (Excel) files.

Let's read in a csv file, and store it as the data frame Grades.

<span style="font-size: 2em;">**NAVIGATE TO THE FOLLOWING SITE TO DOWNLOAD THE FILE:**</span>
<span style="font-size: 2em;">https://tinyurl.com/3ac9rwz5</span>

 

In [None]:
# Read in our csv file and store it as the data frame Grades
Grades = pd.read_csv("Gradebook.csv")

Note the syntax for reading CSV files. Above, we used 
> `df = pd.read_csv(full_file_name)`<br>

where `full_file_name` is the complete filename, including a path if needed.

In [None]:
# Let's take a look at the first 5 rows of our data frame Grades
display(Grades.head(5))

Note that here, Pandas automatically assigned an `index` to the DataFrame that goes from 0 to 49. 

But in this case, maybe a better index should be the `student ID`, which presumably is unique. We could change that index, but we can actually specify when reading the file that Pandas should use a specific column as an index. The syntax is:
> `df = pd.read_csv(full_file_name, index_col)`<br>

where now `index_col` is a **numeric index** of the column to use to construct the index (i.e., the row labels).

Let's try it again, using the `StudentID` (which has column index 0) as our row labels: 

In [None]:
# Read in our csv file with StudentID as index
Grades = pd.read_csv("Gradebook.csv", index_col=0)

# display the first 5 rows
Grades.head(5)

If our data was instead in an Excel file (.xlsx), we would just use the ``read_excel`` function instead of the ``read_csv`` function. We'll see example of this later.

## EXERCISE.
<font color=green>**1. Calculate and print the average grade on the midterm for the entire class.**<br>
<font color=green>**2. Calculate and print the GPA (equally-weighted average of the midterm, homework, and participation) for the following students: Xin, Zeb, Iris.**<br>
<font color=green>**3. Calculate and print the GPA assuming a weight of 35% for the midterm, 45% for the final, and 30% for participation, for all the students in the class.**<br>
<font color=green>**4. Under the weighting in #3, who is the student with the largest GPA in the class?**<br>