<a href="https://colab.research.google.com/github/dan-a-iancu/OIT248/blob/main/Cheat_Sheet_for_OIT248.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is this about?
This document is intended as a Python “cheat-sheet” for all students participating in **OIT248**. You can reference a specific section using the Table of contents on the left. The document is organic and will grow as we keep adding more Python knowledge to our repertoire. If there is a specific thing that you found useful (e.g., a mnemonic rule, an idea, etc.), please let us know and we would be happy to add it below!



To start, try running the code cell in the **Comments** section below. Specifically:

 1. Because the cell is "hidden", let's first open it. You can do this either by clicking the " > " immediately to the left of the section title (**Comments**) or by clicking the grey bar with the text _"1 cell hidden"_  

 2. Let's now run the code inside the cell. You can do this by:
    
  (i) hovering with your mouse on top of the code cell and clicking the little "Play button" that will appear on the left or <br>
  (ii) clicking inside the code cell and hitting `Shift`+`Enter` on your keyboard.

If everything works properly, you should see the message `Welcome to OIT 248!` displayed immediately below the code cell.

# Comments
Comments in Python start with a `#` like directly below

In [None]:
# this and the line below it are comments and Python will ignore them
# print("Something interesting")

# instead, the line below this will print a welcome message
print("Welcome to OIT 248! The number of minutes in a non-leap year is:", 365*24*60)

# Variables
A **variable** in Python is a container for storing data. The variable is created the moment you first assign a value to it.

In [None]:
# create a variable named 'x' that stores the value 10
x = 10

# another variable named 'y' that stores the string "Mary"
y = "Mary"

Once the variable is created, you can use it in code. For instance, let's calculate $x^2$ <br>

In [None]:
x**2

Variable names are case-sensitive

In [None]:
little_john = 10

# the next line would create an error if you uncomment and run it!
# Little_john

# Data Types (int, float, bool, list, etc.)


Each variable stores a certain type of data. You can read more about data types <a href="https://www.w3schools.com/python/python_datatypes.asp">here</a>. The data types that we'll encounter most often in OIT 248 are: integers, floats, strings, booleans, lists, tuples, dictionaries.

In [None]:
# integers (int)
x = 10

# floating point numbers (float)
z = 4.5

# strings (str) - can be created with either ` or ""
one_string = "Why does Arbuckle not have salmon today?"
another_string = 'I LOVE THE GSB!'

# boolean (bool)
is_OIT248_amazing = True

# list - can be declared in several ways:
my_list = list(("apples","oranges","bananas"))   # a list with strings
another_list = [1,2,3,4]    # a list with integers

# dictionary (dict)
tennis_trophies = {}
tennis_trophies["Federer"] = 20
tennis_trophies["Nadal"] = 22          # at least as of 2023...
tennis_trophies["Djokovic"] = 24       # at least as of 2023...

If you ever need to (unlikely), you can find out the type of a variable with the command `type(.)`

In [None]:
type(z)

If needed, you can convert a data type into another type by casting it as below

In [None]:
int(z)

We will cover the more complex data types (lists, dictionaries) separately.  <br>

# Operators
We have already been using several operators above. The most common ones you will see in OIT 248 are:

## Arithmetic (+,-,*, etc.)

In [None]:
print(2 + 2)       # addition
print(3 - 2)       # subtraction
print(2 * 6)       # multiplication
print(66 / 2)      # division
print(8 % 3)       # modulus
print(2**4)        # exponentiation
print(10 // 3)     # floor division

## Assignment (=, +=, -+, etc.)


In [None]:
x = 5               # simple assignment
x += 3              # this is the same as x = x + 3
x -= 3              # this is the same as x = x - 3
x *= 3              # this is the same as x = x * 3
x /= 3              # this is the same as x = x / 3
x **= 3             # this is the same as x = x**3

## Comparison (==, !=, >=, etc.)

In [None]:
print(5 == 6)        #  a==b checks if a is equal to b
print(5 != 6)        #  a!=b checks if a is different from b
print(5 > 5)         #  a > b checks if a is strictly bigger than b
print(5 < 9)         #  a < b checks if a is strictly less than b
print(5 >= 5)        #  a >= b checks if a is at least as large as b
print(5 <= 6)        #  a <= b checks if a is at least as large as b

## Logical (and, or, not)

In [None]:
print((5 < 6) and (7 < 5))  #  'a and b' returns True is both 'a' and 'b' are true
print((5 < 6) or (7 < 5))   #  'a or b' returns True is at least one of 'a','b' is true
print(not (5 < 6))          #  'not a' reverses 'a'

## Membership

In [None]:
print(5 in [3,4,5])        #  'a in b' returns True if 'a' is contained in 'b'
print(5 not in [3,4,5])    #  'a not in b' returns True if 'a' is not contained in 'b'

# Ranges
The `range` function in Python returns a sequence of numbers. It syntax is:
>  `range(start,stop,step)`

- `start` is optional and is an integer that specifies at which position to start. If you omit this, the default value is 0.
- `stop` is required and is an integer that specifies at which position to end.  This value will not be included, so the range will actually end with the value `stop-1`.
- `step` is optional and specifies the increment. If you omit it, the default is 1.

This function will be quite useful, so it would be good to familiarize yourself with the examples below.

Create a range with a single argument.

In [None]:
range(5)

By itself, this is not terribly useful. But we can loop through it using a `for` statement (if you're unfamiliar with `for` loops, check the later section)

In [None]:
for i in range(5):
    print(i)

Now to play with the other versions of `range`

In [None]:
print("Here are all the integers from 5 to 10, INCLUDING 10")
for i in range(5,11):
     print(i)

print("\nAnd here are all the even numbers less than 10")
for i in range(0,10,2):
     print(i)

# For Loops
A `for` loop is used to iterate over a sequence (that could be a range, a list, a dictionary, etc.). Syntax:
> for `variable` in `sequence`:<br>
> $ \qquad$ instructions line 1<br>
> $ \qquad$ instructions line 2

In [None]:
# use a for loop to iterate through a numeric range
for i in range(5):
    print(i)

<font color=red>**Two elements are worth emphasizing in the syntax above:**</red>
- the colon `:` is critical on the first line
- the intendation is also critical, and must come on the second line (you can use as many spaces as you like!)

For instance, the following code would generate errors if you uncomment and run it!

In [None]:
# # forgetting the colon :
#for i in range(5)
#    print(i)

In [None]:
# # forgetting the indentation :
#for i in range(5):
#print(i)

In [None]:
# you can, however, put everything on one line! (this looks ugly and is NOT advised!)
for i in range(5): print(i)

Nested for loops are quite straightforward:

In [None]:
for i in range(1,3,1):
    for j in range(99,102,1):
        print(i,j)

# If-Else Statements
An "if statement" allows implementing a logical condition. Syntax:
> if `logical_condition_1`:<br>
> $ \qquad$ instructions if logical_condition_1 is True<br>
> elif `logical_condition_2`:<br>
> $ \qquad$ instructions if logical_condition_1 is False and logical_condition_2 is True <br>
> ...<br>
> else:<br>
> $ \qquad$ instructions if all logical conditions above are False

In [None]:
a = 15
b = 17
if (a > b):
    print("a is bigger than b")
elif (a == b):
    print("a is equal to b")
else:
    print("a is smaller than b")

<font color=red>**As with "for" loops, the colon ':` and the indentation are critical.**</font><br>
The following code would lead to errors if uncommented:

In [None]:
# # forgetting the colon - this would create an error if uncommented!
#if (a > b)
#   print("a")

In [None]:
# # forgetting the indentation
#if (a>b) :
#print("a")

In [None]:
# the parenthesis around the logical condition is not critical but it's highly recommended
if a > b :
    print("a")
else:
    print("b")

# Lists
Lists are used to store multiple items in a single variable. Lists can be created using square brackets `[.]`.

## Basics


Let's create some simple lists.


In [None]:
# a list with integers, created with the brackets [ ]
my_list = [1,2,3,4]

The data types contained in a list do not have to be the same and you can repeat the values several times

In [None]:
# crazy list containing a string, a float, another list (!), and several 5s
crazy_list = ["a string", 4.75, [1,2,3,4], 5, 5, 5]
print(crazy_list)

The items in a list are ordered. You can access the first item using the index `[0]`, the second item using the index `[1]`, etc.

In [None]:
crazy_list[0]

You can also access several items in the list using a **range** of indices specified with `:`

In [None]:
#  using i:j returns the elements at index i,i+1,...,j-1  (j is NOT included!)
crazy_list[1:3]

You can change the value at a certain location in a list using the correct index

In [None]:
crazy_list[2] = "take out the list from the list"
print(crazy_list)

You can even change several items using the range again

In [None]:
# note that the right-hand-side in this assignment is a list - elements will be matched location-wise
crazy_list[0:2] = ['Barcelona','Madrid']
print(crazy_list)

If your range has fewer indices than the items you want to change, this will expand the list

In [None]:
crazy_list[0:2] = [1,2,3]
print(crazy_list)

Conversely, if the range has more indices that the items you are inserting, this will shrink the list

In [None]:
crazy_list[0:5] = ["Hey", 3.3333]
print(crazy_list)

You can concatenate two lists using the `+` operator (this is also an easy way to add more elements at the end of a given list)

In [None]:
crazy_list_with_plus = crazy_list + ["mango", "kiwi"]
print(crazy_list_with_plus)

You can also duplicate a list many times using the `*` operator

In [None]:
# create a list with the string "We love Dan!" repeated 5 times
my_list = ["We love Dan!"] * 5

print(my_list)

**You should be careful with the _assignment_ operator `=` for a list!**<br>
The assignment operator will **NOT** create a copy of a list; rather, it will create a new/alternative name for the list

In [None]:
# are we creating a copy of `crazy_list` stored in the new variable `a_new_list`?
new_list = crazy_list

# print both lists
print("Here are the two lists")
print(crazy_list)
print(new_list)

# let's change the first element in this "new" list
new_list[0] = "I am changing the start"

# print both lists
print("Here are the two lists after changing")
print(crazy_list)
print(new_list)
# Note how BOTH lists are changing (because the new list just points to the old one)

If you want to create a genuine **copy** of a list, you can use the `copy` method. Details under the list methods section.

## The length, min, max of a list
To calculate the length of a list or the minimum or maximum values in the list, use the `len(.)`, `min(.)` and `max(.)` functions.

In [None]:
# create a list of numbers
list_of_numbers = [3, 6, 9, 1, -5, 34, 23, 99]

# print the length
print(len(list_of_numbers))

# print the smallest value
print(min(list_of_numbers))

# print the largest value
print(max(list_of_numbers))

## Fancy methods for a list
The list data structure already has a couple of pre-defined useful `methods` that allow you to conduct specific manipulations.

The full list of list (sic!) methods is:
- `append()`	Adds an element at the end of the list
- `clear()`	Removes all the elements from the list
- `copy()`	Returns a copy of the list
- `count()`	Returns the number of elements with the specified value
- `extend()`	Add the elements of a list (or any iterable), to the end of the current list
- `index()`	Returns the index of the first element with the specified value
- `insert()`	Adds an element at the specified position
- `pop()`	Removes the element at the specified position
- `remove()`	Removes the item with the specified value
- `reverse()`	Reverses the order of the list
- `sort()`	Sorts the list

For our purposes, the most useful methods are `index` and `copy`, which we exemplify below.


Index allows you to retrieve the location of a specific element. It returns the first location or `ValueError` if the element is not found.

In [None]:
# create another list to play with
crazy_list = ["apples", "oranges", "bananas", "OIT248" ]

# figure out the index where the element 'apples' appears
index_for_apples = crazy_list.index("apples")
print(f"The index where the string `apples` appears: {index_for_apples}")

# now try to find some element that doesn't exist -- note the error!
index_for_smth_new = crazy_list.index("The Wire Show")
print(index_for_smth_new)

The `copy` method does what the name suggests...

In [None]:
# let's create an actual copy of `crazy_list` stored in a new variable `new_list`
new_list = crazy_list.copy()

# print both lists
print("Here are the two lists")
print(crazy_list)
print(new_list)

# To see that this is a copy, let's change the first element in the "new" list and print both
new_list[0] = "Michael Jackson"

print("Here are the lists after changing")
print(crazy_list)
print(new_list)

We likely won't use a lot of these methods in OIT 248, but they are useful to know about. For a detailed coverage, you can check <a href="https://www.w3schools.com/python/python_lists_methods.asp">this link</a>.

## Looping through lists
There are several ways to loop through lists

**Option 1.**<br>
If you just care about the elements in the list **but not** their indices/locations, the most elegant way is to use a `for` loop through the elements themselves

In [None]:
# loop through the elements in `crazy_list` and store them in 'v'
for v in crazy_list:
    # 'v` now stores an element from the list; let's print 'v'
    print(v)

**Option 2.**<br>
An alternative, also useful when you just care about the elements, is to use a "list comprehension".

In [None]:
[print(v) for v in crazy_list];

This is a very compact syntax, but it might confuse you a bit at first so don't worry if you don't fully get it on the first try!...

**Option 3.**<br>
If you need the elements in the list **as well as** their indices, you can write classic" for loop. Specifically, for a list, we actually know what the indices are: they are 0, 1, 2, ..., number of elements-1. So we can get these using the `range(.)` and `len(.)` functions:

In [None]:
# calculate the number of elements in the list
num_elements_in_list = len(crazy_list)

# produce the range 0 .. num_elements_in_list - 1
indices = range(num_elements_in_list)

# and now let's loop through the elements, printing them as well as their index
for i in indices:
    print("At location", i, "we can find:", crazy_list[i])

Normally, you would not define all of those variables above and instead use this compact form:

In [None]:
# let's loop through the elements, printing them as well as their index
for i in range(len(crazy_list)):
    print("At location", i, "we can find:", crazy_list[i])

## List comprehensions
List comprehensions offer a very simple way to create a new list based on some existing lists. The syntax is:

> `newlist = [`_expression_ `for` _item_ `in` _iterable_ `if` _condition_ `== True]`

_iterable_ can be another list or a range (more broadly, any iterable type). The return value is a new list, leaving the old list unchanged.

In [None]:
# let's create a list with some fruits
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]

# now suppose we want to create a list with all the fruit names *except apple*
fruits_no_apple = [v for v in fruits if v!= "apple"]
print(fruits_no_apple)

If you want to embed an `if-else` condition, you need to switch the order of the `if` and the `for` loop, as follows:

In [None]:
# a copy of the original list where every occurrence of *apple* is replaced with *walnut*
fruits_apple_walnut = [v if v!= "apple" else "walnut" for v in fruits]
print(fruits_apple_walnut)

# Tuples
Tuples allow storing several items in a single variable. They are defined using round brackets `(.)`.

In [None]:
# define a tuple with a string, a float, and a list
my_tuple = ("apples", 3.14, [1, 2, 3])

print(my_tuple)

You might think that the tuple is quite similar to a list, but the fundamental difference is that tuples and **unchangeable**: once you created a tuple, you cannot change its contents.<br>

We won't be using tuples directly but many functions in Python return tuples, so you should not be surprised to see them!

# Printing
We have already been printing messages and variables above. The typical syntax of the `print` command that we'll use most often in OIT 248 is:
> `print`(_object(s)_, sep=_separator_)
 - _object(s)_ : one or more objects to print; each object will be converted to a string before printing
 - _sep_ : optional, specifies what separator to use between the objects

Let's see a few simple examples

In [None]:
# print an integer variable
x = 5
print(x)

# print some text and an integer
print("The value of the integer is:", x)

 The most important thing when printing is how to convert the objects into a string that satisfies the criteria you want.<br>

 There are many ways to do this in Python and all will be roughly equally good for our purposes!

## "Old-style" Formatting with %
This uses the `%` operator and will look familiar to those of you with C coding experience!

In [None]:
# let's define a few different variables
name = "Linwei"        # a string
age = 32               # an integer
gpa = 3.92             # a float
income = 245894.242    # a large float

print("The person named %s with age %d has a GPA of %f and income of %e" \
      % (name, age, gpa, income))

The example above already shows the data types that we're most likely to print in OIT 248:
 - `'d'` for a signed integer (decimal, i.e., base 10)
 - `'f'` for a floating point (decimal)
 - `'e'` for a floating point in exponential format
 - `'s'` for a string

In addition to these, you can also adjust the padding and how many digits of precision to print.

In [None]:
# allocate 20 characters to the name, 3 characters to the age, and print 2 digits of
print("The person named %20s with age %3d has a GPA of %.2f." \
      % (name, age, gpa))

# and print a more exotic example to see the difference
print("The person named %20s with age %3d has a GPA of %.2f." \
      % ("Jiawei Luo", 104, 55242.435))

Floating-point numbers use the format `%a.bf`. Here, `a` would be the minimum number of digits to be present in the string (padded with white space if the whole number does not have enough digits), and `b` represents how many digits to display after the decimal point.

Unless you have very strong biases, **we encourage you to consider using one of the next two formatting options, which are more flexible and robust.**

## The `format(.)` method
The second option uses the string `format` method.

In [None]:
# let's again define a few different variables
name = "Linwei"        # a string
age = 32               # an integer
gpa = 3.92             # a float

print("{} has age {} and a GPA of {}".format(name, age, gpa))

So you just need to put braces `{}` for any object inside the string and then use `.format()` at the end of the string to include all the objects. Just as with the `%` method, you can specify a format for the strings inside the `{...}`, with the only distinction that you should use a colon `:` instead of `%`. See below.

In [None]:
# allocate 20 characters to the name, 3 characters to the age, and print 2 digits of
print("{:20s} has age {:3d} and GPA of {:.2f}.".format(name, age, gpa))

# and print a more exotic example to see the difference
print("{:20s} has age {:3d} and GPA of {:,.2f}.".format("Jiawei Luo", 104, 55242.435))

Note that the second statement uses the format `,.2f` to print the nice comma-delimiter `,` for thousands, which is useful when printing large numbers. For more examples of formatting, check out <a href="https://www.pythoncheatsheet.org/cheatsheet/string-formatting">this reference</a>.

## The `f-string` method
The last option uses "formatted" strings, a.k.a. f-strings. This is the most modern and most compact approach, so if you're learning Python for the first time, you might want to use this!

In [None]:
# let's again define a few different variables
name = "Linwei"        # a string
age = 32               # an integer
gpa = 3.92             # a float

print(f"{name} has age {age} and a GPA of {gpa}")

Here, you just need to put the character `f` before the string, and then enclose in braces `{}` the actual object. Formatting is done inside the `{...} ` in a similar way to the `.format()` method, using the colon `:`

In [None]:
# allocate 20 characters to the name, 3 characters to the age, and print 2 digits of
print(f"{name:20s} has age {age:3d} and GPA of {gpa:.2f}.")

# and print a more exotic example to see the difference
second_name = "Jiawei Luo"
print(f"{second_name:20s} has age {104:3d} and GPA of {55242.435:,.2f}.")

# Dictionaries
Dictionaries store data in pairs consisting of a lookup "key" and a corresponding "value". They are defined using curly brackets `{}`.

In [None]:
GS_trophies = {}               # define an empty dictionary
GS_trophies["Federer"] = 20    # add entry with key "Federer" and value 20
GS_trophies["Nadal"] = 22      # at least as of 2023...
GS_trophies["Djokovic"] = 24   # at least as of 2023...

print(GS_trophies)

You can retrieve or change a value from a dictionary using the key

In [None]:
# how many trophies does Nadal have now?
print(f"Nadal has: {GS_trophies['Nadal']} trophies")

# increase Nadal's trophy count by 1
GS_trophies["Nadal"] += 1
print(GS_trophies)

The keys and values do not have to have the same data type (just as with lists!)

In [None]:
# add a new entry with a numeric key and a string value
GS_trophies[3.14] = "Something"
print(GS_trophies)

The code above also shows how to add a new item in a dictionary. To remove an item based on its key, you can use the `pop(.)` method:

In [None]:
# remove the weird item that we added above, with key 3.14
GS_trophies.pop(3.14)
print(GS_trophies)

Loop through all the keys in the dictionary, one by one, with a `for` loop:

In [None]:
for key in GS_trophies:
    print(key)

To loop through the values, you can loop through the keys and get the values:

In [None]:
for key in GS_trophies:
    print(f"{key} has {GS_trophies[key]} Grand Slam trophies.")

To copy a dictionary, you need to use the `copy(.)` method, just like for a list. (Using `=` will **not** return a copy of the dictionary but will simply give the dictionary a second name...)

In [None]:
GS_trophies.keys()

For more details on dictionaries (including the methods available), check <a
href="https://www.w3schools.com/python/python_dictionaries.asp">this</a>.

# Functions
Functions allow organizing the code in blocks that can be called separately (and many times). A function can take several arguments and can return data as a result. In Python, functions are defined using the `def` keyword.

Define a function with no arguments that prints a message.

In [None]:
# define a function without any arguments
def hello():
    print("Hello again!")

# now define another function that adds its two arguments
def my_smart_addition(a,b):
    return a+b

# and an even smarter function that returns a+b and a-b as a list
def my_smartest_function(a,b):
    print("This is the smartest function.")
    a_plus_b = my_smart_addition(a,b)
    a_minus_b = a-b
    return [a_plus_b, a_minus_b]

# let's test our functions
hello()

# this function returns something; let's store and print the result!
result = my_smart_addition(2,5)
print(result)

# this function also returns something; let's store and print the result!
result = my_smartest_function(2,5)
print(result)

# # the following would give an error if uncommented!
# Hello()

A few things to note:
 - the colon `:` is critical in the syntax (just like with `if` and `for`)
 - functions can take arguments<br>
   _For instance, the second function takes as arguments two things a, b. These could be any data type._
 - the keyword `return` tells the function what value to return <br>
   _For instance, the second function returns the sum of its arguments, a + b_
 - you can call functions inside other functions<br>
   _Like we did in the "smart" function, which calls the function that adds_
 - you can return several arguments<br>
   _Just package them in a list and return that list, like in the third example_
 - function names are case-sensitive!

# Importing modules
A module is essentially a library with lots of functions. By "importing" a module with the function `import`, you can use all the functions inside it.

For instance, in this class we will use the `pandas` module a lot (for some reasons, see the separate section covering it!) To import it, you could do any of the following:

> ``import pandas``

This imports the "pandas" module, and allows us to use the functions it contains; to use the function `read_csv()`, we would have to use the syntax ``pandas.read_csv()``.

Because this requires typing the word `pandas` all the time, we can assign it a 'short name' as follows:

> ``import pandas as pd``

This also imports the full "pandas" module, but now we could call the `fname` function using ``pd.read_csv()``. Saving a few characters could mean a lot if you're typing this thousands of times :-)

Lastly, there is one more option that we could use:

> ``from pandas import *``

This imports everything in the pandas module and makes it so that we can just refer to the function using ``read_csv()``. This is useful if the functions are specific enough that you don't think the same name might be defined/used elsewhere, but it could be dangerous if you think there might be overlap.

# Pandas module
Pandas is a Python library used for working with data sets. It has very useful functions for analyzing, cleaning, exploring, and manipulating data, and we will be using it a lot throughout our class. (And in case you're wondering, the name is **not** about an animal, for a change.... It's actually short for "panel data"!) Over coverage here will be very brief, but if you want more, see, e.g., <a href="https://www.w3schools.com/python/pandas/default.asp">this resource.</a>

In [None]:
# let's first import the module
import pandas as pd

## Series
The most basic object in `pandas` is a `Series`, which can be created with the `Series(.)` function. You can think of as a column in a table (think of a column in Excel!)

In [None]:
my_list = [24, 56, 99]            # create a simple list
my_series = pd.Series(my_list)    # now create a series from the list

print(my_series)                  # print the series

Note that `pandas` assigns a label (called an **index**) to each element in the series. If you don't specify anything, these labels/indices will be the integers 0, 1, 2, ... You can also specify your own custom labels with the `index` argument, as below.

In [None]:
# create a series with a given index
my_series = pd.Series(my_list, index = ["a", "b", "c"])
print(my_series)                  # print the series

By themselves, the Series are not terribly useful, but they form the basis of DataFrames, so it's good to understand them separately.

## DataFrames
DataFrames are the fundamental way in which data is organized in `pandas.` You can think of a DataFrame in close analogy with a table in Excel: it is a two-dimensional table that has Series as its columns.

### Basics

Normally we read DataFrames from files, but here, we will show you two ways to create a DataFrame, either from a list of lists or from a dictionary. These might be useful in their own right.

First, this is how to create a DataFrame from a dictionary. The keys correspond to the column names, and the values correspond to the values of the column (for every row).

In [None]:
# a dictionary that stores some data
dictionary_with_data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28],
    'Id' : [10001, 10002, 10003, 10004],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

# create a DataFrame from the dictionary
df = pd.DataFrame(dictionary_with_data)

# display it
display(df)

Next, you can create a DataFrame also from a list of lists. The outer list contains one element for each **row** of the dataframe; each element of the outer list is itself a list, that contains the values for each column (corresponding to a row). We use the same example as above for clarity.

In [None]:
# a list of lists (!) to store the data
list_of_lists_of_data = [ ['Alice', 25, 10001, 'New York'], \
 ['Bob', 30, 10002, 'San Francisco'], \
 ['Charlie', 22, 10003, 'Los Angeles'], \
 ['David', 28, 10004, 'Chicago'] ]

# create a DataFrame from the dictionary
df = pd.DataFrame(list_of_lists_of_data, columns = ["Name", "Age", "Id", "City"])

# display it
display(df)

As you can see, the DataFrame is basically a table with rows and columns. The column labels/names here are "Name", "Age", "Id", and "City". The rows are labeled with 0, 1, 2, 3, which you can see displayed in the first column (to be clear, that column is not really part of our DataFrame, so the first actual column is "Name"). Just like for a `Series`, `pandas` assigned these labels automatically when we created the DataFrame, and it used 0, 1, 2, 3 because we didn't specify custom ones (with the `index` argument).<br>

The function `display(.)`, which we already used above, is great for visualizing DataFrames. In a large DataFrame, you may want to only display a few rows, which you can do with the `head(.)` method:

In [None]:
# display the first 2 rows
df.head(2)

To find out the number of rows, use `len(.)`

In [None]:
# print the number of rows in the DataFrame
print(len(df))

If you need both the number of rows and the number of columns, use `shape`

In [None]:
df.shape

### Column operations

You can see a specific column using the syntax `df[column_name]`:

In [None]:
# check out the column "Age" : this is a Series, with integer values
df["Age"]

To rename a column, use the `rename(.)` DataFrame method

In [None]:
# rename the "City" column as 'Location' and the 'Age' as 'Years'
df.rename(columns={'City': 'Location', 'Age': 'Years'}, inplace=True)
display(df)

The argument `inplace`=True is needed to make sure that the DataFrame is actually changed (otherwise, the `rename` method just returns a new DataFrame, without changing the existing one).

To obtain the names of all the columns, use the attribute `df.columns`.

In [None]:
# get all the column names
df.columns

As you can see, this returns an `Index` object (which we have not discussed), but you can iterate through it with a usual `for` loop or you can transform it into a regular Python List with the function `list`.

In [None]:
# let's iterate through the names of the columns with a for loop
for c in df.columns:
     print(c)

# let's store the columns in a list and display the list
column_names = list(df.columns)
print(column_names)

<font color=red>**Warning!**</font>`df.columns` **is not a method**! If you use round brackets, you will get an error:

In [None]:
# # the following would generate an error!
# df.columns()

### Row operations

To get all the row labels, used `df.index`.

In [None]:
# get all the row labels
df.index

This also returns an object (a `RangeIndex`), but you can readily loop through it with a `for` or store it as a list.

In [None]:
# let's iterate through the indices of all the rows
for i in df.index:
     print(i)

# let's store the columns in a list and display the list
row_idx = list(df.index)
print(row_idx)

To change the row labels used for the DataFrame:
 - to set these to the values of an existing column, use the `set_index` method
 - to set these to some Python list, assign the `index` attribute
**Warning** To make sure you avoid ambiguities, the labels should always contain unique values.

In [None]:
# assign the index to the column 'Id', which in our case has unique id's
df.set_index('Id', inplace=True)

display(df)

As you can see, the row labels are now given by the "Id" column. Now let's set these to a, b, c, ...

In [None]:
# change the index to 0, 1, ...
df.index = list(['a','b','c','d'])

display(df)

### Retrieve elements
One of the most important operations with a DataFrame is to retrieve an element located at a certain row and column.

If you know the row and column labels, there are two approaches:
 1. use `df[column_label][row_label]`
 2. use `df.loc[row_label, column_label]`

First time you see this, it might look a bit confusing, especially because the order or rows/columns is switched! Recall that `df[column_label]` from approach (1) returns the `Series` corresponding to the column named `column_label`, so then, applying `[row_label]` to that simply returns the element at the `[row_label]` location. In contrast, (2) is the (arguably more natural) approach of indexing first with the row and then the column. Let's see them in action:

In [None]:
# let's get the element on row 'c' and column "Years"
print(df["Years"]['c'])   # approach 1
print(df.loc['c',"Years"])    # approach 2

`df.loc` actually allows you to recover several rows and columns (i.e., an entire sub-table) of the DataFrame

In [None]:
df.loc['c':'e',["Name", "Location"]]

Another option is to index into the DataFrame using entirely numeric indices, using `iloc`, with syntax:
> `df.iloc[numeric_row_index, numeric_column_index]`

To not get confused here, remember that Python uses 0-based indexing (and the very first column, which lists the row labels, does not count as a proper column).

In [None]:
# let's retrieve the element in row 1 and column 1
df.iloc[0, 0]

### Looping
To loop through the elements in a row or a column (or a sub-table), you can just use a regular `for` loop

In [None]:
# let's loop through the entire DataFrame on columns
for c in list(df.columns):
    # for every column
    for r in list(df.index):
        # for every row
        print(df.loc[r,c])

In [None]:
# now let's loop on rows, and using numeric indexing
(num_rows, num_cols) = df.shape
for r in range(num_rows):
    # for every row
    for c in range(num_cols):
        # for every column
        print(df.iloc[r,c])

Depending on what calculation you need, looping with `for` loops like above may not be the best approach; instead, we may want to use a list comprehension (something we will do often together) or one of the built-in DataFrame functions. `DataFrames` and more broadly the `pandas` module have a lot of powerful functionality built-in, and you will likely explore more of that in your D&D class. For now, if you want more details, you can check out one of the many tutorials available online (e.g., <a href="https://www.datacamp.com/tutorial/pandas-tutorial-dataframe-python">the DataCamp one</a>) or <a href="https://pandas.pydata.org/docs/reference/frame.html">the official Python manual</a>.

## Reading data files
We will read data files using panda's `read_csv` or `read_excel` functions.

This code assumes that you important the pandas module with the command `import pandas as pd`. For reading CSV files, the most common syntax we use is:
> `df = pd.read_csv(full_file_name, index_col)`<br>
where:
 - `full_file_name`: complete filename, including path if needed
 - `index_col` : the name or numeric index of the column to use to construct the row labels (this is optional, and if you don't specify it, Python will use 0, 1, 2, ...)

For reading CSV files, the most common syntax we use is:
> `df = pd.read_excel(full_file_name, sheet_name, index_col)`<br>
where:
 - `full_file_name`: complete filename, including path if needed
 - `sheet_name`: the name of the sheet
 - `index_col`: the name or numeric index of the column to use to construct the row labels; (optional, and if you don't specify it, Python will use 0, 1, 2, ...)

# Plotting
We will do most of our plotting with the `pyplot` library, which is contained inside the `matplotlib` library.

In [None]:
# import pyplot library from the matplotlib library
import matplotlib.pyplot as plt

For line plots, we use the `plot` function

In [None]:
# let's plot the function 3*x+2 for x in 0, 1, ..., 10
xpoints = list(range(11))               # the values of x
ypoints = [3*x+2 for x in xpoints]      # a list with the values of y = 3x+2

plt.plot(xpoints, ypoints)
plt.show()

For a scatter plot, use `scatter`

In [None]:
# let's plot the same function as above
plt.scatter(xpoints, ypoints)
plt.show()

Many things could be adjusted in a plot. <a href="https://matplotlib.org/2.0.2/users/pyplot_tutorial.html">This tutorial</a> could be a useful reference, but we also recommend leveraging AI bots for this: with some suitable prompting and a few iterations, they can typically get you pretty nice-looking plots!