# Data Science for Social Justice Workshop: Module 1

## Python Fundamentals

Welcome! In Module 1, we aim to cover Python fundamentals.

Specifically, in this notebook, we'll go over basic syntax and data structures when interacting with Python.

If this is your first time working with a programming language, this first module may be overwhelming. **That is totally normal**. Learning how to program well takes months, if not years. Our goal is to help put you in a position to learn more in the future. So, things might not always make sense, or be totally clear, and that's fine! Lean into the discomfort, and ask questions whenever something doesn't make sense.

This notebook is designed to help you:

* be comfortable running **Jupyter Notebooks**,
* know what **variables**, **functions**, and **methods** are,
* know how to use **`for` loops**, and
* know how to use **`if` and `else` statements**.

Let's get started.

## Python and Jupyter Notebooks

Python is the name of a programming language created by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum) in the early 1990s. It is one of many programming languages available. Python is a **general purpose programming language**. This means that it can be used in a wide range of settings.

Python has become arguably the main programming language used for data science and machine learning. One of the most common ways to run Python - in general, but specifically when conducting data-driven work - is with a **Jupyter Notebook**. This is what we are interacting with right now.

A Jupyter Notebook is powerful because it allows a user to use **cells** to interact with the programming language. These cells allow you to run blocks of code one at a time, increasing the interactivity. You can also have other types of cells, like Markdown cells (this one) which improve readability. The combination of code and Markdown cells makes Jupyter Notebooks a powerful tool for creating data-driven narratives.

Code is the vehicle we use to make our ideas "go", via the "kernel" (the computational engine that runs the code). Markdown is used to organize our work and research narrative. This text is essential to introduce the reader/audience to the problem or topic being investigated, our research question/hypothesis, materials and methods, results, discussion, and conclusions. 

[Click here to check out the Jupyter Notebooks beginner guide](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/What%20is%20the%20Jupyter%20Notebook.html).

### Run a Cell

Press `Shift + Enter` to render a cell of Markdown or a cell of code and advance to the next cell.

Press `Control + Enter` to run the cell but not advance to the next cell.

### Hello, World!

Even if programming is a [comparatively young art](https://en.wikipedia.org/wiki/History_of_programming_languages), it still has its traditions. One of these traditions is the `Hello, World!` program which is the first program that most people ever write whilst learning how to program.

In the cell below, you will now write your own `Hello, World!` program. Click on the block below this one, then run it by either pressing `Shift + Enter` or using the button marked <i class="fa-step-forward fa"></i> `Run`. What happens?

In [None]:
print('Hello, World!')

### Exercise 1

Now, write your own line of code here. Use the `print()` function to print out any line of text (don't forget to use quotation marks!).

In [None]:
# YOUR CODE HERE


### Navigating Your Notebook

Congratulations, you're now a programmer! Before we go on though, a quick word about Python itself, and how notebooks will work throughout this course.

Every cell has to be <i class="fa-step-forward fa"></i> `Run` to work. For more complex programs, it will be important to run the cells in the right order, too (i.e., run the cell at the top first, then the next one down etc.). 

You can also insert cells using the `Insert` menu, and move them up and down relative to one another using the <i class="fa-arrow-up fa"></i> and <i class="fa-arrow-down fa"></i> buttons.

Finally, notebooks like these run on what's called a **kernel**. Sometimes this kernel breaks or stops functioning. In such cases, you'll want to restart it. Do this by clicking the `Kernel` menu and clicking the `Restart` button.

### Comments

You'll also find that a lot of cells have comments in them. A comment is a note we leave for ourselves when writing code, explaining what we're thinking or doing. Python ignores these comments entirely, so we can write in human languages.

A comment is anything that starts with the hash (`#`) symbol. It can take up an entire line, or just the rest of a line containing code. See the two comments below.

In [None]:
# Here's a dog
print('Woof!')
print('Meow!')  # And that was a cat
# But let's hide the mouse
# print('Peep!')

## Variables and Data Types

In Python, we call something we literally write into the code a **literal**. This is as opposed to a **variable**, which is simply a name or placeholder for something that can take any value.

In Python, we create a variable by **assigning** it a value. We do this using the **assignment operator**, otherwise known as the equals (`=`) symbol. For example, to assign the value `"woof"` to the variable `dog` we would write `dog = "woof"`.

### Exercise 2

In the cell below, write code which assigns a text value to a variable, then print the variable.

In [None]:
# YOUR CODE HERE


Variables in Python have **types**. So far we have only come across one type: strings (which are what we call text, i.e., strings of characters). You can tell that it is a string as it is in between quotation marks. 

Other data types are integers (whole numbers), floats (numbers with a decimal point), lists (a mutable, or changeable, ordered sequence of elements) and dictionaries (an unordered, changeable and indexed collection).

These data types are referred to in Python as follows:
* `str`: strings, or text
* `int`: whole numbers
* `float`: numbers with a decimal point e.g., 1.0 or 1.5
* `list`: mutable, or changeable, ordered sequence of elements
* `dict`: unordered, changeable and indexed collection of elements

To find out a variable's type, we can use a **function** called `type()`. Try the example below.

In [None]:
print(type("Hello, World!"))

### Exercise 3

In the cell below there are some variables. Using `print()` statements, print out the type of each variable.

In [None]:
lumberjack = "okay"
my_age = 30
i_deserve = 10000.00
my_list = [1, 2, 3]
my_dict = {"eggs": 12, "cheese wheels": 3}

# YOUR CODE HERE



Variables are fundamental to programming and you will use them throughout the course and during the rest of your programming career.

One way to use them is as follows. Make sure you understand how this works!

In [None]:
pronoun = "He"
occupation = "lumberjack"
judgement = "okay!"
print(pronoun, "is a", occupation, "and", pronoun, "is", judgement)

## Functions and Methods

**Functions** are blocks of code that can be repeatedly run with when they are called. We've already used one called `print()`, which is built into Python. Functions usually accept some input and often have some output.

**Methods** are functions that only work on certain data types. Two much-used methods to work with strings are `lower()`, which turns a given string into lowercase, and `split()`, which splits a string into a list. The reason we distinguish methods is because we use **dot notation** to use them. For example, the `print()` function exists on its own. But something like `lower()` needs to be called *on* a string, and we do so by using a dot between the variable and function call:

In [None]:
test = "some Kind of String"
test.lower()

In [None]:
dashes = "I-am-a-string"
dashes.split('-')

## Lists: Ordered Collections of Data

Let's discuss the list, which is an ordered collection of data. We tell Python to create a list by using brackets:

In [None]:
shopping_list = ["bread", "eggs", "milk"]
print(shopping_list)

Lists are **mutable**, meaning that list values can be changed without reassigning the name.

Lists also have methods, some of which are `append()`, `count()`, `pop()` or `index()`. The most common method you'll likely use is `append()`:

In [None]:
shopping_list.append("cereal")
print(shopping_list)

As you see, when used like this, this method adds an element to the back of the list.

To access values in lists, use the square brackets for **slicing** along with the **index or indices** to obtain the value(s) available at that index. For example, to get the first item we use brackets along with an index value. Python is **zero-indexed**, which means that the first entry has index zero:

In [None]:
# Python starts counting indices at zero
shopping_list[0]

In [None]:
# More indices
print(shopping_list[1])
print(shopping_list[2])

We can use this index notation to update an element in a list:

In [None]:
shopping_list[0] = 'donuts'
print(shopping_list)

You can use **colon notation** to create a slice of indices, which will allow you to select multiple values from the list:

In [None]:
shopping_list[2:4]

Finally, let's `.sort()` our list (by default in ascending order).

In [None]:
shopping_list.sort()
shopping_list

As you can see, lists have these special functions attached to them which allow us to change them easily (e.g. how we used `append()` and `sort()` above). 

Let's recap on this terminology. 

* `shopping_list` is a *variable name* pointing to a *list*.
* Lists have a **method** called sort, this can be *accessed* as an **attribute** of any list, using a dot: `shopping_list.sort`
* We can **call** this function using a pair of parentheses.

So, this becomes: `shopping_list.sort()`

Methods called like this act *in place*, meaning that rather than *returning* a new variable, they *change the variable itself*, in our previous example, `shopping_list` was sorted in place.

Familiarise yourself with the [Python List Methods](https://docs.python.org/3/tutorial/datastructures.html).

## Dictionaries: Associated Collections of Data

Dictionaries consist of pairs of _keys_ and _values_. A _key_ is used to retrieve a _value_. For example, if I have a dictionary called `shopping_dict`, it will look like this: 

In [None]:
shopping_dict = {"apples": 3, "eggs": 12, "cheese wheels": 19}
shopping_dict

As you can see, dictionary items are presented in `key:value` pairs, and can be referred to by using the key name (using square brackets). Note that dictionaries cannot have two items with the same key!

In [None]:
shopping_dict["apples"]

We can easily add something new to a dictionary:

In [None]:
shopping_dict["sandwiches"] = 5
print(shopping_dict)

Getting a list of all the keys in a dict is as easy as running the `.keys()` method.

In [None]:
shopping_dict.keys()

...and the same goes for its values:

In [None]:
shopping_dict.values()

## Operators

We can do things with variables, and sometimes change their values using operators. So far, we've only covered one operator, the assignment operator (=). Python actually has lots of different operators.

Here are some basic operators:

| Symbol  | Name              | Example  | Used For                                                           |
|---------|-------------------|----------|--------------------------------------------------------------------|
| `=`     | Assignment        | `a = 1`  | Assigning the value on the right to the variable on the left       |
| `+`     | Addition          | `1 + 2`  | Returns the sum of the right and left hand sides                   |
| `-`     | Subtraction       | `3 - 1`  | Returns the left hand side minus the right hand side               |
| `*`     | Multiplication    | `2 * 3`  | Returns the product of the left and right hand sides               |
| `**`    | Power             | `2 ** 2` | Returns the left hand side to the power of the right hand side     |
| `+=`    | In place addition | `a += 1` | Sums the left and right hand sides, assigns sum to left hand side  |

Most of these operators will not change the value of a variable, but rather return a new value. For example, check out the code below.

In [None]:
a = 10 # Note: we don't put integers in quotes!
b = 20 
print('a * b =', a * b)
print('a =', a)
print('b =', b)

If we want to keep the value returned by an operator like `+` or `*`, we need to use it in conjunction with the assignment operator to assign the value to a new variable, or to an old variable. Check out the examples below:

In [None]:
c = 20
d = c + 10
print('d is', d)
print('and c is still', c)
f = 100
print('f is currently', f)
f = f * d
print('but we multiplied it by d and assigned the product to f, now f is', f)

### Exercise 4

So far, we've only used operators on integers. However, you can also use some operators on strings, too.

In the cells below, try using the addition and and multiplication operators on strings. What do you expect to happen?


In [None]:
# Define a string variable
dog = "woof!"
cat = "meow!"

# Multiply the variable 'dog' by two and print the result
# YOUR CODE HERE

# Add the variables 'dog' and 'cat' and print the result
# YOUR CODE HERE

## Casting: Converting between Types

We've defined several data types thus far. There are scenarios where we might want to take a variable in one data type, and express it in another. For example, we might have a string variable that could be interpreted as a number: `num = '2'`. How could we change this into the integer 2?

The term we use for this is **casting**. Python comes with built-in functions capable of performing casting for us. So, we need to **cast** the **string** value into an **integer**.

Casting doesn't always work the way you expect. It's easy to see how the string `"2"` can be converted to the integer `2`, or how the integer `123` can be converted to the string `"123"`, but how would you convert the string `"woof"` into an integer? Hint - you can't.

If a casting function can't convert a value, it will let you know by **raising an error**. We'll go more into detail about errors at the end of this lesson.

Here are some examples of casting: 

In [None]:
# String to integer
int('22')

In [None]:
# Integer to string
str(100)

In [None]:
# Float to integer
int(24.5)

In [None]:
# Integer to float
float(24)

In [None]:
# Float to string
str(3.1415)

In [None]:
# Raise an error
int("twenty two")

## Loops

The strength of using computers is their speed. We can leverage this by facilitating repeated computation with loops. In programming, there are generally two kinds of loops: **for loops** and **while loops**. We will only focus on for loops in this section, because they are more naturally suited for Python. We will not use while loops throughout these notebooks.

A for loop tells Python to execute some statements once for each value in a list, a character string, or some other set of values. Specifically, we structure our computation as: "for each thing in this group, do these operations".

The syntax for a for loop is shown below:

In [None]:
this_list = [0, 1, 2, 3]

In [None]:
for each_thing in this_list:
    # Do something with each thing
    print(each_thing)

### Looping over Lists and Dictionaries

We have already seen a for loop in action, but let's take a closer look at operating on lists and dictionaries:

In [None]:
shopping_list = ["bread", "eggs", "milk"]
for idx in shopping_list:
    print("We need to get", idx)

In [None]:
shopping_dict = {"apples": 3, "eggs": 12, "cheese wheels": 19}

for key in shopping_dict.keys(): 
    val = shopping_dict[key]
    print(key, "costs", val)

As you see, this retrieved the **keys**.

In [None]:
for j in shopping_dict.values():
    print(j)

...and this retrieved the **values**.

Here, we loop over both keys and values of the dict using the `items()` method, then print out both in a string. Note that we have to convert the value, which is an integer, into a string! 

In [None]:
total = 0

for key, val in shopping_dict.items():
    total += val
    print(key, "costs", val)

print("The total cost is", total)

### Exercise 5

Create a dictionary containing different items (keys) and their quantity (values) in this classroom, and write some code that loops over this dictionary and prints out both these keys and values.

In [None]:
# YOUR CODE HERE


## Creating a Custom Function

We've learned a lot about different functions such as `print()`. The great thing about functions is that you can create them yourself, to do whatever you want. All we do when we create a function is use the `def` keyword to **define** a new function and give it a **name** (and specify what arguments it will take, if any). We then write the code for the function below the `def` statement. 

In Python, functions make use of **indentation**, where you put **tabs** before your code. On most keyboards, the tab key is towards the top left: `⇥`. Then, whenever we **call** the function, the indented block of code is executed. 

The `def` keyword is special as it allows us to define a block of code and give it a name so we can use it again and again. This is the point of defining a function.

See the examples below.

In [None]:
# Let's create our first function
def my_first_function():  # Notice the colon! this is part of the syntax
    # Notice how this code is indented
    print('Thanks for calling me!')

# The code is no longer indented, so we say it is in a different 'block'

# Let's call our function
my_first_function()

# And, let's see what kind of type it is
print(type(my_first_function))

### Arguments in Functions

Often, we'll want a function to take arguments. To do this, we put a list of variables inside the brackets when we define our function. Now, whenever you call the function, you must supply it with arguments, and these variables will have the values the function was called with. 

See the example below.

In [None]:
# Create our function with arguments
def say_hello(x):  # x is an argument here
    # Inside this block, x is whatever we called the function with
    print('Hello,', x)
    
# Let's call our function
say_hello("Berkeley")

### Exercise 6

In the code cell below, write two functions, one (called `multiply_together()` which takes three arguments, multiplies them together and prints out the result, and another (called `add_together()`) which takes two arguments, adds them together and prints the result. When you have finished and run the cell without any errors, the code in the next cell should work and print out the correct values. 

In [None]:
# YOUR CODE HERE
 

In [None]:
multiply_together(2, 3, 4)  # Should print 24
add_together(10, 90)  # Should print 100 

Functions which return a value are generally much more useful than ones that just print them out, because it allows us to assign the value they return to a variable.

Unless a function explicitly returns a value using the `return` keyword, the value returned is simply `None`. `None` is a special value that has its own type, `NoneType`. 

To demonstrate this let's look at the `say_hello()` function from the earlier example *(tip: this cell will only work if you have run the cell above which defined the `say_hello()` function)*.

In [None]:
val = say_hello('student')
print('say_hello() returned ', val)
print(val, 'has the type', type(val))

Returning `None` in this context isn't very useful. Instead, as we said above, it is better generally to avoid printing things in our functions, and instead `return` a value which we can then do whatever we want with (if we like, we can pass it as an argument to the `print()` function to print it out). The `return` keyword will **return** a value from the function, and then **stop executing the function**. 

For example, see the new, improved function `return_hello()`. Make sure you understand the difference!

In [None]:
# Define the function
def return_hello(name):
    # This is a new block
    return 'Hello, ' + name
    # Any code below here (but in the same block) will never run
    
greeting = return_hello('Berkeley')
print(greeting)

## Conditional Execution

Conditionals, such as `if`, `else` and so on, are what's called **Boolean Logic** or **Boolean Algebra**, formalized by [George Boole](https://en.wikipedia.org/wiki/George_Boole). Python has a special type, called `bool`, which can only ever have one of two values: `True` or `False`.

Let's imagine we have a variable which was somehow set to whether a shop had eggs, call it `shop_has_eggs`, it can be either `True` or `False`. This is called **conditional execution**. Conditional execution allows us to control the flow of a program. In Python, we can write out two blocks of code, and then execute only one block, depending on some **condition**.

This is where two keywords come in: `if`, and `else` (you can think of `else` as simply meaning "otherwise"). We use these like below. Run the code and change the value of `shop_has_eggs` to see how the execution differs.

In [None]:
# First, define our "shop has eggs" variable
shop_has_eggs = False  # Change this to False and see what happens when you run it again

# Everything above will always execute
print('I am going to the shop.')

# Conditional execution
if shop_has_eggs:
    # When shop_has_eggs is True, this block executes
    print('The shop has eggs. Therefore, I will buy eggs.')
else:
    # When it's False, this block executes
    print('The shop has no eggs. Therefore, I will buy 1 loaf of bread.')
    
# Everything below will always execute
print('Now I am walking home.')

Here we can see that the `if` block requires a Boolean value. If the value is `True`, it will execute the block of code below it, if it is `False`, it will skip the block below it and not execute it. 

Furtermore, `if` can be combined with `else`, so that if the condition passed to `if` is `False`, then the block below `if` won't execute, but the block below `else` will. **You can use `if` on its own without `else`, but you can never use `else` on its own without `if`; this is because `if` must take a condition, but `else` can't.**

### Exercise 7

Change the code below so that it always says it's buying a loaf of bread, but will buy either 6 eggs or no eggs depending on whether `shop_has_eggs` is `True` or `False`.

In [None]:
# First, define our "shop has eggs" variable
shop_has_eggs = False  # Change this to False and see what happens when you run it again

# Everything above will always execute
print('I am going to the shop.')

# Conditional execution
if shop_has_eggs:
    # When shop_has_eggs is True, this block executes
    print('The shop has eggs. Therefore, I will buy eggs.')
else:
    # When it's False, this block executes
    print('The shop has no eggs. Therefore, I will buy 1 loaf of bread.')
    
# Everything below will always execute
print('Now I am walking home.')

### Comparison Operations

In a real life database management system, it would be much more probable that there was a variable called something like `egg_count`, which tells us the *number* of eggs in stock, rather than simply *whether* there are eggs or not. If that number is 0, there are no eggs, if it's 1 or more, then there are eggs.

So, we need a way to **compare** values (in this case an integer number of eggs) to evaluate to `True` or `False`. This is where Python's [comparison operators](https://www.tutorialspoint.com/python/python_basic_operators.htm) come in.

Comparison operators are like arithmetic operators, in that they take the value on the left, and compare it to the value on the right, and then, depending on the result of the comparison, return `True` or `False`. We can either assign the Boolean value to a variable, or, more commonly, just pass the condition to `if`. 

Perhaps the simplest of all these operators is `==`, or the **equality** operator. Note the two equals signs, which is intentional, as it means Python can understand the different between **assignment** and **equality**.

The code below is an example which uses both the assignment operator and the equality operator.

In [None]:
# Test variable
test_var = 9

# Check if test_var is equal to 10
if test_var == 10:
    print('test_var is equal to ten')
else:
    print('test_var is not equal to ten')

The equality operator is just one of Python's comparison operators:


| Symbol | Name                     | Example: `True`    | Example: `False`   |
|:------:|:------------------------:|:------------------:|:------------------:|
| `==`   | Equality                 | `'woof' == 'woof'`   | `23 == 20`           |
| `!=`   | Inequality               | `'woof' != 'meow'`   | `23 != 23`           |
| `>`    | Greater than             | `123 > 12.3`         | `100 > 1000`         |
| `<`    | Less than                | `1000 < 10000`       | `1 < 0.1`            |
| `>=`   | Greater than or equal to | `10 >= 10`           | `100 >= 1000`        |
| `<=`   | Less than or equal to    | `10 <= 100`          | `101 <= 100.0`       |
| `is`   | Identity                 | `10 is 10`         | `10 is 10.0`         |


Take a moment to go over these in your head so you are confident you understand them. 

### Else if

Finally, what do we do when we want to apply several `if` statements? We use `elif`.
`elif`, like `else`, can only be used after an `if` statement, and like `else`, it is also optional. So, the following combinations are allowed:

* `if` on its own
* `if` and `elif` with no `else`
* `if`, `elif`, and `else`

You may also include as many `elif` statements as desired. See below, and make sure you understand!

In [None]:
# Get the value from the shop
egg_count = 5  # change this value to whatever you like, and see how the execution changes

# How many eggs do we want?
number_of_eggs_wanted = 6

# Does the shop have eggs?
if egg_count >= number_of_eggs_wanted:
    # Yes
    print('I will buy', number_of_eggs_wanted ,'eggs.')
elif egg_count > 2:
    print('I will buy only', egg_count ,'eggs.')
elif egg_count > 1:
    print('I will buy the last pair of eggs.')
elif egg_count > 0:
    print('Lucky me, I got the last egg!')
else:
    # No eggs
    print('No eggs for me today.')

## Errors in Python

So far, we have seen a couple examples where things go wrong, and the code generate an error. Understanding errors in Python, and how you can use them to debug your code, is a skill itself. Part of that skill is parsing the error messages. Python has the difficult task of telling you what wrong in a flexible way, no matter how your code looks. A consequence of this is that sometimes it can be hard to understand what an error message is trying to tell you.

Let's look at a few types of errors you might run into in Python:

### NameErrors

`NameError`s usually occur when something is referenced that Python doesn't recognize. For example, try printing the following variable:

In [None]:
print(does_not_exist)

The red background implies that Python **raised** an error. In this case, Python raised a `NameError`. That means it's trying to find something, but it can't find the name in its memory. The error message, called a **traceback**, points to the literal that it can't identify, and Python informs you that it's not defined.

### SyntaxErrors

A syntax error occurs when Python encounters characters that don't make sense in the context of the surrounding code. This might happen if you don't use the right symbol in the right place - you can think of it as a spelling or grammar error in an essay.

Why does the following line of code raise a syntax error?

In [None]:
print('Oh, I'm a lumberjack, and I'm okay,')

### TypeErrors
 
Type errors occur when Python doesn't know what to do with variables of a provided data type. Often this occurs when you try and apply an operator or function to a data type it was not designed or meant to handle. Consider the following example:

In [None]:
# Define two variables
a = '2'
b = 'hello'
# Multiply them together
result = a * b
# And finally print them out
print(a, '*', b, '=', result)

Uh oh! We ran into a `TypeError`. The program is complaining that it "can't multiply sequence by non-int of type 'str'". What this means is that the `*` operator is not defined for two strings.

# Well done! 

That was a lot to get through! If not everything made sense, that's totally fine! You might need to go through this notebook multiple times, or look for additional resources, before things begin to click. As you work through your project and these exercises, you'll begin to internalize the concepts, making the process easier the next time.