# Workshop 1 - Module Overview and Introduction to Python

Welcome to Data Programming in Python! This module will introduce you to the Python programming language - from variables and if statements to statistical testing and graph plotting - with the aim of preparing you for the other modules on the course and handling data in a work setting. **No need to be shy - if you need help, just ask for it!**

- [Jupyter Notebook](#Jupyter-Notebook)
- [Variables](#Variables)
    - [Data Types and Operators](#Data-Types-and-Operators)
        - [Basic Data Types](#Basic-Data-Types)
        - [Collection Data Types](#Collection-Data-Types)

## Jupyter Notebook
First - since we'll be using it for both the lectures and workshops - let's take a quick look at how Jupyter Notebook works.

A Jupyter Notebook is made up of "cells". There are two primary types of cells you need to know about.

The cell you are currently reading from is called a markdown cell, it contains text written in a syntax called markdown that - when the cell is ran - produces formatted text (including headers, table of contents, etc). Try double clicking this cell (or any other markdown cell) to see the raw un-rendered cell contents. Feel free to experiment with the cell's contents!

The following cell is a "code cell", with the cell highlighted the code within - which is Python code - can be executed using shift+enter or by selecting Run -> Run Selected Cell on the top menu bar.

In [None]:
# This is a code cell.

1 + 2

As you can see, when the cell is executed, the output of the code appears below the cell (if there is a valid output). The line prefixed with a hash symbol (#) is not executed as, in Python, this is a comment line used for labeling your code. In your assignments, you can use it to reference any snippets used!

You can also see a simple addition of two values being executed (1 + 2) and the result being displayed, but we don't "save" these values in any way for later use. In the cell above, modify the arithmetic statement - maybe try using multiplication or division instead. What happens if you try to divide by zero?

Very often, you'll want to store a value for use later in the code. This is where variables come in handy!

## Variables

A variable is a piece of computer memory containing some information which has been assigned a symbolic name as part of your code.

Here's an example:


In [None]:
add_sum = 1 + 2

Here we've declared a variable, with the symbolic name "add_sum", that contains the value of 1 + 2.

When declaring a variable, you should ensure that the symbolic name is representative of what the assigned value represents. You should try to adhere to the [PEP8 style guide](https://peps.python.org/pep-0008/#function-and-variable-names) when declaring your variables (they should be lowercase, with words separated by underscores as necessary to improve readability).

Try creating a variable in the code cell above that stores the result of some calculation using more values - maybe try to involve other arithmetic operators (multiply, divide, subtract)!

In [None]:
dog_name = 'Woofers'
owner_name = 'Steven'
years_adopted = 4

Here's some more variables. You'll notice that, when we ran the code cells, no output values were displayed. We are only declaring variables and assigning values to them, as covered in the lecture; we need to call a variable to see its value.

In [None]:
add_sum

In [None]:
dog_name

In [None]:
owner_name

In [None]:
years_adopted

There we go. The above was an example using numbers, but what else exists in Python and what else can we assign to variables?

### Data Types and Operators

Python is both a ***strongly typed*** and a ***dynamically typed*** programming language. This makes Python very approachable for newer programmers.

***Strongly typed*** refers to the requirement that variables have a type, and that type affects how operations on this variables are performed. Some languages do not check that variables have a type, Python does, but you won't need to worry about this. We will cover data types and operations shortly.

***Dynamically typed*** means that data types do not need to be explicitly stated when declaring variables, they are determined from context at runtime.

In [None]:
# in some programming languages, you would need to specify a type when declaring a variable
# here is an integer declaration in C#/C++
# int dogAge = 6;
dogAge = 6

If you run the above code cell, you'll notice there's an error! Try modifying the code so that it declares an integer variable in valid Python code! Ensure the symbolic name is changed to reflect Python standards.

Python also features operators that allow us to perform five main types of operations.

***Arithmetic Operators***

| Operator | Name           | Example |
|:--------:|:--------------:|:-------:|
| +        | Addition       | x + y   |
| -        | Subtraction    | x - y   |
| *        | Multiplication | x * y   |
| /        | Division       | x / y   |
| %        | Modulus        | x % y   |

***Assignment Operators***

| Operator | Example | Equivalent |
|:--------:|:-------:|:----------:|
|     =    |  x = 5  |     N/A    |
|    +=    |  x += 5 |  x = x + 5 |
|    -=    |  x -= 5 |  x = x - 5 |
|    *=    |  x *= 5 |  x = x * 5 |
|    /=    |  x /= 5 |  x = x

***Comparison Operators***

| Operator |           Name           | Example |
|:--------:|:------------------------:|:-------:|
|    ==    |         Equal to         |  x == y |
|    !=    |       Not equal to       |  x != y |
|     >    |       Greater than       |  x > y  |
|     <    |         Less than        |  x < y  |
|    >=    | Greater than or equal to |  x >= y |
|    <=    |   Less than or equal to  |  x

***Logical Operators***

| Operator |                      Description                     |     Example     |
|:--------:|:----------------------------------------------------:|:---------------:|
|    and   |     Evaluates to True if both statements are True    | x < 5 and x > 3 |
|    or    | Evaluates to True if one or both statements are True | x < 5 or x == 7 |
|    not   |      Evaluates to True if the statement is False     |    not(x < 5)

***Membership Operators***
| Operator |                              Description                             |   Example  |
|:--------:|:--------------------------------------------------------------------:|:----------:|
|    in    |     Evaluates to True if a sequence contains the specified value     |   x in y   |
|  not in  | Evaluates to True if a sequence does not contain the specified value | x not in y |

Don't worry about remembering all of these, they'll start to come naturally with practice, and we'll go through them in the following section!

So now we know that we can use these operators to perform operations on various data types, but what data types exist for us to use in Python? There are quite a few, so let's break it down into groups.

#### Basic Data Types

Python can be thought of having two categories of data types, a set of basic data types and a set of collection data types.

The four basic types are ***integers***, ***floats***, ***strings***, and ***booleans***.

***Integers*** are used to represent whole numbers and, as you might imagine, can be used to represent a wide variety of things.

In [None]:
age = 23
songs = 173
upvotes = 493
heart_rate = 74

Let's look at how integers interact with the operators we covered. 

In [None]:
# arithmetic (addition, subtraction, multiplication, division)
print(age + songs)
print(age - songs)
print(age * songs)
print(age / songs)

# assignment (=, +=, -+, *=, /=)
new_int = 30
print(new_int)
new_int += 5
print(new_int)
new_int -= 5
print(new_int)
new_int *= 5
print(new_int)
new_int /= 5
print(new_int)

# comparison (==, !=, >, <, >=, <=)
print(new_int == heart_rate)
print(new_int != heart_rate)
print(new_int > heart_rate)
print(new_int < heart_rate)
print(new_int >= heart_rate)
print(new_int <= heart_rate)

# logical (and, or)
print(age > 18 and songs > 150)
print(songs > 150 or upvotes > 500)

# membership (in, not in)
print(age in [16, 18, 21, 23])
print(heart_rate not in [40, 50, 60, 70, 80, 90, 100])

***Floats*** are used to represent decimal (or floating-point) numbers.

In [None]:
avg_age = 24.8
coord_x = 38.693
cube_vol = 87.4548
temperature = 98.6

Let's look at how floats interact with the operators we covered. In the code cell below, write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with floats? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# arithmetic (addition, subtraction, multiplication, division)
print(avg_age + coord_x)
print(avg_age - coord_x)
print(avg_age * coord_x)
print(avg_age / coord_x)

# assignment (=, +=, -+, *=, /=)
new_age = 30.0
print(new_age)
new_age += 5.0
print(new_age)
new_age -= 5.0
print(new_age)
new_age *= 5.0
print(new_age)
new_age /= 5.0
print(new_age)

# comparison (==, !=, >, <, >=, <=)
print(new_age == avg_age)
print(new_age != avg_age)
print(new_age > avg_age)
print(new_age < avg_age)
print(new_age >= avg_age)
print(new_age <= avg_age)

# logical (and, or)
print(avg_age > 18 and new_age > 18)
print(avg_age > 18 or new_age > 18)

# membership (in, not in)
print(avg_age in [16, 18, 21, 25])
print(avg_age not in [16, 18, 21, 25])

***Strings*** are used to represent text and can be thought of as a sequence of characters. Some programming languages have a data type specifically for individual characters, Python does not.

In [59]:
name = 'Steven'
full_name = 'Steven Smith'

Let's look at how strings interact with the operators we covered. In the code cell below, write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with strings? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# arithmetic (addition, subtraction, multiplication, division)
name + full_name
# name - full_name # subtraction is not supported
name * 2
# name / 2 # division is not supported

# assignment (=, +=, -+, *=, /=)
new_name = 'Eddie'
new_name += ' Smith'
# new_name -= ' Smith' # -= is not supported
# new_name /= ' Smith' # /= is not supported

# comparison (==, !=, >, <, >=, <=)
name == 'Steven'
name != 'Eddie'
# name > 5 # > not supported
# name < 5 # < not supported
# name >= 5 # >= not supported
# name <= 5 # <= not supported

# membership (in, not in)
'Ste' in name
'Ed' not in name

**Booleans** are used when we want to represent a binary outcome, in other words an outcome that has two possible results. These are often used, for example, to represent true/false or yes/no.

In [89]:
is_adult = True
is_cold = False

Let's look at how booleans interact with the operators we covered. In the code cell below, write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with booleans? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# note: boolean values are evaluated as integers, so a surprising number of these work!
# arithmetic (addition, subtraction, multiplication, division)
is_adult + is_cold
is_adult - is_cold
is_adult * 2
is_adult / 2

# assignment (=, +=, -+, *=, /=)
is_available = True
is_available += 1
is_available -= 1
is_available *= 2
is_available /= 2

# comparison (==, !=, >, <, >=, <=)
is_adult == is_cold
is_adult != is_cold
is_adult > 1
is_adult < 1
is_adult >= 1
is_adult <= 1

# membership (in, not in)
# is_adult in is_cold # in is not supported
# is_adult not in is_cold # not in is not supported

#### Collection Data Types

There are also four data types that are used to hold ***collections*** of other values/variables. Other than syntax, their differences lie in three metrics: whether the collections is ***ordered***; whether the collection ***allows duplicates***; and whether the collection is ***changeable***.

***Tuples*** are the first of the four data types for storing collections. Tuples are ***ordered*** and ***allow duplicates*** but are ***not changeable***.

In [68]:
fruit = ('Apple', 'Banana', 'Cherry')
stuff = ('Banana', 4.3, 19, True)
more_stuff = ('Cherry', 6.4, 852, False, 'Cherry')

Let's look at how tuples interact with the operators we covered.

In the code cell below, declare a tuple variable (try to declare a tuple with various data types contained within, bonus points if your tuple contains another collection data type!), then write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with tuples? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# declare a tuple, bonus points if your tuple contains another collection data type!
items = ('dog', 0.34, 14, True)

# arithmetic (addition, subtraction, multiplication, division)
fruit + items
# fruit - item # subtraction is not supported
fruit * 2
# fruit / 2 # division is not supported

# assignment (=, +=, -+, *=, /=)
more_items = {'cat', 0.66, 86, False}
# more_items += 'dog' # += is not supported
# more_items -= 'cat' # -= is not supported
# more_items *= 2 # *= is not supported
# more_items /= 2 # /= is not supported

# comparison (==, !=, >, <, >=, <=)
items == ('dog', 0.34, 14, True)
items != ('dog', 0.34, 14, True)
# items > 5 # > is not supported
# items < 5 # < is not supported
# items >= 5 # >= is not supported
# items <= 5 # <= is not supported

# membership (in, not in)
'dog' in items
'cat' not in items

***Lists*** are the second of the four data types for storing collections. Lists are ***ordered***, ***allow duplicates*** and are ***changeable***.

In [19]:
fruit = ['Apple', 'Banana', 'Cherry']
stuff = ['Banana', 4.3, 19, True]
more_stuff = ['Cherry', 67.4, 852, False, 'Cherry']

Let's look at how lists interact with the operators we covered.

In the code cell below, declare a list variable (try to declare a list with various data types contained within, bonus points if your list contains another collection data type!), then write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with lists? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# declare a list, bonus points if your list contains another collection data type!
items = ['dog', 0.34, 14, True]

# arithmetic (addition, subtraction, multiplication, division)
items + ['cat']
# items - 'dog' # subtraction is not important
items * 2
# items / 2 # division is not important

# assignment (=, +=, -+, *=, /=)
more_items = ['cat', 0.66, 86, False]
more_items += ['dog']
# more_items -= ['cat'] # -= is not supported
more_items *= 2
# more_items /= 2 # /= is not supported

# comparison (==, !=, >, <, >=, <=)
items == more_items
items != more_items
items > more_items
items < more_items
items >= more_items
items <= more_items

# membership (in, not in)
'dog' in items
'cat' not in items

***Sets*** are the third of the four data types for storing collections. Sets are ***not ordered***. ***don't allow duplicates***, and ***not changeable***.

In [2]:
fruit = {'Apple', 'Banana', 'Cherry'}
stuff = {'Banana', 4.3, 19, True}

Let's look at how sets interact with the operators we covered.

In the code cell below, declare a set variable (try to declare a set with various data types contained within, bonus points if your set contains another collection data type!), then write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with sets? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# declare a set, bonus points if your set contains another collection data type!
items = {'dog', 0.34, 14, True}

# arithmetic (addition, subtraction, multiplication, division)
# fruit + items # addition is not supported
fruit - items
# fruit * items # multiplication is not supported
# fruit / items # division is not supported

# assignment (=, +=, -+, *=, /=)
more_items = {'cat', 0.66, 86, False}
# more_items += 'dog' # += is not supported
# more_items -= 'cat' # -= is not supported
# more_items *= 2 # *= is not supported
# more_items /= 2 # /= is not supported

# comparison (==, !=, >, <, >=, <=)
items == {'dog', 0.34, 14, True}
items != {'dog', 0.34, 14, True}
# items > 5 # > is not supported
# items < 5 # < is not supported
# items >= 5 # >= is not supported
# items <= 5 # <= is not supported

# membership (in, not in)
'dog' in items
'cat' not in items

***Dictionaries*** are the fourth of the four data types for storing collections. Dictionaries are ***ordered***, don't ***allow duplicates***, but are ***changeable***.

*Dictionaries historically were not ordered, but this was changed in Python 3.7.*

In [None]:
prices = {'Apple': 0.85, 'Cherry': 1.25}
more_stuff['Cherry'] = 0.95

Let's look at how dictionaries interact with the operators we covered.

In the code cell below, declare a dictionary variable (try to declare a dictionary with various data types contained within, bonus points if your dictionary contains another collection data type!), then write one statement for each type of operator discussed above (use the code comment for ease). Which operators are not compatible with dictionaries? Make note of them with a comment (you can also place comments on the same line as code, so long as the comment is after the code).

In [None]:
# declare a dictionary, bonus points if your dictionary contains another collection data type!
dict = {'list': ['cheese', 1.4, 23], 'num': 42, 'bool': True}

# arithmetic (addition, subtraction, multiplication, division)
# dict + prices # addition is not supported
# dict - prices # subtraction is not supported
# dict * prices # multiplication is not supported
# dict / prices # division is not supported

# assignment (=, +=, -+, *=, /=)
# dict += 'hi' # += is not supported
# dict -= 'hi' # -= is not supported
# dict *= 'hi' # *= is not supported
# dict /= 'hi' # /= is not supported

# comparison (==, !=, >, <, >=, <=)
print(dict == prices)
print(dict != prices)
# print(dict > prices) # > is not supported
# print(dict < prices) # < is not supported
# print(dict >= prices) # >= is not supported
# print(dict <= prices) # <= is not supported

# membership (in, not in)
print('list' in dict)
print('potato' not in dict)

For your final task - in the code cell below - try to combine the knowledge you've gained so far to create a small piece of code that creates three variables representing the current temperature, whether it is sunny or not, and yesterday's probability of raining. Your code need to determine what is the probability of raininig today based on his data. If it is sunny, the probability of rain is fixed at 5%. In case it is not sunny, today's probability sees and increase of 20% compared to yesterday if the temperature is higher than 20 C and a decrease of 45% if the temperature is smaller or equal than 20 C.
Provide appropriate comments to your code (you can also add inline comments).

In [None]:
## CASE 1 ##
# Create your variables
current_temperature = 21
sunny = True
yesterday_p_rain = 50

# Determine today's probabilty of rain
today_p_rain = 5 # 5% chance of rain

In [None]:
## CASE 2 ##
current_temperature = 21
sunny = False
yesterday_p_rain = 50

# Determine today's probabilty of rain
today_p_rain = yesterday_p_rain + (yesterday_p_rain * 0.2) # increase by 20%

In [None]:
## CASE 3 ##
current_temperature = 19
sunny = False
yesterday_p_rain = 50

# Determine today's probabilty of rain
today_p_rain = yesterday_p_rain - (yesterday_p_rain * 0.45) # decrease by 45%

In [16]:
## Putting all together ##
current_temperature = 21
sunny = False
yesterday_p_rain = 20

# Part 1: Determine whether the temperature is higher than 20 degrees
temp_higher = (current_temperature > 20)

# Part 2: Determine today's probabilty of rain in case temp_higher is True or False
today_p_rain_true = yesterday_p_rain + (yesterday_p_rain * 0.2)
today_p_rain_false = yesterday_p_rain - (yesterday_p_rain * 0.45)

# Part 3: Combine everything 
today_p_rain = sunny * 5 + (1 - sunny) * (temp_higher * today_p_rain_true + (not temp_higher) * today_p_rain_false) # remember that True evaluates to 1 and False to 0

print("Result:", today_p_rain)


# We can also combine all the parts into a single expression
today_p_rain = sunny * 5 + (1 - sunny) * ((current_temperature > 20) * (yesterday_p_rain + (yesterday_p_rain * 0.2)) + (current_temperature <= 20) * (yesterday_p_rain - (yesterday_p_rain * 0.45)))
# NOTE: the above code is a bit complex and practically rarely used, but it shows how you can combine different data types and operators to create a complex expression.

print("Result:", today_p_rain)


Result: 24.0
Result: 24.0


**Bonus task**: modify the previous code into the code cell below so that the data - temperature, whether it's sunny or not, the probabilities of rain - is stored as a dictionary rather than individual variables and you perform the operations using this collection.

In [19]:
# Create a dictionary for the variables
data = {
    'current_temperature': 21,
    'sunny': False,
    'yesterday_p_rain': 20
}

# Part 1: Determine whether the temperature is higher than 20 degrees
temp_higher = (data['current_temperature'] > 20)

# Part 2: Determine today's probabilty of rain in case temp_higher is True or False
today_p_rain_true = data['yesterday_p_rain'] + (data['yesterday_p_rain'] * 0.2)
today_p_rain_false = data['yesterday_p_rain'] - (data['yesterday_p_rain'] * 0.45)

# Part 3: Combine everything 
data['today_p_rain'] = data['sunny'] * 5 + (1 - data['sunny']) * (temp_higher * today_p_rain_true + (not temp_higher) * today_p_rain_false)

print("Result:", data['today_p_rain'])

Result: 24.0
