# Working with Data: Introduction to Python
<font color=indigo>Live Workshop</font>

---

* This document is a technical brainstorm (like whiteboarding with code). 
    * I'll give you this and any other assets over the break. 
    * No need to follow along live.
* If you don't have access to a jupyter installation, you can try the following link, 
    * https://jupyterlite.readthedocs.io/en/latest/_static/retro/notebooks/index.html
    * Press `File > New`
    * **this may take a few minutes to load**
---
 

## Lesson Plan

* Introductions (15 min)
* Part 0: Review (30 min)
    * Python Warm Up (15 min)
    * Individual Challenge (15 min)
* BREAK & ADMIN
* Part 1: Basic Syntax (45 min)
    * Review (20 min)
    * Individual Challenges (15 min)
* Part 2: Looping (45 min)
    * Review (20 min)
    * Group Challenges (25 min)
* BREAK
* Part 3: Advanced Syntax (45 min)
    * Review (20 min)
    * Individual Challenges (25 min)
* Reivew & End (5min)

## Introductions
* Name
* Background/Role
* Prior Experience
* Hobby

---

## Part 0: Review

### Variables & Operations

In [2]:
name = "Michael Burgess"
age = 33
height = 1.81
is_tired = True

What's the type of this data?

In [3]:
type(name), type(age), type(height), type(is_tired)

(str, int, float, bool)

### Printing

In [4]:
print("Hello my name is Michael")

Hello my name is Michael


**Aside**: You can include the value of variables using `{}` (braces) if you prefix the *string* with an `f` symbol,

In [5]:
print(f"Hello my name is {name} and my age is {age}")

Hello my name is Michael Burgess and my age is 33


...`f` for *format*

### Data Structures & Conversions

In [9]:
hobbies = ["Sailing", "Ceramics", "Arguing"]

What's the type of `hobbies`?

In [33]:
type(hobbies)

list

The first `hobbies` is a string,

In [8]:
hobbies[0]

'Sailing'

The *indexes* of a list are (somewhat like) page numbers, in that they identify the position of some data, 

In [16]:
page = 0

In [17]:
print(hobbies[page])

Sailing


In [15]:
print(hobbies[0], hobbies[1])

Sailing Ceramics


---

#### Aside
Python has other complex data structures. Eg., consider a *dictionary* which relates *keys* to *values*,

In [11]:
register = {
    # KEY : VALUE
    'Jaz': True,
    'Sara': True,
    'Sherlock': False
}

In [12]:
type(hobbies), type(register)

(list, dict)

A dictionary allows you to use a named index (rather than just a *sequential* index, ie., a counted one), 

In [18]:
register['Jaz']

True

---

### Importing

In [21]:
import statistics

How would I run the `mean` function *inside* the `statistics` module?

In [23]:
statistics.mean # the name of the function

<function statistics.mean(data)>

Running the function,

In [26]:
statistics.mean([2, 4, 7]) # to run the function we write () parentheses after the name

4.333333333333333

### Summarizing Functions

What built-in functions would help me describe a data-set?

* `len`
* `sum`
* `statistics.mean`
* `statistics.median`
* `statistics.mode`
* `statistics.stdev`

There are `6` elements in the following list,

In [30]:
cities = [0, 2, 2, 2, 2, 4]

len(cities)

6

---

### Challenges (<= 15min)

#### 0. Summarizing Data 

* Create a new notebook in your jupyter application
* Consider a problem of your own choosing
    * eg., in retail, finance, health, ....
    * consider a numerical dataset (ie., a quantitative variable = float)
        * define a variable which is a list of numbers with **7** entries
    * consider a categorical dataset (ie., a qualitative variable = string)
        * define a variable which is a list of string **7** entries
* Run the above summary functions on your lists
    * STRETCH: neatly print out the results

In [111]:
# SOLUTION

    
#### Solution 0. Summarizing Data

In [91]:
caffeine = [300, 600, 100, 200, 150]
sleep = ["GOOD", "BAD", "EXCELLENT", "GOOD", "OK"]

print("CAFFEINE REPORT")
print("No. Entires: ", len(caffeine))
print("Total: ", sum(caffeine))
print("Mean: ", statistics.mean(caffeine))
print("Mode: ", statistics.mode(caffeine))
print("Median: ", statistics.median(caffeine))
print() # an empty line

print("SLEEP REPORT")
print("No. Entires: ", len(sleep))
print("Mode: ", statistics.mode(sleep))
print("Median: ", statistics.median(sleep))


CAFFEINE REPORT
No. Entires:  5
Total:  1350
Mean:  270
Mode:  300
Median:  200

SLEEP REPORT
No. Entires:  5
Mode:  GOOD
Median:  GOOD


---

## Part 1: Basic Syntax

### Lists

In [44]:
prices = [1.20, 2.40, 5.5]

indexes start at zero and count,  ( start page: 0)

In [47]:
prices[0], prices[1], prices[2]

(1.2, 2.4, 5.5)

We can also go backwards through a list, *ending* at `-1`, (end page: 2, also: -1)

In [49]:
prices[-1], prices[-2]

(5.5, 2.4)

How do you add elements to a list?

In [70]:
prices.append(6.55)

In [51]:
prices

[1.2, 2.4, 5.5, 6.55]

How do you add an element *at a position*?

In [52]:
prices.insert(0, 0.51)

In [53]:
prices

[0.51, 1.2, 2.4, 5.5, 6.55]

recall, `sleep`, 

In [54]:
sleep

['GOOD', 'BAD', 'EXCELLENT', 'GOOD', 'OK']

In [56]:
sleep.count("GOOD")

2

In the syntax `o.m()` we refer to the dataset `o` as an *object* and the operation which is applied to the data, a *method*. 

### Syntax Errors

Learning python takes years.

* learning to program (years)
    * <font color=indigo>**learning syntax**</font> (important early skill)
        * adjusting your *reading* to technical code 
        * not like mathematics, or literature -- both ambiguous
        * highly precise
    * learning algorithms
        * and their behaviour
    * learning phrasing
* learning python
    * specifics of pythons: syntax, ....


##### Equality & Assignment

In mathematics: $x = y$ is the same as $y = x$

In python programming `=` means *assignement*, it only reads/works *LEFT-TO-RIGHT*,

In [66]:
name = "Michael"

In [67]:
"Michael" = name

SyntaxError: cannot assign to literal (2134918280.py, line 1)

* What is the symbol for equality?

In [68]:
5 == (1 + 4)

True

In [69]:
(1 + 4) == 5

True

##### Quotes

Quotes need to be exact,

In [57]:
" Hello   '

SyntaxError: EOL while scanning string literal (3403272755.py, line 1)

##### Brackets

In [59]:
prices(0)

TypeError: 'list' object is not callable

Rather than *parentheses*, I should use..

In [60]:
prices[0]

0.51

* `()` parentheses
* `[]` square brackets
* `{}` braces
* double quote, single quote, colon, newline, ... 


The machine will not "infer" what you mean, you have to say *exactly*. 

---

##### Aside: Examples of Brackets in Python

Running a function,

In [61]:
print("Running a function")

Running a function


Grouping mathematical operations,

In [62]:
(2 * 3) + 10

16

*tuples* (are like lists) are often presented in parentheses, *tuples* do as much as lists do, *except* they cant be modified,

In [63]:
pair = (1, 2)

In python square brakets are *always* indexes,

In [64]:
hobbies[0]

'Sailing'

There are two uses of braces (dictionaries, and sets),

In [65]:
eg = {"Ham": 10, "Bread": 3}

people = {"Alice", "Eve", "Bob"}

----

### Comparisons & Tests

In [72]:
print(name, age, height, is_tired)

Michael 33 1.81 True


What would I write to determine if i were: an adult?

In [75]:
age >= 18 # operator = symbol

True

How would I determine if my name starts with an `M` ?

In [76]:
name.startswith("M") # method = name of an operation

True

How would I determine if my age is an odd number?

In [78]:
age % 2 # The remainder when age is divided by 2

1

In [79]:
age % 2 == 1

True

We usually write whether `age % 2` *is not* `0`, ie., is not even,

In [80]:
age % 2 != 0

True

### Logical Combinations of Comparisons (Tests)

We can ask for *both* conditions to be true,

In [98]:
(age >= 18) and (age % 2 == 0) # both that age is >=18    AND    that age is even

False

Or either,

In [99]:
(age >= 18) or (age % 2 == 0) # either that age is >=18    OR    that age is even

True

Or that they arent true,

In [101]:
not ((age >= 18) and (age % 2 == 0)) # not that both...

True

### Decision-Making

```python
if ...:
    action
```

What are the other keywords I can use with `if`?

```python
if ...:
    ...
elif:
    ...
else:
    ...
```

Suppose I want to perform some number of actions based on some tests. Eg., suppose 4 different actions. 

```python

if ....:
    ACTION1
elif ...:
    ACTION2
elif ...:
    ACTION3
else:
    ACTION4
```

Only one group of statements is performed, statements are *grouped* using indentation *whitespace*,

In [90]:
# What action will be performed?
age = 81

if (age >= 18) and (age <= 80):
    # What age am I here?
    print("I am at least 18")
    print("I am at least 18")
    print("I am at least 18")
    # these three statements execute because they are indented to the same level
elif age > 80:
    # What age am I here?
    print("I am at least 81")
else:
    # What age am I here?
    print("I am less than 18")


I am at least 81


Note, `else` and other clauses are optional,

In [165]:
print("Hello people!")
if age >= 18:
    print("DISPENSE drink!")
print("Bye people!")

Hello people!
DISPENSE drink!
Bye people!


---

### Challenges (20 min)

* copy the code to your own notebook
    * and complete the challenge

#### Challenge 1a. Syntax Errors

Fix the syntax in the following example,

In [93]:
ages = [12, 41, 61 18)

print(len[ages])
print(ages[-1])

SyntaxError: invalid syntax (924352493.py, line 1)

#### Challenge 1b. Comparisons & Tests: Odd Numbers

In [95]:
prices = [1, 3, 4, 5, 9]

Select the first price,

In [None]:
# SOLUTION

Determine if the price is odd

In [None]:
# SOLUTION

#### Challenge 1c. Comparisons & Tests: Prefixes

In [96]:
address = "London, United Kingdom"

Using `.startswith` and `.endswith`, determine if the address starts with "London" and ends with "Texas",

In [None]:
# SOLUTION

#### Challenge 1d. Decision-Making

* If the address starts with "London" *and* ends with "United Kingdom", print "Welcome to the UK capital!".
* Otherwise if it ends with "Texas", print "Welcome to Kimble!"
* Otherwise, print "Hi, wherever you are!"

In [None]:
# SOLUTION

#### Challenge 1e. STRETCH CHALLENGE

If you finish the above challenges quickly, modify your summary report (completed earlier) and use various tests and decision-making syntax (if, else, etc.) to expand the report. (Eg., test the means, modes, ...)

### Solutions

In [115]:
#1.              COMMA   BRAKET
ages = [12, 41, 61, 18]

#2.    PARENTHESES
print(len(ages))

#3. NO ISSUE
print(ages[-1]) # -1 is an index which refers to the *last* element

4
18


In [95]:
prices = [1, 3, 4, 5, 9]

In [116]:
prices[0]

1

In [120]:
prices[0] % 2 != 0

True

In [117]:
address = "London, United Kingdom"

In [122]:
address.startswith("London")

True

In [123]:
address.endswith("London")

False

In [124]:
address.startswith("London") and address.endswith("Texas")

False

If you get a working if/else, great... the acutal solution doesnt matter,

In [126]:
if address.startswith("London")   and    address.endswith("United Kingdom"):
    print("Welcome to the UK")
elif address.startswith("London") and    address.endswith("Texas"):
    print("Welcome to Texas")
else:
    print("Hi, wherever you are!")

Welcome to the UK


---

## Part 2: Looping 

#### Looping

So far we've applied operations to single entires in lists,

In [152]:
cities = ["London, UK", "Manchester, UK", "Paris, FR"]
cities

['London, UK', 'Manchester, UK', 'Paris, FR']

How do I apply an operation to *every* entry? (What's the *keyword*...)

In [153]:
for c in cities:
    print(c)

London, UK
Manchester, UK
Paris, FR


Recall, indentation is important,

In [154]:
print("BEFORE")
for city in cities:
    print(city)
    print(city)
print("AFTER")

BEFORE
London, UK
London, UK
Manchester, UK
Manchester, UK
Paris, FR
Paris, FR
AFTER


How do you *read* this syntax? What does it do?

We could manually write it out,

In [155]:
# SET the variable p to be the first price
p = cities[0]
print(p) # REPEAT

# then the second
p = cities[1]
print(p)  # REPEAT

# until the last
p = cities[2]
print(p) # REPEAT


London, UK
Manchester, UK
Paris, FR


In [156]:
#   c = cities[0] ... cities[2]
for c in cities:
    print(c)

London, UK
Manchester, UK
Paris, FR


#### Looping & Filtering

How do I filter elements of data in a dataset, as I loop over them,

In [158]:
for city in cities:
    if city.endswith("UK"):
        print(city)

London, UK
Manchester, UK


### Further Examples of Looping

In [160]:
ratings = [4, 7, 7, 8, 9]

Suppose I want to count the number of elements above `5`, 

In [162]:
count = 0

for r in ratings:
    if r > 5:
        # SET count =  old-count + 1
        count       =  count     + 1

print(count)

4


---

### Challenges (20 min)

#### Challenge 2a. Looping: Even Numbers

* define `dinner_guests` in which each entry represents a group arriving for dinner at a restaurant
* print() each number which is even

In [107]:
# SOLUTION

#### Challenge 2b. Looping: Averages
* compute the total of all the numbers using a loop
* report the total and the mean after the loop
    * using `len()` with the total


In [108]:
# SOLUTION

#### Challenge 2c. Looping: Filtering
* compute a total *of just* the odd numbers
* compute a total *of just* the even numbers
* report their averages seperately

In [109]:
# SOLUTION

#### Challenge 2d. STRETCH: Looping: Maximums & Minimums

If you get the above challenges done quickly, find the maximum and minimum group size using a loop (ie., not using `min`, `max`, etc.).

HINT: You'll need a variable for each defined before the loop (eg., `min_entry`, `max_entry`) and these should be initialized to a useful starting value...

In [110]:
# SOLUTION

---

## Part 3: Advanced Syntax

#### Functions

#### Error Handling

#### Comprehensions

### Challenges (20 min)

#### Challenge 3a: Functions

* define a summary function 
    * which requires two *arguments*: numerical, categorical
    * and prints a statistical summary of both

#### Challenge 3b: Error Handling: Conversions

* modify your above code to `try` to perform the summary functions which are likely to fail
    * eg., `mode`
    * rather than fail, just print "could not perform" that action

#### Challenge 3c: Comprehensions

Consider the following comprehension, which computes a predictor variable $y$ from an $x$ drawn from the range $0 \text{ to } 10$,

In [103]:
y = [2 * x + 1 for x in range(0, 10)]
y

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Rather than using `range(0,10)`, use your own dataset,

* define a variable called, eg., `heart_rates`, to be a numerical list
* replace `range(0, 10)` in the example with your variable (ie., `heart_rates`)

In [106]:
# SOLUTION

Suppose you want to ignore some data in your list based on a test. Consider the following,

In [104]:
y = [2 * x + 1 for x in range(0, 10) if x > 5]
y

[13, 15, 17, 19]

Filter your own dataset,
* replace `range(0, 10)` 
* and the test `x>5` 
* to create a `y` more meaningful to your case.

Eg., suppose we ignore HRs `<30, and >200` on the basis that they arent likely to be real data.

In [105]:
# SOLUTION

---

## Review & End

---

## Appendix

* syntax
    * keywords `for`, `while`
    * operators `+=`
    * variables `name`
    * functions `names()` 
        * these are programmer-given names

#### Dictionaries (Examples)

In [19]:
boats = {
    "Happy": "120ft",
    "Big": "300ft"
}

In [31]:
boats["Big"]

'300ft'

In [20]:
groceries = {
    "Michael": ["Eggs", "Ham"],
    "Alice": ["Eggs", "Ham"],
}

In [32]:
groceries["Michael"][0]

'Eggs'