# Working with Data: Introduction to Python
<font color=indigo>Live Workshop</font>

---

* This document is a technical brainstorm (like whiteboarding with code). 
    * I'll give you this and any other assets over the break. 
    * No need to follow along live.
* If you don't have access to a jupyter installation, you can try the following link, 
    * https://jupyterlite.readthedocs.io/en/latest/_static/retro/notebooks/index.html
    * Press `File > New`
    * **this may take a few minutes to load**
---
 

## Lesson Plan

* <font color=green>Introductions (20 min)</font>
* <font color=green>Part 0: Review (30 min)</font>
    * Python Warm Up (15 min)
    * <font color=blue>Individual Challenge (15 min)</font>
* <font color=green>Break & Admin (20 min)</font>
    * Our Values, Learning Objectives, Menti Self-Assess, ...
* <font color=green>Part 1: Basic Syntax (40 min)</font>
    * Discussion (20 min)
    * <font color=blue>Individual Challenges (20 min)</font>
* <font color=green>Part 2: Looping (40 min)</font>
    * Discussion (10 min)
    * <font color=blue>Group Challenges (30 min)</font>
* <font color=green>Break (10min)</font>
* <font color=green>Part 3: Advanced Syntax (40 min)</font>
    * Discussion (20 min)
    * <font color=blue>Individual Challenges (20 min)</font>
* <font color=green>Reivew & End (>5 min)</font>

## Introductions
* Name
* Background/Role
* Prior Experience
* Expectations
* Hobby

* Michael Burgess
    * michael.burgess@decoded.com
* Head of Technical Solutions
    * recently joined decoded
    * defence, telephony, ... physics
* Arguing, Philosophy...

---

---

## Part 0: Review

### Variables & Operations

In [5]:
# tag "name" is tagged to the value "Michael Burgess"
name = "Michael Burgess"
age = 33 
height = 1.81
is_tired = True

In [6]:
name

'Michael Burgess'

Q. What symbol *assigns* a value to a variable?

Equals sign... `=`

Q. What's the type of this data?

In [4]:
type(name), type(age), type(height), type(is_tired)

(str, int, float, bool)

### Printing

In [8]:
print(name, age, height, is_tired)

Michael Burgess 33 1.81 True


`f` prefix, *format*, "" (quotes).... instead of usual quotes, *format* their contents..>

In [10]:
print(f"{name} is {age} years old") # {name} substituted for "Michael Burgess"

Michael Burgess is 33 years old


**Aside**: You can include the value of variables using `{}` (braces) if you prefix the *string* with an `f` symbol,

...`f` for *format*

### Data Structures & Conversions

In [17]:
hobbies = [
    # 0,        1,           2 
    "running", "absailing", "arguing", "..."
    #    -4     -3         -2         -1
]

Q. What's the type of `hobbies`?

In [12]:
type(hobbies)

list

Q. What index is first?

In [18]:
hobbies[0]

'running'

**Aside: ... what's the index of the last entry?**

In [19]:
hobbies[3]

'...'

In [20]:
hobbies[-1]

'...'

### Importing

How would I import a module *called* `statistics` ?

Q. What keyword *imports* a module?

In [21]:
import statistics

Q. How would I alias (rename) the module for convenience, `st`,

In [22]:
import statistics as st

Q. How would I run the `mean` function ( defined *inside* the `statistics` module)?

In [25]:
from statistics import mean 

In [26]:
mean([1, 2, 3])

2

Suppose that the `statistics` module provides us a *mean* function, 

In [27]:
statistics.mean([1, 2, 3, 4])

2.5

`module.fn_part([1, 2,])`

`object.method([1, 2, 3])`

### Summarizing Functions

Q. What built-in functions would help me describe a data-set?

#### Brainstorm... what functions allow us to describe a dataset?

* without importing we can use...
    * `min`, `max`
    * `len`
    * `sum`
* `statistics`
    * `mean`, `mode`, `median`, `stdev`

#### Function Calling

Running `len` (operation, function, behaviour... algorithm) *on* `"Michael"`...

In [29]:
len("Michael")

7

... what are the inputs to functions? **Arguments** -- always the input to a function. `input` is a bit ambigious 

---

### Challenges (<= 15min)

#### 0. Summarizing Data 

* Create a new notebook in your jupyter application
* Consider a problem of your own choosing
    * eg., in retail, finance, health, ....
    * consider a numerical dataset (ie., a quantitative variable = float)
        * define a variable which is a list of numbers with **7** entries
    * consider a categorical dataset (ie., a qualitative variable = string)
        * define a variable which is a list of string **7** entries
* Run the above summary functions on your lists
    * construct a statistical report using the syntax & ideas above
    * STRETCH: neatly print out the results

In [111]:
# SOLUTION

In [32]:
measures = [1.81, 1.75, 1.6]
labels = ["London", "London", "Leeds"]

print(measures, labels)

[1.81, 1.75, 1.6] ['London', 'London', 'Leeds']


In [33]:
print(measures[-1], labels[0])

1.6 London


In [35]:
temp_list = [15.2, 18.3, 24.3, 26.4, 27.8, 32.2, 35.0]
print(temp_list)


british_sayings = ["meh", "Nice!", "Bit warm!", "blooming lovely!", "Its Hot!", "Its well Hot!","costa del london!"]
print(british_sayings)

print(max(temp_list))
print(british_sayings[6])

[15.2, 18.3, 24.3, 26.4, 27.8, 32.2, 35.0]
['meh', 'Nice!', 'Bit warm!', 'blooming lovely!', 'Its Hot!', 'Its well Hot!', 'costa del london!']
35.0
costa del london!


In [37]:
ages =  [23, 47, 73, 71, 65, 82, 74]
locs = ["London", "Leeds", "Glasgow", "Inverness", "London", "Chichester", "London"]

Eliminate duplicates,

In [42]:
set( locs )

{'Chichester', 'Glasgow', 'Inverness', 'Leeds', 'London'}

In [39]:
locset = set(locs)

In [47]:
print(f"""
run { name }
across  { len(hobbies )}
multiple
lines
""")


run Michael Burgess
across  4
multiple
lines



In [48]:
import statistics as st

print(f"""The yougest person is {min(ages)}.
        \nThe average age is {round(st.mean(ages), 1)}.
        \nThe median age is {st.median(ages)}.
        \nThe oldest person is {max(ages)}.
        \nI've found that the people come from: {locset}.""")

The yougest person is 23.
        
The average age is 62.1.
        
The median age is 71.
        
The oldest person is 82.
        
I've found that the people come from: {'Leeds', 'Inverness', 'London', 'Chichester', 'Glasgow'}.




----
safeguarding@decoded.com

---

## Part 1: Basic Syntax

## Dictionary

In [51]:
record = {
    "TitleOfBook": "Contents",
    
}

### Lists

Lists are **defined** with square brackets,

In [50]:
names = ["Michael", "Sherlock"] # [  ,  ]

Q. ...where else do we use square brackets?

In [53]:
names[0] #  variable_name[  index   ] 

'Michael'

Q. How do you add elements to a list?

In [56]:
names.extend([
    "Watson", 
    "Irene"
])

In [57]:
names

['Michael', 'Sherlock', 'Watson', 'Irene', 'Watson', 'Irene']

In [59]:
names.append("Hound")

In [60]:
names

['Michael', 'Sherlock', 'Watson', 'Irene', 'Watson', 'Irene', 'Hound']

Q. How do you add an element *at a position*?

In [67]:
names.insert(0, "Alice")

In [68]:
names

['Alice',
 'Alice',
 'Alice',
 'Alice',
 'Alice',
 'Michael',
 'Sherlock',
 'Watson',
 'Irene',
 'Watson',
 'Irene',
 'Hound']

Q. How do you count the number of entires which match a given value?

In [69]:
names.count("Alice")

5

In the syntax `o.m()` we refer to the dataset `o` as an *object* and the operation which is applied to the data, a *method*. 

---

### Syntax Errors

##### Equality & Assignment

* $ 3 = 1 + 4 $
* $1 + 4 = 3$

Q. What is the symbol for equality?

In [70]:
name = "Michael"

In [71]:
"Michael" = name

SyntaxError: cannot assign to literal (2134918280.py, line 1)

In [72]:
name == "Michael"

True

##### Quotes

In [73]:
"helllo"

'helllo'

In [74]:
'the same thing'

'the same thing'

In [75]:
"that's all folks!"

"that's all folks!"

In [76]:
# 'that's all folks!'

In [78]:
print(""" 
write 
on multiple
lines
""")

 
write 
on multiple
lines



##### Brackets

* `()` parentheses
* `[]` square brackets
* `{}` braces

When asking for help, we dont run the function, we just use the name,

In [81]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [79]:
len("Michael")

7

Mathematical grouping,

In [83]:
2 + 1*3

5

In [82]:
(2 + 1) * 3

9

### Comparisons & Tests

In [84]:
age = 33

Q. What would I write to determine if i were: an adult?

In [87]:
age >= 18 # symbols

True

Q. How would I determine if my name starts with an `M` ?

In [88]:
name.startswith("M") # method

True

In [89]:
"Michael".endswith("l")

True

Q. How would I determine if my age is an odd number?

In [97]:
age % 2 == 0 #even

False

In [98]:
age % 2 != 0 #odd

True

In [100]:
not ( age % 2 == 0 )

True

### Logical Combinations of Comparisons (Tests)

* calculate
* logic
    * and
    * not 
    * or

We can ask for *both* conditions to be true,

In [103]:
(age < 30) and (name.startswith("M"))

False

Or either,

In [105]:
(age < 30) or (name.startswith("M"))

True

Or that they arent true,

In [107]:
not ( (age < 30) or (name.startswith("M")) )

False

In [108]:
not (age > 18)

False

### Decision-Making

Q. What are the other keywords I can use to specialize or expand an `if`?

three keywords that enable us to define decision-making...

```python
if test1:
    action1 # GROUP
    action2 # GROUP
    action3 # GROUP
elif test2:
    action4
    action5
elif test3:
    ...
else:
    action99
```

Q. why, in python, do we indent code?

Only one group of statements is performed, statements are *grouped* using indentation *whitespace*,

Note, `else` and other clauses are optional,

In [111]:
british_sayings

['meh',
 'Nice!',
 'Bit warm!',
 'blooming lovely!',
 'Its Hot!',
 'Its well Hot!',
 'costa del london!']

In [112]:
locs

['London', 'Leeds', 'Glasgow', 'Inverness', 'London', 'Chichester', 'London']

In [116]:
#   meh....
#        False
#                                       London  ...
#                                               True
#   False and True
#   False
if british_sayings[0].endswith("!") or locs[0] == "London":
    print("AWWWRIGHT M8")
    print("AWWWRIGHT M8")
    print("AWWWRIGHT M8")
elif british_sayings[1].endswith("!"): # True
    print(locs[1]) # second
else:
    print("Hmm, how'dy partner!")

AWWWRIGHT M8
AWWWRIGHT M8
AWWWRIGHT M8


Q. using logical keywords only (not, and, or) how could i modify the first condition so that it passe?

In [117]:
name = # "Michael"

SyntaxError: invalid syntax (35455971.py, line 1)

In [118]:
# on their own line
name = "Michael" # end of line

---

### Challenges 

* copy the code to your own notebook
    * and complete the challenge

#### Challenge 1a. Syntax Errors

Fix the syntax in the following example,

In [93]:
ages = [12, 41, 61 18)

print(len[ages])
print(ages[-1])

SyntaxError: invalid syntax (924352493.py, line 1)

**Solution**

In [None]:
# SOLUTION

In [119]:
ages = [12, 41, 61, 18]

print(len(ages)) # parentheses
print(ages[-1])

4
18


#### Challenge 1b. Comparisons & Tests: Odd Numbers

In [121]:
prices = [1, 3, 4, 5, 9]

Select the first price,

In [122]:
# SOLUTION

In [123]:
prices[0]

1

Determine if the price is odd

In [None]:
# SOLUTION

In [127]:
prices[0] %2 != 0

True

In [125]:
prices[0] %2 == 1

True

#### Challenge 1c. Comparisons & Tests: Prefixes

In [128]:
address = "London, United Kingdom"

Using `.startswith` and `.endswith`, determine if the address starts with "London" and ends with "Texas",

In [129]:
# SOLUTION

In [133]:
address.startswith("London")   and    address.endswith("Texas")

False

#### Challenge 1d. Decision-Making

* If the address starts with "London" *and* ends with "United Kingdom", print "Welcome to the UK capital!".
* Otherwise if it ends with "Texas", print "Welcome to Kimble!"
* Otherwise, print "Hi, wherever you are!"

In [134]:
# SOLUTION

In [135]:
if address.startswith("London")   and    address.endswith("United Kingdom"):
    print("Welcome to the UK")
elif address.endswith("Texas"):
    print("...")
else:
    print("...")

Welcome to the UK


If you get a working if/else, great... the actual solution doesnt matter,

#### Challenge 1e. STRETCH CHALLENGE

If you finish the above challenges quickly, modify your summary report (completed earlier) and use various tests and decision-making syntax (if, else, etc.) to expand the report. (Eg., test the means, modes, ...)

---

## Part 2: Looping 

In [141]:
locs

['London', 'Leeds', 'Glasgow', 'Inverness', 'London', 'Chichester', 'London']

#### Looping

Q. How do I apply an operation to *every* entry? (What's the *keyword*...)

In [142]:
for l in locs:
    print(l)

London
Leeds
Glasgow
Inverness
London
Chichester
London


Recall, indentation is important.

### Reading Loops

How do you *read* this syntax? What does it do?

```
`for` each entry, call it `l`, `in` the `locs` list:
    REPEAT:
        print(l)

```

```python


for l in locs:
    print(l)
    
    
```

#### Looping & Filtering

Q. How do I filter elements of data in a dataset, as I loop over them?

In [150]:
locs

['London', 'Leeds', 'Glasgow', 'Inverness', 'London', 'Chichester', 'London']

In [153]:
"London".upper()

'LONDON'

In [156]:
"london".upper().startswith("L")

True

In [146]:
for l in locs:
    if l.startswith("L"):
        print(l)

London
Leeds
London
London


In [160]:
(2 * 3) + 7

13

In [161]:
for l in locs:
    if l.lower().startswith("l"):
        print(l)

London
Leeds
London
London


In [164]:
"London".startswith("L")

True

In [166]:
("Michael".lower()).startswith("m")

True

### Further Examples of Looping: Counting

---

### Challenges

#### Challenge 2a. Looping: Even Numbers

* define `dinner_guests` in which each entry represents a group arriving for dinner at a restaurant
* print() each number which is even
    * aside: why would a restaurant want to know which groups were even?
    * possible answer: seating, table arragement, etc.

In [107]:
# SOLUTION

In [137]:
dinner_guests = [2, 3, 4, 5]


# guest = dinner_guests[0]
#  print(guest)

# guest = dinner_guests[1]
#  print(guest)

for guest in dinner_guests:
    print(guest)

2
3
4
5


In [139]:
dinner_guests = [2, 3, 4, 5]

#          =
for guest in dinner_guests:
    if guest % 2 == 0:
        print(guest)

2
4


In [140]:
for x in [1, 2, 3]:
    print(x)

1
2
3


#### Challenge 2b. Looping: Averages
* compute the total of all the numbers **using a loop**
* report the total and the mean after the loop
    * using `len()` with the total


In [175]:
total = 0

for guest in dinner_guests:
    #NEW LHS:  set total to be 
    #OLD RHS:       get total add guest
    total = total + guest 
    
total/len(dinner_guests)

3.5

In [110]:
# SOLUTION

#### Challenge 2c. Looping: Filtering
* compute a total *of just* the odd numbers
* compute a total *of just* the even numbers
* report their totals seperately

In [110]:
# SOLUTION

In [177]:
dinner_guests

[2, 3, 4, 5]

In [176]:
total_even = 0
total_odd  = 0

for guest in dinner_guests:
    if guest % 2 == 0:
        total_even = total_even + guest
    else:
        total_odd = total_odd + guest

print(total_even, total_odd)

6 8


#### Challenge 2d. STRETCH: Looping: Maximums & Minimums

If you get the above challenges done quickly, find the maximum and minimum group size using a loop (ie., not using `min`, `max`, etc.).

HINT: You'll need a variable for each defined before the loop (eg., `min_entry`, `max_entry`) and these should be initialized to a useful starting value...

In [110]:
# SOLUTION

---

## Part 3: Advanced Syntax

### Error Handling

**Aside:** Error handling can get more complex and specailized. Eg., it's possible to handle different types of errors, *differently*,

---

### Functions

Q. How do we define our own functions? (What's the keyword..)

Q. What do we put between the parenthese?

*Defining* a function does not mean *running* any operation. 

When we define a function we're provide a *template* of an algorithm,

Functions can *return* values: they can place values *in-memory* when the function is finished running. 

*printing* means *outputing to the screen* and **never places values back into memory**,

Q. How do we *return* a value? How do we modify the defintion, 

#### Aside: scope

*Scope* is the visible area/region of the code where a variable is defined. Each function is its own *scope* meaning it's a self-contained world of variables, which won't "leak out". Variables defined inside functions, stay in functions. 

Why can't I just define variables I want to save inside functions?

### Challenges 

#### Challenge 3a: Functions

* define a summary function 
    * which requires two *arguments*: numerical, categorical
    * and prints a statistical summary of both

In [245]:
# SOLUTION

#### Challenge 3b: Error Handling: Conversions

* modify your above code to `try` to perform the summary functions which are likely to fail
    * eg., `mode`
    * rather than fail, just print "could not perform" that action

In [244]:
# SOLUTION

---

## Review & End

* starting place with python
    * basics
* takes a long time to learn
    * keep practicing

---

# STRETCH

### Comprehensions

It's very common to derive a new list from an old one,

In [166]:
old = [2, 4, 6]
new = []

for element in old:
    new.append(element + 1)

new

[3, 5, 7]

In python *comprehensions* allow you to do this is one go,

In [167]:
[e + 1 for e in old]

[3, 5, 7]

#### Challenge: Comprehensions

Consider the following comprehension, which computes a predictor variable $y$ from an $x$ drawn from the range $0 \text{ to } 10$,

In [103]:
y = [2 * x + 1 for x in range(0, 10)]
y

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Rather than using `range(0,10)`, use your own dataset,

* define a variable called, eg., `heart_rates`, to be a numerical list
* replace `range(0, 10)` in the example with your variable (ie., `heart_rates`)

In [106]:
# SOLUTION

Suppose you want to ignore some data in your list based on a test. Consider the following,

In [104]:
y = [2 * x + 1 for x in range(0, 10) if x > 5]
y

[13, 15, 17, 19]

Filter your own dataset,
* replace `range(0, 10)` 
* and the test `x>5` 
* to create a `y` more meaningful to your case.

Eg., suppose we ignore HRs `<30, and >200` on the basis that they arent likely to be real data.

In [105]:
# SOLUTION

---

## Appendix

### Terminology

#### Syntax
* `def` define *a function*
    * *define* to mean either assign a variable `=`, or anything else
* keywords `for`, `while`
* operators `+=`
* variables `name`
* functions `names()` 
    * these are programmer-given names

### Help

In [211]:
help(statistics.mean)

Help on function mean in module statistics:

mean(data)
    Return the sample arithmetic mean of data.
    
    >>> mean([1, 2, 3, 4, 4])
    2.8
    
    >>> from fractions import Fraction as F
    >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
    Fraction(13, 21)
    
    >>> from decimal import Decimal as D
    >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
    Decimal('0.5625')
    
    If ``data`` is empty, StatisticsError will be raised.



---