# Working with Data: Introduction to Python
<font color=indigo>Live Workshop</font>

---

* This document is a technical brainstorm (like whiteboarding with code). 
    * I'll give you this and any other assets over the break. 
    * No need to follow along live.
* If you don't have access to a jupyter installation, you can try the following link, 
    * https://jupyterlite.readthedocs.io/en/latest/_static/retro/notebooks/index.html
    * Press `File > New`
    * **this may take a few minutes to load**
---
 

## Lesson Plan

* <font color=green>Introductions (20 min)</font>
* <font color=green>Part 0: Review (30 min)</font>
    * Python Warm Up (15 min)
    * <font color=blue>Individual Challenge (15 min)</font>
* <font color=green>Break & Admin (20 min)</font>
    * Our Values, Learning Objectives, Menti Self-Assess, ...
* <font color=green>Part 1: Basic Syntax (40 min)</font>
    * Discussion (20 min)
    * <font color=blue>Individual Challenges (20 min)</font>
* <font color=green>Part 2: Looping (40 min)</font>
    * Discussion (10 min)
    * <font color=blue>Group Challenges (30 min)</font>
* <font color=green>Break (10min)</font>
* <font color=green>Part 3: Advanced Syntax (40 min)</font>
    * Discussion (20 min)
    * <font color=blue>Individual Challenges (20 min)</font>
* <font color=green>Reivew & End (>5 min)</font>

## Introductions
* Name
* Background/Role
* Prior Experience
* Hobby

---

## Part 0: Review

### Variables & Operations

Q. What symbol *assigns* a value to a variable?

Q. What's the type of this data?

### Printing

**Aside**: You can include the value of variables using `{}` (braces) if you prefix the *string* with an `f` symbol,

...`f` for *format*

### Data Structures & Conversions

Q. What's the type of `hobbies`?

Q. What index is first?

### Importing

Q. What keyword *imports* a module?

Q. How would I run the `mean` function ( defined *inside* the `statistics` module)?

### Summarizing Functions

Q. What built-in functions would help me describe a data-set?

---

### Challenges (<= 15min)

#### 0. Summarizing Data 

* Create a new notebook in your jupyter application
* Consider a problem of your own choosing
    * eg., in retail, finance, health, ....
    * consider a numerical dataset (ie., a quantitative variable = float)
        * define a variable which is a list of numbers with **7** entries
    * consider a categorical dataset (ie., a qualitative variable = string)
        * define a variable which is a list of string **7** entries
* Run the above summary functions on your lists
    * STRETCH: neatly print out the results

In [111]:
# SOLUTION

    
#### Solution 0. Summarizing Data

---

## Part 1: Basic Syntax

### Lists

Lists are defined with square brackets,

indexes start at zero and increase by one,

We can also go backwards through a list, *ending* at `-1`, (end page: 2, also: -1)

Q. How do you add elements to a list?

Q. How do you add an element *at a position*?

Q. How do you count the number of entires which match a given value?

In the syntax `o.m()` we refer to the dataset `o` as an *object* and the operation which is applied to the data, a *method*. 

### Syntax Errors

##### Equality & Assignment

Q. What is the symbol for equality?

##### Quotes

##### Brackets

* `()` parentheses
* `[]` square brackets
* `{}` braces

Q. Rather than *parentheses*, we should use..?

### Comparisons & Tests

Q. What would I write to determine if i were: an adult?

Q. How would I determine if my name starts with an `M` ?

Q. How would I determine if my age is an odd number?

### Logical Combinations of Comparisons (Tests)

We can ask for *both* conditions to be true,

Or either,

Or that they arent true,

### Decision-Making

```python
if ...:
    action
```

Q. What are the other keywords I can use to specialize or expand an `if`?

Only one group of statements is performed, statements are *grouped* using indentation *whitespace*,

Note, `else` and other clauses are optional,

---

### Challenges 

* copy the code to your own notebook
    * and complete the challenge

#### Challenge 1a. Syntax Errors

Fix the syntax in the following example,

In [93]:
ages = [12, 41, 61 18)

print(len[ages])
print(ages[-1])

SyntaxError: invalid syntax (924352493.py, line 1)

**Solution**

In [1]:
# SOLUTION

#### Challenge 1e. STRETCH CHALLENGE

If you finish the above challenges quickly, modify your summary report (completed earlier) and use various tests and decision-making syntax (if, else, etc.) to expand the report. (Eg., test the means, modes, ...)

---

## Part 2: Looping 

#### Looping

Q. How do I apply an operation to *every* entry? (What's the *keyword*...)

Recall, indentation is important.

### Reading Loops

How do you *read* this syntax? What does it do?

#### Looping & Filtering

Q. How do I filter elements of data in a dataset, as I loop over them?

### Further Examples of Looping: Counting

---

### Challenges

#### Challenge 2a. Looping: Even Numbers

* define `dinner_guests` in which each entry represents a group arriving for dinner at a restaurant
* print() each number which is even
    * aside: why would a restaurant want to know which groups were even?
    * possible answer: seating, table arragement, etc.

In [107]:
# SOLUTION

#### Challenge 2b. Looping: Averages
* compute the total of all the numbers using a loop
* report the total and the mean after the loop
    * using `len()` with the total


In [110]:
# SOLUTION

#### Challenge 2c. Looping: Filtering
* compute a total *of just* the odd numbers
* compute a total *of just* the even numbers
* report their totals seperately

In [110]:
# SOLUTION

#### Challenge 2d. STRETCH: Looping: Maximums & Minimums

If you get the above challenges done quickly, find the maximum and minimum group size using a loop (ie., not using `min`, `max`, etc.).

HINT: You'll need a variable for each defined before the loop (eg., `min_entry`, `max_entry`) and these should be initialized to a useful starting value...

In [110]:
# SOLUTION

---

## Part 3: Advanced Syntax

### Error Handling

**Aside:** Error handling can get more complex and specailized. Eg., it's possible to handle different types of errors, *differently*,

---

### Functions

Q. How do we define our own functions? (What's the keyword..)

Q. What do we put between the parenthese?

*Defining* a function does not mean *running* any operation. 

When we define a function we're provide a *template* of an algorithm,

Functions can *return* values: they can place values *in-memory* when the function is finished running. 

*printing* means *outputing to the screen* and **never places values back into memory**,

Q. How do we *return* a value? How do we modify the defintion, 

#### Aside: scope

*Scope* is the visible area/region of the code where a variable is defined. Each function is its own *scope* meaning it's a self-contained world of variables, which won't "leak out". Variables defined inside functions, stay in functions. 

Why can't I just define variables I want to save inside functions?

### Challenges 

#### Challenge 3a: Functions

* define a summary function 
    * which requires two *arguments*: numerical, categorical
    * and prints a statistical summary of both

In [245]:
# SOLUTION

#### Challenge 3b: Error Handling: Conversions

* modify your above code to `try` to perform the summary functions which are likely to fail
    * eg., `mode`
    * rather than fail, just print "could not perform" that action

In [244]:
# SOLUTION

---

## Review & End

* starting place with python
    * basics
* takes a long time to learn
    * keep practicing

---

# STRETCH

### Comprehensions

It's very common to derive a new list from an old one,

In [166]:
old = [2, 4, 6]
new = []

for element in old:
    new.append(element + 1)

new

[3, 5, 7]

In python *comprehensions* allow you to do this is one go,

In [167]:
[e + 1 for e in old]

[3, 5, 7]

#### Challenge: Comprehensions

Consider the following comprehension, which computes a predictor variable $y$ from an $x$ drawn from the range $0 \text{ to } 10$,

In [103]:
y = [2 * x + 1 for x in range(0, 10)]
y

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Rather than using `range(0,10)`, use your own dataset,

* define a variable called, eg., `heart_rates`, to be a numerical list
* replace `range(0, 10)` in the example with your variable (ie., `heart_rates`)

In [106]:
# SOLUTION

Suppose you want to ignore some data in your list based on a test. Consider the following,

In [104]:
y = [2 * x + 1 for x in range(0, 10) if x > 5]
y

[13, 15, 17, 19]

Filter your own dataset,
* replace `range(0, 10)` 
* and the test `x>5` 
* to create a `y` more meaningful to your case.

Eg., suppose we ignore HRs `<30, and >200` on the basis that they arent likely to be real data.

In [105]:
# SOLUTION

---

## Appendix

### Terminology

#### Syntax
* `def` define *a function*
    * *define* to mean either assign a variable `=`, or anything else
* keywords `for`, `while`
* operators `+=`
* variables `name`
* functions `names()` 
    * these are programmer-given names

### Help

In [211]:
help(statistics.mean)

Help on function mean in module statistics:

mean(data)
    Return the sample arithmetic mean of data.
    
    >>> mean([1, 2, 3, 4, 4])
    2.8
    
    >>> from fractions import Fraction as F
    >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
    Fraction(13, 21)
    
    >>> from decimal import Decimal as D
    >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
    Decimal('0.5625')
    
    If ``data`` is empty, StatisticsError will be raised.



---