## Data, Analytics &amp; AI 
# <font color=indigo> Programming Principles</font>


---
<small>QA Ltd. owns the copyright and other intellectual property rights of this material and asserts its moral rights as the author. All rights reserved.</small>

### Overview: Programming Principles (2 hr)
* How do you structure code for business applications?
    * What is a Business Object?
    * How do `class`es express business objects?
    * How does inhertiance express business categories?
    * How does polymorphism provide consistency of behaviour across objects?
* How do you structure code for data applications?
    * What is functional programming?
    * How do functions express data transformations?
    * What are the primitive functional operations?
        * How do I write and use comprehensions?
        * How do I use `map`, `filter`, `reduce`?
        * How do I use generators and lazy data structures?
* EXTRA: How do you write and profile algorithms?
    * What is an algorithm and procedure?
    * How do you state the performance of algorithms?
        * What is time complexity?
        * What is space complexity?
* EXERCISE
    * Compare and critique an object-oriented and functional program
    * Extend both programs to include additional functionality
    

#### Schedule
* Overall time: 1.5 hr
    * Lesson: 1 hr
    * Lab: 0.5 hr 
    
#### References & Next Steps
* Books
* Videos

## Lesson Plan

* TODO

---

## What are the limitations of basic functions?

All the functions below use the same datasets, and we keep track of the functions *separetely* from the data. 

In [7]:
my_journal = "I am feeling happy today!"

happy_words = ['happy', 'joyful']
sad_words = ['sad', 'upset']
ignore_characters = "!"

In [8]:
def journal_words(journal):
    return journal.split()

In [9]:
def journal_process(journal, ignore):
    return journal.lower().replace(ignore, "")

In [10]:
def journal_advice(journal, happy, sad):
    emotion = 0

    for word in journal:
        if word in happy:
            emotion += 1 # add 1 to emotion
        elif word in sad:
            emotion -= 1 # subtract 1 from emotion
      
    if emotion >= 0:
        print("Well done!")
    else:
        print("Sorry to hear that!")
        

In [11]:
j_processed = journal_process(my_journal, ignore_characters)
j_words = journal_words(j_processed)

journal_advice(j_words, happy_words, sad_words)

Well done!


In [13]:
## data set
my_journal = "I am feeling Happy! today!"
happy_words = ['happy', 'joyful']
sad_words = ['sad', 'upset']
ignore_words = ['i', 'am']

## it possible to change the data above before they are used
sad_words.append("happy") # MISTAKE!

## use the functions 
j_processed = journal_process(my_journal, ignore_characters)
j_words     = journal_words(j_processed)

journal_advice(j_words, happy_words, sad_words)

Well done!


Using this system of grouping data and then grouping functions it is possible to modify the data, before the functions use it, in a way that breaks the functions. 

It's also not clear what all the data *required* is... this is spread out across all the functions. 

## How does python solve this problem?

Notice that python has two conventions for running a function on data:

* function convention
* method convention

In [45]:
len("Michael")

7

In [46]:
"Michael".upper()

'MICHAEL'

...the situations for using the first kind are mostly historical/conventional. 

In [47]:
"Michael".upper()

'MICHAEL'

In [48]:
"Michael".lower()

'michael'

In [50]:
"Michael".replace("M", "")

'ichael'

Notice here, "Michael" is like a library, 

In [51]:
import statistics

statistics.mean([1, 2, 3])

2

The data "Michael" works like a library in the sense that it "provides" different functions after the `.`, all of which *operate* on the same data, `"Michael"`.

In [52]:
# ask "Michael" to make itself UpperCase
"Michael".upper()

'MICHAEL'

This bundle combines data and behaviour together, and we can access parts of the bundle with `.`. 

This "bundle" is technically called **an object**. And it's behaviours, eg., `s.upper()`, are called **methods**. 

(NB., a method is basically just the same thing as a function, but we use a different term when we are using an objec, ie., `.`). 

## How do I group data and behaviour together?

In the design of our programs so far, we have used the function style, ie., `len(x)`, `print(x)`... how do we bundle functions together so we could have `journal.get()`, `journal.advice()`, etc. 

## What does an object-oriented program look like?

In [29]:
# the name of the kind (, type) of data we are dealing with
class Journal:
    # defining the data that we need
    def __init__(self, message, positive_words, negative_words):
        self.message = message 
        self.positive = positive_words
        self.negative = negative_words
        
    # grouping together all the behaviour we need
    def process(self):
        print("process")
        
    def words(self):
        print("words")
        
    def advice(self):
        print("advice")
        
        

In [30]:
# my_thoughts <-   group all datasets together
my_thoughts = Journal("I am feeling upset!", ["yay", "wow"], ["upset", "boo"])

# on dataset run process, ignore, advice
my_thoughts.process()
my_thoughts.words()
my_thoughts.advice()

process
words
advice


Comparing the above with below, observe that below there are lots of variables; whereas above, we use one `my_thoughts` which groups everything together.

And that each operation uses `my_thoughts` as the dataset, and we do not need to pass anything to each function (/method). And note that these behaviours are grouped after the `.`. 

In [23]:
j_processed = journal_process(my_journal, ignore_characters)
j_words     = journal_words(j_processed)

journal_advice(j_words, happy_words, sad_words)

Well done!


We call this `.`-heavy style of programming "object-oriented" because it is oriented around (an) object.

## When is this style of programming relevant?

Certainly, on large projects, 10,000 lines and beyond. 

On a small data analysis project you probably do not need to be able to create your own objects, ie., to define classes. 

However as python is an object-oriented language, everything is provided in this "grouped" way... so you need to understand some minimal amount -- in order to understand documentation, work with libraries etc. 

Eg., imagine you were given `Journal` above: could you use it effectively?


NB. This is really just a syntax for thinking about how to design a larger program. However it is very common because the syntax is quite helpful for thinking about design. 

In [None]:
# the name of the kind (, type) of data we are dealing with
class Journal:
    # defining the data that we need
    def __init__(self, message, positive_words, negative_words):
        self.message = message 
        self.positive = positive_words
        self.negative = negative_words
        
    # grouping together all the behaviour we need
    def process(self):
        print("processing")
        
    def ignore(self):
        print("ignoring")
        
    def advice(self):
        print("advicing")
        

In [1]:
# my_thoughts <-   group all datasets together
my_thoughts = Journal("I am feeling upset!", ["yay", "wow"], ["upset", "boo"])

# on dataset run process, ignore, advice
my_thoughts.process()
my_thoughts.ignore()
my_thoughts.advice()

processing
ignoring
advicing


The aim of this section is to understand *how* we get to an object-oriented program. 

## What is a class definition?

A class defintion defines a specification or "template" for creating an object. 

In [2]:
class Journal:
    pass

## How does a class create objects?

Once a template is defined, we can use its name (eg., `Journal()`) to create a new object which matches that tempalte.

In [3]:
j = Journal()

## How do you set fields (attributes) on an object?

In [4]:
j1 = Journal()
j2 = Journal()

In [7]:
j1.name = "Michael's Journal"
j1.page = "I am feeling happy!"

j2.name = "Alice's Journal"

In [6]:
j1.name

"Michael's Journal"

In [8]:
j1.page

'I am feeling happy!'

## How do you define an automatic initializer?

In general we almost always want every object of the same type to have the same fields.

We can use an initializer (often called a "constructor") to do this for us:

In [15]:
class Journal:
    def __init__(new_journal,  name_arg, page_arg):
        new_journal.name = name_arg
        new_journal.page = page_arg

When you create a new object (eg., using `Journal()`) python will automatically call the `__init__` for you. 

Python will pass each argument for you... the first argument is always passed implicitly by python, this is the new journal we are creating. Each additional argument corresponds to the arguments you give `Journal()`. 

In [16]:
j = Journal("Michael's Journal", "I'm feeling happy!")

In [17]:
print(j.name, ':', j.page)

Michael's Journal : I'm feeling happy!


## How do I think about designing a class using `init`?

When you're thinking about some object your applicaiton needs to solve a problem, each *part* (or field) of that object is captured by the initializer method.

eg., 
* Cat 
    * paws
    * tail
    
```python

class Cat:
    def __init__(newcat, paws, tail):
        newcat.paws = paws
        newcat.tail = tail
        
```
    
    

### the `self` convention

In python every method which works on object receives the object *as the first argument*, by convention, called `self`.

In [22]:
"Michael".upper()

'MICHAEL'

Compare with:

In [24]:
self = "Micahel"
str.upper(self)

'MICAHEL'

In [21]:
class Journal:
    def __init__(self,  name_arg, page_arg):
        self.name = name_arg
        self.page = page_arg
        
j1 = Journal("Michael's", "happy!") # self = j1
j2 = Journal("Even's", "sad!") # self = j2

In [19]:
j1.name

"Michael's"

In [20]:
j2.name

"Even's"

## How do you define additional behaviour?

You can add additional behaviour to templates:

In [25]:
class Journal:
    def __init__(self,  name_arg, page_arg):
        self.name = name_arg
        self.page = page_arg
        
    def words(self):
        return self.page.split()

In [39]:
j1 = Journal("Micahel's Journal", "I am happy")
j2 = Journal("Alice's Journal", "I am sad!")

In [40]:
j1.page

'I am happy'

In [27]:
j1.page.split()

['I', 'am', 'happy']

Consider `j1.page.split()` defined as `self.page.split()`:

In [28]:
j1.words()

['I', 'am', 'happy']

And now, on `j2` we see `self` has become `j2`:

In [41]:
j2.words()

['I', 'am', 'sad!']

## How do you read a class definition?

In [44]:
# every Journal object
class Journal:
    # automatically
    def __init__(self,  name_arg, page_arg):
        # has the fields
        self.name = name_arg # name
        self.page = page_arg # page
        
    # ...and can...
    
    # provide its page's words
    def words(self):
        return self.page.split()

## How do you design with this system?

* The __init__ : What attributes does a person have?
* The methods:  What can a person do / what can we do with a person?

```
* Person
    * has:
        * hr
        * bp
    * can:
        * predict_prognosis()
```


In [36]:
class Person:
    def __init__(self, hr, bp):
        self.heart_rate = hr
        self.blood_pressure = bp
        
    def predict_prognosis(self):
        return 0.1 * self.heart_rate + 0.1 * self.blood_pressure + 5
    

In [37]:
me = Person(60, 120)
me.predict_prognosis()

23.0

In [38]:
me = Person(160, 190)
me.predict_prognosis()

40.0

# Functional Programming

## Overview

* What is a programming paradigm?
    * What is OO?
    * What is functional programming?
* What are functions (vs., procedures)?
* How do I use...
    * functions
    * functions as arguments
    * functions as transforms
    * comprehensions
* Why is functional programming useful?

## What is a Programming Paradigm?

def., a way of thinking about how we write code...

How do I translate a problem domain into an application domain?

Problem Domain:
* concepts
* processes
* objects
* ...
    
Application Domain:
* loops, if/else
* class, def...

### Classical Object-Orientation

* What are the concrete objects in our problem?
    * Object -> Object
* What do they have in common? (class)
    * What fixed properties are in common? (self.attribute)
    * What behaviours are in common? (self.behaviour())
* How do they differ?
    * Are these differences *specializations*?
        * inheritance & polymorphism

* Application Domain:
    * `class`, `obj.name`, `obj.say()`
    * `class Person(Worker):`
    

### Functional Programming

We use *functions* as the main element of our program. A function is a `def` *which only* returns. OR: its an operation which takes *some input* to *some output* without doing anything else. 

* IN -> f() -> OUT

* What are the *observations* (, examples) in our problem -- what is the *data **Set** *?
* What datasets do we start with?
* What datasets do we want to get to?
* What processes are involved in this transformation?
    * Can these processes be expressed as a simple sequence of transformations?
    
* Application Domain:
    * `def`, `return`
    * `[ ... ]`
    * `map`, `flatMap`, `reduce`, ...
    

### How do I choose?

Object-orientation is preffered by large scale software development projects, as *class* structure helps structure a large scale program. And, often, code is only dealing with on object at a time. (eg., a user) 

Functional programming is preffered by data applications, as we're often dealing with whole datasets and *thinking* in terms of how we can transform them. 

## The Syntax and Ideas of Functional Programming

### Functions vs. Procedures

Functions are not "functions".

"Function" tends to mean *any old behaviour*. In FP, *function* has a technical defintion which is... 

def., any operation which could be a mathematical formula 

def., pure transformation

def., pure *means* having no side-effects

def., side-effects are any effect of a function which cannot be caputred by its return value

### Procedure
A sequence of actions which effect the world, typically, by modifying a device (eg., screen/disk).

(NB. *world* here, roughly, means anything external to the running program).

In [1]:
def record_message(name):
    print(name, '!', file=open('messages', 'w'))

In [2]:
record_message("Michael")

In [3]:
print(open('messages').read())

Michael !



Consider replacing `record_message` with its return value, `None`...

In [11]:
result = record_message("MIchael")

In [10]:
result is None

True

You would change the behaviour of the program..

In [12]:
result = None

...this does not save anything to a file!

### Function
An input-output relationship, which may be *represented* as an algorithm which does not perform i/o or other noticable effects on the world. 

In [4]:
def reformat(messages):
    fmt = []
    for m in messages:
        fmt.append(m.upper() + "!")
    
    return fmt

In [6]:
old = ["Michael", "Adrian", "Tina"]
new = reformat(old)

In [7]:
new

['MICHAEL!', 'ADRIAN!', 'TINA!']

Consider replacing `reformat` with its return value...

In [13]:
old = ["Michael", "Adrian", "Tina"]
new = ['MICHAEL!', 'ADRIAN!', 'TINA!']

No difference...

In [14]:
old, new

(['Michael', 'Adrian', 'Tina'], ['MICHAEL!', 'ADRIAN!', 'TINA!'])

In mathematics *function* is just a relationship between input and output... (aka. old -> new)

$f(x_1, x_2) = x_1 + x_2$ 

In [15]:
def f(x1, x2):
    return x1 + x2

In [16]:
f_relationship = {
    (0, 0): 0,
    (0, 1): 1,
    (1, 0): 1,
    (1, 1): 2,
    #...
}

In [17]:
f(1, 1)

2

In [18]:
f_relationship[(1, 1)]

2

We can see `reformat` the same way:

In [1]:
reformat_relationship = {
    "Michael" : "MICHAEL!",
    "Adrian": "ADRIAN!",
    #etc.
}

In [2]:
reformat_relationship["Michael"]

'MICHAEL!'

### Why is this defintion important?

Procedures are "genuine" operations *on the world* which change it in ways that are **hard** to understand. Most procedures are i/o, ie., they access and modify input-output devices (eg., hard disk, screen, printer...). 

Procedure, def., a sequence of actions *which change the world* which must come in a particular order. 

Consider... building a report:
* print() a page
* send() an email
* print() the email reply
* staple() everything together

Notice:
* any reordering is broken
* *code* doesnt really express this necessary order
    * nor the time each action (/procedure) takes
    
    

Contrast this with, eg., baking a cake:
* ASSUME: get_ingredients() <- actions / impure / procedures
* ingredients -> mixture -> prepare -> bake -> cool -> slice
* ASSUME: hand_slice() <- action

The "report building actions" are hard to reason about because their necessary order is not expressible in code, and they modify the world outside code *which we cannot see in code*, nor be very sure works the way we think.

## Using Functions

#### A simple case

Sketching a functional program...

In [None]:
# ingredients : List of Strings
ingredients = ['flour', 'sugar', 'eggs']

# ingredients -> mixture 
# list of str -> list of int 
def mix(ing):
    pass

# mixture -> bowl 
#  list of int -> int
def prep(mixture):
    pass

# bowl -> cake
def bake(bowl):
    pass

# cake -> cool cake
def cool(cake):
    pass

# cool cake -> slice
def slice(cake):
    pass

In [3]:
print( # this is impure, it modifies the screen
    
    slice(cool(bake(prep(mix(ingredients))))) # pure, this is *the same as* its return value
    
)

None


In [1]:
# ingredients -> mixture
def mix(ing):
    amounts = []
    for i in ing:
        amounts.append(len(i))
        
    return amounts

# mixture -> bowl
def prep(mixture):
    return sum(mixture)

# bowl -> cake
def bake(bowl):
    return 0.8*bowl

# cake -> cool cake
def cool(cake):
    return 0.9*cake

# cool cake -> slice
def slice(cake):
    return cake/6.0

In [21]:
# ingredients : List of Strings
ingredients = ['flour', 'sugar', 'eggs']


print( # this is impure, it modifies the screen
    
    slice(cool(bake(prep(mix(ingredients))))) # pure, this is *the same as* its return value
    
)

1.6800000000000004


In [4]:
(0.9*0.8*sum( [len('flour'), len('sugar'), len('eggs')]))/6.0

1.6800000000000004

---

### Python Functional Data Processing

In base python we often want to repeat a process for each entry in a collection...

In [12]:
prices = [1, 4, 65, 13]

In [13]:
for p in prices:
    print(p)

1
4
65
13


When processing data, more specifically, we want to create a new collection *derived* from this existing one...

In [14]:
profits = []
profit_percent = 0.05

for p in prices:
    profits.append( p * profit_percent)

In [15]:
profits

[0.05, 0.2, 3.25, 0.65]

Above we understanding obtaining `profits` from `prices`, *algorithmically*: as a sequence of steps.

```
1. obtain first price
2. multiply it by a ratio
3. append it to a list
4. next price
5. ....
```

Thinking this way is useful for software engineering but not data analysis, which is more mathematical.

We would really like "multiply all prices by 0.05"... who cares how it's done. 

In [16]:
[ p * 0.05   for p in prices ]

[0.05, 0.2, 3.25, 0.65]

This syntax is called a "comprehension" and is essentially a python syntax for the SELECT sql command.

```python
    NEW_LIST = [  CHANGE(ELEMENT)  for ELEMENT in OLD_LIST ]
```

## How do I write a comprehension?

Comprehensions are often best written (and read) right-to-left, 

* write in the original collection
```python
[  ...  prices ]
```

* name each element
```python
[  ... for p in  prices ]
```

* write the "transformation" (ie., the operation which computes the *new* element)
```python
[  p * 0.05 for p in  prices ]
```

* NB. this does not change prices, just like SELECT, we get a new collection returned

**Good idea to start with a comprehension which doesn't change anything (ie., SELECT *)**

# <center>END</center>