# Lecture 3: OOP, classes, scripts and some other issues

* Everything is an object, so... everything!
* OOP
* Classes
* Some brief notes on other language features (extra material)
* Sets and choice of data structures
* Scripting
* Lab 5-information
* Presentation (and additional material) by Anders Märak Leffler
* Attribution: slightly extends work by Johan Falkenjack.
* License: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)

# Course status report

* Lab grading
* Labs online
* General comments
    * Style going forward.
    * `last_idx`

# Philosophy 101: OOP in principle

* Paradigms - ideals about how code should be structured (and execution models).
    * How do we choose to abstract the world?
    * Previous: Imperative.
    * Previous: Functional.

* Now: OOP (exercises in lab 5)
    * Rough resemblance between objects in the world, and the code.
    * Keeping related data and behaviour together.
        * Roughly: a bunch of data and the way they interact with the world.

## Some motivating examples

### Grouping related data together is useful

In [154]:
# Most extreme (somewhat strawmannish) contrast: entirely ungrouped values.

car_1_maker = "Volvo"
car_1_model = 240
car_1_colour = "black"
car_1_cost = 90000

car_2_maker = "Toyota"
car_2_model = "Camry"
car_2_colour = "red"
car_2_cost = 100000

# ...

# How do we pass a list of cars to a function which calculates eg if Toyotas are more expensive?

In [155]:
# Grouping data together. How?
# (car_1_maker, ...)
# Anything we've seen before?

# Simple class syntax.
class Car:
    pass

car1 = Car()
car2 = Car()
car1.maker = "volvo"
car1.cost  = 9000
car2.maker = "toyota"
car2.cost = 12333
print(car1.maker)

volvo


* Not only grouping, but now we can change the cost of a _specific_ car.

In [156]:
car1.cost = 123131231231231239999
car1.cost

123131231231231239999

Addendum: fewer issues when writing in Notebooks (did I re-run cell X last, or cell Y? What is "data" currently?).

## Defining classes in Python

### Writing the class definition in Python

In [157]:
# Defining the car class. With an initializer, which is run when you create the object.

"""
car_2_maker = "Toyota"
car_2_model = "Camry"
car_2_colour = "red"
car_2_cost = 100000
"""

#Car(string name, model_t model,...)
class Car:
    def __init__(self, maker, model, colour, cost):
        self.maker = maker
        self.model = model
        self.colour = colour
        self.cost = cost
        # additional setup calculations might be good here
        # function calls here OK
        
car1 = Car("saab", model = "9000", colour = "black", cost = 1)
car1.maker

'saab'

In [158]:
cars = [Car(maker = "Volvo", model = 240, colour = "black", cost = 90000), 
       Car(maker = "Toyota", model = "Camry", colour = "red", cost = 10000)]
cars

[<__main__.Car at 0x7f7c42f7b160>, <__main__.Car at 0x7f7c42f7bb38>]

In [159]:
# Adding how a car is presented.

class Car:
    def __init__(self, maker, model, colour, cost):
        self.maker = maker
        self.model = model
        self.colour = colour
        self.cost = cost
        
    def __repr__(self):
        return f"A {self.colour} {self.maker} of model {self.model}."

print(Car("saab", model = "9000", colour = "black", cost = 1))

A black saab of model 9000.


In [160]:
# With a data source such as an API connection which gives us data in a certain format, a CSV file handle,
# we can automate this.

# Nothing specific to OOP so far, but noteworthy feature.

some_magic_data_iterable = [("Volvo", "black", 240, 90000), ("Toyota", "red", "Camry", 10000)]

cars = [Car(maker = maker, colour = colour, model = model, cost = cost) 
        for maker, colour, model, cost in some_magic_data_iterable]

Notes:
* Couldn't we do this via some other abstraction (namedtuple, or tuples + a function which picks out the right parts, such as`get_maker(car) == car[0]`)...?
* Now we have objects with state. What about "behaviours"?
    * (Philosophical question for those so inclined: structs vs objects.)

### Idea: Objects behave as we tell them (acting on messages, the concept of interface)

* (Public) interface: What messages does an object accept, and what does it do or return?

* Example: What can you tell a list to do?

In [161]:
seq = [1,2,3]
other_seq = [4,5,6]
seq + other_seq    # get a new list

seq.append(999)    # Tell exactly the list called seq to append 999.
print(seq)
print(other_seq)

[1, 2, 3, 999]
[4, 5, 6]


## Ideal: Encapsulation. Objects should be isolated, and hide implementation

* Carry their own data, or references to where to get it. 
```
    classifier1 = NaiveProjection(training_data = ..., k = 3)   # Trained with max dimension k = 3
    classifier2 = NaiveProjection(training_data = ..., k = 15)
    # Now classifier1 and classifier2 have different data inside.    
```
* Carry their own behaviours.
    * (Corollary) Avoids dependence on other objects' implementation.
    * Ex: 
    
    ```
    classifier3 = SomeSmartMethod(training_data = ...)
    
    classifier_1.classification(image)  # uses one of the classifiers
    classifier_2.classification(image)  # uses the other
    classifier_3.classification(image)  # something else entirely - we only know that .classification will return a label!
    ```
    
    (This as opposed to `naive_projection_classification(training_data1, image), naive_projection_classification(training_data2, image), someother_classification(training_data3, image)` where the behaviour is in the function and the data somehwere else.)
        
[//]: <> ( `classify classifier_1, image`, ie an outside function which needs data from classifier_1.)

Corollary: other parts of the program shouldn't need to know a lot about _how_ your object does things. And if you change how it does things, their code shouldn't break.

# Classes in Python

* We use `class` to create classes.
    * `class Cat: ...` which defines what it means to be a `Cat` object).
    * This should be common to every single Cat.
* We call the classes to create *instances*. The `__init__` method is called.
    * `Cat(name = "Alonzo")` to create a cat instance.  This is just a value, a single cat.
    * Usually bind this to save value for later (`alonzo = Cat(name="Alonzo")`).

Design note: a particular cat is an instance, what is *common to all cats* belongs in the class.

## Creating simple classes and instances

In [162]:
# Defining what is common to all cats.

# Should have a name.
# Should have a description.
# Should be able to greet (print a greeting string).

# class Cat(object)
class Cat:
    # More data goes here!
    def __init__(self, name, description = "Adorable"):
        self.name = name
        self.description = description
        
    def greet(self):
        print(f"Hello, I am {self.name}.")
    
alonzo = Cat(name = "Alonzo the cat")
alonzo.greet()
dir(alonzo)

Hello, I am Alonzo the cat.


['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'greet',
 'name']

The _class_ is what is common for all cats, an _instance_ is a specific cat.

In [163]:
# Creating some instances.
alonzo = Cat(name = "Alonzo")
zeno = Cat(name = "Zeno")

# Calling a method (function attribute).
alonzo.greet()
zeno.greet()

Hello, I am Alonzo.
Hello, I am Zeno.


* Objects as namespaces. Which attributes does `alonzo` have above?

In [164]:
dir(alonzo)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'greet',
 'name']

* What is the `self`? Why do we pass it along?

In [165]:
# Calling member functions/methods and using the class.

alonzo.greet()
type(alonzo)        # returns Cat
Cat.greet(alonzo)   # alonzo is the self we get data from

Hello, I am Alonzo.
Hello, I am Alonzo.


In [166]:
def jama(self):
    print("Meow")
    
Cat.meow = jama
alonzo.meow()

Meow


In [167]:
dir(alonzo)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'greet',
 'meow',
 'name']

In [168]:
forex = Cat(name = "Forex the cat")
forex.meow()

Meow


In [169]:
forex.special_power = lambda self : 5
forex.special_power(123123123)
forex.__dict__  # What's inherent in this instance? Cf dir

{'name': 'Forex the cat',
 'description': 'Adorable',
 'special_power': <function __main__.<lambda>(self)>}

C++/Java users: _somewhat_ like `this`.

* Corollary: calling our own methods?

In [170]:
class Bat:
    def __init__(self, name = "Vlad"):
        self.name = name
        
    def spooky_name(self):
        return f"~~~~~~~~~~~~~~~{self.name}~~~~~~~~~~~~~~~"
    
    def greet(self):
        
        # To use this method we need to call another Bat-method.
        print(f"Hi, my name is.... {self.spooky_name()}!")

In [171]:
dracula = Bat()
dracula.spooky_name()

'~~~~~~~~~~~~~~~Vlad~~~~~~~~~~~~~~~'

In [172]:
dracula.greet()

Hi, my name is.... ~~~~~~~~~~~~~~~Vlad~~~~~~~~~~~~~~~!


C++/Java users: we can't expect objects to behave in the same way with respect to scopes. Good habit: `self.<method>`.

* Can we access and check for the presence of attributes in other ways than trying with .-access as above?

In [173]:
help(getattr)

Help on built-in function getattr in module builtins:

getattr(...)
    getattr(object, name[, default]) -> value
    
    Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.
    When a default argument is given, it is returned when the attribute doesn't
    exist; without it, an exception is raised in that case.



In [174]:
getattr(dracula, "name")   

'Vlad'

In [175]:
help(setattr)

Help on built-in function setattr in module builtins:

setattr(obj, name, value, /)
    Sets the named attribute on the given object to the specified value.
    
    setattr(x, 'y', v) is equivalent to ``x.y = v''



In [176]:
help(hasattr)

Help on built-in function hasattr in module builtins:

hasattr(obj, name, /)
    Return whether the object has an attribute with the given name.
    
    This is done by calling getattr(obj, name) and catching AttributeError.



[More to come.]

* We can generate "more of the same". A note on `type`.

In [177]:
type(alonzo) #  gives us the class/constructor!

__main__.Cat

In [178]:
# Accessing the class of an object in a different way.
alonzo.__class__

__main__.Cat

In [179]:
# Using it.
zeno = type(alonzo)(name = "Zeno the cat")
zeno

<__main__.Cat at 0x7f7c41718f98>

* Bonus: we can generate classes as well.

In [180]:
help(type)

Help on class type in module builtins:

class type(object)
 |  type(object_or_name, bases, dict)
 |  type(object) -> the object's type
 |  type(name, bases, dict) -> a new type
 |  
 |  Methods defined here:
 |  
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(...)
 |      __dir__() -> list
 |      specialized __dir__ implementation for types
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __instancecheck__(...)
 |      __instancecheck__() -> bool
 |      check if an object is an instance
 |  
 |  __new__(*args, **kwargs)
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __prepare__(...)
 |      __prepare__() -> dict
 |      used to create the namespace for the class statement
 |  
 

In [181]:
# Bonus: creating using type.

CC = type("CoolCat", (Cat, object), { "a" : 5, "zoo" : lambda self : self.a })   # More about the (Cat, ) later!
CC
tomomalley = CC( name = "Tom")
tomomalley.zoo()

5

(Can be used to create types on the fly.)

* Can we have several initialisers (like several constructors in C++)? [Without some fun-but-not-to-be-used trickery.]

In [182]:
# Not possible to have this type of polymorph. in Python!
def __init__(self, maker, model, ...):
    pass

def __init__(self, readymadecar):
    pass

# Only the last definition survives!

SyntaxError: invalid syntax (<ipython-input-182-94195139ae56>, line 2)

* What about destructors? (For those used to C++).
    * Main takeaway: **Garbage collected language! No "need" for delete, delete[], free,... due to memory.**.
    * But sometimes other kinds of cleanup (eg closing API connections) is useful.
    * If this is needed (not in this course!): the method `__del__`.

# Philosophy 201: has-a, is-a

## Inheritance: every _X_ is-a _Y_

* Conceptually: *every X is also a Y* (and can do everything it can).
    * Adding capabilities.
    * Specialisation.
* Operationally: add properties to the objects at the right level.
* Issue: how do we handle conflicts?
* Terminology: superclass/parent, subclass/child.

* Example: every dog is an animal. Superclass: Animal, subclass: Dog (or parent/child)

In [186]:
class Animal:
    
    def __init__(self, name, master):
        self.name = name
        self.master = master
    
    def scratch(self):
        print(f"I, {self.name}, scratch myself!")
        
    def greet(self):
        print(f"I am a generic animal!")
        
class Dog(Animal):
    def __init__(self, name):
        
        # The superclass doesn't initiate itself.
        # super().__init__(name = "Fenrisulven", master = "Loki")
        Animal.__init__(self, name = "Fenrisulven", master = "Loki")
        self.name = name
        
    def greet(self):
        print(f"Woof! I am {self.name}.")
    

In [187]:
# What happens here?
bella = Dog(name = "Bella the dog")
bella.greet()      # Crash? Succeed and print what?
bella.scratch()  # Crash? Succeed and print what?

Woof! I am Bella the dog.
I, Bella the dog, scratch myself!


In [188]:
# What happens when we uncomment the super bit?

bella.master


'Loki'

**Note**: `super()` is easy to use here, but check out the documentation if you have multiple inheritance!
Additional reading (for understanding, but don't convince yourself to believe the hype): [realpython: Supercharge Your Classes With Python super()](https://realpython.com/python-super/). Cf [Python's super considered harmful](https://fuhm.net/super-harmful/).

* Toy example above. Used in many cases, eg subclassing numpy arrays to change behaviour/add features.

### Issue: Multiple inheritance

* Every _X_ is also a _Y_, a _Z_, a...

In [189]:
class Animal:
    def __init__(self, name, master):
        self.name = name
        self.master = master
    
    def scratch(self):
        print(f"I, {self.name}, scratch myself!")
        
    def greet(self):
        print(f"I am a generic animal!")
        
class SuperHero:
    def __init__(self, name):
        self.name = name
        
    def fly(self):
        print("I am flying!")
        
    def greet(self):
        print(f"With great power comes great responsibility.")

        
# Tigers should be *both* Animals and Superheroes.

class Tiger(Animal, SuperHero):
    def __init__(self, name):
        #SuperHero.__init__(self, name = name)
        #Animal.__init__(self, name = name)
        
        self.name = name
        
    # No specialised greet this time around.


In [190]:
# What will this print? Will we be using Animal.greet or SuperHero.greet?

tigger = Tiger(name = "Tigger")
tigger.greet()

I am a generic animal!


* How do we figure out the order here?

In [191]:
# Using inspect.
import inspect
inspect.getmro(tigger.__class__)

(__main__.Tiger, __main__.Animal, __main__.SuperHero, object)

In [192]:
# Directly
tigger.__class__.__mro__

(__main__.Tiger, __main__.Animal, __main__.SuperHero, object)

In [193]:
#Addition:

SuperHero.greet(tigger)

With great power comes great responsibility.


* Possibly more complex: what happens if have several layers of this?
* Where does this Method Resolution Order come from? Extracurricular: check out [The Python 2.3 Method resolution order](https://www.python.org/download/releases/2.3/mro/).

In [196]:
# statics/class variables. Added after the lecture. 
# Not sure what went wrong (possibly duplicate def.)
# (See also prepared info below)

class Test:
    number_instances = 0
    def __init__(self):
        Test.number_instances += 1
        
t1 = Test()
t2 = Test()
print(t1.number_instances)

2


## Concept: composition (has-a)

* Adding capabilities by generating objects of their own.
* Above: every `DataSource` instance has an `APIConnection` instance of its own. 
    * Design choice: don't make giant DataSource class which has all the APIConnection methods inside.
    * Delegating that part of the work to a specialised class.
* See Lutz ch 31.

In [195]:
# Basic idea (in minimal, somewhat artificial example)

class DataSource:
    def __init__(self, url):
        self.connection = APIConnection(url = url)         # Every DataSource has its own APIConnection
        self.parser     = StreamParser(self.connection)    # Every DataSource has its own StreamParser
        #...
    
    def get_data(self, field):
        current_data = self.connection.get()
        parsed_data = self.parser.parse(data)
        return parsed_data["field"]
    
    
# Instead of inheriting from APIConnection, StreamParser.

Corollary: mixins where we inherit simple classes to add their capabilities (`DataSource(APIConnection, StreamParser)`).

## Finding out what kind of objects we're working with

Test if it is a direct instance.

In [197]:
# Can we test if alonzo is a Cat?
type(alonzo) is Cat

True

Says something about the hierarchy.

In [198]:
# Can we test if tigger is an Animal?
isinstance(tigger, Animal)

True

Note: **this is probably what you want to use**!

## Conventions for "hiding" data

<img src="https://imgs.xkcd.com/comics/workflow.png"/>

Dependencies, in [xkcd 1172](https://xkcd.com/1172/).

* Many languages have public/private member distinctions.
* Pessimistic note above. ([Hyrum's law](http://www.hyrumslaw.com/)).
* ...but let's try to hide things.
* By default everything in Python is public.

In [200]:
class SecretKeeper():
    def __init__(self, hidden):
        # self.hidden = hidden    # Initial code.
        self.__hidden = hidden    # Mangle name to "~hide"

    def print_secret(self):
        print(f"My secret: {self.__hidden}")

c1 = SecretKeeper(hidden = "supersecret")
c1.hidden

AttributeError: 'SecretKeeper' object has no attribute 'hidden'

In [201]:
c1.print_secret()

My secret: supersecret


Can we hide it away somewhat?

In [202]:
dir(c1)

['_SecretKeeper__hidden',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'print_secret']

Can we find it anyway?

In [None]:
# Yes, if you know name mangling. But take it as a signal from the programmer that you shouldn't.

# Left as exercise.

In [None]:
# And yes, you can reach the instance variables even if you don't know name mangling. 
# Still something to avoid.

Conclusion: follow the guidelines. **Assume that non-hidden attributes are public.**

* ...but we can tailor access by `__getattr__`, `__setattr__` and `__getattribute__`.

In [203]:
# Bonus task (outside the scope of this course): using custom methods to steer access.

class Mirror:
    def __init__(self, age):
        self.age = age
        
    # Catch all .attributes which do not exist.
    def __getattr__(self, name):
        
        # This function might do something special, eg pass the message on
        # over the internet, to an object on a remote server.
        
        return "<<< {} >>>".format(name)
    
val = Mirror(age = 100)
print("Attribute age exists: ", val.age)
print("Attribute python doesn't, so __getattr__ is run: ", val.python)

Attribute age exists:  100
Attribute python doesn't, so __getattr__ is run:  <<< python >>>


In [204]:
# Bonus task (outside the scope of this course): using custom methods to steer access.

class SuperMirror:
    def __init__(self, age):
        self.age = age
    
    # Catch all .attributes, including those which exist.
    def __getattribute__(self, name):
        if name.lower() in ["age", "the_age"]:
            return object.__getattribute__(self, "age") * 999
        
        return "You wanted {}, you say?".format(name)
    
val = SuperMirror(age = 1)
print(val.fish)
print(val.age)
dir(val)  # not a lot here, _it seems_ (but why?)

You wanted fish, you say?
999


[]

In [None]:
# Extra task for the curious: write code that prints val.age. Ask me in the labs if you have any questions.

## Most decisions are made by the objects

In [205]:
# Somewhat contrived example (used to take concepts where we have no intuition about commutativity).

class Snake():
    def __add__(self, other):
        return "Snake!"

class Ladder():
    def __add__(self, other):
        return "Hello, this is Ladder."

# Apart from possible issues with inheriting from int, will it commute?
p1 = Snake()
p2 = Ladder()
p1 + p2

'Snake!'

Why useful to know? 
* Python methods carried by objects do the heavy lifting, even when there is no `.method()` in the call.
* Following conventions is good. If you implement your own classes, you might want `p1 + p2 == p2 + p1` to hold, even if Python doesn't force you.
* Builtins are hard to avoid. Shows why your nice `my_clever_vector * 5` might be different from `5 * my_clever_vector`.

In [206]:
class MyVector:
    def __init__(self, vals):
        self.vals = tuple(vals)
        
    def __mul__(self, c):
        return tuple(c*v for v in self.vals)
    
    def __rmul__(self, c):
        # Added to handle 5 * u.
        return self.__mul__(c)
    
    
u = MyVector(vals = [1,2,3])
u * 5
5 * u

(5, 10, 15)

## With great power comes great responsibility

* Default: Python will let you.

## Getters? Setters? Interesting feature: properties

* Controlling getting, assignment, deletion of value.
* What happens if we write `myobj.x = 5`? Can be controlled via functions, like in eg C#.

In [207]:
class GetterTestClass():
    def __init__(self, n = 0):
        self.__n = n
   
    # When someone tries to get instance.n, we should run this.
    def __get_n(self):
        return self.__n

    # When someone tries to set instance.n = 123, we should run this
    def __set_n(self, new_val):
        if new_val >= 0:
            self.__n = new_val
        else:
            raise ValueError("n must be non-negative!")
            
    n = property(fget=__get_n, fset = __set_n)
    
    
    # Left out: fdel. When someone deletes an attribute.
    
    # When someone tries to get n, __get_n will be called. (Etc)
    
    
c1 = GetterTestClass(n = 100)
c1.n
c1.n = 3
#c1.n

Note: `@property`-syntax also available.

* Slightly weird. Sidesteps the usual `=` always meaning changing labels for class attributes.
* **Know that it's there, don't (ab)use**.

## Additional concepts: abc:s, class members

* Promises that a class should implement some behaviour (eg "this class should support sequence methods"), that can be checked by the system.

[Abstract Base Classes (ABC:s)](https://docs.python.org/3/glossary.html#term-abstract-base-class).

In [None]:
import collections
isinstance([1,2,3], collections.abc.Sequence)

See also the `abc` module in standard library.

* Class and instance attribute conventions.

In [None]:
class MyVal:
    common_to_all_myvals = 99
    def __init__(self, val):
        self.val = val
        

one_val = MyVal(val = 1)
two_val = MyVal(val = 2)
one_val.val # in the instance
one_val.common_to_all_myvals # in the class

In [None]:
one_val.common_to_all_myvals = "new value. Is this only in one_val?"
two_val.common_to_all_myvals # Actually, just a local name in one_val.

Note: you may have class methods which are reached via `<Class>.method()`, as we might reach `MyVal.common_to_all_myvals` above.

# Other interesting *language* details

* Decorators. Transforming Python functions. (Telltale sign in code: the `@` sign. `@something`, such as `@property`, `@dataclass`.)
* Annotations. [PEP3107](https://www.python.org/dev/peps/pep-3107/)

In [None]:
def sq(n : int, 
       otherarg : "annotations can be anything" = ""):
    """Return the value n^2."""
    return n*n

sq.__annotations__   # can be used by some code analysis tools, documentation tools etc.

* Consequence: type information might be useful eg
    * for code analysis, See eg [mypy](http://www.mypy-lang.org/).
    * IDE:s.

* Dataclasses. Structuring data by "dummy" classes (with type information) etc. Since Python 3.7, we have [dataclasses](https://docs.python.org/3/library/dataclasses.html) in the standard library. (Can also be pip-installed.)

# Useful intuitions about some built-in types (introducing sets)

* Why do we care?
* So far:
    * `list`. Mutable, quick access by index. Finding in general slow.
    * `tuple`. Immutable. Fast, efficient, quick access by index. No need for copying - efficient in some situations. But: finding in general slow (as with lists).
    * `dict`. Mapping, based on hash tables. Very fast access by key, finding by value slow. Membership test very fast. Slight memory overhead. Requires values to be hashable.
    * Yet another useful type: `set`. Based on hash tables, with **very fast membership tests**. Mathematical-set methods.

In [None]:
import profile, random, timeit

N = 9999999
vals_list = list(range(N)) 
vals_set  = set(range(N))

# Pick some random element, just for demonstration purposes.
needle = random.randint(0, N)

print(f"Timing {needle} in vals_list")
print(timeit.timeit(f"{needle} in vals_list", number = 100, globals = globals()))

print(f"Timing {needle} in vals_set")
print(timeit.timeit(f"{needle} in vals_set", number = 100, globals = globals()))

Conclusion: if you need to perform lots of lookups based on the keys, sets, dicts etc might be useful.

In [None]:
words = set(["cat", "snape", "doge"])
animals = set(["cat"])

# What are the commonalities?
words.intersection(animals)


In [None]:
# Which words are not in animals?
words.difference(animals)

In [None]:
# How would we add a word?
animals.add("giraffe")
animals.add("giraffe")  # Redundant - sets contain no duplicates.
animals

* Immutable version: `frozenset`. Can be used as a key in a dictionary.

**Why do we care about this?**

# Modules (once more)

* Those `import math`, `import my_own_module`...
* Where do they come from? [[reference](https://docs.python.org/3.7/tutorial/modules.html#the-module-search-path)]
    * Beginner's note: some installation system usually takes care of this for you. But useful to know if something breaks.
* When you write `my_module.py` you can do `import my_module` at least from the same directory (see search path above).
* You will see `__init__.py` around. This concerns [packages](https://docs.python.org/3.7/tutorial/modules.html#packages).
* Namespaces.

In [None]:
# Example package: sklearn, and its datasets
import sklearn
import sklearn.datasets
help(sklearn.datasets)

# Scripts

* Back to Python as a glue.
* Take us outside Notebooks.

* Telling the (Linux/Mac) system how the file should be interpreted:

In [None]:
#!/usr/bin/env python3

# As the first line.

* Telling the Linux/Mac system that you should be able to run myscript.py:

$ `chmod u+x myscript.py`. 

(In a terminal.)

* Making the script behave differently if it was imported:

In [None]:
if __name__ == "__main__":
    print("Running as main.")
else:
    print("I was imported!")

* Several useful modules to check out. For instance `sys`, `subprocess` (OS commands), `argparse`.

# Pointers to noteworthy data science packages

* [matplotlib](https://matplotlib.org/) (home page with tutorials)
* [scikit-learn](https://scikit-learn.org/stable/index.html) (import as sklearn)
* [numpy](http://www.numpy.org/). Arrays!
* [scipy](https://www.scipy.org/) Note: sparse matrices.
* [pandas](https://pandas.pydata.org/)

* Entire courses.

# Wrapping up (and handwaving intro to SVD)