# Functions and object-oriented programming

Functions allow you to group together related operations in such a way that you can abstract away details in your program. Two main use cases of functions come to mind: 
1. Avoiding repetition and the bugs that can come from inconsistent code;
2. Grouping together operations used elsewhere (like in list comprehensions and equality comparisons).

## Defining a function 

We have already seen a few functions such as `print()` and `len()`. Building your own functions is a crucial part of coding. Without user-defined functions, you are left with code that is literally just one command after another. With functions you can abstract away the common parts, code them once inside the function, and then send the unique or novel parts to the function as __arguments__.

Below is an example of some repetitive code followed by an example of a function that factors out the parts of the repetitive code. A function starts with `def` (for 'define'), followed by a __name__, and an __argument__ on the same line. Then "inside" the function (which is also denoted visually with indentation) is the code that is run each time the function is called. This code will use the values sent in (those are the arguments) and then typically `return` some object as output.  

In [1]:
# Try to identify the repetition
list_result = []

x = 5
if x %2 == 1:     list_result.append(x * 2)
else:             list_result.append(x)

x = 7
if x %2 == 1:     list_result.append(x * 2)
else:             list_result.append(x)

x = 12
if x %2 == 1:     list_result.append(x * 2)
else:             list_result.append(x)

print(list_result)        

[10, 14, 12]


In [1]:
def doubleIfOdd(num):
    if num % 2 == 1: 
        return num * 2
    else: 
        return num

print([doubleIfOdd(x) for x in [5,7,12]])

[10, 14, 12]


To build that function we need to do the four things specified above: 

- Name
- Inputs
- Calculations
- Outputs

The name was `doubleIfOdd`, the inputs in this case referred to a single variable called `num`, the calculations referred to the `if-else` statement, and the output referred to what we returned, namely `num` or `num*2`. Below we can see a similar function, except it doubles numbers if they are even.  

In [3]:
def doubleTheNumberIfEven (input_number): 
    if input_number%2==0:
        return input_number * 2
    else:
        return input_number
    
numbers = [1,4,6,7,9,14,17]

new_numbers = [doubleTheNumberIfEven(i) for i in numbers]

print(new_numbers)

[1, 8, 12, 7, 9, 28, 17]


## Variables have a 'scope'
A variable that is created inside of a function is not the same as the one created outside of that function even if they have the same name. This is because the variable inside the function is a __local__ variable. Variables created in Jupyter are typically treated as __global__ variables if they are created in a cell but not if they are created inside a function. To be global means that they can be used anywhere in the code. Local variables are created and destroyed within their local context. You can watch this behavior with a code snippet. 

In [4]:
# Local / Global scope example 1: Variable in the function stays in there.

def multiplyTheValue(input_number):
    x = input_number * 2
    print("Value of x inside the function",x)
    return x 

x = 4 
output_number = multiplyTheValue(x)
print("Result from the function:",output_number)
print("Value of x after the function:",x)


Value of x inside the function 8
Result from the function: 8
Value of x after the function: 4


But ```x``` wasn't the argument, ```input_number``` was. So what if we change ```input_number``` inside the function? 

In [5]:
# Local / Global scope example 2: Argument sent to function doesn't escape the function.

def multiplyTheValue(input_number):
    print("Inside the function",input_number)
    return input_number 

x = 4 
output_number = multiplyTheValue(x)
print("After the function",input_number)
print("Value of X after the function:",x)

Inside the function 4


NameError: name 'input_number' is not defined

We sent `x` to the function, at which point it became the value for the `input_number` parameter. So we could use `input_number` inside the function, but then when we try to call it outside the function it throws an error. To make it available outside the function is not an advised code pattern, but it is possible by using the `global` flag.

In [6]:
# Local / Global scope example 3: Casting a variable as global makes it available outside the function.

def multiplyTheValue(input_number):
    global x
    x = input_number * 2
    print("Value of x inside the function",x,id(x))
    return x 

x = 4
print("Value of x before the function",x,id(x))
output_number = multiplyTheValue(x)
print("Value of x after the function",x,id(x))
print("After the function",output_number)

Value of x before the function 4 4336056768
Value of x inside the function 8 4336056896
Value of x after the function 8 4336056896
After the function 8


In this third example, we can see that when we declare x is a global variable inside the function, that value then becomes the value outside of the function. We double ```x``` inside the function and then later when we print x it is no longer 4, it retains the value it had inside the function. 

## There are all kinds of ways of passing data to a function. 

A function usually has some _parameters_. Parameters are like another word for options or settings. When you define a function, it is parameters that you write between the parentheses. But when you are coding you are more interested in the values of these parameters. These are _arguments_. So in the function: 

In [7]:
def tinyexample(word):
    print("Tiny examples!", word)

tinyexample("Big ideas!")

Tiny examples! Big ideas!


`word` is the parameter, `"Big Ideas"` is the argument. That said, most people use these terms interchangably. 

There are a number of different kinds of parameters. Some of these allow a function to take in a flexible number of arguments, others define the type of argument that the parameter will permit. Parameters can take default values. If the parameter has a default value, then one does not need to send an argument when running the function. 

Note that since a function can have a combination of different parameter types, the ones without defaults come first. Let's see how some different functions take multiple arguments below: 

In [8]:
# Example 1. Just a single positional argument
def example1(just_name):
    print(just_name)

example1("example 1 argument")

example 1 argument


In [2]:
# Example 2. A positional argument with a default value
def example2(arg_name, setting1 = True, setting2 = True ):
    if setting1:
        print(arg_name)
        return
    elif setting2:
        print(arg_name.upper())
        return
    else:
        print(f"{arg_name} You have disabled the settings")

In [3]:
example2("Example 2. Take 1.")

example2("Example 2. Take 2a.", setting2 = False)

example2("Example 2. Take 2b.", True, False)

example2("Example 2. Take 3.", False, False)

Example 2. Take 1.
Example 2. Take 2a.
Example 2. Take 2b.
Example 2. Take 3. You have disabled the settings


In [11]:
# Example 3. Postional arguments passed but not defined ahead of time
def example3(just_name, *args):
    if len(args) > 0:
        for i in args: print(i)

example3("some data","Maybe","more data")

# Below, why does it not print 'some data'?

Maybe
more data


In [12]:
# Example 4. Keyword arguments passed but not defined ahead of time
def example4(just_name,**kwargs):
    if len(kwargs) > 0:
        for i,j in kwargs.items(): 
            print("var name:",i,"\tvalue:",j)

example4("example",
         var1="some data from v1",
         var3="Maybe it's v3?",
         var2="v2's valuedata")

var name: var1 	value: some data from v1
var name: var3 	value: Maybe it's v3?
var name: var2 	value: v2's valuedata


In [13]:
# Example 5. Showing the possibilities (and dangers) of fragile code and weakly cast variables.

def MakeDouble(value):
    try: 
        output = value*2
    except TypeError:
        output = None
        
    return output

print( MakeDouble(2)  )
print( MakeDouble("Double")  )
print( MakeDouble(["2"]))
print( MakeDouble({1:4}))

4
DoubleDouble
['2', '2']
None


## A function always returns, but it might be nothing at all.

Your function always stops at the return statement. You can have multiple return statements for different conditions (like saying if...return one thing and else...return another). After the return statement, the remaining code will not be evaluated by the program. But if your function does not have a return statement, Python will still return `None` (which if you remember from above evaluates to `False`). Just try it for yourself. 

In [14]:
def noReturn():
    pass

print(noReturn())

if noReturn(): 
    print("Did it work?")
else:
    print("Oh right, None evaluates to false.")

None
Oh right, None evaluates to false.


# Classes and Objects 

Classes are a means of grouping together relevant variables and methods into a single class. Then a class becomes the template for some kind of object. Stated differently, every object is an object of some type of _class_. Object-oriented programming is one of many paradigms of programming. And not all programs need to be object oriented. Regardless, Python is predominantly object-oriented (as is Java, C++, swift, Objective-C, and Ruby, for example). 

To say that a program is object-oriented means that it uses `objects` as a part of its processing. An `object` is the generic term for any data structure that can be created by a program. A nice feature of an object is that it can contain other objects unless it is a 'primitive'. So a character is a primitive but a string is a collection of characters. But we can also have a collection of strings (like a list of strings or a dictionary of key:value pairs). Objects have specialised methods. For example, the string object has `.upper()` or `.lower()` methods. 

We would say that objects of the same type are _instances_ of the same _class_. So, above, when I said "the string object has a...", that was short hand. More specifically, I could have said  "any instantiated objects of the string class can use the..." 

You can create a class from scratch or extend and existing class. 

## Creating classes using `__init__`

To create an object, you need to first define the class name and the provide an internal method called `__init__`. This method will automatically run every time you create (or "initialise") a new object of that type. So if you had a class called `Pizza` which you know creates pizza objects, then you would probably initialise it with a few relevant variables such as `toppings = []`, `sauce='tomato'` , and `base = 'classic'`. You can then modify the pizza object. This would be a basic Pizza class:

In [15]:
class Pizza: 
    def __init__ (self):
        self.toppings = []
        self.base = 'classic'
        self.sauce = 'tomato'
        
p = Pizza()
z = Pizza()

Now we can consider the pizza object as a combination of multiple other objects that all work together. A shopping cart, for example, might be a class that includes a list of items, a discount code, and an identifier for the customer that owns the shopping cart. Admittedly, for something like pizza or a shopping cart we can also get away with just using a dictionary. That is, we could have simply written: 

~~~ python
pizza = {toppings:[], base:"classic",sauce:"tomato"} 

pizza[toppings].append("red peppers")
pizza["base"] = "thin and crispy"
~~~

So what is the advantage of using a class rather than this structure? It depends on the purpose. For simple data transfer, actually it is nice to just keep it as dictionaries and lists. Later when we look at JSON files from the web we will see how they are essentially just collections of lists and dictionaries. But when programming, it is useful to be able to have a _structure_ to the various objects that are related to each other. This structure can give some sense to the objects as well as ensure that they all work in sync. For example, what if we want to manage two pizza orders? Will we create another variable called `pizza2`? 

Below I will show two approaches to printing off a receipt. Compare how I would do it for a dictionary like above, and then for a class: 

In [16]:
cart = {"items":[], "code":None, "customer":None} 

cart["items"] = ["Turntable","Microphone","Keyboard"]
cart["code"] = "HAPPY2020"
cart["customer"] = "Tom"

print(f'Welcome {cart["customer"]}\n\nYour items:\n{" ".join(cart["items"])}\nDiscount code:{cart["code"]}')

Welcome Tom

Your items:
Turntable Microphone Keyboard
Discount code:HAPPY2020


In [17]:
class Cart: 
    def __init__(self):
        self.items = []
        self.code = None
        self.customer = None
        
    def receipt(self):
        message = f"Welcome {self.customer}\n\nYour items:\n"
        message += "\n".join(self.items) + "\n"
        
        if self.code:
            message += f"Discount code:{self.code} applied"
        return message
            

So above is just the class file. From here we can see a couple differences. The first is that the `receipt` function is inside the `Cart` class. The second is that when we are referring to objects that belong to the `Cart` class inside of the class definition we refer to them as `self.<object>`. So `__init__` is never really called directly, you never say `x = Cart.__init__()`, instead you initialise by saying `x = Cart()`, which then will automatically run the `__init__` method. In this case, it will create three internal variables, `self.items`, `self.code`, and `self.customer`, and give them some values. Although this seems a little overkill compared to the nested dictionary, it creates more of a structure to work with. Then we can create multiple cart instances, as can be seen below. 

In [18]:
x = Cart()

x.items = ["Turntable","Microphone","Mixer"]
x.code = "HAPPYSPINNING"
x.customer = "Chuck"

print(x.receipt())

Welcome Chuck

Your items:
Turntable
Microphone
Mixer
Discount code:HAPPYSPINNING applied


Compare how the receipt was printed this time with the code above. We abstracted away the details of printing to the `receipt()` method of the `Cart` class, which we defined elsewhere. We were still able to access the objects in the `Cart` class, but instead of `self.items`, we first instantiated an object called `x`, and then used `x.items`. Some classes can be fussy and expect you to use a dedicated method to get these objects, like `x.get_items()`. Other times classes allow you to access the objects directly. It's a bit of trial and error as well as checking in on the docs for a particular package. 

Below I will create a second object just to demonstrate how we can have separate `Cart` objects and use them together in a `print()` statement.

In [19]:
y = Cart()
y.items = ["808 Drum Machine", "Keyboard", "Laptop"]
y.customer = "Caterina"

print(y.receipt(),x.receipt(),sep="\n###########\n")

Welcome Caterina

Your items:
808 Drum Machine
Keyboard
Laptop

###########
Welcome Chuck

Your items:
Turntable
Microphone
Mixer
Discount code:HAPPYSPINNING applied


## Extending classes and inheriting values

There are instances in both data access and machine learning where the task will have a class that's almost fit for purpose but typically there will be a few key functions missing. You could then 'extend' this class with your own version of these methods. 

One example concerns Twitter data. So there is a way to get the Python to listen for new tweets based on some criteria, such as when someone tweets `#BLM`. In the `twitter` library this is done using the `StreamListener` class. When you instantiate a stream listener, it will handle many of the details automatically, like connecting to Twitter and receiving data according to your search parameters. However, it simply listens and does not do anything with the data it receives. For you to do something with the data, you need to _extend_ the `StreamListener` class. This extension, perhaps called `CustomStreamListener` will _inherit_ all the methods and objects in the `StreamListener` class, but you can add your own additional methods. One method that it will look for is called `on_data`. This method will be called anytime there is a tweet that appears according to your search terms, and then you get to define what to do with that data. For example, in the `on_data()` method, you could fill it with instructions such as "check for hate speech" or "store in a database" or "reply automatically with a messsage".  

Here is a simple example building on the `Cart` above. Notice that in the `Trolley` class (which is what you would often call a cart in the UK), we do not say `self.items` or `self.customer` since they were _inherited_ from the `Cart` class. But now we will add a user `post_code`, since in the UK addresses have post codes. 

In [20]:
class Trolley(Cart): 
    def __init__(self): 
        Cart.__init__(self) # observe what happens if you remove this!
        self.post_code = "OX1 3JS"
        
    def delivery(self):
        message = "Your basket currently includes:\n"
        message += "\n".join(self.items) + "\n"
        message += "It will be delivered to " + self.post_code

        return message

In [21]:
z = Trolley()
z.items = ["Cables","Cassette Player"]

z.placename = "OII"

print(z.receipt())

print(z.delivery()) 

Welcome None

Your items:
Cables
Cassette Player

Your basket currently includes:
Cables
Cassette Player
It will be delivered to OX1 3JS


## Reasons to use a class

Part of the reason for showing a class is that it helps us understand the basis of objects, as each object is necessarily an __instance__ of some _class_ of object. Later when we will be working with data, we will be using DataFrames, which are tables that contain data. These `DataFrame` objects have their own methods, but also _inherit_ methods. We will want to know how to create an instance of a `DataFrame` object, what it means to send different arguments, and to query for parts of the `DataFrame`. Observing the code below:  

~~~ python
import pandas as pd 

df = pd.DataFrame(columns=["name","age"]) 
~~~

You can already notice that `pandas` is a library. In this library, which we have imported under the name `pd` for short, is a class called a DataFrame. By calling `df = pd.DataFrame()` we are creating an __instance__ of the DataFrame class called `df`. By using `cols=["name","age"]` we are sending these two values to the `DataFrame.__init__` method. Thus, when it initialises the DataFrame object `df`, we will have a table with two columns, `name` and `age`. See below (notice that it will possibly run slow the first time you `import pandas`).  

In [22]:
import pandas as pd 

# An empty table with two columns and five rows
df = pd.DataFrame(columns=["name","age"],index=range(5)) 
df

Unnamed: 0,name,age
0,,
1,,
2,,
3,,
4,,


The DataFrame in this case is now an empty table. To create a table with data or to manipuate data is outside the scope of this book. Rather it is where we start off in the book "From Social Science to Data Science". 

# Conclusion

Now we can see how programming can become pretty complicated, with objects referring to other objects and other functions or methods all over the place. Often times, when I'm trying something new with programming I often have check the documentation or print a lot to get a sense of what methods an object has available or simply to determine what type of object was returned from some method or function. Being able to understand how to query an object or manipulate it will be an important skill moving forward in Python and in giving your scripts some structure. This structure is not merely for its own sake. It helps to create code that is more reusable and robust. By structuring our code we are structuring our ideas about data. That is good if we want to do something repeatedly or consistently across many cases. 

The last chapter does not expand our basic programming knowledge much. Instead, the next chapter will focus on how to get out of Jupyter by writing Python scripts as well as by learning the basics of how to read and write files.  