# Functions

Functions are something we've been using all along, they allow us to write a generic block of code that can be reused for all kinds of different purposes. If we write code to add tax to a sale, combine first and last names, or convert between inches and cm, the actual functionality of that code is always the same, just with different details, or inputs. A function allows us to write all the generic code that is ready to accept inputs, does it's job, and spits out an output. 

## Defining Functions

Creating a function is pretty simple, there are a few things we need to think of:
<ul>
<li> Function name - as with variables, we want a name that makes sense. The name can be almost anything, but we want it to make sense. </li>
<li> Inputs/Parameters - these are the values that are supplied by the person <i>using</i> the function.</li> This changes (potentially) every time the function runs. 
<li> Body - the code that we want to reuse over and over.</li>
<li> Output/Return Value - this is the value that the function "returns" or the end result. Not every function needs to return a value, some just do stuff. </li>
</ul>

![Functions](../images/function.png "Functions")

For example, if we have a function to add the tax to a price, or parts break down as:
<ul>
<li> Function name - addTax. </li>
<li> Parameters - the price of the object and the tax rates. These things can change to anything. </li>
<li> Body - calculate the tax and new price. </li>
<li> Return value - the newly calculated total price.</li>
</ul>

We have to define our function first, then we can use it. 

In [9]:
def addTax(price, taxRate):
    total = price + (price * taxRate)
    return total

#### Calling a Function

We can call, or use, a function by typing the name and supplying the parameters. Note that the parameters that we supply can be either in order, or we can specify which parameter is which. If we specify which parameter is which, we can supply them in any order.

<b>Note:</b> we commonly see functions that we import using a full "path" to the function, like <code>math.sqrt()</code>. This is because the function is part of a module, and we need to tell Python where to find it inside the "math" package. Everything else is the same. 

<b>Rule of Thumb:</b>
<ul>
<li> If you are using a function that you wrote, you can just use the name of the function. </li>
<li> If you are using a function that is part of a module, you need to use the full path to the function. </li>
    <ul>
    <li> I.e. [Thing that was imported].[function name] </li>
    </ul>
</ul>

This can be a little confusing, especially at first, but it is not really too much of a real problem. One thing that may be confusing is that people may import things differently, so you may see <code>import math</code> or <code>from math import sqrt</code>. The first one imports the whole math module, the second one only imports the sqrt function. This would result in them being called differently. These differences don't really matter for what you need to care about, so we just need to know how to make sense of what to call. 

In [10]:
print(addTax(79, 0.12))

obj_price = 112
gst_rate = 0.07

print(addTax(obj_price, gst_rate))

88.48
119.84


In [11]:
print(addTax(taxRate=gst_rate, price=obj_price))

119.84


## Void vs. Fruitful Functions

The return value that we added above is an example of a "fruitful" function, as the book labels it, it is a function that returns some value. We can also have a "void" function, which does not return a value. These functions are used to do something, like print, but not return anything.

When using a fruitful function, the function call in the code will "become" the return value, so we can set a variable to the result of that function call. This is commonly what we want to do with such functions. 

In [12]:
price_with_tax = addTax(obj_price, gst_rate)
print(price_with_tax)

119.84


#### Void Function Example

We can make a similar function to the tax one above, but void. Since we aren't returning the "answer" as a return value, we need to use that value for whatever we need it for inside the function. Note that when we attempt to set a variable to the result of the function, it doesn't work as before, because the function doesn't return anything. 

We can have a function that returns a value, but we don't have to use it. We can just call the function and ignore the return value. This approach is probably better in most cases where we may write a void function - do the "thing" and also return the results, if the user wants to ignore them, no problem. We also sometimes have functions which do some action, then return a status code, like 1 for success, 0 for failure.

In [13]:
def addTaxVoid(price, taxRate):
    total = price + (price * taxRate)
    print(total)

In [20]:
tmp = addTaxVoid(79, 0.12)
print(tmp)

88.48
None


### Default Argument Values

We can also specify default values for our parameters, so that if the user doesn't supply them, we can use the default. This is useful for optional parameters, or parameters that are commonly used. For our taxation example, we can insert a default of 5% for the GST - that can be overridden by the user by just supplying a different value, but if they don't, we'll use 5%. This is commonly seen when we use many of the functions that are built-in or from the common libraries. For example, the sort function that we can call on a list has a default of ascending order, but we can specify descending if we want.

In [5]:
forward_list = [1, 2, 3, 4, 5]
print(forward_list)
forward_list.sort(reverse=True)
print(forward_list)

[1, 2, 3, 4, 5]
[5, 4, 3, 2, 1]


#### Making Named Arguments

Our tax function works well with a default. We can create a default for the tax rate by just assigning a value to the parameter in the function definition. One thing to note is that any arguments without a default must come before any arguments with a default. So our price needs to come before the tax rate - the order internally of default or not-default arguments doesn't matter, but they are generally from the most important to the least important.

In [6]:
def addTaxDefault(price, taxRate=0.05):
    total = price + (price * taxRate)
    return total

Note that when we call the optional parameter, we can either call it in order or specify the name. If we specify the name, we can supply the parameters in any order; if we don't specify the names, we need to stick to the order. 

<b>Note:</b> while learning, using descriptive argument names along with specifying the names when calling the function may be helpful, as it makes everything very explicit and probably easier to interpret.

In [9]:
base_price = 100
tax_free = addTaxDefault(base_price, taxRate=0)
print(tax_free)
taxed = addTaxDefault(base_price)
print(taxed)
overtaxed = addTaxDefault(base_price, 0.15)
print(overtaxed)
supertaxed = addTaxDefault(taxRate=0.25, price=base_price)
print(supertaxed)

100
105.0
115.0
125.0


## Exercise

Define a function that takes in two arguments - a price and a discount rate. The function should return the new price after the discount is applied. Think about what the function should be called, what the parameters should be, what the return value should be, and what may be tested to ensure that it works correctly. 

If this is easy, return a tuple of new price and discount amount, in your function call, print the two results on two separate lines. If that's easy, apply a discount and tax, and return the final price, discount amount, and tax amount.

In [14]:
# Create discount function

### Variable Scope

One thing to be aware of with functions is the availability of variables that we declare inside the function, this is called the "scope" of the variable. We'll look more at scope later, but the basic loops are a good direct example of this. Scope refers to where a variable exists. For most of the random variables we've declared so far, they exist everywhere in the notebook. For example, if we declare a variable in a cell, we can use it in any other cell.

Inside the functions we create, we can declare variables that are only available inside the function. This is a good thing, it means we can use the same variable names inside functions without worrying about them conflicting with other variables. In essence, the innards of the function are a separate world from the rest of the notebook. 

#### Scope Levels

We can define the different scope levels as:
<ul>
<li> Global - these variables are available everywhere in the notebook. Things you declare "outside" of any functions are global. </li>
<li> Enclosing - if we have nested functions, things on the outer layers are available on the inside, but the reverse is not true. </li>
<li> Local - these variables are only available inside the function where they are declared </li>
</ul>

![Scope](../images/var_scope.webp "Scope")

When we have functions we'll introduce scopes into our consideration. We need to be clear about which variables are available where. When our code is looking for a variable, it'll work from the inside-out, or bottom up on out list - it will look to local, then enclosed, then global to find a variable. We can reuse variable names in different scopes without worrying about them conflicting, but we should still consider if it makes sense logically. It is common for us to reuse variables in different scopes from function to function, especially generically named ones like loop counters or parameters. We should be careful of overlapping names where the scope resolution can conflict - for example, the image above the reuses x works, but is hard to understand. 

## What to do in a Function?

Where, why, and when should we use functions? The answer is pretty simple, whenever we have a block of code that we want to reuse. More aggressively, if we have a specific thing we want to accomplish, we can write a function to do it. Placing our code into functions is never strictly necessary, but it's usually a good idea for a few reasons:
<ul>
<li> It makes our code more readable. Rather than having all details, the main portion of our code can have higher level abstractions - e.g. "len" rather than code that counts the number of letters in a string. </li>
<li> It makes our code more reusable. If our code is in a function, we can easily use it in many places with little effort. </li>
<li> It makes our code more modular. Individual functions can be added, removed, rearranged, or replaced without impacting the rest of the code. </li>
<li> It makes our code more testable. We can verify that a function does what it does, then trust in that when creating larger applications. </li>
</ul>

It is possible to write code that does everything in one big block, but it's not a good idea. It's much better to break things down into smaller pieces that are easier to understand and test. It is a good rule of thumb to put anything that we might want to do more than once, anything that requires many steps to do one "thing", or anything with a defined input and output into a function. You'll rarely go wrong by putting code into a function.

## Exercise

Create a function that takes in a string and returns the first, middle, and last characters of the string in a new string. Think in particular about testing when writing this, as there are definitely design decisions and edge cases to consider.

In [15]:
# Write function


In [None]:
# Sample Test
#test_string = "I am a testing string for testing purposes"
#firstMidLast(test_string, len(test_string), len(test_string)/2)

## Exercise - Slightly More Complex

<b>Note: it is a good idea to pseudocode this one out first to see what the process is. Think about what the input is, what the output is, and what we'd need to do/call step-by-step to translate the input to the output.</b>

Create a function called federalTax that takes in an income, and returns the amount of federal tax owed. The tax rates are as follows:
<ul>
<li> 15% on the first $53,358 of taxable income</li>
<li> 20.5% on taxable income over $53,359 up to $106,717</li>
<li> 26% on taxable income over $106,717 up to $165,430</li>
<li> 29% on taxable income over $165,430 up to $235,675</li>
<li> 33% on any taxable income over $235,675</li>
</ul>

If you get that function down, create another function called albertaTax that takes in an income and returns the amount of tax owed. The tax rates are as follows:
<ul>
<li> 10% on the first $142,292 of taxable income</li>
<li> 12% on taxable income over $142,292 up to $170,751</li>
<li> 13% on taxable income over $170,751 up to $227,668</li>
<li> 14% on taxable income over $227,668 up to $341,502</li>
<li> 15% on taxable income over $341,502</li>
</ul>

If those work, create a function called taxationLoad, that takes in an income and returns a tuple of federal tax owed, provincial tax owed, total tax owed, and the effective tax rate.

<b>Finally, if you feel up to it, change the taxationLoad to allow people to be from different provinces, calculate their tax accordingly, and return the tuple above.</b> The arguments will need to be expanded to accept a province, and tax rates from other provinces are listed here: https://www.wealthsimple.com/en-ca/learn/tax-brackets-canada#provincial_territorial_tax_bracket_rates_2023 This is a good self check at this point - it is likely that you might not be able to do this entirely off the top of your head, but you should be able to work through it. 

As a super bonus, create a function that takes in a province and income, and tells you which province will save you the most in taxes, and how much that savings is. 

In [None]:
# Start taxing

## Importing Outside Functions

<b>(This is mostly for context. We need to import stuff frequently, but the details are not generally critical.)</b>

When we are using things like len(), type(), or pretty much anything else, we are using a function that was made by someone else and borrowed by us. We can also reuse our own functions in a similar way. 

In general, we want to put things that we want to repeat into functions, it makes our code easier to maintain, understand, and debug. If we figure out some code to perform a calculation or print a set of charts we can wrap that code in a function to allow us to repurpose it - we can perform the same calculation with any set of values, or print charts from any set of data. As well, as our functions get more complex, functions allow us to only write the complex part of our code once, so if we ever need to modify it or correct some error, that change applies everywhere the function is used and we don't need to hunt down multiple corrections. 

### Import Statements

Import statements allow us to load in other files of code, or entire libraries of code, to use their functions without having to write them ourseleves. We can import almost anything, but the most common things that we may pull in are:
<ul>
<li> Premade Libraries</li>
<li> Selfmade Helpers</li>
</ul>

Importing either just needs the import statement, along with the name of what we're importing. For the libraries that we use for most things, we can just search for the correct name, for example we can import the "math" library to do some math-y stuff like take a square root. We can also put our own functions in a python (.py) file that is in the same folder as our file, and import that; I've made the tax function above into addMyTax in the helper.py file. 

When using some larger libraries that have many types of functions, it is common for us to only import certain parts or packages from a larger library. This will be very common with the large data science packages, and is fairly common in examples online where things are imported. For example, sklearn is a very common data science library that we'll use all the time later on, it has many packages inside it to do different types of things. For example below, I imported the "metrics" package only from the sklearn library, then used the "accuracy_score" function from that package. This is not super common for us now, but it will be later on and in stats. 

In [16]:
import sklearn.metrics as ms
ms.accuracy_score([0,0,1,1], [1,1,1,1])

0.5

In [17]:
import math
print(math.sqrt(9))

3.0


## Reading Files

There are several ways to read files in Python, here we'll look at the more basic ways. 

### Reading a File 

We can read a file in Python by first using the open() function. This function takes in the name of the file, and a mode. The mode is a string that tells Python what we want to do with the file, the most common modes are:
<ul>
<li> r - read, this is the default mode, it allows us to read the file. </li>
<li> w - write, this mode allows us to write to the file, but it will overwrite anything that is already there. </li>
<li> a - append, this mode allows us to add to the end of the file, but not overwrite anything that is already there. </li>
</ul>

There are several other modes, for the most part we can just look up the correct one using something like this flow chart when needed:

![File Access Modes](../images/file_access.png "File Access Modes")

The open() function returns a file object, which we can then use to read or write to the file. So we open the file, then interact with that object like we would a list or a string, to ask it to read, write, etc... When we are done, we close that file connection. 

One thing to keep in mind is that files can be large, and reading the entirety may take time. One thing I worked on took a task to check ~175 server logs to see if a backup failed overnight was taken from 6+ hours to ~10 minutes simply by rewriting the code to check the log files to start reading at the end rather than the beginning. These log files were sometimes several GB in size, so that is a lot of reading (as well, this when RAM was more limited, so it could get bogged down).

### Closing a File

When we are done with a file, we should close it. We can do this by calling the close() function on the file object. If we don't close the file, it will be closed automatically when the program ends, but it is good practice to close it ourselves. If we have a program that is running on a server and isn't closed down regularly, maybe for months at a time, doing things like leaving file connections open forever can cause problems. Not closing the files can stop the files from being used by other programs, interfere with them organized by system tasks, or keep resources tied up by causing a memory leak - the memory holding the file we keep open forever. 

In [None]:
!wget https://raw.githubusercontent.com/AkeemSemper/ML_for_Non_DS_Students/67558709fba3eab63af6373a61bd3cbf21113d94/data/output.txt

In [None]:
def read_file(file_name, mode):
    file = open(file_name, mode)
    file_content = file.read()
    file.close()
    return file_content

my_content = read_file('output.txt', 'r')

In [None]:
my_content

In [None]:
my_content.split('\n')

### Writing to a File

We can write to a file using the write() function. This function takes in a string, and writes it to the file. We can also use the writelines() function, which takes in a list of strings and writes them to the file.

In data science work, one of the common things that we might need to use writing for is either to write a log of what's happening or errors that occur - especially if we have something that is pulling data from many sources to feed our model, we might want to write a log of what happened so we can see if something failed. There are dedicated logging packages, but the basic idea is always the same - write a string with "what's happening" to a file. We may also need to write out data to disk as it is in the middle of processing - some actions take a long time with lots of data, so we may want to do all the "clean up", then write the data back to disk ready to use. Again, we often use dedicated packages for this as well. 

## Exercise (May be Better After)

Write a write_file function that takes in arguments of a file name, and "content". The content might be either a string, or a list of strings. If the content is a string, use the write() function to write it to the file. If the content is a list of strings, use the writelines() function to write them to the file. Close the file when you are done. Return either "string", "list", or "error" depending on what was written to the file.

In [2]:
## Try without scrolling too far if you can
#
#
#
#
#
#
#
#

In [None]:
def write_file(file_name, content):
    retval = "error"
    file = open(file_name, 'w')
    if isinstance(content, str):
        file.write(content)
        retval = "string"
    elif isinstance(content, list):
        file.writelines(content)
        retval = "list"
    file.close()
    return retval

status_write_1 = write_file('output1.txt', "I'm a little teapot, short and stout\nHere is my handle, here is my spout\nWhen I get all steamed up, hear me shout\nTip me over and pour me out!")
status_write_2 = write_file('output2.txt', ["I'm a little teapot, short and stout\n", "Here is my handle, here is my spout\n", "When I get all steamed up, hear me shout\n", "Tip me over and pour me out!"])
status_write_1, status_write_2

### With Statement and Easy File Reading

We can use the with statement to open a file, read it, and close it all in one line. This is a good way to do it, as it ensures that the file is closed when we are done with it. In most cases, we can use "with open" along with readlines() to read a file. The function below is a pretty reusable solution. 

In [None]:
# This is generally what you can use
def read_file_lines(file_name):
    with open(file_name, 'r') as file:
        lines = file.readlines()
    return lines

In [None]:
text_by_lines = read_file_lines('output2.txt')

for line in text_by_lines:
    print(line)

## Exercise

Create a function to read the file data_dictionary.txt that is downloaded with the command below. Collect all the section headers into two lists - one for the title, and one for the description. For example, the first line of the text is "MSSubClass: Identifies the type of dwelling involved in the sale." So the parts of this line are:
<ul>
<li> Title - MSSubClass </li>
<li> Description - Identifies the type of dwelling involved in the sale. </li>
</ul>

For any lines that do are not the title, ignore them. (There is a hint below if you need it)

Setup the function so that the file name is an argument, and the two lists are returned. Return the lists as a tuple. 

### Challenge Exercise 1

Update the function to write the headers back to their own file, one header per line.

### Challenge Exercise 2

If this was easy for you, try to change 

In [1]:
!wget https://raw.githubusercontent.com/AkeemSemper/ML_for_Non_DS_Students/main/data/data_dictionary.txt


zsh:1: command not found: wget


In [None]:
# Code it up









--
--



#### Hint

To check if a line is a title, you can check if the first word of the line ends in a colon. This is a good thing that we can practice Googling for solutions, "python check word ends in character" will likely get you there. You may want to try splitting the line into words, then checking. 