*If you are new to Jupyter notebooks, each gray cell is a piece of code. To run the code, click inside the gray cell and either click the triangle button up top, or press shift+return (or shift+enter) on your keyboard. If you are using Google Colab, shift+return should also work.*

# <br><br>Using functions for automation

First, let's review how to run a Python script from the command line.
<br><br>Yesterday we wrote a notebook to sort email addresses by their domain names, according to an ordered list of preferred domain names.
<br><br>I've turned the second version we wrote into a python script - sortEmails2.py. Let's take a look at it. Let's open it in a text editor:
1. If you are using Jupyter Lab on your own computer, you can double click the script from the file tree on the left. Jupyter Lab has its own text editor.
2. If you are using Google Colab online or Jupyter Notebook on your own computer, you will need to open the file in the text editor on your own computer:
- On a PC: Notepad
- On a Mac: TextEdit
- Or another plain text editor of your choice

### <br><br>Running a script on the command line

1. Open up your command line shell. In Jupyter Lab, go to File>New>Terminal. You can also open Terminal on a Mac or Anaconda Prompt on a PC (or PowerShell)

2. Navigate to the folder you downloaded for today's workshop. (Jupyter Lab should open up in the same folder you're working in now)

3. Type `python sortEmails2.py`

4. The script will create the file preferredEmails2.csv.

<br>As a reminder, a Python script is a plain text file that ends in .py. Scripts run from top to bottom.

### <br><br>Breaking up code into functions

**Function** - a chunk of code that you give a name to
<br><br>After you name the code, whenever you want to use it, you can call it by name instead of typing or copying and pasting the entire chunk of code.
<br><br>This is called a **function definition**. While we will be looking at some function definitions, we won't be talking about the syntax of function definitions (that is covered in the Python Fundamentals Bootcamp or other intro level tutorials).

<br>By default, scripts run code from top to bottom. Of course, when you **run** a function definition:

In [1]:
def makeFancy(a_string):
    new_list = a_string.upper().split()
    new_string = "**".join(new_list)
    return new_string

...your computer only stores it in memory, ready to use it when it eventually gets called:

In [2]:
makeFancy("Today is Tuesday.")

'TODAY**IS**TUESDAY.'

<br><br>We're going to break up the sortEmails2.py script into functions to make it more readable. It might also be useful in other ways - let's see!

To create functions, look for:
- code that is repeated anywhere in the notebook
- loops within loops within loops – can you pull out an inner loop and define it as a function?
- a chunk of code that performs a clear task that you can name – "remove headers", "create data dictionary", "calculate stat", etc.
- code that you might want to use in other scripts


*Take a minute and look at the sortEmails2.py script. Do you see any code chunks that could be turned into functions?*

#### <br><br>Defining a good function

Here's one piece of code in the script that performs a clear task that would be easy to name:

In [None]:
noDups = []
[noDups.append(item) for item in ordered_emails if item not in noDups]

*Note that this code won't work here because it refers to a list called `ordered_emails` that is created elsewhere in the code.*

<br>This code chunk removes duplicate email address from the `ordered_emails` list. Let's define it:

In [None]:
def removeDups(ordered_emails):
    noDups = []
    [noDups.append(item) for item in ordered_emails if item not in noDups]
    return noDups

In this example, I used the temporary argument name `ordered_emails` which matches how the code gets called in our script.
<br><br>However, wouldn't this code remove duplicates in any list? It doesn't do anything specific to emails. Let's change that argument name:

In [1]:
def removeDups(a_list):
    noDups = []
    [noDups.append(i) for i in a_list if i not in noDups]
    return noDups

Run the cell above and then let's call it on a sample list:

In [2]:
sample_list = ["a", "b", "a", 1, 2, 1]
print(removeDups(sample_list))

['a', 'b', 1, 2]


<br>Finally, add a good comment to the function to remind yourself what it does. We're going to use the **docstring** format of using triple quotes:

In [None]:
def removeDups(a_list):
    '''removes duplicates from a list while keeping the order'''
    noDups = []
    [noDups.append(i) for i in a_list if i not in noDups]
    return noDups

<br><br>This function definition can now be added to our script. In the body of the script, instead of running the code, we will just call the new function with the correct list. First, though, let's talk about the body of the script.

#### <br><br>The `main()` function

The body of the script should get defined in a function called `main()` that you call at the very end of the script. 

<br>Let's make these two changes in our sortEmails.py script. First, save a new copy of the script as sortEmailsMain.py. Then, we'll walk through the changes and run the new script on the command line.

### <br><br>Exercise: Defining more functions

Take a look at the sortEmailsMain.py script. Can you see any other chunks of code that would be easy to name and define as functions?

<br><br><br>How about the code that writes our dictionary to a csv file? Try to write a good function here that runs that code:

If you finish your function, save a new version of the sortEmailsMain.py script called sortEmailsFunctions.py. Add your new function to the file and replace that code inside the `main()` function.

### <br><br>Saving and Using Your Own Functions

If you have functions that you want to use in multiple scripts or notebooks, you can save them. Here are a few times when you might want to do that.
<br><br>For reusability:
- You wrote code to create custom visualizations
- You frequently work with the same file type and you have code to clean it or to extract certain data from it
- You have code to check a file or dataset for certain qualities before you use it
- You coded equations or algorithms
- You find yourself frequently doing the exact same thing

For organization/presentation:
- Your notebook is getting long and slow and messy and you want to clean it up by saving the function definitions in a different file
- You want to use your notebook to display visualizations or other output to an advisor or colleague and they don't need to see all the code behind the scenes

<br><br>Functions can be saved in a Python script. We've already saved a function in the sortEmailsMain.py script. Some of you may have also saved another function in the sortEmailsFunctions.py script. 
<br><br>In order to be able to use a Python script as a module from which we can import individual functions, we have to add one line of code at the end. Instead of just calling the `main()` function, we call it like this:

In [None]:
if __name__ == "__main__":
    main()

<br>When we use two underscores on either side of a Python command, it is called a **dunder** variable. 

<br>I've already added this line to a new version of the sortEmails.py script called sortEmails4.py. Let's take a look at it.

Because it has that special line at the end, if we run the script from the command line, it will run from top to bottom as planned, AND we can also import individual functions from the script into another notebook or script.

### <br><br>Loading custom functions into a notebook

We can import the `removeDups()` function from the sortEmails4.py script:

In [3]:
from sortEmails4 import removeDups

In [4]:
sample_list = ["c", "c", 4, 5, 5, "d"]
print(removeDups(sample_list))

['c', 4, 5, 'd']


### <br><br>BONUS: Saving functions as modules

**If you are using Google Colab, first run the code cell below. DO NOT run this cell if you are not using Google Colab.**

In [None]:
!wget https://raw.githubusercontent.com/aGitHasNoName/savingFunctions/main/sentences.py

<br><br>When your script only contains functions for importing into other scripts and notebooks, you can refer to it as a **module**!

Let's take a look at a Python script with functions we can import. Open the script "sentences.py" in your text editor.

<br>Let's use the functions in the *sentences* module.
<br>We can import the whole package:

In [5]:
import sentences

To see what functions are available in an imported package, we can use the dir() command:

In [6]:
dir(sentences)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'mean',
 'median',
 'printMean',
 'printMedian',
 'printSum']

Notice that the functions we imported from statistics - mean and median - are also available to us through the new script. Always be careful with naming your functions so that they don't duplicate other functions you're calling.
<br><br>It is still best to use the mean and median functions directly from their own package; you don't want to start relying on functions included in scripts included in other scripts included in other scripts... Professional packages will resolve these dependency redundancies.

<br>Let's try out our imported custom functions:

In [7]:
num_list = [84, 89, 90, 55, 68, 83, 90, 92, 85]

In [8]:
sentences.printMean(num_list)

'The mean of the numbers provided is 81.78.'

<br>**Exercise.** Run printMedian on the num_list:

In [9]:
sentences.printMedian(num_list)

'The median of the numbers provided is 85.'

<br><br>You can also import only a single function from a script:

In [10]:
from sentences import printSum

In [11]:
printSum(num_list)

'The sum of the numbers provided is 736.'

<br>If you ever forget which arguments are required from a function, you can use ?:

In [12]:
printSum?

[0;31mSignature:[0m [0mprintSum[0m[0;34m([0m[0ma_list[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Returns the sum of a list in a complete sentence.
[0;31mFile:[0m      ~/Documents/workshops/PythonForAutomation/sentences.py
[0;31mType:[0m      function


This shows why it is important to name arguments logically and include a good docstring.

### <br>Exercise. Write your own script with functions.

1. Open up a new empty plain text file in your text editor.
2. Save the file as "moreSentences.py". If you're using Jupyter on your own computer, save it in the same location as sentences.py. If you're using Google Colab, save it somewhere you can find it.
3. Import the mode function from the statistics package at the top of your script.
4. Include the three new function definitions listed below.
5. Add informative docstrings to all three functions.
6. Include the "if name equals main" code at the end.
7. It's ok to copy/paste from and look at sentences.py for guidance.
8. Save your work!

    def printMode(a_list):
        m = mode(a_list)
        return f"The mode of the numbers provided is {m}."
        
    def printMax(a_list):
        m = max(a_list)
        return f"The max of the numbers provided is {m}."
        
    def printMin(a_list):
        m = min(a_list)
        return f"The min of the numbers provided is {m}."

<br><br>If you are using Google Colab, you will now need to **upload** your script into your Colab workspace. On the left, you should see some icons. Click on the file icon. Once the file tree appears, click on the upload icon, which look like a piece of paper with an arrow pointing up. Choose the file moreSentences.py from your computer.

### <br><br>Exercise
Write code to import your new custom module into your notebook:

Use the dir() function to see what your module contains:

Test out your functions on the num_list:

### <br><br>Where to save your custom modules

For now, it is perfectly okay to keep your functions in the main folder for your project, like we're doing here - the notebook and modules are saved in the same directory.

If you want to create a subfolder for storing your modules, you can, with a couple tweaks. Once you have a folder containing multiple modules (scripts), you officially have a **package**. You will have to add a special file to the package folder in order to import modules from inside it. The file must be saved as "\_\_init__.py" (it's a dunder, so double underscores on either side). The file can be empty!

If you create the init file, you can import modules from your subfolder like this: 

    import folder.script
    
    from folder.script import function

You can also keep a main folder outside of your project directory to hold all your packages and modules. For more info on how to set up custom packages on your computer and call them from anywhere, go to: https://python-102.readthedocs.io/en/latest/packaging.html#packages
