<a href="https://colab.research.google.com/github/coding-dojo-data-science/python-basics-notebooks/blob/main/Functions_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Functions 2:
## Combining everything so far

Let's combine what you've learned about Python programming to build a function to process some data.

In this problem, a insurance brokerage wants to get some statistics about its brokers sales and have hired you to help them sort through their data to answer some questions about seller performance.

In the cell below we create a dictionary of insurance salespeople.  The keys are the first name of each salesperson and the values are a list of monthly sales figures.  The numbers represent how many insurance policies each seller sold in each of the last six months 

In [None]:
sellers = {'Amrita': [20, 10, 40, 50, 20, 50],
           'Joseph': [15, 80, 45, 38, 15, 83],
           'Xian': [55, 89, 22, 34, 45, 32],
           'Gustav': [102, 12, 32, 63, 87, 34],
           'Terracita': [13, 21, 2, 87, 5, 37],
           'Faith': [65, 34, 82, 12, 3, 54]}

# Question 1: What is the average number of sales for each salesperson over the last 6 months?

Let's break down the problem a little.  

a. Python does not have a built in function for calculating mean averages, so we will create a function for that.  One of our custom function can take advantage of other custom functions we've created.  These are called 'helper functions'.

b. We will want to loop over the sellers in the dictionary and calculate the average for each of the lists of sales.  We will need to use a for loop for that, since we are iterating over a collection.  We can't iterate over a dictionary directly, but we can iterate over the keys of a dictionary.  We can retrieve these using the `dict.keys()` method.

c. We need a way to store the result.  It seems natural to create a new dictionary where the keys will be the sellers and the values will be the average monthly sales for each seller.

## `mean_avg()` Function

We will create a helper function to help us calculate mean averages.  This should come in handy going forward.

In [None]:
def mean_avg(num_list):
  # Return the mean average of a list
  return sum(num_list) / len(num_list)

## Testing

If our function is working as intended, it will return 2.0 as the mean average of the list `[1, 2, 3]`

In [49]:
mean_avg([1,2,3])

2.0

## `sales_avgs()` Function

In [None]:
def sales_avgs(sales_data):
  # Create a new, empty dictionary to store the sales averages in
  averages = {}

  # Iterate over the keys in the sales_data dictionary
  for seller in sales_data.keys():
  
    # Add the seller name and the average of their sales to the averages dictionary
    averages[seller] = mean_avg(sales_data[seller])

  # Return the averages dictionary
  return averages

## Testing

We always want to test our function to make sure it functions as intended.

If this function is working correctly it should return a single number for each seller that represents their average monthly sales.

In [None]:
# Test the function
sales_avgs(sellers)

{'Amrita': 31.666666666666668,
 'Faith': 41.666666666666664,
 'Gustav': 55.0,
 'Joseph': 46.0,
 'Terracita': 27.5,
 'Xian': 46.166666666666664}

# Question 2: Who is the top sales person, by average monthly sales?

This may sound like a simple question, but it's actually not trivial to find the maximum value in a dictionary.  

Let's plan this out:

a. We already have a function that finds the average monthly sales for each seller.  We can use that to help us answer this question.

b. We need a way to keep track of who is the best seller so far in each iteration of our loop.  We will need to test each seller as we calculate their average to see if they are the better than the previous best.

c. We can add a default argument to our current function to change its behavior and reuse it.  If we set the default argument `return_best` to `True` when we call this function, it will return the name of the best seller and their sales average as a tuple.  Otherwise it will return the dictionary of all the sellers.

In [None]:
def sales_avgs(sales_data, return_best=False):
  # Create a new, empty dictionary to store the sales averages in
  averages = {}
  best_seller = None
  best_avg = 0
  
  # Iterate over the keys in the sales_data dictionary
  for seller in sales_data.keys():
  
    # Add the seller name and the average of their sales to the averages dictionary
    averages[seller] = mean_avg(sales_data[seller])

    # Check to see if the current seller is better than the previous best
    if averages[seller] > best_avg:
      # If so, set the best seller to be the current seller

      best_seller = seller
      # Set the best sales avg number to be the current sellers avg sales
      best_sales = averages[seller]

  # If  the return_best argument is True, return the name of the best seller
  # and return their monthly sales average
  if return_best:
    return best_seller, averages[best_seller]

  else:
    # If the return_best argument is False (default) 
    # return the dictionary of seller averages
    return averages

## Testing

Remember, we always want to take the time to test our functions to make sure they are working as intended.

The function should still return the dictionary of sellers if `return_best=False`, but return just one seller and her monthly sales average if `return_best=True`

In [None]:
# Test our function

# Test default
print(f'All seller monthly averages are\n {sales_avgs(sellers)}')

# Test return_best

print(f'\nThe Best Seller is: {sales_avgs(sellers, return_best=True)}')

All seller monthly averages are
 {'Amrita': 31.666666666666668, 'Joseph': 46.0, 'Xian': 46.166666666666664, 'Gustav': 55.0, 'Terracita': 27.5, 'Faith': 41.666666666666664}

The Best Seller is: ('Faith', 41.666666666666664)


# Question 3: What would the averages look like if we exclude the best and worst months for each seller?

Sometimes sellers have outliers, exceptionally low or high months that are not necessarily representative of their average productivity.  Our client would like the option to exclude the highest and lowest months from the calculations  our function makes.

Let's plan this out:

1. We can reuse the function we've already made, since it already does much of the work.  
2. We need to add some extra default arguments.  To make our function more flexible we will create separate arguments for excluding the best month and the worst month.  If our client wants to exclude both they can set both to true.
3. Python has built in functions, `max()` and `min()` to calculate the maximum and minimum values in a collection.  We can use those along with the list method `list.remove()` to remove one or both of those values before calculating the averages.
4. We want to remove the highest or lowest values, but we don't necessarily want to mutate the original dictionary.  `list.remove()` changes a list in place, so we will make a copy of the list of sales in the dictionary entry for each seller before we make changes to it.  We can't accomplish this just by declaring a new variable and pointing it to the existing list because the new variable will point to the same list as the list in the dictionary, not to a copy.  **It's possible to have two variables that point to the same Python object, so that if you change one, both change.**  This is NOT want we want to do.  We will use `list.copy()` to create a copy and avoid mutating the original.

## Aside:  Variables as Pointers

In [47]:
# Example of 2 variables pointing to the same object:

# Declare list1
list1 = [1, 2, 3, 4, 5]

# Declare list2 and point it to list1
list2 = list1

# Remove 3 from list2
list2.remove(3)

# Check the contents of list1.  3 has been removed from list1 as well
# list1 and list2 point to the SAME list
print(list1)

[1, 2, 4, 5]


In [48]:
# Use the .copy() method to create a copy of the list:

# Declare list3
list3 = [1, 2, 3, 4, 5]

# Declare list4 and point it to a copy of list3
list4 = list3.copy()

# Remove 3 from list4
list4.remove(3)

# Check the contents of list3.  List3 has not changed
print(list3)

[1, 2, 3, 4, 5]


## Back to the function!

In [None]:
def sales_avgs(sales_data, return_best=False, remove_best=False, 
               remove_worst=False):
  # Create a new, empty dictionary to store the sales averages in
  averages = {}
  best_seller = None
  best_avg = 0
  
  # Iterate over the keys in the sales_data dictionary
  for seller in sales_data.keys():
    
    # Copy the list of sales for this seller to avoid mutating the original dictionary
    sales = sales_data[seller].copy()

    # If remove_best = True, remove the best month from the list of sales
    if remove_best:
      best = max(sales)
      sales.remove(best)
    
    # If remove_worst = True, remove the worst month from the list of sales
    if remove_worst:
      worst = min(sales)
      sales.remove(worst)

    # Add the seller name and the average of their sales to the averages dictionary
    averages[seller] = mean_avg(sales)

    if return_best:
    # Check to see if the current seller is better than the previous best
      if averages[seller] > best_avg:
        # If so, set the best seller to be the current seller

        best_seller = seller
        # Set the best sales avg number to be the current sellers avg sales
        best_sales = averages[seller]

  # If  the return_best argument is True, return the name of the best seller
  # and return their monthly sales average
  if return_best:
    return best_seller, averages[best_seller]

  else:
    # If the return_best argument is False (default) 
    # return the dictionary of seller averages
    return averages

## Testing:

It's necessary to test our functions before using them in our code to ensure they work as intended.  We don't want our functions to introduce errors later in our code!

In order to test our new arguments we will 
1. Print out the averages of the sellers with no values removed for a baseline.
2. Print out the monthly averages for the sellers with the highest month removed.  We expect that all of the averages will go down.
3. Print out the monthly averages with the lowest month removed.  We expect that all of the averages will go up.  
4. Print out the monthly averages with both the highest and lowest removed, just to make sure there are no errors.
5. Print out the best best seller with the highest and lowest months removed.

In [None]:
# Test our function

# Test function defaults
print('All seller monthly averages are')
print(sales_avgs(sellers))

# Test remove_best
print('\nAll seller monthly averages discounting the best month are') 
print(sales_avgs(sellers, remove_best=True))

# Test remove_worst
print('\nAll seller monthly averages discounting the worst month are') 
print(sales_avgs(sellers, remove_worst=True))

# Test remove_best and remove_worst together
print('\nAll seller monthly averages discounting the best and worst months are') 
print(sales_avgs(sellers, remove_worst=True, remove_best=True))

# Test return_best
print(f'\nThe best seller, taking all months into account is:')
print(sales_avgs(sellers, return_best=True))

# Test all arguments at once
print('\nThe best seller discounting the best and worst months are') 
print(sales_avgs(sellers, remove_worst=True, remove_best=True, return_best=True))

All seller monthly averages are
{'Amrita': 31.666666666666668, 'Joseph': 46.0, 'Xian': 46.166666666666664, 'Gustav': 55.0, 'Terracita': 27.5, 'Faith': 41.666666666666664}

All seller monthly averages discounting the best month are
{'Amrita': 28.0, 'Joseph': 38.6, 'Xian': 37.6, 'Gustav': 45.6, 'Terracita': 15.6, 'Faith': 33.6}

All seller monthly averages discounting the worst month are
{'Amrita': 36.0, 'Joseph': 52.2, 'Xian': 51.0, 'Gustav': 63.6, 'Terracita': 32.6, 'Faith': 49.4}

All seller monthly averages discounting the best and worst months are
{'Amrita': 32.5, 'Joseph': 44.5, 'Xian': 41.5, 'Gustav': 54.0, 'Terracita': 19.0, 'Faith': 41.25}

The best seller, taking all months into account is:
('Faith', 41.666666666666664)

The best seller discounting the best and worst months are
('Faith', 41.25)


Our function above seems to work as expected!

# 💪 Your Turn!

## What is a case where this function might result in an error?  It's important to think about edge cases.

In our example, each seller has been selling insurance for at least 6 months.  But, what if a newly employed seller only has one or two months of sales records?

The function below is the same as the function above.  As you can see it returns an error if a seller has too few monthly records of sales figures.  

Change the function so that if the seller's sales history is empty, or if removing the best or worst months would make it empty, the value for that seller in the returned dictionary of averages is `None` instead of a number.

- Hint, create some conditional statements to check the length of a seller's monthly sales histories, then code the correct behavior for that length.

You can use the dictionary seller monthly sales figures below to test your code.

In [None]:
sellers2 = {'Amrita': [20, 10, 40, 50, 20, 50],
           'Joseph': [15, 80, 45, 38, 15, 83],
           'Xian': [55, 89, 22, 34, 45, 32],
           'Gustav': [102, 12, 32, 63, 87, 34],
           'Terracita': [13, 21, 2, 87, 5, 37],
           'Faith': [65, 34, 82, 12, 3, 54],
            'Penny': [12],
            'Rosa':[34, 87],
            'Ikemba': []
            }
sellers2

{'Amrita': [20, 10, 40, 50, 20, 50],
 'Faith': [65, 34, 82, 12, 3, 54],
 'Gustav': [102, 12, 32, 63, 87, 34],
 'Ikemba': [],
 'Joseph': [15, 80, 45, 38, 15, 83],
 'Penny': [12],
 'Rosa': [34, 87],
 'Terracita': [13, 21, 2, 87, 5, 37],
 'Xian': [55, 89, 22, 34, 45, 32]}

In [None]:
def sales_avgs2(sales_data, return_best=False, remove_best=False, 
               remove_worst=False):
  # Create a new, empty dictionary to store the sales averages in
  averages = {}
  best_seller = None
  best_avg = 0
  
  # Iterate over the keys in the sales_data dictionary
  for seller in sales_data.keys():
    
    # Copy the list of sales for this seller to avoid mutating the original dictionary
    sales = sales_data[seller].copy()

    # If remove_best = True, remove the best month from the list of sales
    if remove_best:
      best = max(sales)
      sales.remove(best)
    
    # If remove_worst = True, remove the worst month from the list of sales
    if remove_worst:
      worst = min(sales)
      sales.remove(worst)

    # Add the seller name and the average of their sales to the averages dictionary
    averages[seller] = mean_avg(sales)

    if return_best:
    # Check to see if the current seller is better than the previous best
      if averages[seller] > best_avg:
        # If so, set the best seller to be the current seller

        best_seller = seller
        # Set the best sales avg number to be the current sellers avg sales
        best_sales = averages[seller]

  # If  the return_best argument is True, return the name of the best seller
  # and return their monthly sales average
  if return_best:
    return best_seller, averages[best_seller]

  else:
    # If the return_best argument is False (default) 
    # return the dictionary of seller averages
    return averages

## Testing

Use the below code cell to test your new version of the function.  If you have done this correctly, it will run with no errors.  

The current error says that the code in the `mean_avgs()` function is trying to divide a value by 0.  This is because `len([])` is zero so `sum([]) / len([])` causes an error because no number can be divided by 0.

In [None]:
# Test your function

# Test function defaults
print('All seller monthly averages are')
print(sales_avgs2(sellers2))

# Test remove_best
print('\nAll seller monthly averages discounting the best month are') 
print(sales_avgs2(sellers2, remove_best=True))

# Test remove_worst
print('\nAll seller monthly averages discounting the worst month are') 
print(sales_avgs2(sellers2, remove_worst=True))

# Test remove_best and remove_worst together
print('\nAll seller monthly averages discounting the best and worst months are') 
print(sales_avgs2(sellers2, remove_worst=True, remove_best=True))

# Test return_best
print(f'\nThe best seller, taking all months into account is:')
print(sales_avgs2(sellers2, return_best=True))

# Test all arguments at once
print('\nThe best seller discounting the best and worst months are') 
print(sales_avgs2(sellers2, remove_worst=True, remove_best=True, return_best=True))

All seller monthly averages are


ZeroDivisionError: ignored

# Summary

In this notebook you saw examples of how to use loops, conditionals, and default arguments in functions to create flexible functions that can solve complex problems.  You also learned that it is important to test your functions after creating them to ensure they behave the way you expect.