# Review module

**Instructions**

In order to complete this review module, we recommend you follow these instructions:

1. Complete the functions provided to you in this notebook, but do **not** change the name of the function or the name(s) of the argument(s). If you do that, the autograder will fail and you will not receive any points.
2. Run all the function-definition cells before you run the testing cells. The functions must exist before they are graded!
3. Read the function docstrings carefully. They contain additional information about how the code should look (a [docstring](https://www.datacamp.com/community/tutorials/docstrings-python) is the stuff that comes between the triple quotes).
4. Some functions may require several outputs (the docstrings tell you which ones). Make sure they are returned in the right order.
5. Remove from each function the code `raise NotImplementedError()` and replace it with your implementation.

For this review module, we will be using the same database we introduced in the main lecture. This database consists of three tables: `agent` (a list of the call center agents), `customer` (a list of the customers and their demographic data), and `call` (all the calls that the agents made and whether they resulted in a purchase or not).

Run the cells below to load the SQLite database and see the names of the columns of each table:

In [31]:
%%capture
!pip install ipython-sql sqlalchemy pandas
import sqlalchemy
sqlalchemy.create_engine("sqlite:///call_center_database.db")
%load_ext sql
%sql sqlite:///call_center_database.db

In [32]:
%%sql
select * from agent limit 0

 * sqlite:///call_center_database.db
Done.


AgentID,Name


In [33]:
%%sql
select * from customer limit 0

 * sqlite:///call_center_database.db
Done.


CustomerID,Name,Occupation,Email,Company,PhoneNumber,Age


In [34]:
%%sql
select * from call limit 0

 * sqlite:///call_center_database.db
Done.


CallID,AgentID,CustomerID,PickedUp,Duration,ProductSold


### Exercise 1

Calculate the average age across all customers. The result of your query should have only one column (called `average_age`) and one row (the actual average numeric age).

In [43]:
def age_customers():
    """
    Calculate the mean age of customers.
    
    Arguments: None
    
    Output:
    query: The SQL query. A string (please refer to the example below)
    
    Example:
    In this function, you only need to write the query inside quotes.
    For instance, if you wanted to see the first 5 rows of the `call` table, this
    should be your code:
    
    query = "SELECT * FROM call LIMIT 5"
    
    If you need to write your query in more than one line (ie. with line breaks),
    you can use triple quotes: https://www.tutorialspoint.com/triple-quotes-in-python
    
    """
    
    # YOUR CODE HERE
    query = "SELECT AVG(age) AS average_age FROM customer"

    return query

### Exercise 2

Calculate the average age of customers who actually made a purchase as a result of a call. The result of your query should have only one column (called `average_age_purchased`) and one row (the actual average numeric age).

**Hint:** Join the `customer` and `call` tables, keep only the calls that resulted in a sale, and then compute the average age. It is true that a single customer may appear several times in the result (which would mean they were contacted more than once and made a purchase more than once). Ignore this and simply average the ages across all sales.

In [67]:
def age_customers_purchased():
    """
    Calculate the mean age of customers who made purchases.
    
    Arguments: None
    
    Output:
    query: The SQL query. A string (please refer to the example below)
    
    Example:
    In this function, you only need to write the query inside quotes.
    For instance, if you wanted to see the first 5 rows of the `call` table, this
    should be your code:
    
    query = "SELECT * FROM call LIMIT 5"
    
    If you need to write your query in more than one line (ie. with line breaks),
    you can use triple quotes: https://www.tutorialspoint.com/triple-quotes-in-python
    
    """
    
    # YOUR CODE HERE
    query = """SELECT AVG(age) AS average_age_purchased
    FROM customer
    WHERE CustomerID IN (
      SELECT CustomerID
      FROM call
      WHERE result = 'sale')"""

    return query

### Exercise 3

Determine the 6 companies which are most highly represented among the customers. Your result should have two columns (`Company` and `number_customers`) and one row.

**Hint:** Count the number of customers who have links to each company (group by company), sort the result in descending order, and finally retrieve only the first 6 rows.

In [61]:
def most_frequent_companies():
    """
    Find which 6 companies are the most frequent among customers.
    
    Arguments: None
    
    Output:
    query: The SQL query. A string (please refer to the example below)
    
    Example:
    In this function, you only need to write the query inside quotes.
    For instance, if you wanted to see the first 5 rows of the `call` table, this
    should be your code:
    
    query = "SELECT * FROM call LIMIT 5"
    
    If you need to write your query in more than one line (ie. with line breaks),
    you can use triple quotes: https://www.tutorialspoint.com/triple-quotes-in-python
    
    """
    
    # YOUR CODE HERE
    query =  """SELECT company AS Company, COUNT(*) AS number_customers 
    FROM customer 
    GROUP BY company 
    ORDER BY number_customers DESC
    LIMIT 6 """


    return query

### Exercise 4

Retrieve all the customers that have Gmail email accounts. Your result must have one column (`Email`) and 79 rows (the 79 customers who have Gmail accounts).

In [49]:
def gmail_accounts():
    """
    Retrieve all the customers who have Gmail accounts.
    
    Arguments: None
    
    Output:
    query: The SQL query. A string (please refer to the example below)
    
    Example:
    In this function, you only need to write the query inside quotes.
    For instance, if you wanted to see the first 5 rows of the `call` table, this
    should be your code:
    
    query = "SELECT * FROM call LIMIT 5"
    
    If you need to write your query in more than one line (ie. with line breaks),
    you can use triple quotes: https://www.tutorialspoint.com/triple-quotes-in-python
    
    """
    
    # YOUR CODE HERE
    query = "SELECT email AS Email FROM customer WHERE email LIKE '%@gmail.com'"
    
    return query

## Testing Cells

Run the below cells to check your answers. Make sure you run your solution cells first before running the cells below, otherwise you will get a `NameError` when checking your answers.

In [52]:
# Ex 1
import pandas as pd
ava = pd.read_sql(age_customers(), con="sqlite:///call_center_database.db")
assert ava.columns == ["average_age"], "Ex. 1 - Your result should only have one column, and it should be named 'average_age'!"
assert len(ava) == 1, "Ex. 1 - Your result should have only one row!"
assert round(ava["average_age"][0],3) == round(24.435, 3), "Ex. 1 - Please check the aggregation function you used. Did you calculate the average age of all customers?"
print("Exercise 1 seems correct!")

Exercise 1 seems correct!


In [68]:
# Ex 2
import pandas as pd
avap = pd.read_sql(age_customers_purchased(), con="sqlite:///call_center_database.db")
assert avap.columns == ["average_age_purchased"], "Ex. 2 - Your result should only have one column, and it should be named 'average_age_purchased'!"
assert len(avap) == 1, "Ex. 2 - Your result should have only one row!"
assert round(avap["average_age_purchased"][0],3) == round(24.708253358925145, 3), "Ex. 2 - Please check the aggregation function you used. Did you calculate the average age of all customers who made a purchase?"
print("Exercise 2 seems correct!")

OperationalError: (sqlite3.OperationalError) no such column: result
[SQL: SELECT AVG(age) AS average_age_purchased
    FROM customer
    WHERE CustomerID IN (
      SELECT CustomerID
      FROM call
      WHERE result = 'sale')]
(Background on this error at: http://sqlalche.me/e/e3q8)

In [62]:
# Ex 3
import pandas as pd
mcc = pd.read_sql(most_frequent_companies(), con="sqlite:///call_center_database.db")
assert set(mcc.columns) == set(["Company", "number_customers"]), "Ex. 3 - Your result should have two columns, named 'Customer' and 'number_customers'!"
assert len(mcc) == 6, "Ex. 3 - Your result should have 6 rows! Remember to group by company and limit your results to only 6 companies!"
assert set(["Romero and Sons","Mitchell and Sons","Miller Group","Kelly Inc","Jones PLC","Hernandez and Sons"]) == set(mcc["Company"].to_list()), "Ex. 3 - Your result does not include all the companies in the top 6! Did you sort the results in descending order?"
print("Exercise 3 seems correct!")

Exercise 3 seems correct!


In [56]:
# Ex 4
import pandas as pd
gm = pd.read_sql(gmail_accounts(), con="sqlite:///call_center_database.db")
assert set(gm.columns) == set(["Email"]), "Ex. 4 - Your result should have only one column, 'Email'!"
assert len(gm) == 79, "Ex. 4 - Your result should have 79 rows! Remember to filter your results using WHERE and the LIKE operator!"
assert gm["Email"].str.count("gmail").sum() == 79, "Ex. 4 - Your result contains some customers whose Email is not Gmail!"
print("Exercise 4 seems correct!")

Exercise 4 seems correct!
