<a href="2. Generate Sales Data.ipynb">&lt;- go back to previous notebook</a>

# Step 3: Add salesperson data.

In the last section, we generated sales data and aggregated it by state.  Now we want some additional insight into our sales personnel.  Let's start by adding some data for our salespeople.  Each state has a different salesperson.

<img src="images/sample-salesperson.png">

## 3.1 Create the 'salesperson' table

In [None]:
from my_connect import my_connect

connection = my_connect()
cursor = connection.cursor()
q = """
CREATE TABLE IF NOT EXISTS salesperson (
                state VARCHAR(2) PRIMARY KEY,
                name VARCHAR(200)
               )
"""
cursor.execute(q)
connection.commit()

## 3.2 Insert salesperson names for all states

In [None]:
import pandas
from faker import Faker
import psycopg2.sql as sql

fake = Faker()
connection = my_connect()
cursor = connection.cursor()

# Get the list of states from the fips table we created earlier
q = "SELECT DISTINCT state FROM fips ORDER BY state ASC"
df = pandas.io.sql.read_sql_query(q, connection)
states = df['state'].values.tolist()

# Generate a fake salesperson name for each state and insert it if there isn't already one for that state
for state in states:
    name = fake.name()
    q = sql.SQL("INSERT INTO salesperson (name, state) VALUES ({}, {}) ON CONFLICT DO NOTHING;")
    cursor.execute(q.format(sql.Literal(name), sql.Literal(state)))
    connection.commit()

# Print 10 to verify
df = pandas.io.sql.read_sql_query("SELECT * FROM salesperson LIMIT 10", connection)
print(df.head(10))

### Data quality concern: flaws in our salesperson table

There's a few things to note about the data set we just created that make it less than ideal for a real-world situation:
- Each state can have one and only one salesperson
- If a salesperson were to handle two states, you'd have to duplicate their name (not normalized)
- Our primary key constraint is on the 'state' column, which would not allow more than one record per state
- We only have the name of the salesperson, no details (e.g. e-mail address)

While the dataset is OK for the faked-up example we're doing here, it's important to think about how your data will be used by others.  In this case, the dataset is pretty limiting, and it probably means that you'd have to do a bunch of rework later on if something changed (e.g. to allow a backup salesperson per state).  Sometimes you can make a small change to the dataset at the beginning that makes it a lot easier.  Examples in this case:
- Adding an integer ID field as the primary key instead, allowing multiple state records
- A separate table for salesperson details, including name, e-mail, phone, etc. with an integer ID as the primary key
- Instead of putting the salesperson name, use the salesperson detail ID to normalize this table

Doing a little work up front can make it much easier to share data across your organization.  The changes above would make the data set more resilient to reasonable changes without having to do a bunch of schema changes that would break downstream apps, reports, or dashboards.

## 3.3 Join the salesperson info in with the sales data

In [None]:
import pandas
connection = my_connect()

q = """
SELECT SUM(sales.amount) AS total, salesperson.state, salesperson.name FROM sales
INNER JOIN fips ON sales.fips = fips.fipstxt
INNER JOIN salesperson ON fips.state = salesperson.state
GROUP BY (salesperson.state, salesperson.name)
ORDER BY total DESC LIMIT 10;
"""

df = pandas.io.sql.read_sql_query(q, connection)
print(df.head(10))


Run the above and you should get something like this:
```
        total state             name
0  3312283.72    TX  William Bernard
1  3173485.84    CA    Larry Morales
2  2113674.22    NY    Kendra Ingram
3  2017619.26    FL   Deborah Walker
4  1627246.43    GA    Justin Medina
5  1380399.77    IL    Robert Hughes
6  1106723.17    OH     Andrew Moore
7  1036475.05    MA      James Scott
8  1023290.36    MO  Tyler Henderson
9  1003630.76    MI    Logan Watkins
```

# Next notebook: visualization

Next we will see a couple of examples of visualizing the report above as a bar graph.

<a href="4. Visualization.ipynb">Go to the next notebook -&gt;</a>


*Contents © Copyright 2020 HP Development Company, L.P. SPDX-License-Identifier: MIT*