<a href="0. Introduction.ipynb">&lt;- Go back to introduction notebook</a>

# Step 1: Import public reference data for US counties.

We'll need this data to match up <a href="https://en.wikipedia.org/wiki/FIPS_county_code">FIPS codes</a> (which some of the Covid data uses) to states, which is how our salespeople are assigned.  FIPS codes are a 5-digit number that identifies a county within a state, or area within a territory.

I've already grabbed some <a href="https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/">USDA data</a> (<a href="https://data.nal.usda.gov/access-policy">license</a>).

I already have Postgres installed and running locally, so let's create a table and insert the CSV data from the USDA.    

I have saved the data into ```data/usda_county_pop_2019.csv```.  

The CSV file looks like this:
```
FIPStxt,State,Area_Name,POP_ESTIMATE_2019
01000,AL,Alabama,4903185
...
```

<img src="images/sample-fips.png">

We can use the Postgres COPY command to import the CSV file into a Postgres table.  

## 1.1 Create a little helper function to connect to the database.

The format of the ```CONFIG_FILE``` below will be like this containing my Postgres username and password:

```
[database]
login=sales
password=<password>
```

We'll use ```psycopg2``` as our database access library, but there are others, like sqlalchemy.  For a good overview of accessing Postgres from Python, see <a href="https://www.learndatasci.com/tutorials/using-databases-python-postgres-sqlalchemy-and-alembic/">this article</a>.

In [None]:
# This will install the prerequisite packages we'll use
!pip install psycopg2 pandas

In [None]:
import configparser
import psycopg2

CONFIG_FILE = r'c:\keys\sales.properties'

def my_connect():
    config = configparser.RawConfigParser()
    config.read(CONFIG_FILE)
    db_username=config.get('database', 'login')
    db_password=config.get('database', 'password')

    connection = psycopg2.connect(user=db_username, password=db_password, host='localhost', port=5432, database='sales')
    return connection

### Reusing Code Within These Notebooks

I'll copy this to a file called my_connect.py and put it in the same directory as these notebooks.  Then we can do:
```
from my_connect import my_connect

connection = my_connect()
```

We'll do that in the subsequent notebooks.  (Please note that if you changed the location of your sales.properties file
above, you'll also need to change it in the my_connect.py file in this directory.  You can just use a text editor.)

## 1.2 Create the fips table

In [None]:
connection = my_connect()
cursor = connection.cursor()
q = """
CREATE TABLE IF NOT EXISTS fips (
                fipstxt VARCHAR(5) PRIMARY KEY,
                state VARCHAR(5),
                area_name VARCHAR(100),
                pop_estimate_2019 INTEGER
               )
"""
cursor.execute(q)
connection.commit()

## 1.3 Import the data from the CSV file

In [None]:
import psycopg2
import psycopg2.sql as sql
import os

connection = my_connect()
cursor = connection.cursor()

CSV_FILE = os.path.join(os.getcwd(), "usda_county_pop_2019.csv")

q2 = sql.SQL("""
DELETE FROM fips;
COPY fips(fipstxt, state, area_name, pop_estimate_2019) FROM {} CSV HEADER;
""")

cursor.execute(q2.format(sql.Literal(CSV_FILE)))
connection.commit()

## 1.4 Show a few rows to validate

In [None]:
import pandas

connection = my_connect()
cursor = connection.cursor()
cursor.execute("SELECT COUNT(*) FROM fips")
result = cursor.fetchone()
print("FIPS rows: %s" % result[0])
df = pandas.io.sql.read_sql_query("SELECT * FROM fips LIMIT 5;", connection)
print(df.head())


# Next notebook: generating sales data

<a href="2. Generate Sales data.ipynb">Go to the next notebook -&gt;</a>


*Contents © Copyright 2020 HP Development Company, L.P. SPDX-License-Identifier: MIT*