# Holistic Use of Data for Better Business Insights

In this module, we'll demonstrate some typical phases of the data lifecycle.  This will include:

- Bringing data in
- Cleaning it up
- Enriching it with other data to provide context
- Joining it with other data sources to provide answers to business questions
- Use visualization tools to provide insight into your business

This folder contains a number of Jupyter notebooks containing Python 3 code.  It uses a local PostgreSQL
database to demonstrate some of the items above.

# Overview

In this lab, we'll pretend we're a company selling products.  We'll have data sets representing sales and salespeople, and we'll be able to analyze the top-selling regions and salespeople.  

Next, we'll examine the impact of Covid-19 on our sales.  We have the following business questions that data can help us answer:


- Are there any new outbreaks (weekly increases of Covid-19 cases greater than 20%)?
- Are any of our top 100 sales regions (by county) affected?
- Do I have to notify any of my salespeople about these issues?  Who are they?

We'll use this Python-based Jupyter notebook to automate and document the steps along the way.  We'll use a local PostgreSQL database to hold the data we'll use.

In some cases, we'll create simulated data.  We'll also bring in some public data sets for enrichment and reference (ZIP code, city/state, timezone, etc.) to help support getting us the answers we need in context.

# Some of the things you will learn

- How to perform database queries to PostgreSQL from Python
- How to use pandas, a very popular data manipulation library
- How to import data from a CSV file in an AWS S3 bucket and put it into a Postgres table
- How to import reference data into memory lists for faster use
- How to use Jupyter notebooks to write, test, and document your code
- How to reuse code across notebooks
- How to go from a set of business questions and basic enterprise data to insights into your business
- Some data quality pitfalls when importing, designing, and reusing datasets
- How to create a chart from your data directly inside the Jupyter notebook

# Setup

To run these notebooks, you'll need:
- PostgreSQL installed locally (I'm using v10)
- Anaconda (Python 3.7 or 3.8 is fine), which comes with Python and Jupyter

Follow the normal installation processes, nothing special.  **See the README.md at the root of this repo for
detailed instructions on setting everything up.**

- Install Postgres locally and get it running.  Use the included pgAdmin tool to do the next steps.
- Create a role called 'sales' and set a password.
- Create a properties file with the login and password (see 1.1 below for the correct format).
- Create a database called 'sales'.  You can leave it empty.

# Next notebook: importing reference data

Open the notebook titled <a href="1. FIPS Code and Population Data.ipynb">1. FIPS Code and Population Data</a> to continue.


*Contents © Copyright 2020 HP Development Company, L.P. SPDX-License-Identifier: MIT*
