Skip to content

generate predictions to help prioritize a list of sales prospects

License

Notifications You must be signed in to change notification settings

domfelipe/prospecting

 
 

Repository files navigation

status license

⚒ Prospecting ⚒

This project started as an effort to predict a 'prospect score' for each business in a list of current and (predominantly) potential customers.

While the initial goal was to provide a list to help prioritize sales opportunities (ex. rank order prospects by state), I also had some ideas about tying in Google Sheets to help with my typical ML workflow (data profiling > clean/transform > performance reporting > delivery of final predictions > revisiting column treatments >> etc). OAuth 2.0 is used for Google API authentication when using the SheetsApi and DriveApi classes, and the usual Sheets sharing options exist if you want to invite collaborators.

I'll be updating this README and documentation in general...In the interim - as an example of how Google Sheets is used, the following table outlines the spreadsheets and tabs which I have found to be useful. While I cannot share my original prospecting dataset, I used an old Innocentive challenge dataset as an example.

spreadsheet sheet note
projectname_metadata metadata Control logic for column processing treatments; used by Python to inform how each column is processed. The functions in process.py rely on information from this tab.
raw_descr Descriptive information about raw data (df_raw)
clean_descr Descriptive information about cleaned dataset (df_clean)
--- --- ---
projectname_model_reporting session_report Summarizes model performance, plan to make this the main performance tab. A "session" represents an instance of a "ModelSession" class instance which is used to share access to train/test sets.
cv_results If GridSearchCV is used, the GridSearchCV.cv_results_ reports are saved here (shows performance by fold for each parameter set evaluated)
model_types A simple lookup table, used by Python script as a reference when building the report for the session_report tab
_plots performance report performance report subset
--- --- ---
projectname_predictions predictions Final predictions, with probabilities
lookupmaster A lookup table with master list of prospects / entities of interest, or misc information to join with predictions
README Intend to use as an FYI tab, to provide overview of health of predictions made (ex. highlight number of correct/incorrect predictions, etc)

Overview

  • The project directory contains:
        .
        ├── .dockerignore
        ├── .gitattributes              # For CRLF correction
        ├── .gitignore
        ├── credentials/                # Not necessarily best practice, but convenient
        │   ├── README.md
        │   └── certs/
        │       └── README.md
        ├── data/
        │   ├── README.md
        │   └── tmp/                    # Logs saved here
        │       ├── README.md
        │       └── joblib/             # Used by scikit learn when running in Docker container
        │           └── README.md
        ├── Dockerfile                  # See README_detail.md for more info
        ├── LICENSE.md
        ├── jupyter_notebook_config.py  # See README_detail.md for more info
        ├── mplimporthook.py            # Used by Dockerfile
        ├── notebooks/                  # Jupyter Notebooks
        ├── prospecting
        │   ├── __init__.py
        │   ├── env.py                  # Check here for environment variables required
        │   ├── utils.py
        │   ├── api.py                  # Google Sheets and Google Drive API classes
        │   ├── process.py              # Data cleaning functions, relies on info in metadata tab
        │   ├── model.py
        │   ├── report.py
        │   ├── errors.py
        │   └── version.py
        ├── README.md
        ├── requirements_nonconda.txt   # Used by Dockerfile
        ├── scripts
        │   └── hash_jupyter_pw.py      # Create hashed password to use with Docker container
        ├── start-notebook.sh           # Used by Dockerfile
        ├── start.sh                    # Used by Dockerfile
        └── start-singleuser.sh         # Used by Dockerfile

⚒ ⚒ ⚒

About

generate predictions to help prioritize a list of sales prospects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 74.9%
  • Jupyter Notebook 23.6%
  • Shell 1.5%