# NameX Six Months Stats

**!pip** can be used to install any libraries not loaded when the env was created.

This notebook assumes you've installed the requirements.txt (`pip install -r requirements.txt`) before launching jupyter

contents of requirements.txt should be

`jupyter
psycopg2-binary
sqlalchemy
ipython-sql
simplejson
pandas
matplotlib
spacy
papermill
schedule`

We need to load in these libraries into our notebook in order to query, load, manipulate and view the data

In [12]:
import os
import sqlalchemy
import simplejson
import pandas as pd
import matplotlib
# from ..NotebookScheduler import NotebookScheduler
from datetime import datetime, timedelta
from IPython.core.display import HTML

%load_ext sql
%config SqlMagic.displaylimit = 5

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


Read in the connection string info, for DEV | TEST | PROD depending on which DB you wish to run stats against

In [13]:
# Local Credentials
# with open("creds-dev-forward.json.nogit") as fh:
#     creds = simplejson.loads(fh.read())

In [14]:
# DEV Credentials
with open("creds-dev.json.nogit") as fh:
    creds = simplejson.loads(fh.read())

FileNotFoundError: [Errno 2] No such file or directory: 'creds-dev.json.nogit'

In [None]:
# TEST Credentials
# with open("creds-test.json.nogit") as fh:
#     creds = simplejson.loads(fh.read())

In [None]:
# PROD Credentials
# with open("creds-prod.json.nogit") as fh:
#     creds = simplejson.loads(fh.read())

This will create the connection to the database and prep the jupyter magic for SQL

In [15]:
connect_to_db = 'postgresql://' + \
                creds['username'] + ":" + creds['password'] +'@' + \
                creds['hostname'] + ':' + creds['port_num'] + '/' + creds['db_name'];
logging.debug("##########connect_to_db in namex-six-month-report is {}".format(connect_to_db))
%sql $connect_to_db

RuntimeError: Working outside of application context.

This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context().  See the
documentation for more information.

Simplest query to run to ensure our libraries are loaded and our DB connection is working

In [None]:
%%sql 
select now() AT TIME ZONE 'PST' as current_date

Six month totals before running time.

In [None]:
%%sql stat_six_month_completed  <<
SELECT r.user_id     
     , (select username from users u where u.id=r.user_id) AS EXAMINER
     , count(r.*) FILTER (WHERE r.state_cd = 'APPROVED')  AS APPROVED
     , count(r.*) FILTER (WHERE r.state_cd = 'REJECTED')  AS REJECTED
     , count(r.*) FILTER (WHERE r.state_cd = 'CONDITIONAL')  AS CONDITIONAL     
     , count(r.*) FILTER (WHERE r.priority_cd = 'Y')  AS PRIORITIES
     , count(r.*) + count(r.*) FILTER (WHERE r.priority_cd = 'Y')   AS total      
FROM requests r
where r.user_id != 1
AND date(r.last_update AT TIME ZONE 'PST') > current_date - interval '6 months' 
and r.state_cd in ('APPROVED','REJECTED','CONDITIONAL')
group by r.user_id
order by r.user_id

In [None]:
edt = stat_six_month_completed.DataFrame()
edt['examiner'] = edt['examiner'].str.replace('idir/','')

In [None]:
edt['approved_%'] = ((edt.approved + edt.conditional) / edt.total * 100).round(1)
edt['rejected_%'] = (edt.rejected / edt.total * 100).round(1)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(HTML(edt.to_html()))
    print('grand total', edt['total'].sum())

Save to CSV

In [None]:
filename = 'six_month_totals_before_' + datetime.strftime(datetime.now(), '%Y-%m-%d') +'.csv'
edt.to_csv(filename, sep=',', encoding='utf-8', index=False)