<a href="https://colab.research.google.com/github/cincinnatilibrary/collection-analysis/blob/master/reports/Overdue%20Checkin%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overdue Items Based on Checkin

This data is derived from the `import_ils_circ_trans.ipynb` script which in turn is derived from the `ils-analytics` scripts used to gather data related to circulation transactions from the Sierra ILS each week.

A few things to keep in mind about this data

* based on circulation transactions recorded to the Sierra SQL Database
* transactions date back to 2019-01
* checkin operations _do_ have a due_date associated with them
* checkin operations _may be_ backdated--this is especially a concern as noted later
* check out operations may not be present or be able to be matched to a later checkin operation--data prior to that 2019-01 is not available where that operation would be present

NOTE: backdated checkin transactions were discovered to have a number of issues--especailly as they related to re-opening after lengthy shutdowns starting in March 2020.

1. Prior to 2020-06, circulation transaction retention was previously set to 30 days, but data was collected from the ILS every 7 days--provding ample buffer to record the data for later analysis _before_ it was purged from the production ILS.
1. backdates on checkins are treated as if the transaction happened on the backdate being applied. 

   Suppose for example:
   
   1. retenion for circulation transaction data is set to 30-days
   1. the current date was **2020-05-01** and an item was being checked in and the checkin was backdated to **2020-03-01**
   1. several days pass (data retention purges happen on a daily basis and purges are applied to the transactions having a date less than purge date minus retention period)
   1. circulation data is collected from the ILS for later analysis sometime later--at the begining of the following week.
   
   
   In this example, the checkin operation would have been purged from the system before it would have had a chance to be recorded--which was the situation for a many of the checkin operations happening after re-opening.

In [None]:
!pwd
!pip install -U pip > /dev/null
!pip install -U duckdb > /dev/null
!pip install -U duckdb-engine > /dev/null
!pip install -U pandas > /dev/null
!pip install -U sqlalchemy > /dev/null
!pip install -U altair > /dev/null

/home/ray/Documents/plch-data-warehouse/jupyter/ils-analytics-import


In [None]:
import duckdb as sql
from sqlalchemy import create_engine
import pandas as pd
import altair as alt

In [None]:
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('sqlite:///file:./circ_trans_working.db?mode=ro&uri=true')

In [None]:
sql = """\
with checkin_data as (
	SELECT
	-- dct.id,
-- 	ct.call_number_id,
	CASE
		WHEN ct.call_number_id IN (
			SELECT
			id
			FROM
			call_number
			WHERE
			value like 'fiction'
		) then 'fiction'
		else 'non-fiction'
	END AS simplified_callnumber,
	strftime('%Y-%m', dct.checkin_transaction_gmt, 'localtime') as checkin_transaction_month,
	dct.itype_code_num,
	dct.ptype_code, -- 
-- 	dct.overdue_days -- ,
	count(*) as count_checkins,
	count(CASE WHEN overdue_days > 0.0 THEN 1 ELSE NULL END) as count_overdue,
	count(CASE WHEN overdue_days <= 0.0 THEN 1 ELSE NULL END) as count_not_overdue,
	round( avg(overdue_days), 2) as avg_overdue_days
	FROM
	derived_overdue_circ_trans as dct
	JOIN circ_trans as ct on ct.id = dct.id
	WHERE
	checkin_transaction_gmt >= '2019-01-01'
-- 	AND checkin_transaction_gmt < '2020-01-01'
-- 	AND ptype_code < 196
	AND dct.ptype_code = 0
	AND dct.itype_code_num = 0
	GROUP BY 1,2,3,4
)
SELECT
simplified_callnumber,
checkin_transaction_month,
itype_code_num,
(
    SELECT
    name
    FROM
    item_type
    WHERE
    item_type.code = itype_code_num
) as item_format,
ptype_code,
count_checkins,
count_overdue,
round( 
    ( (count_overdue * 1.0) / (count_checkins * 1.0) ) * 100.0, 
    2
) as pct_overdue,
count_not_overdue,
avg_overdue_days
FROM 
checkin_data
"""

df = pd.read_sql(sql=sql, con=engine)

df

Unnamed: 0,simplified_callnumber,checkin_transaction_month,itype_code_num,item_format,ptype_code,count_checkins,count_overdue,pct_overdue,count_not_overdue,avg_overdue_days
0,fiction,2019-01,0,Book,0,65362,2158,3.30,63204,-10.13
1,fiction,2019-02,0,Book,0,59087,1701,2.88,57386,-11.57
2,fiction,2019-03,0,Book,0,61634,1895,3.07,59739,-8.30
3,fiction,2019-04,0,Book,0,62887,1868,2.97,61019,-10.52
4,fiction,2019-05,0,Book,0,63401,1997,3.15,61404,-9.99
...,...,...,...,...,...,...,...,...,...,...
77,non-fiction,2022-01,0,Book,0,32393,4809,14.85,27584,-10.89
78,non-fiction,2022-02,0,Book,0,28949,4161,14.37,24788,-12.99
79,non-fiction,2022-03,0,Book,0,33379,4649,13.93,28730,-10.54
80,non-fiction,2022-04,0,Book,0,31961,4583,14.34,27378,-10.41


In [None]:
alt.Chart(df).mark_rect().encode(
    x='checkin_transaction_month:O',
    y='item_format:O',
    color=alt.Color(
        'pct_overdue:Q',
        scale=alt.Scale(domain=(0, 100))       
    ),
    facet=alt.Facet(
        'simplified_callnumber',
        columns=1
    ),
    tooltip=['checkin_transaction_month', 'checkin_transaction_month', 'simplified_callnumber', 'pct_overdue']
)

In [None]:
# SELECT
# itype_code_num
# FROM
# circ_trans
# WHERE
# op_code = 'o'
# GROUP BY 
# 1
# ORDER BY
# COUNT(*) DESC
# LIMIT 30

sql = """
with checkin_data as (
	SELECT
	strftime('%Y-%m', checkin_transaction_gmt, 'localtime') as checkin_transaction_month,
    strftime('%Y', checkin_transaction_gmt, 'localtime') as checkin_transaction_year,
	itype_code_num,
	ptype_code,
	count(*) as count_checkins,
	count(CASE WHEN overdue_days > 0.0 THEN 1 ELSE NULL END) as count_overdue,
	count(CASE WHEN overdue_days <= 0.0 THEN 1 ELSE NULL END) as count_not_overdue,
	round( avg(overdue_days), 2) as avg_overdue_days
	FROM
	derived_overdue_circ_trans
	WHERE
	checkin_transaction_gmt >= '2019-01-01'
-- 	AND checkin_transaction_gmt < '2020-01-01'
-- 	AND ptype_code < 196
	AND ptype_code = 0
	AND itype_code_num IN (
        2,
        0,
        101,
        100,
        77,
        20,
        4,
        30,
        70,
        230,
        113,
        231,
        6,
        71,
        90,
        105,
        78,
        200,
        31,
        91
	)
	GROUP BY 1,2,3,4
)
SELECT
checkin_transaction_month,
checkin_transaction_year,
itype_code_num,
(
	SELECT
	name
	FROM
	item_type
	WHERE
	item_type.code = itype_code_num
) as item_format,
ptype_code,
count_checkins,
count_overdue,
round( 
	( (count_overdue * 1.0) / (count_checkins * 1.0) ) * 100.0, 
	2
) as pct_overdue,
count_not_overdue,
avg_overdue_days
FROM 
checkin_data
"""

df = pd.read_sql(con=engine, sql=sql)

In [None]:
df.columns

Index(['checkin_transaction_month', 'checkin_transaction_year',
       'itype_code_num', 'item_format', 'ptype_code', 'count_checkins',
       'count_overdue', 'pct_overdue', 'count_not_overdue',
       'avg_overdue_days'],
      dtype='object')

In [None]:
alt.Chart(df[df['checkin_transaction_year']=='2019']).mark_rect().encode(
    x='checkin_transaction_month:O',
    y='item_format:O',
    color=alt.Color(
        'pct_overdue:Q',
        scale=alt.Scale(domain=(0, 100))       
    )
) #.facet(
    # column='checkin_transaction_year'
# )

In [None]:
alt.Chart(df[df['checkin_transaction_year']=='2020']).mark_rect().encode(
    x='checkin_transaction_month:O',
    y='item_format:O',
    color=alt.Color(
        'pct_overdue:Q',
        scale=alt.Scale(domain=(0, 100))       
    )
) #.facet(
    # column='checkin_transaction_year'
# )

In [None]:
alt.Chart(df[df['checkin_transaction_year']=='2021']).mark_rect().encode(
    x='checkin_transaction_month:O',
    y='item_format:O',
    color=alt.Color(
        'pct_overdue:Q',
        scale=alt.Scale(domain=(0, 100))       
    )
) #.facet(
    # column='checkin_transaction_year'
# )

In [None]:
charts = list()
for year in ['2019', '2020', '2021', '2022']:
    print(year)
    chart = alt.Chart(df[df['checkin_transaction_year']==year]).mark_rect().encode(
        x='checkin_transaction_month:O',
        y='item_format:O',
        color=alt.Color(
            'pct_overdue:Q',
            scale=alt.Scale(domain=(0, 100))       
        ),
        tooltip=['item_format', 'checkin_transaction_month', 'checkin_transaction_month', 'pct_overdue']
    )
    charts.append(chart)

2019
2020
2021
2022


In [None]:
charts[0] | charts[3]