# Exploring inconsistancies between AllStatesAndTerritories

This notebook shows inconsistancies in the *LESO Property Transferred to Participating Agencies* files over the first three fiscal quarters of 2020.

As of October, 2020, three DISP_AllStatesAndTransfers file have been collected from the [DLA LESO Public Data](https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/PublicInformation/) website for the quarters ending March 31, June 30, and September 30 of 2020.

In [None]:
#    Libraries used by this notebook.

import pandas as pd
import sys

#!python --version    #Python 3.8.5
# sys standard module
#pd.__version__       #1.1.2

sys.path.insert(0, "..\\..\\scripts\\") # go to parent dir
from notebookfunctions import make_dataframe, count_by_time, differences_by_time

#    VARIABLES THAT CAN BE CUSTOMIZED

#    Enter the path to the folder containing all the data files.
path_datafiles = "../../data/"

#    Get the 'LESO Property Transferred to Participating Agencies' file from 
#        Defense Logicstics Agency Law Enforcement Support Office Public Information
#    The original name of the data file should be in the form:
#        DISP_AllStatesAndTerritories_mmddyyyy.xlsx  
#
#    Enter the name of the LESO files to be checked.
LESO1_file = "DISP_AllStatesAndTerritories_03312020.xlsx"
LESO2_file = "DISP_AllStatesAndTerritories_06302020.xlsx"
LESO3_file = "DISP_AllStatesAndTerritories_09302020.xlsx"
#LESO4_file = "DISP_AllStatesAndTerritories_12312020.xlsx"

#    Read the data from the XLSX files.
#    transfer#_dict is a dictionary of all sheets in the LESO_file
#         keys are full state/territory names
#         values are a single dataframe of all transfers for that state/territory
#    The records may be cumulative up to this quarter.
transfers1_dict = pd.read_excel('file:' + path_datafiles + LESO1_file, sheet_name=None)
transfers2_dict = pd.read_excel('file:' + path_datafiles + LESO2_file, sheet_name=None)
transfers3_dict = pd.read_excel('file:' + path_datafiles + LESO3_file, sheet_name=None)
#transfers4_dict = pd.read_excel('file:' + path_datafiles + 'DISP_AllStatesAndTerritories_12312020.xlsx', sheet_name=None)

#    Flatten each dictionary of states into its own dataframe
transfers1_df = make_dataframe(transfers1_dict, 'Ship Date')
transfers2_df = make_dataframe(transfers2_dict, 'Ship Date')
transfers3_df = make_dataframe(transfers3_dict, 'Ship Date')
#transfers4_df = make_dataframe(transfers4_dict, 'Ship Date')

In [None]:
def split_demil(a_df: pd.DataFrame) -> [pd.DataFrame, pd.DataFrame]:
    """split the dataframe into controlled and noncontrolled records"""
    
    allQ = a_df[a_df['DEMIL Code'].isin(['Q'])]
    ncQ = allQ[allQ['DEMIL IC'] != 3]
    cQ = allQ[allQ['DEMIL IC'] == 3]
    
    noncontrolled = a_df[a_df['DEMIL Code'].isin(['A'])]
    controlled = a_df[a_df['DEMIL Code'].isin(['B', 'C', 'D', 'E', 'F', 'G'])]
    
    return [controlled.append(cQ), noncontrolled.append(ncQ)]

### What is the range of dates on records in each file?

Under a section called *LESO Property Transferred to Participating Agencies* the website says this file is "... is the most recent quarterly update of the accountable property held by participating agencies." Looking at the 'Ship Date' in each file, one finds the following:

In [None]:
print('LESO file transfers1 index between',transfers1_df.index.min(),'and',transfers1_df.index.max())
print('LESO file transfers2 index between',transfers2_df.index.min(),'and',transfers2_df.index.max())
print('LESO file transfers3 index between',transfers3_df.index.min(),'and',transfers3_df.index.max())

In [None]:
transfer1_late_dates = transfers1_df.loc['2020-04-01':'2020-04-21'].groupby('Ship Date')['Ship Date'].count()
transfer2_late_dates = transfers2_df.loc['2020-04-01':'2020-04-21'].groupby('Ship Date')['Ship Date'].count()
transfer3_late_dates = transfers3_df.loc['2020-04-01':'2020-04-21'].groupby('Ship Date')['Ship Date'].count()
april_df = pd.DataFrame({'Transfers1':transfer1_late_dates,
                         'Transfers2':transfer2_late_dates,
                         'Transfers3':transfer3_late_dates}).fillna(0).astype(int)
ax = april_df.plot.bar(rot=90,figsize=(10,5))

In [None]:
april_differences = april_df.diff(axis=1).iloc[:, 1:]
april_differences.columns = ['Difference2-1','Difference3-2']
april_differences[(april_differences['Difference2-1'] != 0) | (april_differences['Difference3-2'] != 0)]

While the data is stable from the second quarter to the third quarter, there are both records gained and lost from the first quarter to the second quarter.

### What how many total records are there in each file?

In [None]:
ax = pd.Series({'Transfers1': transfers1_df.shape[0],
                'Transfers2': transfers2_df.shape[0],
                'Transfers3': transfers3_df.shape[0]}).plot(rot=90,figsize=(10,5))

In [None]:
trans1_record_count = int(transfers1_df.shape[0])
trans2_record_count = int(transfers2_df.shape[0])
trans3_record_count = int(transfers3_df.shape[0])
print('Record count in transfers1:',trans1_record_count)
print('Record count in transfers2:',trans2_record_count)
print('Record count in transfers3:',trans3_record_count)

The number of records is declining from file to file despite the time range growing. Documentation at the DLA LESO Public Data FAQ states that *LESO Property Transferred to Participating Agencies* files are snapshots of the DLA LESO inventory of items held by law enforcement agencies as of that quarter. It says that after 1 year non-controlled items are removed from DLA LESO inventory and become the property of local law enforcement agencies.

[Controlled](https://www.dla.mil/Portals/104/Documents/DispositionServices/LESO/DISP_ControlledPropertyDefinition_062019.pdf) items have a DEMIL Code of B, C, D, E, F or G and those with a DEMIL Code Q combined with a DEMIL IC value of 3. Note [10 U.S.C. &sect; 2576a(f)](https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title10-section2576a&num=0&edition=prelim) defines controlled as having DEMIL Codes of B, C, D, E, G, and Q. For the purpose of this notebook, the first definition is used.   

In [None]:
demilcode_qNON_count = pd.DataFrame([
    transfers1_df[(transfers1_df['DEMIL Code'] == 'Q') & (transfers1_df['DEMIL IC'] != 3)].\
                    groupby('DEMIL Code')['DEMIL Code'].count(),
    transfers2_df[(transfers2_df['DEMIL Code'] == 'Q') & (transfers2_df['DEMIL IC'] != 3)].\
                    groupby('DEMIL Code')['DEMIL Code'].count(),
    transfers3_df[(transfers3_df['DEMIL Code'] == 'Q') & (transfers3_df['DEMIL IC'] != 3)].\
                    groupby('DEMIL Code')['DEMIL Code'].count()])
demilcode_qNON_count.index = ['transfers1', 'transfers2', 'transfers3']
demilcode_qNON_count.columns = ['QNON']


demilcode_q3_count = pd.DataFrame([
    transfers1_df[(transfers1_df['DEMIL Code'] == 'Q') & (transfers1_df['DEMIL IC'] == 3)].\
                    groupby('DEMIL Code')['DEMIL Code'].count(),
    transfers2_df[(transfers2_df['DEMIL Code'] == 'Q') & (transfers2_df['DEMIL IC'] == 3)].\
                    groupby('DEMIL Code')['DEMIL Code'].count(),
    transfers3_df[(transfers3_df['DEMIL Code'] == 'Q') & (transfers3_df['DEMIL IC'] == 3)].\
                    groupby('DEMIL Code')['DEMIL Code'].count()])
demilcode_q3_count.index = ['transfers1', 'transfers2', 'transfers3']
demilcode_q3_count.columns = ['Q3']


demilcode_count = pd.concat([transfers1_df.groupby('DEMIL Code')['Ship Date'].count(),
                             transfers2_df.groupby('DEMIL Code')['Ship Date'].count(),
                             transfers3_df.groupby('DEMIL Code')['Ship Date'].count()], axis=1)
demilcode_count.columns = ['transfers1', 'transfers2', 'transfers3']
demilcode_count = demilcode_count.append(demilcode_q3_count.T).append(demilcode_qNON_count.T)

In [None]:
demilcode_count


In [None]:
diff_t1t2all = trans1_record_count - trans2_record_count
print('There are', str(diff_t1t2all), 'records in transfers2 than transfers1.')

t1nc = demilcode_count.loc[['A','QNON'], 'transfers1'].sum()
t2nc = demilcode_count.loc[['A','QNON'], 'transfers2'].sum()
diff_t1t2nc = t1nc - t2nc
print('Of these,', str(diff_t1t2nc), 'are non-controlled.')

t1c = demilcode_count.loc[['B','C','D','E','F','Q3'], 'transfers1'].sum()
t2c = demilcode_count.loc[['B','C','D','E','F','Q3'], 'transfers2'].sum()
diff_t1t2c = t1c - t2c
print('Of these,', str(diff_t1t2c), 'are controlled.')

### What patterns in controlled and non-controlled items over the years?

In [None]:
controlled1, noncontrolled1 = split_demil(transfers1_df)
controlled2, noncontrolled2 = split_demil(transfers2_df)
controlled3, noncontrolled3 = split_demil(transfers3_df)

In [None]:
between_t1c_and_t2c = differences_by_time('Y', '2000-01-01', '2020-09-30', controlled1, controlled2)
between_t2c_and_t3c = differences_by_time('Y', '2000-01-01', '2020-09-30', controlled2, controlled3)

compare_controlled = between_t1c_and_t2c.merge(between_t2c_and_t3c, how='outer',
                          left_index=True, right_index=True, suffixes=['_first','_second'])
compare_controlled.columns = ['t1count1','t2count2','t2count1','t3count2']
compare_controlled = compare_controlled.fillna(0).astype(int)
ax = compare_controlled.plot.bar(rot=90,figsize=(18,9))

In [None]:
between_t1nc_and_t2nc = differences_by_time('Y', '2000-01-01', '2020-09-30', noncontrolled1, noncontrolled2)
between_t2nc_and_t3nc = differences_by_time('Y', '2000-01-01', '2020-09-30', noncontrolled2, noncontrolled3)

compare_noncontrolled = between_t1nc_and_t2nc.merge(between_t2nc_and_t3nc, how='outer',
                          left_index=True, right_index=True, suffixes=['_first','_second'])
compare_noncontrolled.columns = ['t1count1','t2count2','t2count1','t3count2']
compare_noncontrolled = compare_noncontrolled.fillna(0).astype(int)
ax = compare_noncontrolled.plot.bar(rot=90,figsize=(18,9))

In [None]:
compare_noncontrolled

As expected 2019 shows a big decrease in non-controlled items. In each file, there is some change in both controlled and non-controlled items beyond one year ago. There is no documentation for why this is so. If there were a way to tie requests and cancellations to this data, some insight might be gained. This is why the Check and Merge notebooks in this repository have been developed.

### Exploring 2018 differences

Exploring the differences month-by-month in the year 2018 shows that there is no pattern in the data. Also, records are not only disappearing, but being added back in over time.

In [None]:
month_between_t1c_and_t2c = differences_by_time('M', '2018-01-01', '2018-12-31', controlled1, controlled2)
month_between_t2c_and_t3c = differences_by_time('M', '2018-01-01', '2018-12-31', controlled2, controlled3)

month_compare_controlled = month_between_t1c_and_t2c.merge(month_between_t2c_and_t3c, how='outer',
                          left_index=True, right_index=True, suffixes=['_first','_second'])
month_compare_controlled.columns = ['t1count1','t2count2','t2count1','t3count2']
month_compare_controlled = month_compare_controlled.fillna(0).astype(int)
ax = month_compare_controlled.plot.bar(rot=90,figsize=(18,9))

In [None]:
month_between_t1nc_and_t2nc = differences_by_time('M', '2018-01-01', '2018-12-31', noncontrolled1, noncontrolled2)
month_between_t2nc_and_t3nc = differences_by_time('M', '2018-01-01', '2018-12-31', noncontrolled2, noncontrolled3)

month_compare_noncontrolled = month_between_t1nc_and_t2nc.merge(month_between_t2nc_and_t3nc, how='outer',
                          left_index=True, right_index=True, suffixes=['_first','_second'])
month_compare_noncontrolled.columns = ['t1count1','t2count2','t2count1','t3count2']
month_compare_noncontrolled = month_compare_noncontrolled.fillna(0).astype(int)
ax = month_compare_noncontrolled.plot.bar(rot=90,figsize=(18,9))