# Get Started

Before running this code, you'll need clone the repo into your home directory

* `cd ~`
* `git clone https://github.com/code-for-charlottesville/LAJC-expungement.git`
  
Next you'll do the following in R

* Set your working directory to the repo
* Set your library paths
* Load required libraries
* Load the helper functions
* Load the classifier function

In [65]:
setwd("~/LAJC-expungement/")
.libPaths("~/R/x86_64-pc-linux-gnu-library/4.0/")
source(here::here("code", "helper-functions-db.R"))
source(here::here("code", "expunge_classifier2.R"))

## Preview the data

Here use the helper functions we just loaded to do the following:

* Check what ID's are in a table of interest. There are four tables of raw data. The first three are random samples from the fourth, useful for developing and iterating without running on the full data.
  * `data_1k_sample`
  * `data_10k_sample`
  * `data_100k_sample`
  * `expunge` (the full data, about 3 million ID's)
* Load the raw data for a sample ID
* Classify a sample ID and look at the result

In [8]:
.table <- "data_1k_sample"
head(get_ids_from_table(.table))

In [9]:
.id <- "127051000000102"
read_person_file_db(.id, .table)

person_id,HearingDate,CodeSection,ChargeType,Class,DispositionCode,Plea,Race,Sex,fips
<chr>,<date>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>
127051000000102,2008-12-01,4.1-308,Misdemeanor,4.0,Nolle Prosequi,,White Caucasian (Non-Hispanic),Female,1
127051000000102,2012-01-30,A.18.2-266,Misdemeanor,1.0,Not Guilty,,White Caucasian(Non-Hispanic),Female,1
127051000000102,2012-01-30,46.2-000,Misdemeanor,,Dismissed,,White Caucasian(Non-Hispanic),Female,1
127051000000102,2012-09-27,C.46.2-894,Felony,5.0,Guilty,Guilty,White Caucasian (Non-Hispanic),Female,1


In [17]:
xdf <- read_person_file_db(.id, .table) %>%
  classify_ex2() 

select(xdf, person_id, HearingDate, expungable, reason)

“Unknown levels in `f`: No Indictment Presented, Not True Bill, Dismissed/Other, Not Guilty/Acquitted, Guilty In Absentia”


person_id,HearingDate,expungable,reason
<chr>,<date>,<fct>,<chr>
127051000000102,2008-12-01,Petition,"Dismissal of misdemeanor charges with no arrests or charges in the past 3 years, but with prior convictions on the person's record"
127051000000102,2012-01-30,Petition,"Dismissal of misdemeanor charges with no arrests or charges in the past 3 years, but with prior convictions on the person's record"
127051000000102,2012-01-30,Petition,"Dismissal of misdemeanor charges with no arrests or charges in the past 3 years, but with prior convictions on the person's record"
127051000000102,2012-09-27,Not eligible,"Conviction or deferred dismissal of felony charges that are not ones listed under 19.2-392.12, with no felonies within the last 10 years, no class 3 or 4 felony conviction within the past 20 years, no class 1 or 2 felony or any other felony punishable by imprisonment for life, and no convictions of another kind since disposition date. However, because the disposition date is within 10 years of the current date, the record is not yet eligible for expungement; HOWEVER, the outcome is changed to not eligible because the lifetime limit of two expungements has been exceeded"


# Classify a full table

This section is the real stuff. 

* Classify all rows in a given table
* Write the classified rows to a new table
* Preview the results of that

Note: **you can pass in a customized classifier function** where we are passing in `classify_ex2()`. If you are tweaking the classifier, save your new function, source it as we do above, and then pass it to `classify_table(.classifier_func)`.

In [53]:
# define new table name for classified observations
# NOTE: this will append to this table, NOT overwrite it
# so give it a unique name, unless you're intending to append new rows to an existing table
.nt <- gsub("[^A-Za-z0-9]", "_", glue("test-table-{Sys.getenv('USER')}"))
print(.nt)

test_table_jupyter_seth127


In [52]:
# you can check the currently existing tables with this
dbListTables(DB_CON)

In [66]:
# now run the classifier
classify_table(
    input_table = .table, 
    output_table = .nt,
    classifier_func = classify_ex2,
    update_every = 100
)

Classifying 1000 ID's from data_1k_sample and writing to test_table_jupyter_seth127

  Finished 100 ID's in 0.2 minutes.

  Finished 200 ID's in 0.4 minutes.

  Finished 300 ID's in 0.6 minutes.

  Finished 400 ID's in 0.8 minutes.

  Finished 500 ID's in 1 minutes.

  Finished 600 ID's in 1.2 minutes.

  Finished 700 ID's in 1.4 minutes.

  Finished 800 ID's in 1.6 minutes.

  Finished 900 ID's in 1.8 minutes.

  Finished 1000 ID's in 1.9 minutes.

All done, in 1.9 minutes.



# Analysis

You can now do analysis queries on your new table. Use the `DB_CON` object (loaded from `helper-functions-db.R`) to connect to the database, and any `DBI` functions to query it. [This page](https://dbi.r-dbi.org/) has some helpful examples.

In [68]:
# get all the columns for the first 5 rows
res <- dbSendQuery(DB_CON, glue('SELECT * FROM {.nt} LIMIT 5'))
res_df <- dbFetch(res) # get result and assign to R data.frame
dbClearResult(res) # tell the db to close this query

res_df %>% select(person_id, HearingDate, expungable, reason)

person_id,HearingDate,expungable,reason
<chr>,<date>,<chr>,<chr>
127051000000102,2008-12-01,Petition,"Dismissal of misdemeanor charges with no arrests or charges in the past 3 years, but with prior convictions on the person's record"
127051000000102,2012-01-30,Petition,"Dismissal of misdemeanor charges with no arrests or charges in the past 3 years, but with prior convictions on the person's record"
127051000000102,2012-01-30,Petition,"Dismissal of misdemeanor charges with no arrests or charges in the past 3 years, but with prior convictions on the person's record"
127051000000102,2012-09-27,Not eligible,"Conviction or deferred dismissal of felony charges that are not ones listed under 19.2-392.12, with no felonies within the last 10 years, no class 3 or 4 felony conviction within the past 20 years, no class 1 or 2 felony or any other felony punishable by imprisonment for life, and no convictions of another kind since disposition date. However, because the disposition date is within 10 years of the current date, the record is not yet eligible for expungement; HOWEVER, the outcome is changed to not eligible because the lifetime limit of two expungements has been exceeded"
270120000000917,2007-11-14,Automatic,Conviction of misdemeanor charges listed in 19.2-392.6 B with no convictions within 7 years from disposition date


In [22]:
# see all the column names
names(res_df)
# note that when if you want to query postgres with case-sensitive column names, 
# you have to put them in double quotes, which is very annoying.

In [26]:
# get some counts
res <- dbSendQuery(DB_CON, glue('
SELECT "Race", "Sex", COUNT(expungable) 
FROM {.nt}
GROUP BY "Race", "Sex"
ORDER BY "Race", "Sex"
'))
res_df <- dbFetch(res) # get result and assign to R data.frame
dbClearResult(res) # tell the db to close this query
res_df

Race,Sex,count
<chr>,<chr>,<int64>
American Indian,Male,3
Asian Or Pacific Islander,Female,11
Asian Or Pacific Islander,Male,16
Black,Female,2
Black,Male,8
Black (Non-Hispanic),Female,69
Black (Non-Hispanic),Male,242
Black(Non-Hispanic),Female,277
Black(Non-Hispanic),Male,587
Hispanic,Female,11


In [69]:
# if you want to delete your test table, uncomment this

# dbRemoveTable(DB_CON, .nt)