ROTA

Rapid Offense Text Autocoder

Criminal justice research often requires conversion of free-text offense descriptions into overall charge categories to aid analysis. For example, the free-text offense of "eluding a police vehicle" would be coded to a charge category of "Obstruction - Law Enforcement". Since free-text offense descriptions aren't standardized and often need to be categorized in large volumes, this can result in a manual and time intensive process for researchers. ROTA is a machine learning model for converting offense text into offense codes.

Currently ROTA predicts the Charge Category of a given offense text. A charge category is one of the headings for offense codes in the 2009 NCRP Codebook: Appendix F.

The model was trained on publicly available data from a crosswalk containing offenses from all 50 states combined with three additional hand-labeled offense text datasets.

Charge Category Example

Data Preprocessing

The input text is standardized through a series of preprocessing steps. The text is first passed through a sequence of 500+ case-insensitive regular expressions that identify common misspellings and abbreviations and expand the text to a more full, correct English text. Some data-specific prefixes and suffixes are then removed from the text -- e.g. some states included a statute as a part of the text. Finally, punctuation (excluding dollar signs) are removed from the input, multiple spaces between words are removed, and the text is lowercased.

Cross-Validation Performance

This model was evaluated using 3-fold cross validation. Except where noted, numbers presented below are the mean value across the 3 folds.

The model in this repository is trained on all available data. Because of this, you can typically expect production performance to be (unknowably) better than the numbers presented below.

Overall Metrics

Metric	Value
Accuracy	0.934
MCC	0.931

Metric	precision	recall	f1-score
macro avg	0.811	0.786	0.794

Note: These are the average of the values per fold, so macro avg is the average of the macro average of all categories per fold.

Per-Category Metrics

Category	precision	recall	f1-score	support
AGGRAVATED ASSAULT	0.954	0.954	0.954	4085
ARMED ROBBERY	0.961	0.955	0.958	1021
ARSON	0.946	0.954	0.95	344
ASSAULTING PUBLIC OFFICER	0.914	0.905	0.909	588
AUTO THEFT	0.962	0.962	0.962	1660
BLACKMAIL/EXTORTION/INTIMIDATION	0.872	0.871	0.872	627
BRIBERY AND CONFLICT OF INTEREST	0.784	0.796	0.79	216
BURGLARY	0.979	0.981	0.98	2214
CHILD ABUSE	0.805	0.78	0.792	139
COCAINE OR CRACK VIOLATION OFFENSE UNSPECIFIED	0.827	0.815	0.821	47
COMMERCIALIZED VICE	0.818	0.788	0.802	666
CONTEMPT OF COURT	0.982	0.987	0.984	2952
CONTRIBUTING TO DELINQUENCY OF A MINOR	0.544	0.333	0.392	50
CONTROLLED SUBSTANCE - OFFENSE UNSPECIFIED	0.864	0.791	0.826	280
COUNTERFEITING (FEDERAL ONLY)	0	0	0	2
DESTRUCTION OF PROPERTY	0.97	0.968	0.969	2560
DRIVING UNDER INFLUENCE - DRUGS	0.567	0.603	0.581	34
DRIVING UNDER THE INFLUENCE	0.951	0.946	0.949	2195
DRIVING WHILE INTOXICATED	0.986	0.981	0.984	2391
DRUG OFFENSES - VIOLATION/DRUG UNSPECIFIED	0.903	0.911	0.907	3100
DRUNKENNESS/VAGRANCY/DISORDERLY CONDUCT	0.856	0.861	0.858	380
EMBEZZLEMENT	0.865	0.759	0.809	100
EMBEZZLEMENT (FEDERAL ONLY)	0	0	0	1
ESCAPE FROM CUSTODY	0.988	0.991	0.989	4035
FAMILY RELATED OFFENSES	0.739	0.773	0.755	442
FELONY - UNSPECIFIED	0.692	0.735	0.712	122
FLIGHT TO AVOID PROSECUTION	0.46	0.407	0.425	38
FORCIBLE SODOMY	0.82	0.8	0.809	76
FORGERY (FEDERAL ONLY)	0	0	0	2
FORGERY/FRAUD	0.911	0.928	0.919	4687
FRAUD (FEDERAL ONLY)	0	0	0	2
GRAND LARCENY - THEFT OVER $200	0.957	0.973	0.965	2412
HABITUAL OFFENDER	0.742	0.627	0.679	53
HEROIN VIOLATION - OFFENSE UNSPECIFIED	0.879	0.811	0.843	24
HIT AND RUN DRIVING	0.922	0.94	0.931	303
HIT/RUN DRIVING - PROPERTY DAMAGE	0.929	0.918	0.923	362
IMMIGRATION VIOLATIONS	0.84	0.609	0.697	19
INVASION OF PRIVACY	0.927	0.923	0.925	1235
JUVENILE OFFENSES	0.928	0.866	0.895	144
KIDNAPPING	0.937	0.93	0.933	553
LARCENY/THEFT - VALUE UNKNOWN	0.955	0.945	0.95	3175
LEWD ACT WITH CHILDREN	0.775	0.85	0.811	596
LIQUOR LAW VIOLATIONS	0.741	0.768	0.755	214
MANSLAUGHTER - NON-VEHICULAR	0.626	0.802	0.701	139
MANSLAUGHTER - VEHICULAR	0.79	0.853	0.819	117
MARIJUANA/HASHISH VIOLATION - OFFENSE UNSPECIFIED	0.741	0.662	0.699	62
MISDEMEANOR UNSPECIFIED	0.63	0.243	0.347	57
MORALS/DECENCY - OFFENSE	0.774	0.764	0.769	412
MURDER	0.965	0.915	0.939	621
OBSTRUCTION - LAW ENFORCEMENT	0.939	0.947	0.943	4220
OFFENSES AGAINST COURTS, LEGISLATURES, AND COMMISSIONS	0.881	0.895	0.888	1965
PAROLE VIOLATION	0.97	0.953	0.962	946
PETTY LARCENY - THEFT UNDER $200	0.965	0.761	0.85	139
POSSESSION/USE - COCAINE OR CRACK	0.893	0.928	0.908	68
POSSESSION/USE - DRUG UNSPECIFIED	0.624	0.535	0.572	189
POSSESSION/USE - HEROIN	0.884	0.852	0.866	25
POSSESSION/USE - MARIJUANA/HASHISH	0.977	0.97	0.973	556
POSSESSION/USE - OTHER CONTROLLED SUBSTANCES	0.975	0.965	0.97	3271
PROBATION VIOLATION	0.963	0.953	0.958	1158
PROPERTY OFFENSES - OTHER	0.901	0.87	0.885	446
PUBLIC ORDER OFFENSES - OTHER	0.7	0.721	0.71	1871
RACKETEERING/EXTORTION (FEDERAL ONLY)	0	0	0	2
RAPE - FORCE	0.842	0.873	0.857	641
RAPE - STATUTORY - NO FORCE	0.707	0.55	0.611	140 ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROTA

Rapid Offense Text Autocoder

Data Preprocessing

Cross-Validation Performance

Overall Metrics

Per-Category Metrics

Releases: RTIInternational/rota

2021.05.18.15

ROTA

Rapid Offense Text Autocoder

Data Preprocessing

Cross-Validation Performance

Overall Metrics

Per-Category Metrics

2021.05.17.14