## ODOC Public Inmate Data


To use this notebook, download the data published [here](http://doc.publishpath.com/odoc-public-inmate-data). Unzip the file and place the files in a subdirectory called 'data'.

The set of files includes a ReadMe.txt which describes the files and their fixed formats. The first sections of this notebook show description of each file and how to import it into pandas dataframes.  NOTE: the widths variable differ slighty from the description to handle some difference in the data. 

In [24]:
import pandas as pd

## Schedule A - Profile Data Layout
 ```
 =======================================================
 Name                            Null?    Type
 ------------------------------- -------- ----
 DOC_NUM                         NOT NULL NUMBER(10)
 LAST_NAME                                VARCHAR2(30)
 FIRST_NAME                               VARCHAR2(30)
 MIDDLE_NAME                              VARCHAR2(30)
 SUFFIX                                   VARCHAR2(5)
 LAST_MOVE_DATE                           DATE 'DD-MMM-YY' (9)
 FACILITY                                 VARCHAR2(40)
 BIRTH_DATE                               DATE 'DD-MMM-YY' (9)
 SEX                                      VARCHAR2(1)
 RACE                                     VARCHAR2(40)
 HAIR                                     VARCHAR2(40)
 HEIGHT_FT                                VARCHAR2(2)
 HEIGHT_IN                                VARCHAR2(2)
 WEIGHT                                   VARCHAR2(4)
 EYE                                      VARCHAR2(40)
 STATUS                                   VARCHAR2(10)
 ```

In [25]:
file = 'data/Vendor_Profile_Sample_Text.dat'

# uncomment this line to use the full dataset
# file = 'data/Vendor_Profile_Extract_Text.dat'

names = [
    "DOC_NUM"       
    ,"LAST_NAME"     
    ,"FIRST_NAME"    
    ,"MIDDLE_NAME"   
    ,"SUFFIX"        
    ,"LAST_MOVE_DATE"
    ,"FACILITY"      
    ,"BIRTH_DATE"    
    ,"SEX"           
    ,"RACE"          
    ,"HAIR"          
    ,"HEIGHT_FT"     
    ,"HEIGHT_IN"     
    ,"WEIGHT"        
    ,"EYE"           
    ,"STATUS"
]
widths = [
    11,
    30,
    30,
    30,
    5,
    9,
    40,
    9,
    1,
    40,
    40,
    2,
    2,
    4,
    40,10
]

df = pd.read_fwf(file, 
    header=None,
    widths=widths,
    names=names)
                 
df

Unnamed: 0,DOC_NUM,LAST_NAME,FIRST_NAME,MIDDLE_NAME,SUFFIX,LAST_MOVE_DATE,FACILITY,BIRTH_DATE,SEX,RACE,HAIR,HEIGHT_FT,HEIGHT_IN,WEIGHT,EYE,STATUS
0,400000,KANNADY,CARL,,,,INACTIVE,06-SEP-56,M,WHITE,BROWN,6,0,210,GREEN,Inactive
1,400001,SMITH,ADOLPHUS,HOWARD,,,INACTIVE,09-APR-71,M,BLACK,BLACK,5,7,155,BROWN,Inactive
2,400002,LE,DAVID,SMITH,,,INACTIVE,26-AUG-80,M,ASIAN,BLACK,5,6,150,BROWN,Inactive
3,400003,PHILLIPS,CLINT,ALLEN,,,INACTIVE,24-AUG-75,M,WHITE,BROWN,5,7,162,BROWN,Inactive
4,400005,COLEMAN,JOEL,,,,INACTIVE,15-JAN-67,M,WHITE,BLONDE,5,7,150,BROWN,Inactive
5,400006,SMITH,DANIEL,DUANE,,30-MAY-08,INACTIVE,14-DEC-74,M,WHITE,BROWN,6,3,205,BROWN,Inactive
6,400007,HILL,MICHAEL,,,,INACTIVE,24-OCT-55,M,WHITE,RED,5,8,180,BROWN,Inactive
7,400009,MCLAIN,JOSEPH,,,,INACTIVE,12-DEC-76,M,WHITE,RED,5,3,145,BLUE,Inactive
8,400014,BLEDSOE,TONY,LEE,,13-MAY-02,INACTIVE,24-APR-77,M,WHITE,BLONDE,6,0,150,HAZEL,Inactive
9,400019,BEVENS,KAREN,ANNETTE,,,INACTIVE,07-OCT-61,F,WHITE,BROWN,5,2,140,BLUE,Inactive


## Schedule B - Alias Data Layout
```
=======================================================
 Name                            Null?    Type
 ------------------------------- -------- ----
 DOC_NUM                         NOT NULL NUMBER(10)
 LAST_NAME                                VARCHAR2(30)
 FIRST_NAME                               VARCHAR2(30)
 MIDDLE_NAME                              VARCHAR2(30)
 SUFFIX                                   VARCHAR2(5)
```

In [27]:
file = 'data/Vendor_Alias_Sample_Text.dat'

# uncomment this line to use the full dataset
# file = 'data/Vendor_Alias_Extract_Text.dat'

names = ["DOC_NUM", "LAST_NAME", "FIRST_NAME", "MIDDLE_NAME","SUFFIX"
]

widths = [ 11, 30, 30, 30, 5]

df = pd.read_fwf(file, 
    header=None,
    widths=widths,
    names=names)
                 
df

Unnamed: 0,DOC_NUM,LAST_NAME,FIRST_NAME,MIDDLE_NAME,SUFFIX
0,400000,KANNADY,CARL,,
1,400001,SMITH,ADOLPHUS,HOWARD,
2,400002,LE,DAVID,SMITH,
3,400003,PHILLIPS,CLINT,ALLEN,
4,400005,COLEMAN,JOEL,,
5,400006,SMITH,DANIEL,DUANE,JR
6,400006,SMITH,DANIEL,DUANE,
7,400006,SMITH,DANIEL,DUANEJR,
8,400007,HILL,MICHAEL,,
9,400009,MCLAIN,JOSEPH,,


## Schedule C - Sentence Data Layout
```
=======================================================
Incarcerated_Term_In_Years = 9999 indicates a death sen
tence

Incarcerated_Term_In_Years = 8888 indicates a life with
out parole sentence

Incarcerated_Term_In_Years = 7777 indicates a life sent
ence
=======================================================
 Name                            Null?    Type
 ------------------------------- -------- ----
 DOC_NUM                         NOT NULL NUMBER(10)
 STATUTE_CODE                    NOT NULL VARCHAR2(40)
 SENTENCING_COUNTY                        VARCHAR2(40)
 JS_DATE                                  DATE 'YYYYMMDD' (8)
 CRF_NUMBER                               VARCHAR2(40)
 INCARCERATED_TERM_IN_YEARS               NUMBER(10,2)
 PROBATION_TERM_IN_YEARS                  NUMBER(10,2)
 ```

In [30]:
file = 'data/Vendor_sentence_Sample_Text.dat'

# uncomment this line to use the full dataset
# file = 'data/Vendor_sentence_Extract_Text.dat'


names =[
    "DOC_NUM",                   
    "STATUTE_CODE",              
    "SENTENCING_COUNTY",         
    "JS_DATE",                   
    "CRF_NUMBER",                
    "INCARCERATED_TERM_IN_YEARS",
    "PROBATION_TERM_IN_YEARS"
]

widths = [
    10,
    40,
    40,
    10,
    40,
    13,
    13
]

df = pd.read_fwf(file, 
    header=None,
    widths=widths,
    names=names)
                 
df


Unnamed: 0,DOC_NUM,STATUTE_CODE,SENTENCING_COUNTY,JS_DATE,CRF_NUMBER,INCARCERATED_TERM_IN_YEARS,PROBATION_TERM_IN_YEARS
0,40000,021-1123.A,ARKANSAS JURISDICTION,05-FEB-97,1996-2007,,9.95
1,40000,121-1435,COLORADO JURISDICTION,30-APR-01,00CR668-18,,3.00
2,40000,221-1731,OKLAHOMA COUNTY COURT,15-JUN-01,2000-6699,,1.00
3,40000,321-797,TEXAS JURISDICTION,30-APR-01,"B28,824",,3.00
4,40000,521-1713,MUSKOGEE COUNTY COURT,19-JUN-01,1999-1078,,3.00
5,40000,621-1544,WASHINGTON COUNTY COURT,21-JUN-01,1999-171,,5.00
6,40000,621-1544,WASHINGTON COUNTY COURT,19-JUN-06,1999-171,,5.00
7,40000,615-765.3,TULSA COUNTY COURT,27-OCT-04,2004-2092,8.0,
8,40000,615-765.3,TULSA COUNTY COURT,27-OCT-04,2004-2182,8.0,
9,40000,621-1542,WASHINGTON COUNTY COURT,21-JUN-01,99-171,2.0,3.00


Schedule D - Offense Codes Layout
=======================================================
 ```
 Name                            Null?    Type
 ------------------------------- -------- ----
 STATUTE_CODE                    NOT NULL VARCHAR2(38)
 DESCRIPTION                     NOT NULL VARCHAR2(40)
 VIOLENT                                  VARCHAR2(1)
```

In [6]:
file = 'data/odoc/Vendor_Offense_Extract_Text.dat'

names = ["STATUTE_CODE",
         "DESCRIPTION", 
         "VIOLENT",
]

widths = [
    38,
    40,
    1
]

df = pd.read_fwf(file, 
    header=None,
    widths=widths,
    names=names)
                 
df

Unnamed: 0,STATUTE_CODE,DESCRIPTION,VIOLENT
0,0-0,UNKNOWN - FOR WARRANTS ONLY,N
1,10-1144,ACTS CAUSING JUVENILE DELINQUENCY,N
2,10-1627,DEPRIVATION OF LAWFUL CUSTODY,N
3,10-26,UNLAWFUL ASSUMPTION OF CUSTODY OF CHILD,N
4,10-404.1,SEX OFFENDER PROVIDING SERVICES TO CHILD,N
5,10-410,OPERATING CHILD CARE FACILITY W/O LICENS,N
6,10-71,PATERNITY COMPLAINT,N
7,10-7103,FAILURE TO REPORT CHILD,N
8,10-7115,CHILD ABUSE,Y
9,10-7303-1.1B,REFUSE TO ASSUME CUSTODY OF CHILD IN DET,N
