# Using Machine Learning to predict Maimi Crime rate for July 2019
 
![alt text](https://www.visitflorida.com/content/dam/visitflorida/en-us/images/cities/miami/Downtown2011_08.jpg.1280.500.rendition)



When you think about going on vacation, have a good time and relax: you are describing Miami! 

Miami is one of the most touristic cities in the US, not only for the good weather and it's 249 sunny days per year (Regarding [Current Results
weather and science facts](https://www.currentresults.com/Weather/Florida/annual-days-of-sunshine.php)) but also for the beaches, parties and different outdoor activities that it offers.

So, being such a great attraction for tourists and moreover, nesting a quickly ascending population, the Government should have in one the top priorities the crime.   

Either you are thinking of living or going for vacations to Miami, 

**<center> ¿Have you ever wondered about how's the crime? </center>**

What if you could know if the crime will increase.. and what if you could find the zone that has less crime? ... The truth is that YOU CAN! 

The following project has the purpose of using Machine Learning to predict next months of crime in Miami area, based on jail-bookings from May 2015 up to June 2019, we will try to predict, by crime time, how will be the rate in July 2019. 

Along the development and towards the delivery, you'll see different techniques and tools to form and analyze the data, up to clear and informative visualizations that we hope will be useful for anyone that is wondering about crime, either the Police that it may want to know where and how a station should be, a family searching for a new safe place to live with kids or a tourist thinking on party. 

#### Welcome to the future!
<hr>

## Extract and Transform 

Dataset was obtained from [Miami Dade County Open Data Hub](https://gis-mdc.opendata.arcgis.com/datasets/jail-bookings-may-29-2015-to-current/geoservice)

In [None]:
# Import Dependencies
import pandas as pd

In [2]:
# Read the dataset
csv_path = "Resources/Jail_Bookings__May_29_2015_to_current.csv"                    
df0 = pd.read_csv(csv_path)
df0.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,BookDate,Defendant,Address,CityStateZip,DOB,ChargeCode1,Charge1,ChargeCode2,Code2,ChargeCode3,Charge3,Zip,Filler,City,State,Zip1,ObjectId
0,2015-05-29T04:00:00.000Z,"ABAD, MICHEL CARLOS",6130 W 21ST CT,"HIALEAH, FL 330162681",1989-07-19T04:00:00.000Z,7840300,BATTERY,7840300.0,BATTERY,,,33016.0,,HIALEAH,FL,330162681.0,1
1,2015-05-29T04:00:00.000Z,"ALBA, RAQUEL",982 SW 3 ST,"MIAMI, FL 33130",1967-01-16T05:00:00.000Z,81201402C,GRAND THEFT 3RD DEG,,,,,33130.0,,MIAMI,FL,33130.0,2
2,2015-05-29T04:00:00.000Z,"ALBERTE, FEDERICO",SUPERI 4351 CAPIT FEDER,"BUENOS AIRES, AT",1984-02-28T05:00:00.000Z,81201403A,PETIT THEFT,,,,,,,BUENOS AIRES,AT,,3
3,2015-05-29T04:00:00.000Z,"ALBERTE, NICOLAS",SUPERI 4351 CAPITAL FED,"BUENOS AIRES, AT",1993-06-16T04:00:00.000Z,81201403A,PETIT THEFT,,,,,,,BUENOS AIRES,AT,,4
4,2015-05-29T04:00:00.000Z,"ALFONSO, DAILYN",14427 SW 187TH AVE,"MIAMI, FL 33196",1984-11-06T05:00:00.000Z,,BENCH WARRANT,,,,,33196.0,,MIAMI,FL,33196.0,5


In [3]:
# Observe column names
df0.columns

Index(['BookDate', 'Defendant', 'Address', 'CityStateZip', 'DOB',
       'ChargeCode1', 'Charge1', 'ChargeCode2', 'Code2', 'ChargeCode3',
       'Charge3', 'Zip', 'Filler', 'City', 'State', 'Zip1', 'ObjectId'],
      dtype='object')

In [4]:
# Filter by State of Florida only (dropping NA also in this column)  
df = df0.loc[df0['State'] == "FL",:]

In [5]:
df.head()

Unnamed: 0,BookDate,Defendant,Address,CityStateZip,DOB,ChargeCode1,Charge1,ChargeCode2,Code2,ChargeCode3,Charge3,Zip,Filler,City,State,Zip1,ObjectId
5,2015-05-29T04:00:00.000Z,"AMADOR, PEDRO",UNKNOWN,"MIAMI, FL",1996-10-12T04:00:00.000Z,81213502A,RBRY/HM INV/FA - PBL,89313001A1C,COKE/SELL/DEL/W/INT,89313006A1,COCAINE/POSSESSION,,,MIAMI,FL,,6
6,2015-05-29T04:00:00.000Z,"ANDREW, GRACE",UNKNOWN,"MIAMI, FL",1963-09-26T04:00:00.000Z,81201402C,GRAND THEFT 3RD DEG,9959970,HOLD FOR IMMIGRATION,,,,,MIAMI,FL,,7
15,2015-05-29T04:00:00.000Z,"BENOIT, ERNSON",UNKNOWN,"MIAMI, FL",1989-10-26T04:00:00.000Z,,,,,,,,,MIAMI,FL,,16
21,2015-05-29T04:00:00.000Z,"BRADLEY, HENRY L",15821 NW 28TH PL,"MIAMI, FL",1986-08-21T04:00:00.000Z,81213002C,ROBBERY/STRONGARM,,,,,,,MIAMI,FL,,22
22,2015-05-29T04:00:00.000Z,"BRANDT, JAMES",HOMELESS,"MIAMI, FL",1972-04-11T04:00:00.000Z,89313006A1,COCAINE/POSSESSION,89313002A1B,COCAINE/PUR/ATTEMPT,91813001A,TAMPER/PHYS EVIDENCE,,,MIAMI,FL,,23


In [6]:
# Drop columns that we will not use 
df.drop(['Filler', "ChargeCode2", "ChargeCode3", "ObjectId", "Zip1","CityStateZip"], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [7]:
# look for missing values
df.count()

BookDate       32934
Defendant      32934
Address        32931
DOB            32934
ChargeCode1    25135
Charge1        31646
Code2          13539
Charge3         6236
Zip             5187
City           32926
State          32934
dtype: int64

In [8]:
# Drop empty values from "Charge1" column
df = df.dropna(subset=['Charge1'])

In [9]:
# Checking values again
df.count()

BookDate       31646
Defendant      31646
Address        31643
DOB            31646
ChargeCode1    25135
Charge1        31646
Code2          13539
Charge3         6236
Zip             4959
City           31638
State          31646
dtype: int64

In [10]:
# Checking types of column values
df.dtypes

BookDate        object
Defendant       object
Address         object
DOB             object
ChargeCode1     object
Charge1         object
Code2           object
Charge3         object
Zip            float64
City            object
State           object
dtype: object

In [11]:
# Rename column
df.rename(columns={"Code2":"Charge2"}, inplace = True)

In [12]:
df.head()

Unnamed: 0,BookDate,Defendant,Address,DOB,ChargeCode1,Charge1,Charge2,Charge3,Zip,City,State
5,2015-05-29T04:00:00.000Z,"AMADOR, PEDRO",UNKNOWN,1996-10-12T04:00:00.000Z,81213502A,RBRY/HM INV/FA - PBL,COKE/SELL/DEL/W/INT,COCAINE/POSSESSION,,MIAMI,FL
6,2015-05-29T04:00:00.000Z,"ANDREW, GRACE",UNKNOWN,1963-09-26T04:00:00.000Z,81201402C,GRAND THEFT 3RD DEG,HOLD FOR IMMIGRATION,,,MIAMI,FL
21,2015-05-29T04:00:00.000Z,"BRADLEY, HENRY L",15821 NW 28TH PL,1986-08-21T04:00:00.000Z,81213002C,ROBBERY/STRONGARM,,,,MIAMI,FL
22,2015-05-29T04:00:00.000Z,"BRANDT, JAMES",HOMELESS,1972-04-11T04:00:00.000Z,89313006A1,COCAINE/POSSESSION,COCAINE/PUR/ATTEMPT,TAMPER/PHYS EVIDENCE,,MIAMI,FL
36,2015-05-29T04:00:00.000Z,"DANIELS, DORIAN LAFRANCE",157 NE 67TH ST 2,1988-08-29T04:00:00.000Z,,BENCH WARRANT,,,,MIAMI,FL


### Cleaning Dates

In [13]:
# Give BookDate column date format to plot
df['BookDate'] = df['BookDate'].astype(str)
df['Booking_year'] = df['BookDate'].str.split('-').str.get(0)
df['Booking_month'] = df['BookDate'].str.split('-').str.get(1)
df['Booking_day'] = df['BookDate'].str.split('-').str.get(2).str[:2]

In [14]:
df['Booking_Date'] = df['Booking_year'].astype(str) + "-" + df['Booking_month'].astype(str) + "-" + df['Booking_day'].astype(str)
# df.head()

In [15]:
df['Booking_year_month'] = df['Booking_year'].astype(str) + "-" + df['Booking_month'].astype(str)

In [16]:
# Give DOB (date of birth of Defendant) column date format to plot
df['DOB'] = df['DOB'].astype(str)
df['DOB_year'] = df['DOB'].str.split('-').str.get(0)
df['DOB_month'] = df['DOB'].str.split('-').str.get(1)
df['DOB_day'] = df['DOB'].str.split('-').str.get(2).str[:2]

df['Date_of_birth'] = df['DOB_year'].astype(str) + "-" + df['DOB_month'].astype(str) + "-" + df['DOB_day'].astype(str)
# df.head()

In [17]:
# Drop innecesary columns
df.drop(['DOB_year','DOB_day','Booking_year','Booking_day'], axis=1, inplace=True)
# df.head()

In [18]:
# Add Age column. Note that independently of the month/year that you will run the code, Age column will be always updated accordingly
dob = pd.to_datetime(df['Date_of_birth'])

now = pd.Timestamp('now')

df['Age'] = (now - dob).astype('<m8[Y]')

df['Age'] = df['Age'].astype(int)

In [19]:
df['Booking_Date'] = pd.to_datetime(df['Booking_Date'])

df['day_of_week'] = df['Booking_Date'].dt.day_name()

In [20]:
df.head()

Unnamed: 0,BookDate,Defendant,Address,DOB,ChargeCode1,Charge1,Charge2,Charge3,Zip,City,State,Booking_month,Booking_Date,Booking_year_month,DOB_month,Date_of_birth,Age,day_of_week
5,2015-05-29T04:00:00.000Z,"AMADOR, PEDRO",UNKNOWN,1996-10-12T04:00:00.000Z,81213502A,RBRY/HM INV/FA - PBL,COKE/SELL/DEL/W/INT,COCAINE/POSSESSION,,MIAMI,FL,5,2015-05-29,2015-05,10,1996-10-12,22,Friday
6,2015-05-29T04:00:00.000Z,"ANDREW, GRACE",UNKNOWN,1963-09-26T04:00:00.000Z,81201402C,GRAND THEFT 3RD DEG,HOLD FOR IMMIGRATION,,,MIAMI,FL,5,2015-05-29,2015-05,9,1963-09-26,55,Friday
21,2015-05-29T04:00:00.000Z,"BRADLEY, HENRY L",15821 NW 28TH PL,1986-08-21T04:00:00.000Z,81213002C,ROBBERY/STRONGARM,,,,MIAMI,FL,5,2015-05-29,2015-05,8,1986-08-21,32,Friday
22,2015-05-29T04:00:00.000Z,"BRANDT, JAMES",HOMELESS,1972-04-11T04:00:00.000Z,89313006A1,COCAINE/POSSESSION,COCAINE/PUR/ATTEMPT,TAMPER/PHYS EVIDENCE,,MIAMI,FL,5,2015-05-29,2015-05,4,1972-04-11,47,Friday
36,2015-05-29T04:00:00.000Z,"DANIELS, DORIAN LAFRANCE",157 NE 67TH ST 2,1988-08-29T04:00:00.000Z,,BENCH WARRANT,,,,MIAMI,FL,5,2015-05-29,2015-05,8,1988-08-29,30,Friday


In [21]:
# df.to_csv("df_1_clean_dates.csv",index=True,header=True)

### Cleaning Charges 

In [22]:
# To fill empty Chargecodes 
for index, row in df["Charge1"].iteritems(): 
    if "ASSAULT" == row:
        df.loc[index, 'ChargeCode1'] = "7840300"
    elif "BATTERY" == str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "ASSAULT AND BATTERY" == str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "SEX BATTERY/ARMED" == str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "BATTERY" in str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "BATTERY" in str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "BATTERY" in str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "BATTERY" in str(row):
        df.loc[index, 'ChargeCode1'] = "7840300"    
    elif "ASSAULT" in str(row):
        df.loc[index, 'ChargeCode1'] = "31619301"
    elif "ASSLT" in str(row):
        df.loc[index, 'ChargeCode1'] = "31619301"
    elif (str(row)[:3]) == "DUI":
        df.loc[index, 'ChargeCode1'] = "31619301"
    elif "THEFT" in (str(row)):
        df.loc[index, 'ChargeCode1'] = "812"
    elif row == 'BENCH WARRANT':
        df.loc[index, 'ChargeCode1'] = "WARRANT"
    elif row == 'PROBATION WARRANT':
        df.loc[index, 'ChargeCode1'] = "WARRANT"
    elif row == 'OUT-OF-CNTY/WARRANT':
        df.loc[index, 'ChargeCode1'] = "WARRANT"
    elif 'CAPIAS' in str(row):
        df.loc[index, 'ChargeCode1'] = "WARRANT"
    elif row == 'ARREST WARRANT':
        df.loc[index, 'ChargeCode1'] = "WARRANT"
    elif row == 'FUG WARR/OUT O STATE':
        df.loc[index, 'ChargeCode1'] = "WARRANT"
    elif row == ('BEV/DRK IN PUBLC'):
        df.loc[index, 'ChargeCode1'] = "40100"
    elif "ALC" in str(row):
        df.loc[index, 'ChargeCode1'] = "708" # ALCOHOL PROHIBITION category: aLC/BEV CONSUME/SERV, ALCOHOL/PUB/MIA BCH, ALC BEV/EST/SOLICIT, ALCOHOL/CONSUM/STORE, ALCOHOL/POSN/STORE, ALCOHOL/POSN/MINOR, ALC/CONS IN PUBLIC, ALC/OPN CONT/VEHICLE, ALCOHOL VIOLATION, ALCOHOL/CURB DRNKNG
    elif row == 'TAG NOT ASSIGNED VEH':
        df.loc[index, 'ChargeCode1'] = "322"
    elif row == 'NO VALID DRIVERS LIC':
        df.loc[index, 'ChargeCode1'] = "322"
    elif "TRESPASS" in str(row):
        df.loc[index, 'ChargeCode1'] = "0218100A5"
    elif "BATT/DOM/STRANGLE" in str(row):
        df.loc[index, 'ChargeCode1'] = "99600"
    elif "CANNABIS" in str(row):
        df.loc[index, 'ChargeCode1'] = "893"
    elif row == "CANN/SELL/DEL/PSNW/I":
        df.loc[index, 'ChargeCode1'] = "893"
    elif "COCAINE" in str(row):
        df.loc[index, 'ChargeCode1'] = "893"
    elif row == "DRUG PARAPHERNA/POSN":
        df.loc[index, 'ChargeCode1'] = "893"
    elif row == ('BURGLARY TOOLS/POSN'):
        df.loc[index, 'ChargeCode1'] = "810"

In [23]:
# New columns that will hold categories (2 is the bigger category) 
df["Crime_Family"] = ''
df["Crime_Family1"] = ''
df["Crime_Family2"] = ''

In [24]:
# Categories by code 
for index, row in df["ChargeCode1"].iteritems():
    if "775" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "GENERAL PENALTIES; REGISTRATION OF CRIMINALS"
    elif "777" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "PRINCIPAL; ACCESSORY; ATTEMPT; SOLICITATION; CONSPIRACY"        
    elif "782" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "HOMICIDE"
    elif "784" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "ASSAULT; BATTERY; CULPABLE NEGLIGENCE"
    elif "787" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "KIDNAPPING; CUSTODY OFFENSES; HUMAN TRAFFICKING; AND RELATED OFFENSES"
    elif "790" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "WEAPONS AND FIREARMS"
    elif "794" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "SEXUAL BATTERY"
    elif "796" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "PROSTITUTION"
    elif "798" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "ADULTERY; LEWD AND LASCIVIOUS BEHAVIOR"
    elif "800" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "LEWDNESS; INDECENT EXPOSURE"
    elif "806" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "ARSON AND CRIMINAL MISCHIEF"
    elif "810" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif "812" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "THEFT, ROBBERY, AND RELATED CRIMES"
    elif "815" == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "COMPUTER-RELATED CRIMES"
    elif "817" in (str(row)):
        df.loc[index, 'Crime_Family'] = "FRAUDULENT PRACTICES"
    elif "823" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "PUBLIC NUISANCES"
    elif "825" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "ABUSE, NEGLECT, AND EXPLOITATION OF ELDERLY PERSONS AND DISABLED ADULTS"
    elif "827" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "ABUSE OF CHILDREN"
    elif "828" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "ANIMALS: CRUELTY; SALES; ANIMAL ENTERPRISE PROTECTION"
    elif "831" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "FORGERY AND COUNTERFEITING"
    elif "832" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "VIOLATIONS INVOLVING CHECKS AND DRAFTS"
    elif "836" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "DEFAMATION; LIBEL; THREATENING LETTERS AND SIMILAR OFFENSES"
    elif "837" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "PERJURY"
    elif "838" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "BRIBERY; MISUSE OF PUBLIC OFFICE"
    elif "839" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OFFENSES BY PUBLIC OFFICERS AND EMPLOYEES"
    elif "843" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OBSTRUCTING JUSTICE"
    elif "847" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OBSCENITY"
    elif "849" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "GAMBLING"
    elif "856" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "DRUNKENNESS; OPEN HOUSE PARTIES; LOITERING; PROWLING; DESERTION"
    elif "859" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "POISONS; ADULTERATED DRUGS"
    elif "860" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OFFENSES CONCERNING AIRCRAFT, MOTOR VEHICLES, VESSELS, AND RAILROADS"
    elif "861" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OFFENSES RELATED TO PUBLIC ROADS, TRANSPORT, AND WATERS"
    elif "865" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "VIOLATIONS OF CERTAIN COMMERCIAL RESTRICTIONS"
    elif "870" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "AFFRAYS; RIOTS; ROUTS; UNLAWFUL ASSEMBLIES"
    elif "872" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OFFENSES CONCERNING DEAD BODIES AND GRAVES"
    elif "874" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "CRIMINAL GANG ENFORCEMENT AND PREVENTION"
    elif "876" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "CRIMINAL ANARCHY, TREASON, AND OTHER CRIMES AGAINST PUBLIC ORDER"
    elif "877" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "MISCELLANEOUS CRIMES"
    elif "893" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif "895" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OFFENSES CONCERNING RACKETEERING AND ILLEGAL DEBTS"
    elif "896" == (str(row)[:3]):
            df.loc[index, 'Crime_Family'] = "OFFENSES RELATED TO FINANCIAL TRANSACTIONS"
    elif '316' == (str(row)[:3]): # Here starts special code cases 
        df.loc[index, 'Crime_Family'] = "DUI (DRIVING UNDER THE INFLUENCE)"
    elif '90136' == (str(row)[:5]):
        df.loc[index, 'Crime_Family'] = "IDENTITY THEFT"
    elif '90104' == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "WARRANT"
    elif 'WARRANT' == str(row):
        df.loc[index, 'Crime_Family'] = "WARRANT"
    elif '590125' == (str(row)[:5]):
        df.loc[index, 'Crime_Family'] = "WRONFUL BURNING"
    elif '316' == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "DRIVING PENALTIES" #RECKLESS DRIVING, DUI, DRAG RACING/HWY, OBSTRUCT TRF/SOLICIT
    elif '401' == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "DRINK IN PUBLIC"
    elif '142' == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "PANHANDLING"  #aggresive and beggining with or without obstruction in public property   
    elif '70700' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PANHANDLING"
    elif '376000' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PANHANDLING" 
    elif '378000' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PANHANDLING" 
    elif '747800' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PANHANDLING"
    elif '142510' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PANHANDLING"
    elif '02131004B' == (str(row)):
        df.loc[index, 'Crime_Family'] = "PANHANDLING"       
    elif '322' == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE" #FALSE ID, FALSE INFO IN DRIVER LICENSE, EXPIRED LICENCE +6MONTHS, FALSE AFFIDAVIT, NON RESIDENT REQUIREMENT
    elif '704' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PUBLIC DISORDERLY CONDUCT" # URINATE/DEFE/PUB PLA, CAMPING PROHIBITED, PEDDLING/SELL MERCH
    elif '0708' == (str(row)[:4]):
        df.loc[index, 'Crime_Family'] = "ALCOHOL PROHIBITION"
    elif '708' == (str(row)[:3]):
        df.loc[index, 'Crime_Family'] = "ALCOHOL PROHIBITION"
    elif '943' in (str(row)):
        df.loc[index, 'Crime_Family'] = "SEX OFFENDER"
    elif '9444000' == (str(row)):
        df.loc[index, 'Crime_Family'] = "SEX OFFENDER"
    elif '985481510' == (str(row)):
        df.loc[index, 'Crime_Family'] = "SEX OFFENDER"
    elif '996' in (str(row)):
        df.loc[index, 'Crime_Family'] = "PHYSICAL ATTACK" # DOMESTIC VIOLENCE, DOMESTRIC STRANGLE, BODILY ATTACK
    elif 'DRUG ALT TREAT' in (str(row)):
        df.loc[index, 'Crime_Family'] = "COURT DECISION"

In [25]:
# Adding Categories to column Crime_Family (first category group, smaller)
for index, row in df["Charge1"].iteritems():
    if (str(row)) == "BOATING UNDER INFLU":
        df.loc[index, 'Crime_Family'] = "ALCOHOL PROHIBITION"
    elif (str(row)) == 'BOND SURRENDER':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"       
    elif str(row) == 'BURG/ASS/BAT/>7/1/01':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'BURG/OCC/DWELL/ATT':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'BURG/UNOCC CONVY/ATT':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif 'BURGLARY' in str(row):
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'CHILD NEG NO GR HARM':
        df.loc[index, 'Crime_Family'] = "PROSTITUTION"
    elif str(row) == 'CONCEALED F/A /CARRY':
        df.loc[index, 'Crime_Family'] = "WEAPONS AND FIREARMS"
    elif str(row) == 'CONTEMPT OF COURT':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'CREDIT CARD/0-300':
        df.loc[index, 'Crime_Family'] = "FINANCIAL CRIME"
    elif str(row) == 'DIS COND/ESTAB':
        df.loc[index, 'Crime_Family'] = "PUBLIC BEHAVIOR PROHIBITION"
    elif str(row) == 'DIS ORD/ESTB/RESIST':
        df.loc[index, 'Crime_Family'] = "PUBLIC BEHAVIOR PROHIBITION"
    elif str(row) == 'DIS COND/ESTAB':
        df.loc[index, 'Crime_Family'] = "PUBLIC BEHAVIOR PROHIBITION"
    elif str(row) == 'DISORDERL CONDUCT':
        df.loc[index, 'Crime_Family'] = "PUBLIC BEHAVIOR PROHIBITION"
    elif str(row) == 'DISORDERLY INTOX':
        df.loc[index, 'Crime_Family'] = "PUBLIC BEHAVIOR PROHIBITION"
    elif str(row) == 'DOM VIO/INJ VIOLATIO':
        df.loc[index, 'Crime_Family'] = "PHYSICAL ATTACK"
    elif str(row) == 'DOM VIOL/INJUNC VIOL':
        df.loc[index, 'Crime_Family'] = "PHYSICAL ATTACK"
    elif str(row) == 'DOMESTIC VIOL WARRNT':
        df.loc[index, 'Crime_Family'] = "PHYSICAL ATTACK"
    elif str(row) == 'DRUG PARA/SALE/PROH':
        df.loc[index, 'Crime_Family'] = 'DRUG ABUSE PREVENTION AND CONTROL'
    elif str(row) == 'DRUGS/HARMFUL/POSN':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'DRUGS/NEW/W/O PRESCR':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'DRUGS/TRAFFICKING':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'DWLS/3RD & SUBS OFFN':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE"
    elif str(row) == 'DWLS/HABITUAL':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE"
    elif str(row) == 'DWLS/KNOWINGLY':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE"
    elif str(row) == 'EMER/COMM/911/MISUSE':
        df.loc[index, 'Crime_Family'] = "EMERGENCY SERVICE MISUSE" #MISUSE OF911, FALSE REPORT TO POLICE 
    elif str(row) == 'FA/WEAP/POS/FEL/ATT':
        df.loc[index, 'Crime_Family'] = "WEAPONS AND FIREARMS"
    elif str(row) == 'FALSE RPT TO POLICE':
        df.loc[index, 'Crime_Family'] = "EMERGENCY SERVICE MISUSE/MISCONDUCT"    
    elif str(row) == 'HOLD FOR AGENCY':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'HOLD FOR IMMIGRATION':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'HOLD FOR MAGISTRATE':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'HOLD FOR MARSHALL':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'INDECENT EXPOSURE':
        df.loc[index, 'Crime_Family'] = "OBSCENITY"
    elif str(row) == 'MARIJUANA/POSSESSION':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'OBSTRUCT POLICE OFF':
        df.loc[index, 'Crime_Family'] = "WARRANT"
    elif str(row) == 'OBSTRUCT/FREE PASS':
        df.loc[index, 'Crime_Family'] = "WARRANT"
    elif str(row) == 'OUT ON FELONY BOND':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'OUT ON PROBATION':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"
    elif str(row) == 'PANHANDLING/BEG/SOL':
        df.loc[index, 'Crime_Family'] = "PANHANDLING"
    elif str(row) == 'WRIT OF ATTACHMENT':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"    
    elif str(row) == 'WRIT/BODILY ATTACH':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"    
    elif str(row) == 'WINDOW WASH/STREET':
        df.loc[index, 'Crime_Family'] = "PUBLIC SELLING/SERVICE"    
    elif str(row) == 'WINDOW WASHING':
        df.loc[index, 'Crime_Family'] = "PUBLIC SELLING/SERVICE" 
    elif str(row) == 'VEH REG/NOT HAVE':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE" 
    elif str(row) == 'VEH/ALT ID/POSN/SALE':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE" 
    elif str(row) == 'UTTERING FORGED BILL':
        df.loc[index, 'Crime_Family'] = "VIOLATIONS INVOLVING CHECKS AND DRAFTS" 
    elif str(row) == 'PARK/ENT AFT HRS':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE" 
    elif str(row) == 'PROBATION VIOLATION':
        df.loc[index, 'Crime_Family'] = "COURT DECISION"   
    elif str(row) == 'PROST/COMMIT/ENGAGE':
        df.loc[index, 'Crime_Family'] = "PROSTITUTION" 
    elif str(row) == 'PRTRL REL/DOM VIOL':
        df.loc[index, 'Crime_Family'] = "PHYSICAL ATTACK" 
    elif str(row) == 'RECKLESS DRIVING':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE"
    elif str(row) == 'RESIST ARR W/O VIOL':
        df.loc[index, 'Crime_Family'] = "EMERGENCY SERVICE MISUSE/MISCONDUCT"
    elif str(row) == 'RESIST OFF W/VIOL':
        df.loc[index, 'Crime_Family'] = "EMERGENCY SERVICE MISUSE/MISCONDUCT"
    elif str(row) == 'ROBBERY/ARMED/WEAPON':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'ROBBERY/STRONGARM':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'SHOPPING CART/POSN':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'ROBBERY/STRONGARM':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'ST RD/UNLAW USE':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE" 
    elif str(row) == 'TAMPER W/EVIDENCE/AT':
        df.loc[index, 'Crime_Family'] = "ASSAULT; BATTERY; CULPABLE NEGLIGENCE" 
    elif str(row) == 'TAMPER/PHYS EVIDENCE':
        df.loc[index, 'Crime_Family'] = "ASSAULT; BATTERY; CULPABLE NEGLIGENCE" 
    elif str(row) == 'TAMPER/PHYS EVIDENCE':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUBS/POSN/>10GR':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CON/SELL TRAVEL/REG':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CON/SELL TRAVEL/REG':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUB/PLC/PRP/TRF':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUB/PUR/POS W/I':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUB/SALE/1K/CON':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUB/SELL/DEL':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUB/SELL/DEL':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUBS/IMIT/SALE':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif str(row) == 'CONT SUBS/POSS':
        df.loc[index, 'Crime_Family'] = "DRUG ABUSE PREVENTION AND CONTROL"
    elif "CREDIT" in str(row):
        df.loc[index, 'Crime_Family'] = "FINANCIAL CRIME"
    elif str(row) == 'DISORDERLY CONDUCT':
        df.loc[index, 'Crime_Family'] = "PUBLIC DISORDERLY CONDUCT"
    elif str(row) == 'DL/EXPIRED 6 MTHS+':
        df.loc[index, 'Crime_Family'] = "MOTHOR VEHICLE, DRIVER LICENSE"
    elif str(row) == 'RESIST OFF W/O VIOL':
        df.loc[index, 'Crime_Family'] = "EMERGENCY SERVICE MISUSE/MISCONDUCT"
    elif str(row) == 'TRES PROP/AFTER WARN':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif str(row) == 'TRES VIO POST RESTRI':
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"
    elif 'TRESPASS' in str(row):
        df.loc[index, 'Crime_Family'] = "BURGLARY AND TRESPASS"

In [26]:
# Delete empty rows, based on Crime_Family column 
import numpy as np

df['Crime_Family'].replace('', np.nan, inplace=True)
df = df.dropna(subset=['Crime_Family'])

In [27]:
# Save for debugging purpose 
# df.to_csv("df_2_crime_categorie.csv",index=False,header=True)

In [28]:
# Adding Categories to column Crime_Family1 (2nd category group, middle size)
for index, row in df["Crime_Family"].iteritems():
    if str(row) == "GENERAL PENALTIES; REGISTRATION OF CRIMINALS":
        df.loc[index, 'Crime_Family1'] = 'SEXUAL MISCONDUCT'
    elif str(row) == "PRINCIPAL; ACCESSORY; ATTEMPT; SOLICITATION; CONSPIRACY":
        df.loc[index, 'Crime_Family1'] = "TREASON"        
    elif str(row) == "HOMICIDE":
        df.loc[index, 'Crime_Family1'] = "HOMICIDE"
    elif str(row) == "ASSAULT; BATTERY; CULPABLE NEGLIGENCE":
        df.loc[index, 'Crime_Family1'] = "ASSAULT & BATTERY"
    elif str(row) == "KIDNAPPING; CUSTODY OFFENSES; HUMAN TRAFFICKING; AND RELATED OFFENSES":
        df.loc[index, 'Crime_Family1'] = "KIDNAPPING"
    elif str(row) == "WEAPONS AND FIREARMS":
        df.loc[index, 'Crime_Family1'] = "TREASON"
    elif str(row) == "SEXUAL BATTERY":
        df.loc[index, 'Crime_Family1'] = "SEXUAL MISCONDUCT"
    elif str(row) == "PROSTITUTION":
        df.loc[index, 'Crime_Family1'] = "SEXUAL MISCONDUCT"
    elif  str(row) == "ADULTERY; LEWD AND LASCIVIOUS BEHAVIOR":
        df.loc[index, 'Crime_Family1'] = "SEXUAL MISCONDUCT"
    elif str(row) == "LEWDNESS; INDECENT EXPOSURE":
        df.loc[index, 'Crime_Family1'] = "SEXUAL MISCONDUCT"
    elif str(row) == "ARSON AND CRIMINAL MISCHIEF":
        df.loc[index, 'Crime_Family1'] = "FIRE"
    elif str(row) == "BURGLARY AND TRESPASS":
        df.loc[index, 'Crime_Family1'] = "BURGLARY"
    elif str(row) == "THEFT, ROBBERY, AND RELATED CRIMES":
        df.loc[index, 'Crime_Family1'] = "ASSAULT & BATTERY"
    elif str(row) == "COMPUTER-RELATED CRIMES":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "FRAUDULENT PRACTICES":
        df.loc[index, 'Crime_Family1'] = "WHITE COLLAR"
    elif str(row) == "PUBLIC NUISANCES":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "ABUSE, NEGLECT, AND EXPLOITATION OF ELDERLY PERSONS AND DISABLED ADULTS":
            df.loc[index, 'Crime_Family1'] = "ABUSE"
    elif str(row) == "ABUSE OF CHILDREN":
            df.loc[index, 'Crime_Family1'] = "ABUSE"
    elif str(row) == "ANIMALS: CRUELTY; SALES; ANIMAL ENTERPRISE PROTECTION":
            df.loc[index, 'Crime_Family1'] = "ABUSE"
    elif str(row) == "FORGERY AND COUNTERFEITING":
            df.loc[index, 'Crime_Family1'] = "WHITE COLLAR"
    elif str(row) == "VIOLATIONS INVOLVING CHECKS AND DRAFTS":
            df.loc[index, 'Crime_Family1'] = "WHITE COLLAR"
    elif str(row) == "DEFAMATION; LIBEL; THREATENING LETTERS AND SIMILAR OFFENSES":
            df.loc[index, 'Crime_Family1'] = "WHITE COLLAR"
    elif str(row) == "PERJURY":
            df.loc[index, 'Crime_Family1'] = "TREASON"
    elif str(row) == "BRIBERY; MISUSE OF PUBLIC OFFICE":
            df.loc[index, 'Crime_Family1'] = "WHITE COLLAR"
    elif str(row) == "OFFENSES BY PUBLIC OFFICERS AND EMPLOYEES":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "OBSTRUCTING JUSTICE":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "OBSCENITY":
            df.loc[index, 'Crime_Family1'] = "SEXUAL MISCONDUCT"
    elif str(row) == "GAMBLING":
            df.loc[index, 'Crime_Family1'] = "GAMBLING"
    elif str(row) == "DRUNKENNESS; OPEN HOUSE PARTIES; LOITERING; PROWLING; DESERTION":
            df.loc[index, 'Crime_Family1'] = "DRUGS & ALCOHOL"
    elif str(row) == "POISONS; ADULTERATED DRUGS":
            df.loc[index, 'Crime_Family1'] = "DRUGS & ALCOHOL"
    elif str(row) == "OFFENSES CONCERNING AIRCRAFT, MOTOR VEHICLES, VESSELS, AND RAILROADS":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "OFFENSES RELATED TO PUBLIC ROADS, TRANSPORT, AND WATERS":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "VIOLATIONS OF CERTAIN COMMERCIAL RESTRICTIONS":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "AFFRAYS; RIOTS; ROUTS; UNLAWFUL ASSEMBLIES":
            df.loc[index, 'Crime_Family1'] = "PROTEST"
    elif str(row) == "OFFENSES CONCERNING DEAD BODIES AND GRAVES":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "CRIMINAL GANG ENFORCEMENT AND PREVENTION":
            df.loc[index, 'Crime_Family1'] = "CRIMINAL GANG"
    elif str(row) == "CRIMINAL ANARCHY, TREASON, AND OTHER CRIMES AGAINST PUBLIC ORDER":
            df.loc[index, 'Crime_Family1'] = "TREASON"
    elif str(row) == "MISCELLANEOUS CRIMES":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "DRUG ABUSE PREVENTION AND CONTROL":
            df.loc[index, 'Crime_Family1'] = "DRUGS & ALCOHOL"
    elif str(row) == "OFFENSES CONCERNING RACKETEERING AND ILLEGAL DEBTS":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "OFFENSES RELATED TO FINANCIAL TRANSACTIONS":
            df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == 'DUI (DRIVING UNDER THE INFLUENCE)': 
        df.loc[index, 'Crime_Family1'] = "DRUGS & ALCOHOL"
    elif str(row) == 'IDENTITY THEFT':
        df.loc[index, 'Crime_Family1'] = "WHITE COLLAR"
    elif str(row) == 'WARRANT':
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == 'WRONFUL BURNING':
        df.loc[index, 'Crime_Family1'] = "FIRE"
    elif str(row) == 'DRIVING PENALTIES':
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES" 
    elif str(row) == 'DRINK IN PUBLIC':
        df.loc[index, 'Crime_Family1'] = "DRUGS & ALCOHOL"
    elif str(row) == 'PANHANDLING':
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"  
    elif str(row) == 'MOTHOR VEHICLE, DRIVER LICENSE':
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == 'PUBLIC DISORDERLY CONDUCT':
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == 'ALCOHOL PROHIBITION':
        df.loc[index, 'Crime_Family1'] = "DRUGS & ALCOHOL"
    elif str(row) == 'SEX OFFENDER':
        df.loc[index, 'Crime_Family1'] = "SEXUAL MISCONDUCT"
    elif str(row) == 'PHYSICAL ATTACK':
        df.loc[index, 'Crime_Family1'] = "ASSAULT & BATTERY"
    elif str(row) == "WAITING FOR COURT'S DECISION":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "EMERGENCY SERVICE MISUSE/MISCONDUCT":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "COURT DECISION":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "PUBLIC BEHAVIOR PROHIBITION":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "FINANCIAL CRIME":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "EMERGENCY SERVICE MISUSE":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"
    elif str(row) == "PUBLIC SELLING/SERVICE":
        df.loc[index, 'Crime_Family1'] = "MISCELLANEOUS CRIMES"

In [29]:
# Adding Categories to column Crime_Family 2 (3rd category group, bigger)
for index, row in df["Crime_Family1"].iteritems():
    if str(row) == "CRIMINAL GANG ":
        df.loc[index, 'Crime_Family2'] = 'OTHER'
    elif str(row) == "GAMBLING":
        df.loc[index, 'Crime_Family2'] = "OTHER"        
    elif str(row) == "MISCELLANEOUS CRIMES":
        df.loc[index, 'Crime_Family2'] = "OTHER"
    elif str(row) == "TREASON":
        df.loc[index, 'Crime_Family2'] = "OTHER"
    elif str(row) == "WHITE COLLAR":
        df.loc[index, 'Crime_Family2'] = "OTHER"
    elif str(row) == "ABUSE":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "ASSAULT & BATTERY":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "HOMICIDE":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "KIDNAPPING":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "SEXUAL MISCONDUCT":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "BURGLARY":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "FIRE":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "PROTEST":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"
    elif str(row) == "DRUGS & ALCOHOL":
        df.loc[index, 'Crime_Family2'] = "STATUTORY CRIME"
    elif str(row) == "CRIMINAL GANG":
        df.loc[index, 'Crime_Family2'] = "PERSONAL CRIME"

In [31]:
# df.to_csv("df_3_crime_categories_clean.csv",index=True,header=True)

### Cleaning City names and Address

In [32]:
# Cleaning Cities column 

# Homeless empty values 
for index, row in df["Address"].iteritems():
    if "HOMLESS" == str(row):
        df.loc[index, 'City'] = "HOMELESS"
    elif "HOMELESS" == str(row):
        df.loc[index, 'City'] = "HOMELESS"        

In [33]:
# City names abbreviated of misspelled 
for index, row in df["City"].iteritems():
    if "AV" == str(row):
        df.loc[index, 'City'] = "AVENTURA"
    elif "HARBOR" in str(row):
        df.loc[index, 'City'] = "BAL HARBOUR"   
    elif "BAL HARBOR" == str(row):
        df.loc[index, 'City'] = "BAL HARBOUR"  
    elif "BAY HARBOR IS" == str(row):
        df.loc[index, 'City'] = "BAL HARBOUR"
    elif "BOYTON BEACH" == str(row):
        df.loc[index, 'City'] = "BOYNTON BEACH"
    elif "CUTLUR BAY" == str(row):
        df.loc[index, 'City'] = "CUTLER BAY"     
    elif "DANIA BCH" == str(row):
        df.loc[index, 'City'] = "DANIA BEACH"        
    elif "DEERFIELD" == str(row):
        df.loc[index, 'City'] = "DEERFIELD BEACH"
    elif "FL LAUDERDALE" == str(row):
        df.loc[index, 'City'] = "FORT LAUDERDALE"

In [34]:
# Rename city
df["CityRN"] = df["City"]

df["CityRN"] = df["CityRN"].replace({
    "CHILE": "FOREIGN", 
    "CILLIE": "FOREIGN", 
    "BUENOS AIRES": "FOREIGN", 
    "CINCINNATI": "FOREIGN",
    "LAS VEGAS": "FOREIGN",
    "GA": "FOREIGN", 
    "SAN DIEGO": "FOREIGN", 
    "BRAZIL": "FOREIGN", 
    "NOR": "FOREIGN", 
    "RUSSIA": "FOREIGN", 
    "MEDELLIN": "FOREIGN",
    "BROOKLYN": "FOREIGN", 
    "N": "FOREIGN",
    "P C BEACH": "FOREIGN",
    "IND CRK VLG": "FOREIGN", 
    "BOGATA": "FOREIGN", 
    "SAN ANTONIO": "FOREIGN",
    "UNK":"FOREIGN", 
    "NLA":"FOREIGN",                             
    'ALAMONTE SPRING': "FOREIGN",
    'AVENTURA': "MIAMI",
    'BAL HARBOUR': "FOREIGN",
    'BAY COUNTY': "FOREIGN",
    'BIG PINE FL': "FOREIGN",
    'BIG PINE KEY': "FOREIGN",
    'BOCA RATON': "FOREIGN",
    'BOYNTON BEACH': "FOREIGN",
    'BRICKELL': "MIAMI",
    'CHATTAHOOCHEE': "FOREIGN",
    'CLEWISTON': "FOREIGN",
    'COCONUT CREEK': "MIAMI",
    'COCONUT GROVE': "MIAMI",
    'COLEMAN': "FOREIGN",
    'CORAL GABLES': "MIAMI",
    'CORAL SPRINGS': "MIAMI",
    'CPE CANAVERAL': "FOREIGN",
    'CUTLER BAY': "MIAMI",
    'DANIA': "FOREIGN",
    'DANIA BEACH': "FOREIGN",
    'DAVENPORT': "FOREIGN",
    'DAVIE': "FOREIGN",
    'DE LEON SPRINGS': "FOREIGN",
    'DEERFIELD BEACH': "FOREIGN",
    'DELRAY BEACH': "FOREIGN",
    'DORAL': "MIAMI",
    'E BAY VILLAGE': "FOREIGN",
    'EL PORTAL': "MIAMI",
    'FL': "FOREIGN",
    'FLORIDA CITY': "MIAMI",
    'FORT LAUDERDALE': "FOREIGN",
    'FORT MYERS': "FOREIGN",
    'FORT PIERCE': "FOREIGN",
    'FORT WALTON BEACH': "FOREIGN",
    'FRISCO': "FOREIGN",
    'FRNG': "FOREIGN",
    'HAINES CITY': "FOREIGN",
    'HALLANDALE BEACH': "FOREIGN",
    'HAMILTON': "FOREIGN",
    'HIALEAH': "MIAMI",
    'HIALEAH GARDENS': "MIAMI",
    'HILLSBOROUGH': "FOREIGN",
    'HOLLYWOOD': "FOREIGN",
    'HOMESTEAD': "MIAMI",
    'JACKSONVILLE': "FOREIGN",
    'JASPER': "FOREIGN",
    'JAX BEACH': "FOREIGN",
    'JUPITER': "FOREIGN",
    'KALAMATA': "FOREIGN",
    'KENDALL': "MIAMI",
    'KEY BISCAYNE': "MIAMI",
    'KEY COLONY BEACH': "FOREIGN",
    'KEY LARGO': "FOREIGN",
    'KEY WEST': "FOREIGN",
    'KISSIMMEE': "FOREIGN",
    'LAKE CITY': "FOREIGN",
    'LAKE WORTH': "FOREIGN",
    'LAUDERDALE LAKES': "FOREIGN",
    'LAUDERHILL': "FOREIGN",
    'LEEHIGH ACRES': "FOREIGN",
    'LOS ANGELES': "FOREIGN",
    'LOVINGTON': "FOREIGN",
    'LTL TORCH KEY': "FOREIGN",
    'MARATHON': "FOREIGN",
    'MARGATE': "FOREIGN",
    'MIAMI': "MIAMI",
    'MIAMI BEACH': "MIAMI",
    'MIAMI DADE': "MIAMI",
    'MIAMI GARDENS': "MIAMI",
    'MIAMI LAKES': "MIAMI",
    'MIAMI SPRINGS': "MIAMI",
    'MIAMIGO': "MIAMI",
    'MIAQMI': "MIAMI",
    'MILTON': "FOREIGN",
    'MIRAMAR': "FOREIGN",
    'MOUNT DORA': "FOREIGN",
    'N LAUDERDALE': "FOREIGN",
    'NAPLES': "FOREIGN",
    'NEW PORT RICHEY': "FOREIGN",
    'NEW SMYRNA': "FOREIGN",
    'NO MIAMI BEACH': "MIAMI",
    'NORTH BAY VILLAGE': "MIAMI",
    'NORTH BEACH': "MIAMI",
    'NORTH FORT MYERS': "FOREIGN",
    'NORTH LAUDERDALE': "FOREIGN",
    'NORTH MIAMI': "MIAMI",
    'NORTH MIAMI BEACH': "MIAMI",
    'NORTH PORT': "FOREIGN",
    'OAK LAND PARK': "FOREIGN",
    'OAKLAND': "FOREIGN",
    'OAKLAND PARK': "FOREIGN",
    'OKEECHOBEE': "FOREIGN",
    'OLD TOWN': "FOREIGN",
    'OPA-LOCKA': "MIAMI",
    'ORANGE CTY': "FOREIGN",
    'ORLANDO': "FOREIGN",
    'PALM BEACH': "FOREIGN",
    'PALMETTO BAY': "MIAMI",
    'PANAMA CITY': "FOREIGN",
    'PEMBROKE PINES': "FOREIGN",
    'PLANTATION': "FOREIGN",
    'PLANTION': "FOREIGN",
    'POMPANO BEACH': "FOREIGN",
    'PORT CHARLOTTE': "FOREIGN",
    'PORT ST LUCIE': "FOREIGN",
    'QUINCY': "FOREIGN",
    'RESCUE MISSION': "MIAMI",
    'SAFETY HARBOUR': "FOREIGN",
    'SAN MATEO': "MIAMI",
    'SEBRING': "FOREIGN",
    'SILVER SPRINGS': "FOREIGN",
    'SINGER ISLAND': "FFOREIGN",
    'SOUTH MIAMI': "MIAMI",
    'SOUTHWEST RANCH': "FOREIGN",
    'SPRING HILL': "FOREIGN",
    'ST AUGUSTINE': "FOREIGN",
    'ST CLOUD': "FOREIGN",
    'ST JAMES CITY': "FOREIGN",
    'ST PETE BEACH': "FOREIGN",
    'ST PETERSBURG': "FOREIGN",
    'STUART': "FOREIGN",
    'SUNNY ISLES': "MIAMI",
    'SUNNY ISLES BEACH': "MIAMI",
    'SUNRISE': "FOREIGN",
    'SURFSIDE': "MIAMI",
    'SW RANCHES': "FOREIGN",
    'TALLAHASSEE': "FOREIGN",
    'TAMARAC': "FOREIGN",
    'TAMPA': "FOREIGN",
    'TITUSVILLE': "FFOREIGN",
    'WALES': "FOREIGN",
    'WEST FORT MYERS': "FOREIGN",
    'WEST PALM BEACH': "FOREIGN",
    'WEST PARK': "FOREIGN",
    'WILLOUGHBY': "FOREIGN",
    'WINTER PARK': "FOREIGN",
    'UNKNOWN': "FOREIGN"
    })

In [35]:
# Drop empty categories (Crime_Family)
df = df.dropna(subset=['Crime_Family'])

In [36]:
# Rename cities
df["City"] = df["City"].replace({"AV": "AVENTURA", "HARBOR":"BAL HARBOUR", "BAL HARBOR":"BAL HARBOUR", "BAY HARBOR IS":"BAL HARBOUR", \
                                 "BAY HARBOR":"BAL HARBOUR", "BAY HARBOR ISL":"BAL HARBOUR", "BAY HARBOR ISLA":"BAL HARBOUR", "BAY HARBOR ISLD":"BAL HARBOUR", \
                                 "BOYTON BEACH":"BOYNTON BEACH", "BRICKLE":"BRICKELL","CUTLUR BAY": "CUTLER BAY", "DANIA BCH":"DANIA BEACH", "DEERFIELD": "DEERFIELD BEACH",\
                                 "FL CITY":"FLORIDA CITY","FL LAUDERDALE": "FORT LAUDERDALE", "FT":"FORT LAUDERDALE", "FT LAAUDERDALE":"FORT LAUDERDALE", \
                                 "FT LADERDALE":"FORT LAUDERDALE", "FT LAUDARDALE":"FORT LAUDERDALE","FT LAUDEDALE":"FORT LAUDERDALE",\
                                 "FT LAUDERDALE":"FORT LAUDERDALE","FT LAUDREDALE":"FORT LAUDERDALE", "FT LAUERDALE":"FORT LAUDERDALE",\
                                 "FT LAUNDERDALE":"FORT LAUDERDALE","FT LUADERDALE":"FORT LAUDERDALE", 'FT. LAUD':"FORT LAUDERDALE", \
                                 'FT. LAUDALE':"FORT LAUDERDALE", 'FT. LAUDERDAL':"FORT LAUDERDALE",'FT. LAUDERDALE':"FORT LAUDERDALE",\
                                 'FT. LUADERDALE':"FORT LAUDERDALE","FTLAUDERDALE":"FORT LAUDERDALE", "FT MYERS":"FORT MYERS", \
                                 "FT MYERS BCH":"FORT MYERS" , "FT MYERS BEACH":"FORT MYERS",\
                                 'FT. MEYERS':"FORT MYERS", 'FT. MYERS':"FORT MYERS", "FT.MYERS":"FORT MYERS", "FT PIERCE": "FORT PIERCE","FT. PIERCE":"FORT PIERCE",\
                                 "FT WALTON BCH":"FORT WALTON BEACH","FT. WALTON BEAC":"FORT WALTON BEACH", "HALLANDALE":"HALLANDALE BEACH",\
                                 "HALLANDALE BCH":"HALLANDALE BEACH","HALLANDALE BEAC": "HALLANDALE BEACH","HALLENDALE":"HALLANDALE BEACH",\
                                 "HALLENDALE BEAC":"HALLANDALE BEACH","HAMILTEN":"HAMILTON","HOLLLYWOOD":"HOLLYWOOD","HOLLWOOD":"HOLLYWOOD",\
                                 "HOLLYWWOD":"HOLLYWOOD","HOMESTEAN":"HOMESTEAD","KENDAL":"KENDALL","KEY COLONY BCH":"KEY COLONY BEACH",
                                "LAUDERDALE BY S":"SINGER ISLAND","LAUDERDALE LAKE":"LAUDERDALE LAKES","LAUDERLAKES":"LAUDERDALE LAKES",\
                                 "LAURDALE LAKES":"LAUDERDALE LAKES","LAURDERDALE LAK":"LAUDERDALE LAKES","LEE HIGH ACRES":"LEEHIGH ACRES",\
                                 "LEHIGH ACRES":"LEEHIGH ACRES","MI":"MIAMI","MIAM":"MIAMI","MIAMIWN":"MIAMI","MIAAMI":"MIAMI","MAIMI":"MIAMI",\
                                 "MIAI":"MIAMI","MIAIMI":"MIAMI","MIAMI":"MIAMI",\
                                 "MIAMI BCH":"MIAMI BEACH","MIAMI FL":"MIAMI","MIAMI GARDENJS":"MIAMI GARDENS","MIAMI MIAMI":"MIAMI","MIOAMI":"MIAMI",\
                                 "MIRMAR":"MIRAMAR","N BAY VILLAGE":"NORTH BAY VILLAGE","N FT MYERS":"NORTH FORT MYERS","N LAUDERDALE":"NORTH LAUDERDALE",\
                                 "N LAUDERHILL":"N LAUDERDALE","N MIAMI":"NORTH MIAMI","N MIAMI BCH":"NORTH MIAMI BEACH","N MIAMI BEACH":"NORTH MIAMI BEACH",\
                                 "N MIAMI FL":"NORTH MIAMI","N. BAY VILLAGE":"NORTH BAY VILLAGE","N. LAUDERDALE":"NORTH LAUDERDALE","N. MIAMI":"NORTH MIAMI",\
                                 "N. MIAMI AVE":"NORTH MIAMI","N. MIAMI BCH":"NORTH MIAMI BEACH","N. MIAMI BEACH":"NORTH MIAMI BEACH",\
                                 "N. MIAMI BEACHF":"NORTH MIAMI BEACH","NEW PRT RCHY":"NEW PORT RICHEY","NEW SHYRNA":"NEW SMYRNA",\
                                 "NEW SMYRNA BEAC":"NEW SMYRNA","NORTH LAUDERDAL":"NORTH LAUDERDALE","NORTH MIA BCH":"NORTH MIAMI BEACH",\
                                 "NORTH MIAMI BCH":"NORTH MIAMI BEACH","NORTH MIAMI BEA":"NORTH MIAMI BEACH","OPA LOCKA":"OPA-LOCKA","OP":"OPA-LOCKA",\
                                 "OPA LOCKA BLVD":"OPA-LOCKA","OPALOCKA":"OPA-LOCKA","PALM BAY":"PALM BEACH","PALMETTO BY":"PALMETTO BAY",\
                                 "PEMBROKE":"PEMBROKE PINES","PLANTATION":"PLANTATION","POMPANO":"POMPANO BEACH", "POMPANO BCH":"POMPANO BEACH",\
                                 "POPANO BEACH":"POMPANO BEACH","PORT LUCIE":"PORT ST LUCIE","PORT SAINT LUCI":"PORT ST LUCIE","PORT ST LUICE":"PORT ST LUCIE",\
                                 "PT CHARLOTTE":"PORT CHARLOTTE","PT SAINT LUCIE":"PORT ST LUCIE","PT ST LUCIE":"PORT ST LUCIE","S MIAMI":"SOUTH MIAMI",\
                                 "SAN DAGO":"SAN DIEGO","ST. AUGUSTINE":"ST AUGUSTINE","ST. CLOUD":"ST CLOUD","ST. PETERSBURG":"ST PETERSBURG",\
                                 "ST.CLOUD":"ST CLOUD","SUNNY ISLES BEA":"SUNNY ISLES BEACH","SUN ISLE BCH":"SUNNY ISLES BEACH","TAMPA FL":"TAMPA",\
                                 "W. PALM BEACH":"WEST PALM BEACH","W PALM BEACH":"WEST PALM BEACH","WEST PALM BCH":"WEST PALM BEACH","JAX BCH":"JAX BEACH",
                                "W FORT MYERS":"WEST FORT MYERS"})

In [37]:
# Rename address column
df["Address"] = df["Address"].replace({
    "HOMELESSS":"HOMELESS",
    "HOMELESS (DEF. CLAIMS HE":"HOMELESS",
    "HOMELESS REFUSED TO PROV":"HOMELESS",
    "HOMELESS IN MIAMI BEACH":"HOMELESS",
    "NLA": "FOREIGN", 
    "NO LOCAL": "FOREIGN", 
    "N/L/A": "FOREIGN", 
    "ADDRESS UNAVAILABLE":"UNKNOWN", 
    "NONE": "UNKNOWN", 
    "NO LOCAL ADDRESS": "FOREIGN", 
    "ADDRESS UVKNOWN": "UNKNOWN", 
    "N L A": "FFOREIGN", 
    "ADDRESS": "UNKNOWN", 
    "ADDRESS UMKNOWN": "UNKNOWN", 
    "UKNOWN": "UNKNOWN", 
    "HOMELESS TRANSIENT": "HOMELESS", 
    "LOS LITRES 111090, CHILE": "FOREIGN", 
    "UNKNOWN MIAMI DADE COUNT":"MIAMI",
    "UNK": "UNKNOWN", 
    "UNKNOWN ADDRESS": "UNKNOWN", 
    "UNKNOWNW": "UNKNOWN", 
    "UNKNOWN  Q": "UNKNOWN", 
    "UNKNOWN  801": "UNKNOWN",
    "UNKNOWN  4D": "UNKNOWN",
    "UNKNOWNOW": "UNKNOWN",
    "UNK ADDRESS": "UNKNOWN",
    "UNKNOWN  1068": "UNKNOWN",
    "UNKNOWNOWB": "UNKNOWN",
    "UNKNOWNAL A": "UNKNOWN", 
    "UNKNOWNT": "UNKNOWN"})

In [38]:
# Drop innecesary columns
df.drop(['BookDate','DOB','ChargeCode1','Charge1','Charge2','Charge3'], axis=1, inplace=True)

In [39]:
df.count()

Defendant             31255
Address               31252
Zip                    4881
City                  31255
State                 31255
Booking_month         31255
Booking_Date          31255
Booking_year_month    31255
DOB_month             31255
Date_of_birth         31255
Age                   31255
day_of_week           31255
Crime_Family          31255
Crime_Family1         31255
Crime_Family2         31255
CityRN                31255
dtype: int64

### Save final clean file

In [40]:
df.to_csv("Resources/data1_extract.csv",index=True,header=True)