# US CRIME - Effect of Punishment Regimes on Crime Rates

Author: **Artur Chiaperini Grover**   
**Company**'s entrance exam      

-------   

In this exercise, I will create a regression model in order to find out what is the crime rate, denoted by variable `Crime`, for the following point (where variables assume the values):
- `M = 14.0`
- `So = 0`
- `Ed = 10.0`
- `Po1 = 12.0`
- `Po2 = 15.5`
- `LF = 0.640`
- `M.F = 94`
- `Pop = 150`
- `NW = 1.1`
- `U1 = 0.120`
- `U2 = 3.6`
- `Wealth = 3200`
- `Ineq = 20.1`
- `Prop = 0.04`
- `Time = 39.0` 
- `Crime = ???` 

-------    

# Data Description

The dataset can be found in [here](http://www.statsci.org/data/general/uscrime.html), and the information below was extracted from it.

>In order to investigate the effect of punishment regimes on crime rates, criminologists used the aggregated data of 47 states of the USA for 1960. The variables considered in this study are the following:   
>
>| Variable | Description |
>|:---------|:------------|
>| M		| percentage of males aged 14–24 in total state population |
>| So	    | indicator variable for a southern state |
>| Ed	    | mean years of schooling of the population aged 25 years or over |
>| Po1	    | per capita expenditure on police protection in 1960 |
>| Po2	    | per capita expenditure on police protection in 1959 |
>| LF	    | labour force participation rate of civilian urban males in the age-group 14-24 |
>| M.F	    | number of males per 100 females |
>| Pop	    | state population in 1960 in hundred thousands |
>| NW	    | percentage of nonwhites in the population |
>| U1       | unemployment rate of urban males 14–24 |
>| U2	    | unemployment rate of urban males 35–39 |
>| Wealth   | wealth: median value of transferable assets or family income |
>| Ineq	    | income inequality: percentage of families earning below half the median income |
>| Prob	    | probability of imprisonment: ratio of number of commitments to number of offenses |
>| Time	    | average time in months served by offenders in state prisons before their first release |
>| Crime	| crime rate: number of offenses per 100,000 population in 1960 |   

From this point on we begin our investigation in trying to understand how the variables above contribute to the crime rates.



In [2]:
import pandas as pd
import numpy as np

In [3]:
df_crimes = pd.read_csv("../dataset/uscrime.txt", sep="\t")

In [4]:
df_crimes.head(5)

Unnamed: 0,M,So,Ed,Po1,Po2,LF,M.F,Pop,NW,U1,U2,Wealth,Ineq,Prob,Time,Crime
0,15.1,1,9.1,5.8,5.6,0.51,95.0,33,30.1,0.108,4.1,3940,26.1,0.084602,26.2011,791
1,14.3,0,11.3,10.3,9.5,0.583,101.2,13,10.2,0.096,3.6,5570,19.4,0.029599,25.2999,1635
2,14.2,1,8.9,4.5,4.4,0.533,96.9,18,21.9,0.094,3.3,3180,25.0,0.083401,24.3006,578
3,13.6,0,12.1,14.9,14.1,0.577,99.4,157,8.0,0.102,3.9,6730,16.7,0.015801,29.9012,1969
4,14.1,0,12.1,10.9,10.1,0.591,98.5,18,3.0,0.091,2.0,5780,17.4,0.041399,21.2998,1234
