![image.png](attachment:image.png)

# Programming and Scripting Project 2020- Simulating a Dataset

## Aaron Donnelly                             G00299531

### Introduction

#### Within this project I will be looking at simulating a dataset related to the likelihood of recidivism in the United States among released inmates within a 9 year period. I will be simulating the dataset based on statistics from official government studies related to recidivism in the U.S[1]. The three variables that I believe have an influence on the likelihood of an individual returning to prison is race, gender, and the number of years released. Off course there are a large number of additional factors that should be taken into account for example number of years in prison and location in the U.S but for the sake of the project I will be only using the three variables mentioned as statisitics from these variables are provided by the document. Recidivism can be defined as the tendency of a convicted criminal to re-offend. For the simulation of the dataset I will be using a very simple function contained in the Numpy.random package called .choice to randomly generate both the race and gender of the convicted criminals[2]. I will then use the .randint function to random select a year between 1 and 9 and based on these variables randomly generated, the likelihood of reoffending will be calculated using the statistics obtained from the government study in the final column.

#### The United States account for about 4.2% of the world’s population however also account for 25% of the total number of individuals incarcerated globally. The United States has the highest incarceration rate on the planet surpassing both North Korea and Russia. In the years between 1984 and 2014 incarceration rates have increased by as much as 400%. New polices in the United States during this time span has increased the minimum sentence an individual can receive and that coupled with the ongoing war on drugs have resulted in a spike of incarcerations which now costs America around 60 billion dollars per year [3]. One such policy is the 1994 crime bill.

### The 1994 Crime Bill

#### The 1994 crime bill was a bill signed by President Bill Clinton in an effort to curb the spike in crime rate heavily linked to drug use in the United States in particular the use of crack cocaine. This bill eliminated a high number of semi-automatic rifles along with providing funding for 100,000 extra police officers to increase the amount of community policing in ‘problem’ areas. One of the most significant outcomes from the 1994 crime bill was the provision of 12 billion dollars to pay states to increase their prison housing capability and even build additional prisons if necessary. This was with the view to having sentenced convicts serve 85% of their sentences behind bars. Another significant out come from this bill was the introduction of habitual offender laws. This included a new ‘3 strike rule’ which would ensure that anyone who was arrested 3 times with one arrest being for a violet act would be given a mandatory life sentence. This has led to a lot of controversy in recent years as to the intention of the government at the time as the federal government was effectively paying states to incarcerate more people. This bill is now seen by many senior members of both political parties to be a complete failure especially with what is known now about the effects of incarceration on individuals and the likelihood of re-offending due to incarceration [4]. 

#### In the United States black inmates make up nearly 40% of the prison population despite only accounting for 13.4% of the population of America. There are many reasons for this.  Discrimination against African Americans among police officers is one such reason. In America the ACLU found that black people are 3.7 times more likely to be arrested for possession of marijuana than a white person in some states this figure could be as high as 6. Unemployment rates among African Americans are also quite low. Many African Americans come from broken homes where parents are no longer together. This makes children from such homes susceptible to joining gangs and subsequently getting caught up in criminal activity. Often times when individuals are arrested they cannot afford to pay the bail necessary to be released from jail. This can increase the likelihood of these individuals picking up an additional charge within the prison [5]. The Bureau of Prisons reported that a federal inmate can cost the tax payer approximately 36,299.25 dollars. To put that in perspective the average American salary in the same year was 48,251.57 dollars[6].

### Simulating a Dataset

##### I have decided to generate a dataset based on a report from the US Department of Justice[1]. This report looks at a 9 year study of the rate of recidivism of convicted fellons between the year 2005 to 2014. The report contains the total probability of a convicted fellon returning to prison in the 9 year period and then breaks the likelyhood of returning to prison down into year by year probabilities. The report also shows the probability of returning to prison based on gender and race. The below piece of code will randomly simulate 100 individuals each with different races, genders and length of time out of prison. The fourth variable will then be calculated based of the other three to give a likelhood of returning to prison on that particular year. The random.choice function from the Numpy package is used randomly generate the individuals and their attributes. 

##### Relationship between the first three variables

##### From the government document used to simulate this dataset I used the three variables Gender, Race and Number of Years Released (1-9). The document provided statistics for each of these variables. Of all the prisoners released 89.3% of these were male and 10.7% of these where female. I then had to alter these statistics given the fact only 83% of the total amount in the group actually reoffended. I did this using the 'and' rule in statistics. From the same document it can be seen that of the total amount of individuals released 39.7% were white, 40.1% were Black/African American, 17.7% were Hispanic/Latino and 2.4% were considered 'Other'. I then made the adjustment again to account for the fact that only 83% of the individuals released re-offended. The document then provided the breakdown of re-offences in the 9 year period. 43.9% reoffended in year 1, 16.2% reoffended in year 2, 8.3% reoffended in year 3, 5.1% reoffended in year 4, 3.5% reoffended in year 5, 2.3% reoffended in year 6, 1.7% reoffended in year 7, 1.3% reoffended in year 8 and 1.0% reoffended in year 9.

##### Given the above statistics, the most 'at risk of re-offending' individual would be a Black/African American male in his first year of being released yielding a probababilty of re-offending at 10.8%

In [3]:
# Firstly packages numpy and pandas are imported[7].
import numpy as np
import pandas as pd

# The set_option function in pandas is used to ensure the full dataset can be viewed.
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

# The inital three variables are defined in array format.
Gender= ['Male', 'Female']
Race = ['White', 'Black', 'Hispanic/Latino', 'Other']
Yr = ['1','2','3','4','5','6','7','8','9']
# Random values are selected using the numpy.random.choice function for the first three variables.
x= np.random.choice(Gender, size=100)
y = np.random.choice(Race, size= 100)
z = np.random.choice (Yr, size = 100)


df = pd.DataFrame({'Gender':x, 'Race':y, 'Years_Released':z})
# A dummy list is generated to assign probabilty values to each variable
chance_gender = df.Gender.map({'Male':0.7412, 'Female':0.0888})
chance_race = df.Race.map({'White':0.3295, 'Black':0.3328, 'Hispanic/Latino':0.1469, 'Other':0.0199})
chance_Years_Released = df.Years_Released.map({'1':0.439, '2':0.162,'3':0.083, '4':0.051, '5':0.035, '6': 0.024, '7':0.017, '8':0.013, '9':0.010})
# The dataframe below will now show a random variable for the first three values followed by a probability value in the forth column.
# This value is calculated based on the three variables randomly generated.
df = pd.DataFrame({'Gender':x, 'Race':y, 'Years Released':z, 'Probability of Being Arrested this Year (%)':(chance_gender*chance_race*chance_Years_Released)*100})

df





    


Unnamed: 0,Gender,Race,Years Released,Probability of Being Arrested this Year (%)
0,Male,Other,4,0.075224
1,Male,Hispanic/Latino,1,4.779932
2,Female,Hispanic/Latino,3,0.108271
3,Female,Black,7,0.050239
4,Male,Other,2,0.238948
5,Female,Hispanic/Latino,6,0.031307
6,Female,White,7,0.049741
7,Female,Black,5,0.103434
8,Male,Other,7,0.025075
9,Female,Hispanic/Latino,5,0.045657


### Conclusion

##### The above dataset shows 100 convicted felons who have been released from prison over a 9 year period. Based on the 2018 Update on Prisoner Recidivism from the U.S Department of Justice the probability of each individual re-offending in the current year is calculated as a percentage in the 4th column. This dataset simulation is purely fictional and ommits a large amount of other key variables that would result in a dataset closer matching reality however for the purpose of the exercise has demonstrated how a dataset can be simulated by applying a simple calculation to known statistics to generate another variable.

### References

##### [1] Alper, M., Durose, M. and Markman, J., 2018. 2018 Update on Prisoner Recidivism: A 9- Year Follow-up Period (2005-2014). U.S Department of Justice, Office of Justice Programs, Bureau of Justice Statistics, p1-24.
##### [2] Numpy.org. 2020. Random sampling (numpy.random). [ONLINE] Available at: https://numpy.org/doc/stable/reference/random/index.html. [Accessed 9 December 2020].
##### [3] Prisonpolicy.org. 2020. Prison Policy Initiative. [ONLINE] Available at: https://www.prisonpolicy.org/. [Accessed 9 December 2020].
##### [4] NBC news. (2019). What Is The 1994 Crime Bill? | NBC News Now. [Online Video]. 14 August 2019. Available from: https://www.youtube.com/watch?v=0DcN6wNKxZA. [Accessed: 12 December 2020].
##### [5] Huffpost.com. 2016. 40 Reasons Why Our Jails Are Full of Black and Poor People. [ONLINE] Available at: https://www.huffpost.com/entry/40-reasons-why-our-jails-are-full-of-black-and-poor-people_b_7492902?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuYmluZy5jb20v&guce_referrer_sig=AQAAACgEGv9JuskWRxsOsgpgVnPG__QseZVG5PQEH_7H51-KdzF6xSlokrv2eQP7Nw4ef0hkHoqHe-7oo4FvXzPAVSOIDW64ISDUrTdHpv_CaSBRUWJiTHdvPpjSRagmAP3rZwAgQ2Fzxg6QHLZV5FsIdo3S7b6zm48eRryVd8uJjEs-. [Accessed 12 December 2020].
##### [6] gobankingrates.com. 2020. How Much Do Prisons Cost Taxpayers?. [ONLINE] Available at: https://www.gobankingrates.com/taxes/filing/wont-believe-much-prison-inmates-costing-year/#:~:text=In%202018%2C%20the%20Bureau%20of%20Prisons%20reported%20that,annual%20expense%20of%20nearly%20%245.8%20billion%20per%20year.. [Accessed 13 December 2020].
##### [7] https://pandas.pydata.org/. 2020. pandas. [ONLINE] Available at: https://pandas.pydata.org/. [Accessed 13 December 2020].