Capstone Project By: Zhengzheng Wang, Ajanthan Mathialagan, Zihui Yu, Mingming Wei and Harbrinder Bhullar
“Arrests” dataset is based on the police treatment of individuals arrested in Toronto for possession of small amounts of marijuana from 1997 to 2002. The dataset is just a part of a large data set mentioned in a series of articles in the Toronto star. The dataset contains 5226 observations with 8 variables as below.
released: whether or not the person who is arrested is released with a summon (Yes or No) colour: The arrested persons race (Black or White) year: from 1997 - 2002 age: The age of the arrested person in years sex: Gender of the arrested person (Male or Female) employed: Is the arrested person employed (Yes or No) citizen: Is the person a citizen of toronto (Yes or No) checks: Number obtained from the police databases (of previous arrests, previous conviction, parole status, etc.) the arrested persons name appeared upon labeled from 1 to 6
According to the dataset, the variable “released” is the independent variable y, and the rest are dependent variables. In this project, we will build two models using logistic regression method, compare the two models in different ways, and find out the factors which can influence the independent variable “released” significantly in order to explore the patterns of discrimination in the dataset.