Skip to content

Latest commit

 

History

History
34 lines (19 loc) · 2.95 KB

project_proposal.md

File metadata and controls

34 lines (19 loc) · 2.95 KB

Data Science Final Project Proposal

11/7/2018

Group Members

Amin Yakubu (ay2146, GitHub: aminyakubu), Jyoti Ankam (jva2106, GitHub: jyotiankam), Yaa Asantewaa Klu (ykk2116, GitHub:yaaklu), Jacky Choi (jmc2392, GitHub: jmc2392)

Tentative Project Title

Factors Associated With Crime Distribution in New York City

Motivation for Project: We thought it would be interesting to look at a dataset containing information on New York Police Department (NYPD) crime complaints because there is a wealth of information and variables in the dataset. It is particularly exciting that we will be able to study the variables longitudinally, as well as by discrete categories such as borough and type of crime. Furthermore, the dataset provides both victim and suspect demographic information, allowing us to link this information to either the type of crime, the location of crime, or to each other. Lastly, the dataset provides ample opportunities to visualize our research hypotheses graphically.

We also intend to look for associations between crime and time (day/seasonality), and other trends.

Intended Final Products: As stated on the course website, the intended final products include a written report, webpage overview, explanatory video, peer assessments, and an in-class presentation.

Anticipated Data Sources

We anticipate that we will be using the dataset collected by the NYPD - specifically, we will be using the NYPD Historic Complaint dataset, which provides longitudinal information on complaints filed to the NYPD, the type of crimes committed by a suspect, suspect demographics, victim demographics, location of crime, date and time of crime, and other variables.

The link to the dataset is here: https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i

Planned Analyses/Visualizations/Coding Challenges

In terms of analyses, we plan on using both graphical elements (e.g., ggplot, plotly, shiny, dashboard) and data frames/tables to demonstrate longitudinal trends and trends by type of crime or location. We will be using flexdashboard and shiny app controls to provide stratification by different variables. Some challenges that we anticipate involve complex geospatial analysis and usage of shiny for visualization.

Planned Timeline

We will adhere to the timeline provided by the final project instructions posted on our course website, including meeting with the teaching to discuss our project during November 12 - 16, providing a written report about the project by December 6, a webpage overview of the project with a short explanatory video by December 6, completing peer assessments by December 6, and discussing our projects in class on December 11.

We plan to meet frequently during the month of November to further hone in on project variables we wish to study, assign tasks and responsibilities to each group member, discuss ongoing issues or problems as they arise, and to discuss progress on each task.