# Final Project Part 2
## Group members: Rocky Ma, Carol Lu, Yifan Wang

## Introduction

Crime is a significant issue that affects communities worldwide, and the city of Los Angeles is no exception. According to the Los Angeles Police Department (LAPD), there were over 210,000 reported crimes in the city in 2020 alone, ranging from theft and robbery to assault and homicide. The impact of crime on communities is far-reaching, from personal trauma to economic costs, and it is crucial to understand its distribution and trends to address it effectively.

To this end, our team has undertaken a data visualization project on crime in Los Angeles City, utilizing a dataset representing the incidents of crime in the city dating back to 2020. To make the data more manageable, we narrowed our focus to the years 2023 and beyond, reducing the number of rows from 700,000 to 60,000. Through our project, we aim to provide insights into the distribution and patterns of crime in LA City, allowing for better decision-making and policy implementation.

## Dashboard

Our dashboard comprises two main components: the left section showcases a plot that displays the age, descent, and sex of victims population, while the right section presents the count of criminal records grouped by area names of the city.

The left section of the dashboard focuses on the demographics of crime victims, providing an insight into who is most vulnerable to crime with different colors representing different demographic groups. By selecting a particular area, users can view the specific number of criminal records connected to the selected victim group, and distributions in different areas. This interactivity makes it easier to compare the demographics of victims in different ages, descent, gender between different regions of the city.

### Dataset Size
The original dataset includes the crime data from LAPD from 2020 to present and is about 178MB. However, our group decide to focus on the crime data in 2023 to provide the latest analysis, so the actual dataset is only about 17MB. We don't have to worry about the Github upload limit. 

In [2]:
import pandas as pd
import altair as alt
import seaborn as sns
import geopandas

In [3]:
la_crime = pd.read_csv("https://data.lacity.org/api/views/2nrs-mtv8/rows.csv?accessType=DOWNLOAD")
la_crime.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Part 1-2,Crm Cd,Crm Cd Desc,...,Status,Status Desc,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LOCATION,Cross Street,LAT,LON
0,10304468,01/08/2020 12:00:00 AM,01/08/2020 12:00:00 AM,2230,3,Southwest,377,2,624,BATTERY - SIMPLE ASSAULT,...,AO,Adult Other,624.0,,,,1100 W 39TH PL,,34.0141,-118.2978
1,190101086,01/02/2020 12:00:00 AM,01/01/2020 12:00:00 AM,330,1,Central,163,2,624,BATTERY - SIMPLE ASSAULT,...,IC,Invest Cont,624.0,,,,700 S HILL ST,,34.0459,-118.2545
2,200110444,04/14/2020 12:00:00 AM,02/13/2020 12:00:00 AM,1200,1,Central,155,2,845,SEX OFFENDER REGISTRANT OUT OF COMPLIANCE,...,AA,Adult Arrest,845.0,,,,200 E 6TH ST,,34.0448,-118.2474
3,191501505,01/01/2020 12:00:00 AM,01/01/2020 12:00:00 AM,1730,15,N Hollywood,1543,2,745,VANDALISM - MISDEAMEANOR ($399 OR UNDER),...,IC,Invest Cont,745.0,998.0,,,5400 CORTEEN PL,,34.1685,-118.4019
4,191921269,01/01/2020 12:00:00 AM,01/01/2020 12:00:00 AM,415,19,Mission,1998,2,740,"VANDALISM - FELONY ($400 & OVER, ALL CHURCH VA...",...,IC,Invest Cont,740.0,,,,14400 TITUS ST,,34.2198,-118.4468


In [4]:
la_crime['Date Rptd'] =  pd.to_datetime(la_crime['Date Rptd'])
la_crime = la_crime[(la_crime['Date Rptd'] > '2023-01-01') & (la_crime['Vict Sex']!='H')]

In [5]:
la_crime.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 71493 entries, 148620 to 708083
Data columns (total 28 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   DR_NO           71493 non-null  int64         
 1   Date Rptd       71493 non-null  datetime64[ns]
 2   DATE OCC        71493 non-null  object        
 3   TIME OCC        71493 non-null  int64         
 4   AREA            71493 non-null  int64         
 5   AREA NAME       71493 non-null  object        
 6   Rpt Dist No     71493 non-null  int64         
 7   Part 1-2        71493 non-null  int64         
 8   Crm Cd          71493 non-null  int64         
 9   Crm Cd Desc     71493 non-null  object        
 10  Mocodes         61694 non-null  object        
 11  Vict Age        71493 non-null  int64         
 12  Vict Sex        62131 non-null  object        
 13  Vict Descent    62129 non-null  object        
 14  Premis Cd       71493 non-null  float64       
 

In [None]:
la_crime.nunique()

In [None]:
# Disable Max Rows check
alt.data_transformers.disable_max_rows()

In [None]:
la_status_chart = alt.Chart(la_crime).mark_line().encode(
    alt.X('Date Rptd:T', title = 'Date Reported'),
    alt.Y('count():Q'),
    alt.Color('Status Desc:N',title = 'Status Description')
)

In [None]:
# Save chart
#la_status_chart.save('la_status_chart.json')

In [None]:
# Build Dashboard
brush = alt.selection_interval(encodings=['x','y'])

area = alt.Chart(la_crime).mark_bar().encode(
    x = alt.X('AREA NAME:N'),
    y = alt.Y('count():Q'),
).transform_filter(
    brush
)

vict = alt.Chart(la_crime).mark_point(shape='stroke').encode(
    alt.X('Vict Descent:N', title = 'Victim Descent'),
    alt.Y('Vict Age:Q', title = 'Victim Age'),
    color='Vict Sex'
).properties(
    width=550,
    height=300
).add_selection(
brush
)

dash = vict.properties(width=400) | area.properties(width=400)

dash

In [None]:
# Save Dashboard
#dash.save('dash.json')

In [None]:
# Save transformed dataset
#la_crime.to_csv('la_crime_2023.csv')

## Contextual Dataset
URL: https://data.cityofnewyork.us/api/views/833y-fsy8/rows.csv?accessType=DOWNLOAD
Name of the dataset: NYPD Shooting Incident Data

This dataset contains information on all shooting incidents that have taken place in New York City from 2006 until the end of the previous calendar year. The data is collected quarterly and reviewed by the Office of Management Analysis and Planning before being made available on the NYPD website. Each record in the dataset represents a specific shooting incident and includes details about when and where it occurred. It also includes information about the demographics of both the victim and the suspect. This dataset can be used by the public to gain insights into patterns and trends related to criminal activity involving firearms. 

We have two plots below. The first one is also interactive and contains two parts. The heat map shows the relationship between victims and perpetrators and categorize them in different race. When you click the square, it show the area of the incidents happening. The second one shows the number of shootings from 2006 to 2022 by areas in New York City.

In [None]:
# Load contextual dataset
nyc_shoot = pd.read_csv("https://data.cityofnewyork.us/api/views/833y-fsy8/rows.csv?accessType=DOWNLOAD")
nyc_shoot.head()

In [None]:
nyc_shoot.info()

In [None]:
nyc_shoot.nunique()

In [None]:
click = alt.selection_single(encodings=['x','y'])

nyc_area = alt.Chart(nyc_shoot).mark_bar().encode(
    alt.Y('BORO:N', title = 'Area'),
    alt.X('count():Q'),
    color = alt.Color('BORO:N', title = 'Area')
).transform_filter(
    click
)

race = alt.Chart(nyc_shoot).mark_rect().encode(
    alt.X('PERP_RACE:N', title = 'Perpetrator Race'),
    alt.Y('VIC_RACE:N', title = 'Victim Race'),
    color='count():Q'
).add_selection(
    click
)

nyc_dash = nyc_area.properties(height = 150) & race.properties(width = 250, height = 250)
nyc_dash

In [None]:
# Save Dashboard
#nyc_dash.save('nyc_dash.json')

In [None]:
# Extract Year
nyc_shoot['OCCUR_DATE'] =  pd.to_datetime(nyc_shoot['OCCUR_DATE'])
nyc_shoot['Year'] = nyc_shoot['OCCUR_DATE'].dt.year

In [None]:
nyc_year_chart = alt.Chart(nyc_shoot).encode(
    alt.X('Year:N'),
    alt.Y('count():Q'),
    alt.Color('BORO:N', title='Area')
).mark_bar().properties(title='Number of Shoots from 2006 to 2022 by Area').properties(
    width = 600,
    height = 400
)

In [None]:
nyc_year_chart

In [None]:
# Save nyc_year_chart
#nyc_year_chart.save('nyc_year_chart.json')

## Conclusion

When people are thinking where to live, safety is usually considered to be one of the most important factors. So, we found out a way to evaluate whether a city is dangerous or not through crime data. We chose LA city data as an example to make an interactive visualization plot showing the crime situation of the city. The result showed many useful information and let us learn the importance of data visualization in real world.