# Crime in Birmingham 

## Introduction

This report will display the most dangerous areas in Birmingham and the potential venues they have an effect on. This will showcase which areas are safer to live in or even open a business in. 

## The Data

I will be using data from the National Police database. The data contains the road names, coordinates which i will use with the Foursquare api to analyse the surrounding areas where crime happens the most. With this data it is my plan to find any potential links or patterns that could explain why the crime rate is so high.

### Importing Libraries 

In [1]:
import numpy as np
import pandas as pd
import folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import requests
import json 

### Creating the Dataframe

In [59]:
df = pd.read_csv('/Users/oliver/Desktop/projects/Course Final Project/2019-01-west-midlands-street.csv')

#### Data is spread across 12 different spreadsheets so the following code is appending them all into one.

for i in range(2,13):
    if i < 10:
        a = 0
    else:
        a = ''
    df1 = pd.read_csv('/Users/oliver/Desktop/projects/Course Final Project/2019-'+ str(a) + str(i) +'-west-midlands-street.csv')
    df = df.append(df1)

## Removing the coloumns/rows with No data

In [61]:
df = df.drop(['Context'], axis=1)

In [62]:
df = df.dropna()

## Selecting all the Data for Birmingham from the dataframe

In [63]:
df = df[df['LSOA name'].str.contains('Birmingham')].reset_index(drop=True)

#### The Location Column all contain 'On or near' before the road name so the following code will remove that in order for us to use this data

In [64]:
location = df['Location'].str.replace('On or near','')

In [65]:
new_column = pd.Series(location, name='Location')
df.update(new_column)

In [67]:
df.shape

(114372, 11)

## Finding Locations with Foursqaure Api

In [18]:
CLIENT_ID = '34DPB1NKCDHLPW4W3O24LXLKZ2BWFSFMJ3D3OLWNV4PXZBU1' # your Foursquare ID
CLIENT_SECRET = 'Z4NCOC2MXKQEIWYVT2GDGSIPCZ1JAL2LAABR2CCJ0XFGNOSK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 1
radius=500

In [57]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Filtering data to only show Necessary data and limiting to only areas with 100+ crimes

In [77]:
df1 = df1.groupby('Location').count()
df1 = df.drop(['Crime ID','Month', 'Reported by', 'Falls within', 'LSOA code', 'LSOA name','Crime type', 'Last outcome category'], axis=1)
df1['Count'] = df.groupby('Location')['Location'].transform('count')
df1 = df1.sort_values('Location')
df1 = df1.drop_duplicates(subset='Location', keep="first")
df1 = df1[df1['Count'] > 100]
df1

Unnamed: 0,Longitude,Latitude,Location,Count
39414,-1.844205,52.521681,A4040,141
48624,-1.857833,52.518624,Albert Road,248
93992,-1.884615,52.471519,Alcester Street,128
100100,-1.812045,52.463299,Ash Tree Drive,112
15413,-1.941253,52.445435,Aston Webb Boulevard,121
...,...,...,...,...
58428,-1.857453,52.508555,Tyburn Road,131
35404,-1.905922,52.427657,Vicarage Road,234
108532,-1.813344,52.483277,Victoria Road,146
97355,-1.941102,52.503636,Winsham Grove,121


## Getting Nearby Locations

In [78]:
birmingham_venues = getNearbyVenues(names=df1['Location'],
                                   latitudes=df1['Latitude'],
                                   longitudes=df1['Longitude'])

 A4040
 Albert Road
 Alcester Street
 Ash Tree Drive
 Aston Webb Boulevard
 Aubrey Road
 Bagot Street
 Blackmoor Croft
 Boundary Place
 Bristol Road South
 Broad Street
 Bus/Coach Station
 Chapel Lane
 Church Road
 College Road
 Conference/Exhibition Centre
 Coventry Road
 Devon Way
 Digbeth
 Dunsink Road
 Fowey Road
 Frances Road
 Further/Higher Educational Building
 Hagley Road
 High Street
 Hob Moor Close
 Holly Road
 Holyhead Road
 Hospital
 Hurst Street
 Kingsfield Road
 Mere Road
 New Meeting Street
 New Street
 Nightclub
 Old Horns Crescent
 Park/Open Space
 Parking Area
 Pedestrian Subway
 Petrol Station
 Police Station
 Powick Road
 Prison
 Queens Drive
 Reddings Lane
 Reservoir Road
 Ridley Street
 Salwarpe Grove
 Shopping Area
 South Road
 Sports/Recreation Area
 Spring Hill
 Station Road
 Stratford Road
 Supermarket
 Taywood Drive
 Temple Row West
 Theatre/Concert Hall
 Tyburn Road
 Vicarage Road
 Victoria Road
 Winsham Grove
 York Road


## Analyse Each Neighbourhood

In [80]:
birmingham_onehot = pd.get_dummies(birmingham_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
birmingham_onehot['Neighborhood'] = birmingham_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [birmingham_onehot.columns[-1]] + list(birmingham_onehot.columns[:-1])
birmingham_onehot = birmingham_onehot[fixed_columns]

birmingham_onehot.head()

Unnamed: 0,Neighborhood,Café,Chinese Restaurant,Climbing Gym,Coffee Shop,Construction & Landscaping,Department Store,Discount Store,Donut Shop,Dumpling Restaurant,...,Restaurant,Sandwich Place,Shopping Plaza,Soccer Stadium,Street Food Gathering,Supermarket,Tapas Restaurant,Theater,Track Stadium,Vietnamese Restaurant
0,A4040,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Albert Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Alcester Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ash Tree Drive,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Aston Webb Boulevard,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [81]:
birmingham_grouped = birmingham_onehot.groupby('Neighborhood').mean().reset_index()
birmingham_grouped

Unnamed: 0,Neighborhood,Café,Chinese Restaurant,Climbing Gym,Coffee Shop,Construction & Landscaping,Department Store,Discount Store,Donut Shop,Dumpling Restaurant,...,Restaurant,Sandwich Place,Shopping Plaza,Soccer Stadium,Street Food Gathering,Supermarket,Tapas Restaurant,Theater,Track Stadium,Vietnamese Restaurant
0,A4040,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Albert Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Alcester Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ash Tree Drive,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Aston Webb Boulevard,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,Tyburn Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
59,Vicarage Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60,Victoria Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
61,Winsham Grove,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Birmingham's Coordinates

In [82]:
address = 'Birmingham, UK'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Birmingham are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Birmingham are 52.4796992, -1.9026911.


## Area grouping

In [83]:
birmingham_grouped = birmingham_onehot.groupby('Neighborhood').mean().reset_index()
birmingham_grouped

Unnamed: 0,Neighborhood,Café,Chinese Restaurant,Climbing Gym,Coffee Shop,Construction & Landscaping,Department Store,Discount Store,Donut Shop,Dumpling Restaurant,...,Restaurant,Sandwich Place,Shopping Plaza,Soccer Stadium,Street Food Gathering,Supermarket,Tapas Restaurant,Theater,Track Stadium,Vietnamese Restaurant
0,A4040,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Albert Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Alcester Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ash Tree Drive,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Aston Webb Boulevard,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,Tyburn Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
59,Vicarage Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60,Victoria Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
61,Winsham Grove,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Get top 5 Venues where the Crimes occur most

In [88]:
num_top_venues = 5

for hood in birmingham_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = birmingham_grouped[birmingham_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- A4040----
            venue  freq
0  Discount Store   1.0
1            Café   0.0
2      Restaurant   0.0
3   Movie Theater   0.0
4       Multiplex   0.0


---- Albert Road----
                       venue  freq
0                  Pet Store   1.0
1                       Café   0.0
2  Middle Eastern Restaurant   0.0
3              Movie Theater   0.0
4                  Multiplex   0.0


---- Alcester Street----
                       venue  freq
0                        Pub   1.0
1                       Café   0.0
2  Middle Eastern Restaurant   0.0
3              Movie Theater   0.0
4                  Multiplex   0.0


---- Ash Tree Drive----
           venue  freq
0    Coffee Shop   1.0
1           Café   0.0
2     Restaurant   0.0
3  Movie Theater   0.0
4      Multiplex   0.0


---- Aston Webb Boulevard----
                       venue  freq
0                       Park   1.0
1                       Café   0.0
2  Middle Eastern Restaurant   0.0
3              Movie Theater   0.0


4                Multiplex   0.0


---- Queens Drive----
                       venue  freq
0        Indie Movie Theater   1.0
1  Middle Eastern Restaurant   0.0
2              Movie Theater   0.0
3                  Multiplex   0.0
4                     Museum   0.0


---- Reddings Lane----
               venue  freq
0  Indian Restaurant   1.0
1               Café   0.0
2                Pub   0.0
3      Movie Theater   0.0
4          Multiplex   0.0


---- Reservoir Road----
            venue  freq
0  Sandwich Place   1.0
1            Café   0.0
2             Pub   0.0
3   Movie Theater   0.0
4       Multiplex   0.0


---- Ridley Street----
           venue  freq
0          Hotel   1.0
1           Café   0.0
2            Pub   0.0
3  Movie Theater   0.0
4      Multiplex   0.0


---- Salwarpe Grove----
                  venue  freq
0  Fast Food Restaurant   1.0
1            Restaurant   0.0
2         Movie Theater   0.0
3             Multiplex   0.0
4                Museum   0.0


---- 

In [90]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Methodology 

In this project I will be finding the areas in Birmingham that have more than 100 confirmed crimes in certain areas. This will then show the most effected areas rather than all 6000 areas. Doing this has brought the total to 63 which is much more manageable. 

In the First step I collected all of the required data for every recorded crime in the West Midlands police force jurisdiction. I then Focused the data on the UK's second largest city: Birmingham. 

Our second step in our analysis we filtered the data and ran it through the foursquare api. This allowed us to see what kind of venues are in the surrounding areas. This areas would be the most effected by the crime. 

For our third and final step, we plotted the areas onto a map so we can see where in the city the most crimes happen. This will give us an understanding of where more police should be posted as its more of an at risk area. This can also act as a guide in order to suggest where would be better to live or work. 

# Analysis

## Display top 10 Venues where the Crimes occur most

In [121]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = birmingham_grouped['Neighborhood']

for ind in np.arange(birmingham_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(birmingham_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,A4040,Discount Store,Vietnamese Restaurant,Fast Food Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop
1,Albert Road,Pet Store,Vietnamese Restaurant,Fast Food Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop
2,Alcester Street,Pub,English Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop,Fast Food Restaurant
3,Ash Tree Drive,Coffee Shop,Vietnamese Restaurant,Fast Food Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop
4,Aston Webb Boulevard,Park,Vietnamese Restaurant,Fast Food Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop
...,...,...,...,...,...,...,...,...,...,...,...
58,Tyburn Road,Mini Golf,Vietnamese Restaurant,Fast Food Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop
59,Vicarage Road,Pub,English Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop,Fast Food Restaurant
60,Victoria Road,Gym Pool,Vietnamese Restaurant,Indian Restaurant,Health & Beauty Service,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop,Fast Food Restaurant
61,Winsham Grove,Coffee Shop,Vietnamese Restaurant,Fast Food Restaurant,Health & Beauty Service,Gym Pool,Gym / Fitness Center,Grocery Store,Go Kart Track,Gastropub,Fish & Chips Shop


# Birmingham Crime Map where theres been more than 100 crimes in an area from January 2019 - December 2019.

In [120]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, label in zip(df1['Latitude'], df1['Longitude'], df1['Location'],  df1['Count']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='#FF0000',
        fill=True,
        fill_color='#FF0000',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Results and Discussion

As we can see from the above data, all of the crimes commited have similar nearby venues. For example if we look at the first 5 areas we see almost all of them have pubs in the list at some point. This could indicate that alcohol could be the cause for some of the crimes.

Also, Fast food restaurants appear quite frequently, we know fast food restaurants are cheap which could suggest that people from lower income households are more likely to go there and therefore and its commonly known that people from lower income households are more likey to commit crimes. The discount store could be another example of this. 

A lot of the areas all have, Fast food Restaurants, Fish and chips restaurants, pubs, Discount stores. These could all be considered to be things you would find in low income areas, where people would be stuggling the most. I beleive this suggests that the higher the quantity of these restaurants and shops the high the crime rate will be, in other words there is a direct correlation between the two.

I would also like to point out that most of areas with the highest crime rate occur in the city centre, where the population is higher and more people are socialising more frequently throughout the day and even night. This not onlt shows that higher the population the higher the crime rate, but also that the higher the population density the higher the crime rate too.

# Conclusion

In conclusion I beleive our data Undoubtedly shows the direct correlation between low income areas and crime rate. Birmingham is an area known for being one of the biggest Industrial cities in the UK, jobs like factory work and mnaual labour are very common. 

As the second largest city in the UK it has a population of 1,137,100 people as of 2017. It is already clear that cities have larger crime rates due to the higher population. My data also proves that Population density is another factor to take into account. 

