# Capstone Project - IBM Data Science by Coursera

## Report: London Crime Rates

---

## Introduction: Business Problem

How do you go about choosing where you want to live? Well, if you are raising a family, then one of the most important considerations may be based on how _safe_ the area is. With that in mind, this report aims to identify, by means of k-mean clustering, the safest borough in _London_ according to the total number of documented crimes. 

This report will then explore the 10 most common venues within each neighbourhood of the safest borough, with the ultimate aim of assisting an individual in the selection of a suitable neighbourhood for raising a family.

---

## Data

This project will extract data from 2 different sources, Kaggle and Wikipedia, while also making use of Foursquare and Google Maps for locational data. 

- __Kaggle:__ London crimes [Data](https://www.kaggle.com/jboysen/london-crime) from 2008 to 2016, detailing the statistics of crime within each borough of London.


- __Wikipedia:__ Additional [Data](https://en.wikipedia.org/wiki/List_of_London_boroughs) related to the different boroughs in London.


- __Foursquare:__ Using to provide locational details of each venue within each neighbourhood.


- __Google Maps:__ Using API geoencoding to obtain neighbourhood coordinates.


---

## Methodology

1) Preprocessing:
- Downloaded, unzipped, and read London Crime (Kaggle) data from 2008 to 2016 as CSV files using Pandas.
- Cleaned and manipulated the data.

2) Scraping:
- Used BeautifulSoup to scrape London borough details from a Wikipedia page.
- Cleaned and manipulated the data.

3) Exploratory Data Analysis: 
- Visualised crime rates in the London boroughs to idenity which is most safe.
- Extracted the neighbourhoods within the safest borough to determine the 10 most common venues in each neighbourhood.

4) Modelling: 
- Used K-means to cluster similar neighbourhoods, clustering 15 neighbourhoods into 5 clusters.
- Used folium to portray the clusters on a map.

Analysis:
- Created tables so that people can shortlist the area of their interests based on the nearest venues around each neighbourhood.

---

## Results

![image.png](attachment:image.png)

---

![image.png](attachment:image.png)

- 'Kingston upon Thames' is the __safest__ borough in London according to the 2016 data portraying lowest number of crimes.

---

![image.png](attachment:image.png)

- Theft is the most common crime in 'Kingston upon Thames', which comes by no surprise since theft is very prominant in London.


- High count of violence is alarming.


- Clearly London is no safe place, but the data suggests that this borough is a lot safer than many others in London.

---

#### Map of Clusters:

![image.png](attachment:image.png)

---

#### Cluster 1:

![image.png](attachment:image.png)

- This cluster is best for people that want to be near a __Grocery Store__.

---

#### Cluster 2:

![image.png](attachment:image.png)

- This cluster is best for people that want to be near a __Health & Beauty Service__.

---

#### Cluster 3:

![image.png](attachment:image.png)

- This cluster is best for people that want to be near a __Train Station__.

---

#### Cluster 4:

![image.png](attachment:image.png)

- This cluster is best for people that want to be near __Indian Restaurants__ and __Coffee Shops__.

---

#### Cluster 5:

![image.png](attachment:image.png)

- This cluster is best for people that want to be near a __Gym__.

---

## Discussion

The results clearly indicate that a person seeking to raise a family in London would likely be interested in living within the borough 'Kingston upon Thames', due to the most recent data (2016) suggesting that the crime rates are lower, and therefore more safe. However, with high theft and violence, there are much safer places to raise a family outside of London - might I suggest Cambridge? It is a good comprimise, and short commute from London.

So, you want to live in __'Kingston upon Thames'__, but where abouts within the borough? Here is a summary to help you choose:

- *Cluster 1*: Most common venue: __Grocery stores__. These are very common in London, and therefore you may want to prioritise other venues.


- *Cluster 2*: Most common venue: __Health & Beauty Service__. These are not so common a venue, and therefore this cluster would suit you in you love getting your hair and nails done while wearing a mud mask!


- *Cluster 3*: Most common venue: __Train Station__. In my opinion, this is where you want to be. A nearby train station can open a lot of doors in London, but watch out, the house prices are steeper!


- *Cluster 4*: Most common venue: __Indian Restaurants__ and __Coffee Shops__. Are you a foodie, excellent! This cluster is a good mix for social life.


- *Cluster 5*: Most common venue: __Gym__. If you looking to lift weights, it may well be worth picking this cluster. The closer the gym, the weaker the excuses. P.S. If you are running on a treadmill, I suggest you run outside instead. It's free, and much more fun!



---

## Conclusion

'Kingston upon Thames' has the lowest crime rate over any other borough in London, and therefore it is likely the best place to raise a London-based family.

Theft and violent crime are alarmingly high, and so this must be considered. A more extensive comparion of data would provide greater confidence in the obtained results, but this report has been sucessful in establishing indicators that aid in making a better decision about where to live within London.

There are a wide variety of venues in London, which is emphasised by the fact that each of the 5 clusters reported different most common venues. More cross-referencing of data would improve the reliablity of results.