# Battle of Neighborhoods: Best Places to relocate in Mecklenburg County

## Introduction

When someone or a family is trying to find the best places to live, it's always a good idea to compare the cities and, if possible, compare the neighborhoods to see if they suit your taste. After all, when you're going to buy a car or a house or a big-ticket item, you 're usually going to try out a couple of models or visit a couple of homes before you decide. The same methods apply to finding the right places to stay. It's also best to do so before you start planning your shift, just to further narrow down your choices. 
When you think about the best place to live, a lot of things are considered when you try to make a comparison between cities , towns, or neighborhoods. Some of these include:

- **Overall Comparison:** This is a comparison of the same factors for each city, resulting in a general overview of the two cities. Some popular factors include population, cost of living, average rent, crime rate, tax rate and air quality.

- **Crime Rates:** Here, the distinction is made in order to learn the crime rates in two towns, and then to calculate them against the national figures.

- **Cost of Living and Salary Comparison:** This requires a measure of wages and living conditions inside cities for a decision to be made. Any of the reasons for this distinction include data on food, lodging, services , transport and more. This is a useful way to find out if your salary is going to be measured in the new city.

- **Compare Schools:** This is useful in identifying the right school in the area by making a comparison between various places. It mostly takes into account test scores and teacher and student ratios, including the teaching experience of the least schools in the city of your choice.

- **Neighborhood Comparison:** This looks at the neighborhood comparison and helps one choose the best place to live in any given city. These sites allow you to see some interesting facts about the different communities.


The data set includes the city / neighbourhood coordinates in the USA. However, it does not include locations within these locations. If we had information about the location, we could easily find out more information about the neighborhoods. For example, how many restaurants there are 
Parks or theaters? So what about the banks so glossary stores? If all this knowledge is understood, we may better consider or make an educated decision as to whether to go or transfer. The goal of this project is therefore, algorithmically, to find a way to use position coordinates and assign each data point to a community in Mecklenburg County. The algorithm used is the k-means of clustering. The main idea is to identify a neighborhood with locations clustered around each other so that one can choose the right neighborhood based on the proximity of amenities and venues to each other.

## Data

### Background Data

The dataset for this project consists of information on cities in the USA gathered from https:/simplemaps.com/data/us-cities. Specifically, the data includes: City Name, County Code, County Name, Population, I d, Latitude , Longitude, Source, State Address, State Name, and Time Zone. The table was used to geocode the data to determine the right coordinates. The data was then exported and translated into a file, read into a pandas data container, and transformed into a file. 
Mecklenburg sliced data for use in the project. In addition to these results, the Foursquare API was used to gather positions near communities for cluster analysis to be conducted on the results.

## Methodology

### Exploratory Data Analysis

Exploratory analysis was performed by examining tables and plots of the downloaded data. This was used to:
- Segment the Mecklenburg County Data in North Carolina.
- Identify missing values, verify the quality of the data.
- determine likely approaches to modelling, which might best yield to good clustering.

### Filtering and Visualizing the data

The thorough collection of variables in available data is an important aspect in cluster modeling. A requirement for the analysis is that the four-square API is used to gather information about the venue. It is therefore very important that the data set for this work includes the coordinates of the cities to be studied. Subjects used in the study data include: Community Name, County Name, Population, Latitude , Longitude, and State Name.

Folium was used to display the cut data for both counties. One may wonder what folium is. Folium is a powerful python library that builds on the data wrangling power of the python ecosystem and the mapping power of the Leaflet.js library. Generally, the data is being manipulated in Python, and then visualized. 
On the map of the Leaflet via Folium. As a result, to represent the data in folio, the position coordinate in Wake County was collected and then looped around the remainder of the communities and plotted to show the location on the map. This was also done for data from Mecklenburg County.

### Neighborhood Exploration and Cluster - Mecklenburg County

The Foursquare API was used for neighborhood exploration. The get question was deployed on the Foursquare Api to get the group categories of locations to restrict the number of locations to 100 within a 500 radius. Because the aim of the project is to determine the cluster of locations in the neighborhoods, one-hot encoding was performed on the location categories to get dummies for each location. In other words, the venues were coded into 0s and 1s. The result was then grouped by neighborhood, taking the mean frequency of occurrence of each category.

### Cluster of Neighborhoods in Mecklenburg County

The k-means cluster was used for the clustering of place groups in the communities. Cluster the area into four clusters. The K-means clustering machine learning algorithm is an unsupervised clustering technique that searches for a predetermined number of clusters within an unmarked multidimensional dataset. It is done by a simple definition of what the ideal clustering feels like:

- The cluster center is the arithmetic mean of all the points belonging to the cluster.
- Each point is closer to its own cluster center than the other cluster centers in the dataset.

The two assumptions above are probably the basis of the K-means model.
In order to be able to produce clusters and visualize them on a map, the sliced Mecklenburg county data were merged with the pooled location data. This has been done so that the coordinates of the sliced data can help to visualize the clusters on the map.

## Result

The .json source data contained a total of 36,651 rows and 11 columns. The sliced data for the  county came out with 10 rows and 6 columns for Mecklenburg County, as shown below. This makes it easy for the data to be analyzed easily.

|   | index | Neighborhood  | County      | Density | Latitude | Longitude | State |
|---|-------|---------------|-------------|---------|----------|-----------|-------|
| 0 | 7182  | Paw Creek     | Mecklenburg | 533.0   | 35.2749  | -80.9384  | NC    |
| 1 | 7183  | Hickory Grove | Mecklenburg | 992.4   | 35.2288  | -80.7206  | NC    |
| 2 | 7184  | Derita        | Mecklenburg | 1123.7  | 35.2938  | -80.7976  | NC    |
| 3 | 32556 | Pineville     | Mecklenburg | 500.0   | 35.0864  | -80.8915  | NC    |
| 4 | 32557 | Davidson      | Mecklenburg | 835.0   | 35.4861  | -80.8272  | NC    |
| 5 | 32558 | Mint Hill     | Mecklenburg | 424.0   | 35.1781  | -80.6538  | NC    |
| 6 | 32559 | Cornelius     | Mecklenburg | 951.0   | 35.4733  | -80.8833  | NC    |
| 7 | 32560 | Matthews      | Mecklenburg | 710.0   | 35.1196  | -80.7101  | NC    |
| 8 | 32561 | Huntersville  | Mecklenburg | 530.0   | 35.4055  | -80.8741  | NC    |
| 9 | 32562 | Charlotte     | Mecklenburg | 1065.0  | 35.2080  | -80.8308  | NC    |

Using the Foursquare API, the community site of Mecklenburg county culminated in a significant outcomes. The range for the position returned positions of 71 rows and 7 columns for Mecklenburg County. One-hot encoding produced a total of 71 rows, 53 rows and columns for Mecklenburg county, respectively. Tables below demonstrate the outcomes of the top three positions in each community for Mecklenburg County for the area.

|   | Neighborhood  | 1st Most Common Venue      | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue |
|---|---------------|----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|------------------------|
| 0 | Charlotte     | Pizza Place                | Chinese Restaurant    | Park                  | Fast Food Restaurant  | Sandwich Place        | Convenience Store     | American Restaurant   | Italian Restaurant    | Bakery                | Furniture / Home Store |
| 1 | Cornelius     | American Restaurant        | Athletics & Sports    | Pet Store             | Mexican Restaurant    | Sports Bar            | Grocery Store         | Donut Shop            | Diner                 | Cosmetics Shop        | Deli / Bodega          |
| 2 | Davidson      | Construction & Landscaping | Cosmetics Shop        | Women's Store         | Deli / Bodega         | Department Store      | Dessert Shop          | Diner                 | Discount Store        | Donut Shop            | Dry Cleaner            |
| 3 | Derita        | Sandwich Place             | Home Service          | Chinese Restaurant    | Video Store           | Supermarket           | Bank                  | Fried Chicken Joint   | Donut Shop            | Scenic Lookout        | Pharmacy               |
| 4 | Hickory Grove | Convenience Store          | Basketball Court      | Dry Cleaner           | Cosmetics Shop        | Deli / Bodega         | Department Store      | Dessert Shop          | Diner                 | Discount Store        | Donut Shop             |


Since the purpose of the project is to cluster the neighborhoods, the k-means algorithm is applied to the one-hot encoded location dataset, assuming that there are 4 different clusters. The tables below show the neighborhood and cluster codes given to it after the k-means algorithm has been used. The label of the cluster   '0' is the 1st cluster and '2' is the 3rd cluster. This series of plots shows the data for each pair of variables with different clusters shown with different cluster symbols on the maps below.

| Neighborhood | County        | Density      | Latitude    | Longitude | State    | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue      | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue |               |
|--------------|---------------|--------------|-------------|-----------|----------|----------------|-----------------------|----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|------------------------|---------------|
| 7182         | Paw Creek     | Mecklenburg  | 533.0       | 35.2749   | -80.9384 | NC             | 0                     | Discount Store             | Convenience Store     | Pizza Place           | Restaurant            | Coffee Shop           | Cosmetics Shop        | Deli / Bodega         | Department Store      | Dessert Shop           | Diner         |
| 7183         | Hickory Grove | Mecklenburg  | 992.4       | 35.2288   | -80.7206 | NC             | 2                     | Convenience Store          | Basketball Court      | Dry Cleaner           | Cosmetics Shop        | Deli / Bodega         | Department Store      | Dessert Shop          | Diner                 | Discount Store         | Donut Shop    |
| 7184         | Derita        | Mecklenburg  | 1123.7      | 35.2938   | -80.7976 | NC             | 0                     | Sandwich Place             | Home Service          | Chinese Restaurant    | Video Store           | Supermarket           | Bank                  | Fried Chicken Joint   | Donut Shop            | Scenic Lookout         | Pharmacy      |
| 32556        | Pineville     | Mecklenburg  | 500.0       | 35.0864   | -80.8915 | NC             | 0                     | Golf Course                | Indian Restaurant     | Motorcycle Shop       | Deli / Bodega         | Electronics Store     | Mexican Restaurant    | Chinese Restaurant    | Coffee Shop           | Convenience Store      | Grocery Store |
| 32557        | Davidson      | Mecklenburg  | 835.0       | 35.4861   | -80.8272 | NC             | 1                     | Construction & Landscaping | Cosmetics Shop        | Women's Store         | Deli / Bodega         | Department Store      | Dessert Shop          | Diner                 | Discount Store        | Donut Shop             | Dry Cleaner   |

![image.png](attachment:image.png)

Examining each cluster for the various communities in the counties studied, it was found that some of the various types of areas were separated from each cluster. Based on the classification groups, the following names have been assigned to each cluster. Since 10 common locations were defined in this work, the assigned names were based only on 2 common locations for ease of naming assignment.

## Discussion

Apparently, there are a lot of neighborhoods in the red cluster in Mecklenburg County. Looking at the red cluster for Mecklenburg County, it is clear that the first two most common venues in the neighborhoods have a lot of mixed amenities: Fast Food Restaurant, Pizza Place, Park, Basketball Court, Gas Station, Health & Beauty Service, Dance Studio, American Restaurant, Pharmacy, and SPA.

However, the decision is left to the individual looking to relocate. But in general, although all these analyzes are useful, there is nothing like visiting the city, seeing the neighborhoods, and talking to the residents. If possible, an in-person visit is highly recommended before making a big one.

## Conclusion

The aim of this work is to provide the necessary facilities to help people decide the best way to live or move if they think about it. Using online datasets from the internet, I was able to tackle a few variables by evaluating communities within the county of North Carolina.
Mecklenburg, based on the spatial distribution of the locations in the chosen neighbourhoods. My analysis has shown that it uses a folio-python library to help build quick interactive data.
Visualization and the Foursquare API for community data processing, it is possible to cluster city data on the basis of proven and agreed machine learning techniques – K-Means Algorithm. These tests must be considered constrained by the nature of the data set used, since there is no information provided about the origins of the data set. Such results will be of interest to people or citizens whose aim is to compare different neighborhoods when thinking about relocation or vacationing in a different environment , given the ease of access to a number of locations within a clustered setting.

There is definitely plenty of room for improvement. For example, to collect more than the existing community positions to evaluate and cluster a wide variety of geographical locations. We may also use and evaluate crime data – which is freely accessible to all counties – to help provide enough space for decision-making about the option of place to move. This knowledge can be particularly helpful because we definitely may not choose to stay in a crime-ridden community. Although the approach used here may not be vigorous enough, it nevertheless shows the usefulness of a neighborhood data analysis.

