# **COURSERA CAPSTONE PROJECT – THE BATTLE OF NEIGHBORHOODS**
## **MOSCOW FLATS**

### **INTRODUCTION**

##### This project will help people who want to buy a one- or two-room flat in Moscow to find the district which will match their personal needs and lifestyle. From this study you can see:

- Which districts have lower flat prices and what is an average price per square meter in each district
- What venues are most popular in each district, which can help you choose the district that matches preferences
- You can see which other districts in Moscow have the same infrastructure as your preferable one, so you can widen your search and find truly the best option

### **DATA**

The data on the flats' sizes, prices, number of rooms and addresses was collected by scrapping one of the most popular local websites with apartment listings - CIAN. Using Yandex API, the coordinates for each flat were found (as the addresses on the local apartments website are in Russian, Yandex gave more accurate results than geopy). Then, the data on the closest venues (parks, cafes, hotels, gyms, supermarkets, etc.) for each district was collected using the Foursquare API. 

### **METHODOLOGY**

As a first step, I scraped the data (price, number of rooms, size and address) from local apartment listings website CIAN for the one- and two-room flats in Moscow using the BeautifulSoup package and put the data into the below Pandas data frame:

![image.PNG](attachment:image.png)

I have also added a calculated price per square meter column to compare prices across districts:

![image.png](attachment:image.png)

After removing all outliers, which you can see on the plot below, we got a final data set of 1178 flats. 

![image.png](attachment:image.png)

From the boxplot below you can see how the price per square meter is distributed among Moscow regions. The first one is the central part of Moscow. Troytsky region has the lowest price per square meter - it is a part of 'New Moscow' and was united to Moscow only a couple of years ago.

![image.png](attachment:image.png)

Using the Yandex API I found the coordinates for each flat and each district of the flats from the analyzed dataset:

![image.png](attachment:image.png)

We can use the Follium Map to see the flats from the final dataset on the Moscow map:

![image.png](attachment:image.png)

Using the Choropleth Map we can also see the average price per square meter for each district. 

![image.png](attachment:image.png)

Having the district coordinates, we can find most popular venues for each district using the Foursquare API:

![image.png](attachment:image.png)

Finally, with all the collected data we can run k-means clustering to cluster the districts. There were 11 clusters generated, as it appears to be the best k value in this case. Due to the highly diversified locations, 7 clusters have only one district inside. For example, one district with a Zoo was identified as a separate cluster (red dot on the map below) or another one has one of the most popular food markets inside, so it was also allocated to a separate cluster (green dot). The most interesting for analysis are the 4 clusters that have many districts, so let's analyze each of the clusters. 

![image.png](attachment:image.png)

The first cluster is represented by blue dots on the map above. The most popular venues there are parks. By analyzing the data, we can see that these are the most 'green' districts with plenty of places for playing sports and buying healthy groceries. If you are looking for the most quiet and natural place to live in Moscow, I would choose one of the districts from the 'blue' cluster:

![image.png](attachment:image.png)

The second cluster, where dots are colored in brown, looks more urban than the previous cluster. It still has a lot of parks, but there are also a lot of other places like a variety of eating and coffee shops. These districts are also perfect for doing sports, but there you will feel a more 'in town' vibe with plenty of facilities. 

![image.png](attachment:image.png)

The next cluster with pink dots mostly consists of districts located quite far from the town center - many of them are close to airports and have 'Trail' as the most common venue, which means that there are quite a lot of forest areas inside the districts. These places are good for those who are not keen on being close to the center of Moscow (for example do not have to commute to the center every day), who have jobs in such remote areas as airports or who just want to live in more ‘natural’ places, but still near the town.

![image.png](attachment:image.png)

The last cluster, labeled with light blue dots on the map, represents the central part of Moscow with a great variety of cafes, restaurants, hotels (yoga studio is also quite a popular venue there) and the most vibrant life. If you want to always feel 'in the center', this is most likely the best (but, obviously, it is the most expensive) place to live. 

![image.png](attachment:image.png)

### **RESULTS**

There are four main clusters in Moscow identified during the research, each of them reflects different needs to accompany various lifestyles. The Blue cluster, as discussed above, will be mostly suitable for people who love parks, nature, need a lot of sport facilities, but still want to be closer to the center and don’t have a demand for such venues as cafes, restaurants or just more modern infrastructure near their home. Brown cluster is more ‘equipped’ for the town comparing to the Blue one and also contains a lot of sport facilities, but at the same time you can find a wider range of venues in these districts. The Pink cluster is mostly quite far from the center with more forest areas. You will probably need a car to live there and comfortably reach all the necessary venues in the district. The Light Blue cluster is the town center with its usual pros and cons for such places: high diversity of nearby venues, but high prices and almost no ‘green’ areas. 

You can see all clusters described above on the final visualization together with the average prices per square meter in Moscow. We can see that there are a lot of districts which fall under one cluster but have different prices per square meter. Using this map, you can find the districts with the infrastructure type you want and find the similar ones, but in a different price range. 

![image.png](attachment:image.png)

### **DISCUSSION AND CONCLUSION**

Using the flats data parsed from CIAN website, geodata and Foursquare venues data, we looked at the flat price ranges across different districts in Moscow, their top venues and clustered the data according to the most popular venues. This allowed us to reveal the most suitable areas to live for different lifestyles and needs. Of course, as mentioned earlier, this study does not provide an absolutely full picture to make a decision on buying a flat in the certain location: you must also consider such factors as condition of the house, ecological situation, transportation system, maybe even crime level of a certain district and a lot more. This study, however, helps to get the first overall impression of each district and what you can expect in terms of price and facilities if you decide to buy your flat there. 

### **REFERENCES**

1.	https://www.cian.ru/  –  flats data
2.	https://tech.yandex.com/maps/ – flats coordinates
3.	https://gis-lab.info/qa/moscow-atd.html – districts geodata
4.	https://github.com/FUlyankin/Parsers/blob/master/sems/2_CIAN/2.1%20CIAN_parser.ipynb – CIAN parsing recommendations 
5.	https://github.com/kosticn/The-Battle-of-Neighborhoods/blob/master/The%20Battle%20of%20Neighborhoods.ipynb – scripting tips and ideas for analysis
6.	https://developer.foursquare.com/docs/ – venues data 
