# Home Away From Home: 
### An Analysis of Potential Cities to Call Home 
##### From the Perspective of the Author, Alan Lee

**Alan Lee**

### Introduction

I, Alan Lee, have lived in the city of Atlanta for more than 25 years. While I was not born a native son of Atlanta, it bore me as its child and showed me what wonders there are to enjoy in life. It has been more than just my home: it has been my mentor for life's experiences, my foster parent, my dearest friend. 

Atlanta has much to pursue within its boundaries: it has a rich and colorful history as its foundation, growing into a multicultural mélange that today welcomes visitors and potential homeowners from across the globe into its travel hub, the busiest airport in the world, Hartsfield-Jackson International; it has internationally recognized colleges and universities that rival some of the best in the nation in medicine, engineering, and business; and it is quickly evolving into a new hub for small businesses and Fortune 500 headquarters alike, thanks in part to new business-friendly legislation. Indeed, it is a fair and beautiful city, full of industrious and lively individuals.

But perhaps it is time to say goodbye to its temperate climes and friendly faces. Perhaps it is time to bid adieu to this city I have for so long called home. 

Now, as a professional in the software industry, I wish to see new cities. I wish to discover new subcultures, new ways of life. I wish to experience more of life and, in a way, the welcoming arms of Atlanta can often feel like the all-too-comforting embrace of a beloved parent. I am too comfortable here.

Please consider then the below analysis, dear reader, through my perspective, through the lens of someone seeking a new home not too unfamiliar and yet still bold and fresh and exciting. Please consider the below analysis as a personal exploration of multiple settings both near and abroad. 

### Background

But what cities to explore? There are so many with different histories and backgrounds and cultures; such varied locales greet me with a variety of factors to measure and interpret. To best determine which cities to dissect requires me to expound on what I'm most seeking in a new home:

* Having come from a fairly liberal background, I would like my prospective home to lean left politically. 
* With my current salary, I can budget a modest $1600 per month for mortgage payments
* Since Atlanta is often rated among the worst cities for commuting, I would prefer a city with less than 45 minutes of travel time per trip.

Let us use these criteria to generate a list of cities that have potential.

##### Political Leanings
---

![alt text](http://www.270towin.com/2018-house-election/xpq9bEp.png "2018 House Election Map by District")

---
The above map shows how voters in each district leaned politically. Districts colored a deeper hue of blue voted more liberally, whereas those colored a deeper hue of red leaned more conservatively. Atlanta, surprisingly, is split longitudinally, with its northern, more affluent districts voting red while its southern, less affluent districts voting blue.

With an eye towards more blue districts, we can see that there are significant bastions of liberal voters along the West coast and the Northeast stretching from Delaware to New Hampshire and Vermont. There are other pockets in the Midwest, dotted in other states, and in Hawaii, but we will focus on the aforementioned regions for city seeking.

With that in mind, we generate the following list of cities:

* San Diego, CA
* Los Angeles, CA
* San Francisco, CA
* Portland, OR
* Olympia, WA
* Annapolis, MD
* Dover, DE
* Hartford, CT
* New York City, NY
* Providence, RI
* Boston, MA
* Montpelier, VT
* Concord, NH

Let us next compare this list to a map of average monthly mortgage payments.

##### Mortgage Budget
---

![alt text](https://cdn.howmuch.net/articles/americas-mortgage-map-2019-3267.png "2019 Monthly Mortgage Payments by City")

---
The above map shows the average monthly mortgage payments for some cities across the United States in 2019. While we don't have exact numbers for every city on our list, we can make some educated guesses based on the map. For example, unless I marry into a rich family, strike it rich playing the lottery, or significantly lower my standard of living, I will be unlikely to afford even an average house in most any city in California. I can also confirm that New York City is unlikely to fit my lifestyle or standard of living, so that can be safely removed from the list.

For fair comparison, I can confirm that Atlanta has an average monthly mortgage payment of around $1,200-1,300.

After some comparison, we can pare down the list to something like this:

* Annapolis, MD
* Dover, DE
* Hartford, CT
* Providence, RI
* Montpelier, VT
* Concord, NH

Lastly, let us compare this list to an infographic listing the average (one-way) commute times for each state in the United States.

##### Average Commute Time
-----

![alt text](http://2oqz471sa19h3vbwa53m33yj-wpengine.netdna-ssl.com/wp-content/uploads/2018/04/average-commute-to-work-by-state-city.jpg "2018 Average Commute Times by City")

------
The above infographic shows the average commute time to work for each state. Georgia as a whole consistently is rated near the top of this list, but still states like Maryland can exceed even Georgia's state average. 

Using the infographic to trim the list of cities whose states' average commute times exceed Georgia's average commute time, we are left with the following cities:

* Dover, DE
* Hartford, CT
* Providence, RI
* Montpelier, VT

It is interesting to note that the cities that remain on the list seem to be clustered around the snowy Northeast rather than being more uniformly spread out across the country. Perhaps the Northeast region as a whole might have some hidden gems not seen on this list?

For the sake of this report, let us compare and contrast **Dover, DE** with **Hartford, CT** with a basis in **Atlanta, GA**.

### Data

To analyze the neighborhoods of each city, we will primarily be using Foursquare location data. This is the location data as provided by the Foursquare API: it deals with the venues in a region as determined by latitude and longitude and will be primarily used to understand the changes in a community from a venue perspective during the process of gentrification.

An example of Foursquare location data looks like such:

```json
{
  "meta": {
    "code": 200,
    "requestId": "5ac51d7e6a607143d811cecb"
  },
  "response": {
    "venues": [
      {
        "id": "5642aef9498e51025cf4a7a5",
        "name": "Mr. Purple",
        "location": {
          "address": "180 Orchard St",
          "crossStreet": "btwn Houston & Stanton St",
          "lat": 40.72173744277209,
          "lng": -73.98800687282996,
          "labeledLatLngs": [
            {
              "label": "display",
              "lat": 40.72173744277209,
              "lng": -73.98800687282996
            }
          ],
          "distance": 8,
          "postalCode": "10002",
          "cc": "US",
          "city": "New York",
          "state": "NY",
          "country": "United States",
          "formattedAddress": [
            "180 Orchard St (btwn Houston & Stanton St)",
            "New York, NY 10002",
            "United States"
          ]
        },
        "categories": [
          {
            "id": "4bf58dd8d48988d1d5941735",
            "name": "Hotel Bar",
            "pluralName": "Hotel Bars",
            "shortName": "Hotel Bar",
            "icon": {
              "prefix": "https://ss3.4sqi.net/img/categories_v2/travel/hotel_bar_",
              "suffix": ".png"
            },
            "primary": true
          }
        ],
        "venuePage": {
          "id": "150747252"
        }
      }
    ]
  }
}
```

This JSON object response contains an array of venues located within a set distance from the user, as determined by the latitude, longitude, and search distance in meters found in the request query parameters. Within each venue object in the array is a number of attributes that describe the venue, such as its name, address, exact latitudinal and longitudinal coordinates, and designated category of venue.

In addition to Foursquare location data, I also used data from https://public.opendatasoft.com/ to determine which zipcodes constitute part of each city. This, used in conjunction with the Foursquare location data, allowed me to analyze each city specifically and with granularity.

### Methodology

I initially looked at the datasets for each city and noticed that there was a stark difference in the size of datasets when comparing Atlanta, GA and Hartford, CT versus Dover, DE. Dover is a significantly smaller city than both Atlanta and Hartford, and thus I predicted that Dover would have far fewer venues as a whole than either Atlanta or Hartford, though I did not know how this would impact the statistical analysis of the cities. I determined that, with all being major cities in their respective states, each should have roughly the same *density of venues per neighborhood* and thus should have no impact on the statistical analysis. 

To confirm this, I collected the total dataset of venues for each city from Foursquare. I then grouped each dataset individually by zipcode and aggregated those values into a mean value of venues per zipcode. The results were as such:

| City          | Venue Density (per zipcode)      |
| ------------- |:--------------------------------:|
| Atlanta       | 17.518519                        | 
| Dover         | 12.666667                        |  
| Hartford      | 8.770833                         | 

I was very surprised to find that Dover, despite it being the smallest of the three cities by far, actually had a greater venue density per neighborhood than Hartford. Ultimately, since the comparison is between Dover and Hartford, I felt I could ignore the high venue density of Atlanta, which made the gulf in comparison of venue density between Dover and Hartford a bit less expansive. I decided to carry on with the analysis and determine after the fact if the difference in venue density would account for any observations in the results.

During the analysis, I discovered that the Foursquare location data did not return back the same types of venues each time; for example, I found that Atlanta's location data did not return back a "Art Museum" column (despite the fact that there are a handful of art museums in Atlanta) while both Dover and Hartford did. I determined that Foursquare location data was only returning back one location category per location, even though some locations could have multiple qualifiers. Continuing with the aforementioned Art Museum example, Atlanta's High Museum of Art was being encoded as a **"Museum"** rather than an **"Art Museum"**. Ultimately there is little that I could have done to fix this data, as it comes only from personal experience and is to be noted on Foursquare's side. To accommodate the missing data, however, I collated each city's dataset into a single dataset and filled in any missing data with zeroes to accommodate the one-hot pattern.

Since there are a sizeable number of factors that go into determining the similiarity of two cities, I decided to use K-means clustering to compare Atlanta to Hartford and Dover. My understanding was that, with using a k-value of 2 for two clusters, the clustering would eventually align Atlanta with one of the two cities. To align the data better to be used with the K-means clustering algorithm, I determined that I needed to convert the data into a one-hot pattern. To do this for each city as a whole, I determined that I could drop the Zipcode data and instead encode each city's location by adding three columns to each city's venue dataset, one for each city. That way, the clustering algorithm would be able to tell me if Atlanta *as a whole* was more like Dover or Hartford, rather than determining this comparison for *each individual zipcode*.

### Results

The K-means clustering algorithm returned the following results, which I filtered to include only the cities by cluster:

| Cluster         | Atlanta | Dover | Hartford |
| --------------- |:-------:|:-----:|:--------:|
| 0               | 0.9     | 0.1   | 0.0      |
| 1               | 0.0     | 0.0   | 1.0      |

As you can see, this suggests that Atlanta, GA is more similar to Dover, DE than Hartford, CT. 

Below, I have created a choropleth map using Folium to show the clusters by color:

![Choropleth](https://i.imgur.com/dOVd4GD.jpg)

### Discussion

I was surprised to find that Atlanta, GA is more similar to Dover, DE than Hartford, CT. To my naive first impressions, Hartford and Dover are very similar in that they are both located in the predominantly liberal Northeast and have similar climates. Comparing Atlanta, a warmer and more conservative city, to Hartford and Dover seemed like a difficult task, since I expected there would be more differences between Atlanta and the Northeast than I could find between Hartford and Dover. And yet, it was to my delight to find that data science can even the finest details and draw conclusions from them.

Since Hartford and Atlanta were of similar sizes, I naively thought that the two would share more similarities in their neighborhoods. But as I dig through the data manually, I see that the location data does contain some small amount of economic data encoded into it; borrowing terms from economics, zipcodes with higher numbers of what I call "luxury locations" like museums, art galleries, and coffee shops tend to show more gentrified locations, locations with more affluence, though sometimes at the cost of racial diversity; locations with higher numbers of "normal locations" like pawn shops, gun stores, and liquor stores tend to be less gentrified and less affluent.

I would like to mention again that I discovered that the Foursquare location data did not return back the same types of venues each time; for example, I found that Atlanta's location data did not return back a "Art Museum" column (despite the fact that there are a handful of art museums in Atlanta) while both Dover and Hartford did. I determined that Foursquare location data was only returning back one location category per location, even though some locations could have multiple qualifiers. Continuing with the aforementioned Art Museum example, Atlanta's High Museum of Art was being encoded as a **"Museum"** rather than an **"Art Museum"**. Ultimately there is little that I could have done to fix this data, as it comes only from personal experience and is to be noted on Foursquare's side. In the future, I would like to use a more refined location dataset so that I can apply multiple location categories to each venue to better understand the layout of a neighborhood.

In the future, I would try to refine this process by drawing in further datasets that include economic and geopolitical data for each city. It was tragically quite difficult to collect good data and I would imagine I could have a more refined understanding of the differences and similarities amongst the three cities if I could include different types of data that could affect my impressions of each city, like the aforementioned economic data, instead of relying on the infographics that I displayed early in the report.

### Conclusion

Atlanta, GA has been a beautiful city to me; it has been like a third parent: it has raised me, taught me how to approach life, how to make friends, it has given me my first steps through life and put me on my career path. And it is still growing: businesses and movie studios alike are flocking here, bringing with them more jobs, more people, more economic growth. I doubt many could fault me for wanting to plant my roots here and settle down

And yet, I have wanderlust. Whether it's the folly of my youth or the spirit of adventure that stirs within me, I seek a new city, with new names, new faces, new stories to learn and to tell. This analysis was just the first step of many to better understand the cities that call to me. This is more than just a report or a project; this is the first step on my journey to find my home away from home.