# Introduction/Business Problem

ABC Breweries has multiple locations on the East Coast. Around each of these brewery locations, a main source of revenue for ABC is selling their beer to local bars in the near vicinity of the brewery. 

ABC is planning on starting their expansion across the country by adding 1 more location outside of the East Coast. They have done a variety of market research, and have narrowed their options down to Dallas, TX and Boise, ID. 

The executives of ABC Breweries would like to learn more about how important the "bar scene" is to each of these cities. They want to make sure to pick a city where an adequate portion of the most popular destinations are bars. This will help them be confident that the bars that they are selling their beer to will continue to thrive (and continue to purchase their beer), due to being in a generally favorable environment for bar owners.

They've also found that they generally sell more beer to the bars that are closest in proximity to their East Coast locations, so whichever city they choose, they want to make sure to build their brewery in a location that is reasonably close to some of the most popular bars.

As data scientists, our job is to help the stakeholders (ABC's executives) use data to compare whether Dallas or Boise may have an environment that is more favorable for bars, and therefore for ABC's continued revenue growth. We also want to show them where some of the most popular bars are, so that they can best choose a specific location to have their brewery.

# Data

To solve this problem for the stakeholders, we will be relying on Foursquare location data and other geospatial data. We will need to make separate calls to the Foursquare API: one set of calls for Dallas's data, and another set of calls for Boise's data. 

We will be using Foursquare's "Explore" endpoint to find 100 recommended venues in the vicinity of Dallas, and will do the same for Boise.

This data will need to be cleaned and formatted appropriately in a more digestable dataframe in order to better interpret it and eventually report back to the executives.

For each city, we will hone in on the Categories of the 100 recommended venues (e.g. Bar, Hotel, Park, etc).

We will want to filter the Dallas and Boise dataframes down to only show the venues with a Category including the word "bar". We want to make sure this includes "Hotel Bars", "Dive Bars", and so on, in addition to strictly "Bars".

These filtered results, compared to the original list of 100 recommended venues, will show us the proportion of recommended venues in the city that are "bars" of some sort.

We can then compare the proportion in Dallas vs the proportion in Boise and report back to the stakeholders with the city that seems to have a more favorable environment for bars.

Finally, for each of the cities, we'd like to visualize our filtered results by plotting each of the top recommended bars onto a map, so that the stakeholders can determine the ideal spot to have the newest location of ABC Breweries.

# Methodology

The data requirements above call for a number of installations and library imports:

- requests
- pandas
- numpy
- random
- geopy
- json_normalize
- Image
- HTML
- folium

We then need to initialize a geolocator object centered on the latitude/longitude coordinates of Dallas, TX. 

Following this, we need to define our Foursquare credentials as a set of variables, along with "radius" and "LIMIT" variables, so that we can make calls to the Foursquare API.

Along with the credentials, a URL must be defined using the "explore" endpoint and our credentials. The "explore" endpoint will allow us to see 100 recommended venues in each respective city, data sets that we will then manipulate to better understand what portion bars are in these recommended venues.

We will receive the results in a lengthy JSON file, with much information that is irrelevant to our purposes. We will handle this by dropping all information except the items of interest to us. This will be done by creating a "filtered columns" variable and applying it to the JSON data that has been normalized into a dataframe.

We will be left with a dataframe with:

- venue ID
- venue name
- venue location address
- venue location latitude
- venue location longitude
- venue location city
- venue category

More cleanup will need to be done, as the venue category output from the Foursquare API is not provided in an easily readable format. We can fix this by defining a function to extract the relevant information from the venue category output, so that we are left with a "cleaner" category.

We next want to filter down our dataset of 100 recommended venues so that we can see the ones that are bars. We will apply a filter to only show rows where the category contains the word "bar". Our output will show us a variety of types of bars:

- Bar
- Hotel Bar
- Dive Bar
- Sports Bar
- Cocktail Bar

The number of results that we get from applying this filter will tell us the proportion of the top 100 recommended venues in the city that are bars of some sort. In turn, we will have a better understanding of how "bar-friendly" the city is (i.e. how interested its population is in bars), and we will have greater confidence in picking a city where bars will be more likely to thrive, which means they could make more frequent purchases of ABC's beer and help to grow ABC's revenues.

Finally, we will want to visualize where these recommended bars are actually located. We will create a map centered on each city and create a loop function to plot points on the map for the bars, based on their latitude and longitude. We will make sure to label them with the bar name and the type of bar, so that ABC's executives can easily focus on certain types of bars (and ignore others) if they'd like. This visualization will help ABC to decide on a specific area to open their brewery in that city, since proximity to local bars has also shown to correlate to increased revenues at its other locations.

When we complete the above process for Dallas, we will repeat for Boise, and we will compare the proportions of the 100 recommended venues that are bars in each city, as well as the geographical layout of those top bar venues. These comparisons, along with ABC's other market research, will help ABC make a decision on which city to expand to.




# Results

Based on the above methodology, we were able to determine that 13% of Foursquare's 100 recommended venues in Dallas were bars of some sort. More specifically, there were:

- 5 "bars"
- 1 "hotel bar"
- 3 "dive bars"
- 1 "sports bar"
- 3 "cocktail bars"

Geographically speaking, the Deep Ellum and Downtown neighborhoods featured 6 of the 13 recommended bars, with the other 7 somewhat scattered across the city.

For Boise, 8% of Foursquare's 100 recommended venues were bars of some sort. More specifically, there were:

- 4 "bars"
- 1 "sports bar"
- 1 "hotel bar"
- 2 "dive bars"

Geographically, 4 of the 8 recommended bars are clustered right in the heart of Boise, in the Downtown area just north of Boise State University, while the other 4 are more scattered across the city.

# Discussion

Strictly based on these datasets, Dallas would appear to be the front runner for the next city for ABC to expand to. Bars account for 13% of Foursquare's 100 recommended venues for the area, compared to 8% for Boise. 

Furthermore, there is a favorable cluster of 6 bars in Deep Ellum and Downtown, the midpoint of which could make for an excellent place to open the brewery and begin a close business relationship with ABC's new bar-neighbors.

However, real estate prices in Dallas may be prohibitive compared to Boise. Furthermore, we've found that Downtown Boise offers its own favorable cluster of bars, with 4 appearing close together. Additionally, the promising cluster of bars is also close to Boise State University, where college students may be especially interested in trying new beers. 

Another thing ABC's executives need to consider is whether they can rely on "cocktail bars" as potential customers to sell their beer to. They need to look at their East Coast operations and determine whether these types of bars usually account for much revenue for them. If not, Dallas is not quite as appealing as it may look at first glance, since 3 of the 13 recommended bars are in fact cocktail bars (including 1 of the bars in the Deep Ellum cluster).

On the other hand, none of the 4 bars in the Downtown Boise cluster are "cocktail bars". So, if ABC determines that cocktail bars are not good potential buyers of their product, then the clusters in Dallas really only feature 5 relevant bars, while the cluster in Boise has 4.

We recommend that ABC's executives weigh the data from this project along with considering potentially major differences in real estate prices between Dallas and Boise. Given the attractive cluster of highly recommended bars in Downtown Boise, plus the proximity to college students that are typically more frequent beer drinkers, we believe that Boise could be a better option for ABC if Dallas real estate prices are prohibitive. However, in a vacuum (i.e. if real estate prices prove to be similar in the 2 cities), and especially if ABC believes that it can sell effectively to cocktail bars (they may want to do some research on the cocktail bars in the Dallas data set to determine if they sell beer), Dallas remains the more favorable option of the two. 

# Conclusion

ABC's executives need to weigh these data sets and cluster maps along with the other market research that they've done. A higher proportion of Foursquare's 100 recommended venues for Dallas are bars, compared to Boise, but there are some complicating factors such as questions of whether/how much beer can be sold to "cocktail bars" and whether real estate prices will be significantly different between the two cities. If ABC's executives find answers to these complicating factors that are favorable for Dallas, then that city would appear to have a slight edge over Boise. If not, we believe that Boise (particularly the Downtown cluster) is by no means any kind of significant downgrade over Dallas.

# Sample Blog Post

At SJ Data Science, we are always finding ways to help our clients solve problems. Today we wanted to share a story about one of our closest clients, ABC Breweries.

For many years, ABC Breweries has had multiple locations on the East Coast. Around each of these brewery locations, a main source of revenue for ABC is selling their beer to local bars in the near vicinity of the brewery.

Last year, ABC came to us to let us know they were planning on starting their expansion across the country by adding 1 more location outside of the East Coast. They had done a variety of market research, and had narrowed their options down to Dallas, TX and Boise, ID.

The executives of ABC Breweries wanted to learn more about how important the "bar scene" is to each of these cities. They wanted to make sure to pick a city where an adequate portion of the most popular destinations are bars. They figured this would help them be confident that the bars that they are selling their beer to will continue to thrive (and continue to purchase their beer), due to being in a generally favorable environment for bar owners.

They also let us know that they generally sell more beer to the bars that are closest in proximity to their East Coast locations, so whichever city they choose for the expansion, they wanted to make sure to build their brewery in a location that is reasonably close to some of the most popular bars.

We were excited to help ABC by using data to compare whether Dallas or Boise may have an environment that is more favorable for bars, and therefore for ABC's continued revenue growth. We also wanted to show them where some of the most popular bars are, so that they could best choose a specific location to have their brewery.

So how did we help them?

To solve this problem for ABC, we were able to use Foursquare location data for both cities. 

We used Foursquare's "Explore" feature to hone in on the Categories of Foursquare's 100 recommended venues for each city (e.g. Bar, Hotel, Park, etc).

We were able to take this data and filter the Dallas and Boise dataframes down to only show the venues with a Category including the word "bar". This included "Hotel Bars", "Dive Bars", and so on, in addition to strictly "Bars".

We took these filtered results, compared to the original list of 100 recommended venues, and were able to find a proportion of recommended venues in the city that are "bars" of some sort.

We were then able to compare the proportion in Dallas vs the proportion in Boise and report back to ABC with the city that seems to have a more favorable environment for bars.

For each of the cities, we were also able to visualize our filtered results by plotting each of the top recommended bars onto a map, so that ABC could determine the ideal spot to have the newest location of ABC Breweries.

We had some interesting results from this project. 13% of Dallas's 100 recommended venues were bars, while only 8% of Boise's were. However, a closer look at the data showed us that Dallas featured several "cocktail bars" in these recommended venues. We were concerned that these types of bars might not be great clients for ABC, so we had them check what proportion of their East Coast revenues came from bars like this.

It turned out that ABC indeed does not collect much revenue from cocktail bars, so Dallas lost a bit of its luster. Plus, real estate prices were materially higher in Dallas compared to Boise. The final deciding factor was that our Boise map showed us that there was an attractive cluster of bars in Downtown Boise, none of which were "cocktail bars". So, ABC was confident that if they opened in that area, their close proximity to these new potential bar customers (and to Boise State University's beer-drinking students) would pay off.

In the end, ABC Breweries went with Boise for their next expansion, and we're excited to see how their opening goes next year!