#### The criteria to be met:
Designers like to go to design talks and share knowledge. There must be some nearby companies that also do design.

30% of the company staff have at least 1 child.

Developers like to be near successful tech startups that have raised at least 1 Million dollars.

Executives like Starbucks A LOT. Ensure there's a starbucks not too far.

Account managers need to travel a lot.

Everyone in the company is between 25 and 40, give them some place to go party.

The CEO is vegan.

If you want to make the maintenance guy happy, a basketball stadium must be around 10 Km.

The office dog—"Dobby" needs a hairdresser every month. Ensure there's one not too far away.

#### Importing the Libraries

In [98]:
#!pip install folium

In [99]:
import numpy as np
from pymongo import MongoClient
import pandas as pd
import time
import os
import requests
import json
from dotenv import load_dotenv
import geopandas as gpd
import geopy.distance

In [100]:
import folium
from folium import Choropleth, Circle, Marker, Icon, Map
from folium.plugins import HeatMap, MarkerCluster

In [101]:
from cartoframes.viz import Map, Layer, popup_element

In [102]:
client = MongoClient("localhost:27017")
db = client["Ironhack"]
companies = db.get_collection("Companies")

#### I want to get a perspective on what information does my collection have

In [103]:
companies.find_one().keys()

dict_keys(['_id', 'name', 'permalink', 'crunchbase_url', 'homepage_url', 'blog_url', 'blog_feed_url', 'twitter_username', 'category_code', 'number_of_employees', 'founded_year', 'founded_month', 'founded_day', 'deadpooled_year', 'tag_list', 'alias_list', 'email_address', 'phone_number', 'description', 'created_at', 'updated_at', 'overview', 'image', 'products', 'relationships', 'competitions', 'providerships', 'total_money_raised', 'funding_rounds', 'investments', 'acquisition', 'acquisitions', 'offices', 'milestones', 'video_embeds', 'screenshots', 'external_links', 'partners'])

#### I assume a good starting point is to narrow it down by category and eventually by founding year, to be in the same vibe as more recent companies

In [104]:
#what different Categories do we have?
distinct_categories = companies.distinct("category_code")

for category in distinct_categories:
    print(category)


None
advertising
analytics
automotive
biotech
cleantech
consulting
design
ecommerce
education
enterprise
fashion
finance
games_video
government
hardware
health
hospitality
legal
local
manufacturing
medical
messaging
mobile
music
nanotech
network_hosting
news
nonprofit
other
photo_video
public_relations
real_estate
search
security
semiconductor
social
software
sports
transportation
travel
web


#### OK, I've got my eye on tech companies: 
categories: analytics, ecommerce, games_video (OBV!), software and web

In [105]:
#let's find all the companies 
condition_1 = {"category_code": "analytics"}
condition_2 = {"category_code": "ecommerce"}
condition_3 = {"category_code": "games_video"}
condition_4 = {"category_code": "software"}
condition_5 = {"category_code": "web"}

projection = {"_id": 0, "name":1,"category_code":1, "founded_year":1}

print(f"there are {len(list(companies.find(condition_1, projection)))} companies in Analytics")
print(f"there are {len(list(companies.find(condition_2, projection)))} companies in ecommerce")
print(f"there are {len(list(companies.find(condition_3, projection)))} companies in games_video")
print(f"there are {len(list(companies.find(condition_4, projection)))} companies in software")
print(f"there are {len(list(companies.find(condition_5, projection)))} companies in web")

there are 66 companies in Analytics
there are 688 companies in ecommerce
there are 1083 companies in games_video
there are 2736 companies in software
there are 3787 companies in web


### Hot damn, that's a lotta companies.

In [122]:
condition_1 = {"category_code": "analytics"}
condition_2 = {"category_code": "ecommerce"}
condition_3 = {"category_code": "games_video"}
condition_4 = {"category_code": "software"}
condition_5 = {"category_code": "web"}
minimum_founding_year = 2010
condition_founded_year = {"founded_year": {"$gt":minimum_founding_year}}

projection = {"_id": 0, "name":1,"category_code":1, "founded_year":1}

print(f"there are {len(list(companies.find({'$and': [condition_1, condition_founded_year]}, projection)))} companies in Analytics founded after {minimum_founding_year}")
print(f"there are {len(list(companies.find({'$and': [condition_2, condition_founded_year]},projection)))} companies in ecommerce founded after {minimum_founding_year}")
print(f"there are {len(list(companies.find({'$and': [condition_3, condition_founded_year]},projection)))} companies in games_video founded after {minimum_founding_year}")
print(f"there are {len(list(companies.find({'$and': [condition_4, condition_founded_year]},projection)))} companies in software founded after {minimum_founding_year}")
print(f"there are {len(list(companies.find({'$and': [condition_5, condition_founded_year]},projection)))} companies in web founded after {minimum_founding_year}")

print(f"total of {len(list(companies.find({'$and': [condition_1, condition_founded_year]}, projection))) + len(list(companies.find({'$and': [condition_2, condition_founded_year]},projection))) + len(list(companies.find({'$and': [condition_3, condition_founded_year]},projection))) + len(list(companies.find({'$and': [condition_4, condition_founded_year]},projection))) + len(list(companies.find({'$and': [condition_5, condition_founded_year]},projection)))} companies")

there are 2 companies in Analytics founded after 2010
there are 4 companies in ecommerce founded after 2010
there are 10 companies in games_video founded after 2010
there are 7 companies in software founded after 2010
there are 21 companies in web founded after 2010
total of 44 companies


#### let's work with that

In [107]:
condition_1 = {"category_code": "analytics"}
condition_2 = {"category_code": "ecommerce"}
condition_3 = {"category_code": "games_video"}
condition_4 = {"category_code": "software"}
condition_5 = {"category_code": "web"}
minimum_founding_year = 2010
condition_founded_year = {"founded_year": {"$gt":minimum_founding_year}}

projection = {"_id": 0, "name":1,"category_code":1, "founded_year":1, "offices.latitude":1, "offices.longitude":1}

conditions = [
    {"$and": [condition_1, condition_founded_year]},
    {"$and": [condition_2, condition_founded_year]},
    {"$and": [condition_3, condition_founded_year]},
    {"$and": [condition_4, condition_founded_year]},
    {"$and": [condition_5, condition_founded_year]},]

# Create the final query with the $or operator
the_5_different_categories = {"$or": conditions}

# Find companies that meet any of the specified conditions
list_narrowed_companies_to_benchmark = list(companies.find(the_5_different_categories, projection))

#### Let's go and try start plotting these dots on the map:

In [240]:
pd.options.display.max_columns = None

In [108]:
#1. Let's create a dataframe: 
companies_to_benchmark = pd.DataFrame(list_narrowed_companies_to_benchmark)
companies_to_benchmark.sample(5)
#well, nice, some of them don't even have the coords. Cool that we've kept a medium sized list

Unnamed: 0,name,category_code,founded_year,offices
28,Newstree,analytics,2012,"[{'latitude': None, 'longitude': None}]"
5,Social Gaming Network,games_video,2011,"[{'latitude': 37.446823, 'longitude': -122.161..."
22,FamilyDen,software,2011,[]
43,DocASAP,web,2012,"[{'latitude': None, 'longitude': None}]"
37,Shopseen,ecommerce,2013,"[{'latitude': 37.772323, 'longitude': -122.214..."


In [109]:
#the coordinates are fuc*ed up all together in the same col. Let's split them:
def extract_latitude(office):
    if office !=[]:
        return office[0].get('latitude', None)
    else:
        return None

def extract_longitude(office):
    if office !=[]:
        return office[0].get('longitude', None)
    else:
        return None

companies_to_benchmark['latitude'] = companies_to_benchmark['offices'].apply(extract_latitude)
companies_to_benchmark['longitude'] = companies_to_benchmark['offices'].apply(extract_longitude)

#don't need the offices anymore, thank you for your service:
companies_to_benchmark.drop('offices',axis=1,inplace=True)
companies_to_benchmark.sample(5)


Unnamed: 0,name,category_code,founded_year,latitude,longitude
5,Social Gaming Network,games_video,2011,37.446823,-122.161523
0,Mokitown,web,2011,37.09024,-95.712891
15,Bling Easy,web,2012,,
29,VisualOn,software,2011,37.270518,-121.955879
33,Topify,web,2012,,


#### not all the companies have coordinates, let's drop them. Thank you for your service:

In [110]:
companies_to_benchmark_with_coordinates = companies_to_benchmark.dropna()
companies_to_benchmark_with_coordinates.head()

Unnamed: 0,name,category_code,founded_year,latitude,longitude
0,Mokitown,web,2011,37.09024,-95.712891
1,headr,web,2012,52.501345,13.410907
2,Fixya,web,2013,37.566879,-122.323895
4,RazorGator,ecommerce,2011,34.047312,-118.445243
5,Social Gaming Network,games_video,2011,37.446823,-122.161523


In [111]:
companies_to_benchmark_with_coordinates.shape
# we've got 19 companies to benchmark now

(19, 5)

In [112]:
#let's plot them in the map. I am using the mean() instead of a specific point
#from the dataframe as we've seen in class. Basically I'm calculating what's the avergage "center" in the map, so to say

#https://fontawesome.com/search?q=work&o=r for my icons
companies_to_benchmark_map = folium.Map(location=[companies_to_benchmark_with_coordinates['latitude'].mean(), companies_to_benchmark_with_coordinates['longitude'].mean()], zoom_start=10)

for index, row in companies_to_benchmark_with_coordinates.iterrows():
        icon = Icon(
        icon="person-digging", #person digging because I thought funny as we're going to "build" a new HQ, whtvs
        prefix="fa",
        color="blue",
        icon_color="red",
    )
        
        marker = Marker(
        location=[row["latitude"], row["longitude"]],
        tooltip=row["name"],
        icon=icon,
    )
        folium.Marker([row['latitude'], row['longitude']], tooltip=row['name']).add_to(companies_to_benchmark_map)

        marker.add_to(companies_to_benchmark_map)

companies_to_benchmark_map


#### I now want to start addressing the distance of interest points to the markers

In [113]:
import os
from dotenv import load_dotenv
load_dotenv() # load_env

True

In [114]:
cc = os.getenv("credit_card")
token = os.getenv("token")

### 1. Designers like to go to design talks and share knowledge. There must be some nearby companies that also do design.

## to get the different possible queries from foursquare:
### https://location.foursquare.com/places/docs/categories

In [155]:
# what design companies are there nearby?
#let's find all the companies 
condition_1 = {"category_code": "design"}
projection = {"_id": 0, "name":1,"category_code":1}

print(f"there are {len(list(companies.find(condition_1, projection)))} companies in Design in the database")

there are 4 companies in Design in the database


#### uff 4 companies, that's not a lot. And 3 of them don't have coordinates 

In [156]:
#maybe if we ask foursquare? Let's take as an example the first company 
mokitown_lat = 37.09024
mokitown_long = -95.712891

lat = mokitown_lat
lon = mokitown_long

In [157]:
def requests_for_foursquare (query, lat, lon, radius=50000, limit=10):

    url = f"https://api.foursquare.com/v3/places/search?query={query}&ll={lat}%2C{lon}&radius={radius}&limit={limit}"

    headers = {
        "accept": "application/json",
        "Authorization": token
    }
    
    try:
        return requests.get(url, headers=headers).json()
    except:
        print("no :(")

In [158]:
results_design_studios = requests_for_foursquare ("design studio", lat, lon, radius=50000, limit=10)
results_design_studios

{'results': [{'fsq_id': '52c9dac711d21fde707c5e8f',
   'categories': [{'id': 11138,
     'name': 'Photographer',
     'short_name': 'Photographer',
     'plural_name': 'Photographers',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/default_',
      'suffix': '.png'}}],
   'chains': [],
   'closed_bucket': 'LikelyOpen',
   'distance': 47692,
   'geocodes': {'main': {'latitude': 37.340346, 'longitude': -95.27499},
    'roof': {'latitude': 37.340346, 'longitude': -95.27499}},
   'link': '/v3/places/52c9dac711d21fde707c5e8f',
   'location': {'address': '2522 Main St',
    'census_block': '200999502001017',
    'country': 'US',
    'dma': 'Joplin-Pittsburg',
    'formatted_address': '2522 Main St, Parsons, KS 67357',
    'locality': 'Parsons',
    'postcode': '67357',
    'region': 'KS'},
   'name': 'Mandi Lever Photography',
   'related_places': {},
   'timezone': 'America/Chicago'},
  {'fsq_id': '4d8cc9e81716a143ec4108f7',
   'categories': [{'id': 11094,
     'na

In [159]:
venues = results_design_studios.get('results', [])

venues_with_coordinates = []

for venue in venues:
    venue_name = venue.get('name', '')  

    geocodes = venue.get('geocodes', {}).get('main', {})
    venue_latitude = geocodes.get('latitude', None)
    venue_longitude = geocodes.get('longitude', None)

    if venue_name and venue_latitude is not None and venue_longitude is not None:
        venues_with_coordinates.append({
            'name': venue_name,
            'latitude': venue_latitude,
            'longitude': venue_longitude
        })

print(venues_with_coordinates)


[{'name': 'Mandi Lever Photography', 'latitude': 37.340346, 'longitude': -95.27499}, {'name': 'Sherwin-Williams', 'latitude': 36.746054, 'longitude': -95.936165}, {'name': 'Bleacher Gear', 'latitude': 37.34023, 'longitude': -95.262227}]


In [160]:
companies_to_benchmark_with_coordinates.head()

Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...",
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...",
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,


In [161]:
# In my dataframe I want a new column to store the nearby design studios
companies_to_benchmark_with_coordinates['Design Studios nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("design studio", lat, lon, radius=50000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Design Studios nearby'] = venues_with_coordinates

print(companies_to_benchmark_with_coordinates)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Design Studios nearby'] = None


                     name category_code  founded_year   latitude   longitude  \
0                Mokitown           web          2011  37.090240  -95.712891   
1                   headr           web          2012  52.501345   13.410907   
2                   Fixya           web          2013  37.566879 -122.323895   
4              RazorGator     ecommerce          2011  34.047312 -118.445243   
5   Social Gaming Network   games_video          2011  37.446823 -122.161523   
8                    Fuzz   games_video          2011  37.760524 -122.387799   
9                Carfeine      software          2012  38.989124  -77.026676   
10                 Ziippi           web          2011  37.444098 -122.161287   
13             Pixelmatic   games_video          2011  49.263588 -123.138565   
14                 Gimigo     analytics          2013  44.859587  -93.226503   
17                  Kidos   games_video          2011  40.768058  -73.956599   
18                  Kidos   games_video 

In [394]:
companies_to_benchmark_with_coordinates.sample(3)

Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Design Studios Count,Starbucks nearby,Starbucks Count,Daycare nearby,Daycare Count,Airport nearby,Airport Count,Train nearby,Train Count,Metro Station nearby,Metro Station Count,Night Club nearby,Night Club Count,Strip Club nearby,Strip Club Count,Cocktail Bar nearby,Cocktail Bar Count,Vegan and Vegetarian Restaurant nearby,Vegan and Vegetarian Restaurant Count,Basketball Stadium nearby,Basketball Stadium Count,Pet Grooming Service nearby,Pet Grooming Service Count,Average Daycare distance,Average Starbucks distance,Average design_studios distance,Average airport distance,Average train distance,Average metro distance,Average night_club distance,Average strip_coordinates distance,Average cocktail_bar distance,Average vegan_rest distance,Average basket_stadium distance,Average pet_grooming distance,Total Points,Average Starbucks distance Points,Average design_studios distance Points,Average Daycare distance Points,Average airport distance Points,Average train distance Points,Average metro distance Points,Average night_club distance Points,Average strip_coordinates distance Points,Average cocktail_bar distance Points,Average vegan_rest distance Points,Average basket_stadium distance Points,Average pet_grooming distance Points
2,Unison Technologies,software,2011,40.764577,-73.979901,"[{'name': 'Mociun', 'latitude': 40.717913, 'lo...",10,"[{'name': 'Starbucks', 'latitude': 40.764052, ...",10,"[{'name': 'Dawning Village Daycare', 'latitude...",10,"[{'name': 'Airport', 'latitude': 40.752971, 'l...",10,"[{'name': 'J Train', 'latitude': 40.725666, 'l...",10,[{'name': 'MTA - 57th St/7th Ave Subway Statio...,10,"[{'name': 'Lavo', 'latitude': 40.76294, 'longi...",20,"[{'name': 'Flashdancers NYC', 'latitude': 40.7...",5,"[{'name': 'Tanner Smith's', 'latitude': 40.764...",20,"[{'name': 'Beyond Sushi', 'latitude': 40.76321...",10,"[{'name': 'Madison Square Garden', 'latitude':...",9,"[{'name': 'Finishing Touches by Stephanie', 'l...",1,8480.245954,1852.431026,1869.887198,2259.79118,4776.329954,1689.447766,1883.006582,1922.993525,1671.92974,875.320133,3297.203261,2612.870312,0.790858,3.8e-05,5.3e-05,1.8e-05,4.4e-05,2.7e-05,8.9e-05,8e-05,7.8e-05,9e-05,0.000171,4.5e-05,5.7e-05
14,Fuzz,games_video,2011,37.760524,-122.387799,"[{'name': 'Stamen Design', 'latitude': 37.7647...",10,"[{'name': 'Starbucks', 'latitude': 37.767121, ...",10,"[{'name': 'Little Bee Daycare & Preschool', 'l...",10,[{'name': '1st Classic Limousine & Car Service...,10,"[{'name': 'Train', 'latitude': 37.788149, 'lon...",10,[{'name': 'Yerba Buena/Moscone MUNI Metro Stat...,10,"[{'name': 'The Great Northern', 'latitude': 37...",20,"[{'name': 'Gold Club', 'latitude': 37.785979, ...",5,"[{'name': 'Third Rail', 'latitude': 37.760692,...",20,"[{'name': 'Cha-Ya', 'latitude': 37.760671, 'lo...",10,"[{'name': 'Chase Center Stadium', 'latitude': ...",9,[],0,3157.244744,6387.630477,4911.313438,10375.748201,5840.256174,14681.182386,2682.974478,2756.572502,3033.934155,3334.22964,5345.89951,,0.353755,1.1e-05,2e-05,4.8e-05,1e-05,2.2e-05,1e-05,5.6e-05,5.4e-05,4.9e-05,4.5e-05,2.8e-05,0.0
4,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,10,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10,"[{'name': 'Philz Coffee', 'latitude': 37.44222...",10,"[{'name': 'Homer Ave Ped/Bike Tunnel', 'latitu...",2,"[{'name': 'Friday Night Waltz', 'latitude': 37...",3,[],0,[{'name': 'San Agus Cocina Urbana & Cocktails'...,20,"[{'name': 'Wildseed', 'latitude': 37.438956, '...",10,"[{'name': 'Maples Pavilion', 'latitude': 37.42...",1,[],0,5231.284079,9186.655261,15342.038079,9549.241891,5518.431911,1371.68473,891.904677,,501.448314,1760.416627,1030.915122,,0.884217,8e-06,7e-06,2.9e-05,1e-05,2.4e-05,0.000109,0.000168,0.0,0.000299,8.5e-05,0.000146,0.0


In [164]:
# Add a new column 'Design Studios Count' to store the count of design studios for each company
companies_to_benchmark_with_coordinates['Design Studios Count'] = companies_to_benchmark_with_coordinates['Design Studios nearby'].apply(lambda x: len(x) if x is not None else 0)

companies_to_benchmark_with_coordinates.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Design Studios Count'] = companies_to_benchmark_with_coordinates['Design Studios nearby'].apply(lambda x: len(x) if x is not None else 0)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby,Design Studios Count
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",,2
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...",,10
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,,10
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...",,10
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,,10


#### it looks like all (except 1) the companies have at leat 10 design studios within a 50km radius. Let's not filter it more for now as we will check for some other criteria

### 2. 30% of the company staff have at least 1 child.

In [165]:
#I'm really just gonna make use of my previous code but replacing where needed. 

# In my dataframe I want a new column to store the nearby daycare
companies_to_benchmark_with_coordinates['Daycare nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Daycare", lat, lon, radius=50000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Daycare nearby'] = venues_with_coordinates

print(companies_to_benchmark_with_coordinates.head())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Daycare nearby'] = None


                    name category_code  founded_year   latitude   longitude  \
0               Mokitown           web          2011  37.090240  -95.712891   
1                  headr           web          2012  52.501345   13.410907   
2                  Fixya           web          2013  37.566879 -122.323895   
4             RazorGator     ecommerce          2011  34.047312 -118.445243   
5  Social Gaming Network   games_video          2011  37.446823 -122.161523   

                               Design Studios nearby  Starbucks nearby  \
0  [{'name': 'Mandi Lever Photography', 'latitude...               NaN   
1  [{'name': 'Zalando Customer Care', 'latitude':...               NaN   
2  [{'name': 'Facebook Analog Research Laboratory...               NaN   
4  [{'name': 'Smashbox Studios', 'latitude': 34.0...               NaN   
5  [{'name': 'Facebook Analog Research Laboratory...               NaN   

   Design Studios Count                                     Daycare nearby  
0  

In [166]:
# Add a new column 'Daycare Count' to store the count of design studios for each company
companies_to_benchmark_with_coordinates['Daycare Count'] = companies_to_benchmark_with_coordinates['Daycare nearby'].apply(lambda x: len(x) if x is not None else 0)

companies_to_benchmark_with_coordinates.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Daycare Count'] = companies_to_benchmark_with_coordinates['Daycare nearby'].apply(lambda x: len(x) if x is not None else 0)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby,Design Studios Count,Daycare nearby,Daycare Count
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",,2,"[{'name': 'Rainbow Palace Day Care', 'latitude...",10
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...",,10,"[{'name': 'Småland', 'latitude': 52.469804, 'l...",10
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,,10,"[{'name': 'Rita's Family Day Care', 'latitude'...",10
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...",,10,"[{'name': 'Lala Land Daycare', 'latitude': 33....",10
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,,10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10


#### bro, these areas really be offering lots of services, all of them having at least 10 daycare centers too. 

### 3. Developers like to be near successful tech startups that have raised at least 1 Million dollars.

In [167]:
#let's find all the companies that raised at least 1M dollars and no older than 5yo
!(Reference for startup age)[https://www.eu-startups.com/2021/03/when-is-a-startup-no-longer-a-startup/]
condition_1 = {"total_money_raised": {"$gte":1}}
condition_2 = {"founded_year": {"$gte":2016}}

projection = {"_id": 0, "name":1, "founded_year":1,"total_money_raised":1}

print(f"there are {len(list(companies.find(condition_2, projection)))} startups that raised more than 1M")


there are 0 startups that raised more than 1M


[https://www.eu-startups.com/2021/03/when-is-a-startup-no-longer-a-startup/] was unexpected at this time.


#### deeply sorry developers, but apparently in my companies collection there are no companies founded after 2016 let alone in the last 5 years. Really don't know how to work these around.

### 4. Executives like Starbucks A LOT. Ensure there's a starbucks not too far.

#### Executives be lookin like 20yo's posh ratchet gurls posting pics on their IG from a party they went to just to show they are social, but oh well, they be Executives and big fish. Foursquare has got chains codes, and Strabucks' is to be found here: https://location.foursquare.com/places/docs/chains

In [171]:
#practical example let's test the code with a company I saw is in Palo Alto so I assume there's lots o offer there:
lat = 37.444098
lon = -122.161287

In [172]:
def requests_for_foursquare_starbucks(query, lat, lon, chains, radius=5000, limit=10):
    chain_code_starbucks = 'ab4c54c0-d68a-012e-5619-003048cad9da'

    url = f"https://api.foursquare.com/v3/places/search?query={query}&ll={lat}%2C{lon}&radius={radius}&chains={chain_code_starbucks}&limit={limit}"

    headers = {
        "accept": "application/json",
        "Authorization": token
    }

    try:
        return requests.get(url, headers=headers).json()
    except:
        print("no :(")


In [174]:
chain_code_starbucks = 'ab4c54c0-d68a-012e-5619-003048cad9da'
requests_for_foursquare_starbucks(query, lat, lon, chains=chain_code_starbucks, radius=5000, limit=10)

{'results': [{'fsq_id': '49e3d406f964a520e2621fe3',
   'categories': [{'id': 13035,
     'name': 'Coffee Shop',
     'short_name': 'Coffee Shop',
     'plural_name': 'Coffee Shops',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
      'suffix': '.png'}}],
   'chains': [{'id': 'ab4c54c0-d68a-012e-5619-003048cad9da',
     'name': 'Starbucks'}],
   'closed_bucket': 'VeryLikelyOpen',
   'distance': 933,
   'geocodes': {'main': {'latitude': 37.443647, 'longitude': -122.171784},
    'roof': {'latitude': 37.443647, 'longitude': -122.171784}},
   'link': '/v3/places/49e3d406f964a520e2621fe3',
   'location': {'address': '79 Stanford Mall',
    'address_extended': 'Stanford Shopping Center',
    'census_block': '060855116091017',
    'country': 'US',
    'cross_street': 'Quarry Rd',
    'dma': 'San Francisco-Oakland-San Jose',
    'formatted_address': '79 Stanford Mall (Quarry Rd), Palo Alto, CA 94304',
    'locality': 'Palo Alto',
    'postcode': '94304',
   

In [176]:
companies_to_benchmark_with_coordinates['Starbucks nearby'] = None
chain_code_starbucks = 'ab4c54c0-d68a-012e-5619-003048cad9da'

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare_starbucks("Starbucks", lat, lon, chains=chain_code_starbucks, radius=5000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Starbucks nearby'] = venues_with_coordinates

print(companies_to_benchmark_with_coordinates.head())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Starbucks nearby'] = None


                    name category_code  founded_year   latitude   longitude  \
0               Mokitown           web          2011  37.090240  -95.712891   
1                  headr           web          2012  52.501345   13.410907   
2                  Fixya           web          2013  37.566879 -122.323895   
4             RazorGator     ecommerce          2011  34.047312 -118.445243   
5  Social Gaming Network   games_video          2011  37.446823 -122.161523   

                               Design Studios nearby  \
0  [{'name': 'Mandi Lever Photography', 'latitude...   
1  [{'name': 'Zalando Customer Care', 'latitude':...   
2  [{'name': 'Facebook Analog Research Laboratory...   
4  [{'name': 'Smashbox Studios', 'latitude': 34.0...   
5  [{'name': 'Facebook Analog Research Laboratory...   

                                    Starbucks nearby  Design Studios Count  \
0                                                 []                     2   
1  [{'name': 'Starbucks', 'latit

In [178]:
# Add a new column 'Starbucks Count' to store the count of design studios for each company
companies_to_benchmark_with_coordinates['Starbucks Count'] = companies_to_benchmark_with_coordinates['Starbucks nearby'].apply(lambda x: len(x) if x is not None else 0)

companies_to_benchmark_with_coordinates.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Starbucks Count'] = companies_to_benchmark_with_coordinates['Starbucks nearby'].apply(lambda x: len(x) if x is not None else 0)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby,Design Studios Count,Daycare nearby,Daycare Count,Starbucks Count
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",[],2,"[{'name': 'Rainbow Palace Day Care', 'latitude...",10,0
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...","[{'name': 'Starbucks', 'latitude': 52.511063, ...",10,"[{'name': 'Småland', 'latitude': 52.469804, 'l...",10,9
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.562883, ...",10,"[{'name': 'Rita's Family Day Care', 'latitude'...",10,10
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...","[{'name': 'Starbucks', 'latitude': 34.047568, ...",10,"[{'name': 'Lala Land Daycare', 'latitude': 33....",10,10
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,10


### 5. Account managers need to travel a lot. (let's try and give them airports, metro and bus)

### Airport:

In [180]:
companies_to_benchmark_with_coordinates['Airport nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Airport", lat, lon, radius=5000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Airport nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Airport nearby'] = None


In [181]:
companies_to_benchmark_with_coordinates['Airport Count'] = companies_to_benchmark_with_coordinates['Airport nearby'].apply(lambda x: len(x) if x is not None else 0)

companies_to_benchmark_with_coordinates.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Airport Count'] = companies_to_benchmark_with_coordinates['Airport nearby'].apply(lambda x: len(x) if x is not None else 0)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby,Design Studios Count,Daycare nearby,Daycare Count,Starbucks Count,Airport nearby,Airport Count
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",[],2,"[{'name': 'Rainbow Palace Day Care', 'latitude...",10,0,[],0
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...","[{'name': 'Starbucks', 'latitude': 52.511063, ...",10,"[{'name': 'Småland', 'latitude': 52.469804, 'l...",10,9,"[{'name': 'Möbelentsorgung', 'latitude': 52.52...",10
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.562883, ...",10,"[{'name': 'Rita's Family Day Care', 'latitude'...",10,10,"[{'name': '888 Airport IP', 'latitude': 37.562...",10
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...","[{'name': 'Starbucks', 'latitude': 34.047568, ...",10,"[{'name': 'Lala Land Daycare', 'latitude': 33....",10,10,"[{'name': 'Santa Monica Airport (SMO)', 'latit...",10
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10


### Train:

In [190]:
companies_to_benchmark_with_coordinates['Train nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Train", lat, lon, radius=5000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Train nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Train nearby'] = None


In [186]:
companies_to_benchmark_with_coordinates['Train Count'] = companies_to_benchmark_with_coordinates['Train nearby'].apply(lambda x: len(x) if x is not None else 0)

companies_to_benchmark_with_coordinates.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Train Count'] = companies_to_benchmark_with_coordinates['Train nearby'].apply(lambda x: len(x) if x is not None else 0)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby,Design Studios Count,Daycare nearby,Daycare Count,Starbucks Count,Airport nearby,Airport Count,Train nearby,Train Count
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",[],2,"[{'name': 'Rainbow Palace Day Care', 'latitude...",10,0,[],0,[],0
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...","[{'name': 'Starbucks', 'latitude': 52.511063, ...",10,"[{'name': 'Småland', 'latitude': 52.469804, 'l...",10,9,"[{'name': 'Möbelentsorgung', 'latitude': 52.52...",10,[{'name': 'German Museum of Technology (Deutsc...,10
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.562883, ...",10,"[{'name': 'Rita's Family Day Care', 'latitude'...",10,10,"[{'name': '888 Airport IP', 'latitude': 37.562...",10,"[{'name': 'Central Park', 'latitude': 37.56169...",10
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...","[{'name': 'Starbucks', 'latitude': 34.047568, ...",10,"[{'name': 'Lala Land Daycare', 'latitude': 33....",10,10,"[{'name': 'Santa Monica Airport (SMO)', 'latit...",10,"[{'name': 'Equinox', 'latitude': 34.05789, 'lo...",10
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10,"[{'name': 'Philz Coffee', 'latitude': 37.44222...",10


### Metro Station:

In [192]:
companies_to_benchmark_with_coordinates['Metro Station nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Metro Station", lat, lon, radius=5000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Metro Station nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Metro Station nearby'] = None


In [193]:
companies_to_benchmark_with_coordinates['Metro Station Count'] = companies_to_benchmark_with_coordinates['Metro Station nearby'].apply(lambda x: len(x) if x is not None else 0)

companies_to_benchmark_with_coordinates.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Metro Station Count'] = companies_to_benchmark_with_coordinates['Metro Station nearby'].apply(lambda x: len(x) if x is not None else 0)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Starbucks nearby,Design Studios Count,Daycare nearby,Daycare Count,Starbucks Count,Airport nearby,Airport Count,Train nearby,Train Count,Metro Station nearby,Metro Station Count
0,Mokitown,web,2011,37.09024,-95.712891,"[{'name': 'Mandi Lever Photography', 'latitude...",[],2,"[{'name': 'Rainbow Palace Day Care', 'latitude...",10,0,[],0,[],0,[],0
1,headr,web,2012,52.501345,13.410907,"[{'name': 'Zalando Customer Care', 'latitude':...","[{'name': 'Starbucks', 'latitude': 52.511063, ...",10,"[{'name': 'Småland', 'latitude': 52.469804, 'l...",10,9,"[{'name': 'Möbelentsorgung', 'latitude': 52.52...",10,[{'name': 'German Museum of Technology (Deutsc...,10,"[{'name': 'U-Bahnhof Moritzplatz', 'latitude':...",10
2,Fixya,web,2013,37.566879,-122.323895,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.562883, ...",10,"[{'name': 'Rita's Family Day Care', 'latitude'...",10,10,"[{'name': '888 Airport IP', 'latitude': 37.562...",10,"[{'name': 'Central Park', 'latitude': 37.56169...",10,"[{'name': 'Caltrain', 'latitude': 37.57249, 'l...",2
4,RazorGator,ecommerce,2011,34.047312,-118.445243,"[{'name': 'Smashbox Studios', 'latitude': 34.0...","[{'name': 'Starbucks', 'latitude': 34.047568, ...",10,"[{'name': 'Lala Land Daycare', 'latitude': 33....",10,10,"[{'name': 'Santa Monica Airport (SMO)', 'latit...",10,"[{'name': 'Equinox', 'latitude': 34.05789, 'lo...",10,"[{'name': 'MTA Expo Line - 26th St/ Bergamot',...",8
5,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10,"[{'name': 'Philz Coffee', 'latitude': 37.44222...",10,"[{'name': 'Homer Ave Ped/Bike Tunnel', 'latitu...",2


### 6. Everyone in the company is between 25 and 40, give them some place to go party.

#### This time I'm going to increase the limit cause we want lots of fun. Let's check night clubs.

In [194]:
companies_to_benchmark_with_coordinates['Night Club nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Night Club", lat, lon, radius=5000, limit=20)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Night Club nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Night Club nearby'] = None


In [197]:
companies_to_benchmark_with_coordinates['Night Club Count'] = companies_to_benchmark_with_coordinates['Night Club nearby'].apply(lambda x: len(x) if x is not None else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Night Club Count'] = companies_to_benchmark_with_coordinates['Night Club nearby'].apply(lambda x: len(x) if x is not None else 0)


#### uuuh there's strip clubs!

In [199]:
#let's limit to 5 cause come on
companies_to_benchmark_with_coordinates['Strip Club nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Strip Club", lat, lon, radius=5000, limit=5)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframeStrip Club Club nearby'] = venues_with_coordinates
    companies_to_benchmark_with_coordinates.at[index, 'Strip Club nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Strip Club nearby'] = None


In [200]:
companies_to_benchmark_with_coordinates['Strip Club Count'] = companies_to_benchmark_with_coordinates['Strip Club nearby'].apply(lambda x: len(x) if x is not None else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Strip Club Count'] = companies_to_benchmark_with_coordinates['Strip Club nearby'].apply(lambda x: len(x) if x is not None else 0)


#### Cocktail bars

In [198]:
companies_to_benchmark_with_coordinates['Cocktail Bar nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Cocktail Bar", lat, lon, radius=5000, limit=20)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Cocktail Bar nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Cocktail Bar nearby'] = None


In [201]:
companies_to_benchmark_with_coordinates['Cocktail Bar Count'] = companies_to_benchmark_with_coordinates['Cocktail Bar nearby'].apply(lambda x: len(x) if x is not None else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Cocktail Bar Count'] = companies_to_benchmark_with_coordinates['Cocktail Bar nearby'].apply(lambda x: len(x) if x is not None else 0)


### 7. Of course the CEO had to be vegan. (b*tch I aint placing a HQ near a restaurant just cause you vegan, when I'm to do the weights, I'm for sure gonna weight this one down).

In [204]:
companies_to_benchmark_with_coordinates['Vegan and Vegetarian Restaurant nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Vegan and Vegetarian Restaurant", lat, lon, radius=5000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Vegan and Vegetarian Restaurant nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Vegan and Vegetarian Restaurant nearby'] = None


In [205]:
companies_to_benchmark_with_coordinates['Vegan and Vegetarian Restaurant Count'] = companies_to_benchmark_with_coordinates['Vegan and Vegetarian Restaurant nearby'].apply(lambda x: len(x) if x is not None else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Vegan and Vegetarian Restaurant Count'] = companies_to_benchmark_with_coordinates['Vegan and Vegetarian Restaurant nearby'].apply(lambda x: len(x) if x is not None else 0)


### 8. If you want to make the maintenance guy happy, a basketball stadium must be around 10 Km.

In [207]:
#I got U bro

In [216]:
companies_to_benchmark_with_coordinates['Basketball Stadium nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Basketball Stadium", lat, lon, radius=10000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Basketball Stadium nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Basketball Stadium nearby'] = None


In [217]:
companies_to_benchmark_with_coordinates['Basketball Stadium Count'] = companies_to_benchmark_with_coordinates['Basketball Stadium nearby'].apply(lambda x: len(x) if x is not None else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Basketball Stadium Count'] = companies_to_benchmark_with_coordinates['Basketball Stadium nearby'].apply(lambda x: len(x) if x is not None else 0)


### 9. The office dog—"Dobby" needs a hairdresser every month. Ensure there's one not too far away.

In [215]:
#I mean, Galgos have little fur, how much can you need it? Trivia: It's in my bucketlist to one day rescue a Galgo <3

In [218]:
companies_to_benchmark_with_coordinates['Pet Grooming Service nearby'] = None

# Iterate through the DataFrame jsut like the previous code
for index, row in companies_to_benchmark_with_coordinates.iterrows():
    lat = row['latitude']
    lon = row['longitude']
    results = requests_for_foursquare("Pet Grooming Service", lat, lon, radius=5000, limit=10)
    venues = results.get('results', [])

    venues_with_coordinates = []

    for venue in venues:
        venue_name = venue.get('name', '') 

        geocodes = venue.get('geocodes', {}).get('main', {})
        venue_latitude = geocodes.get('latitude', None)
        venue_longitude = geocodes.get('longitude', None)

        if venue_name and venue_latitude is not None and venue_longitude is not None:
            venues_with_coordinates.append({
                'name': venue_name,
                'latitude': venue_latitude,
                'longitude': venue_longitude
            })

    #I am placing these results in my dataframe 
    companies_to_benchmark_with_coordinates.at[index, 'Pet Grooming Service nearby'] = venues_with_coordinates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Pet Grooming Service nearby'] = None


In [219]:
companies_to_benchmark_with_coordinates['Pet Grooming Service Count'] = companies_to_benchmark_with_coordinates['Pet Grooming Service nearby'].apply(lambda x: len(x) if x is not None else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates['Pet Grooming Service Count'] = companies_to_benchmark_with_coordinates['Pet Grooming Service nearby'].apply(lambda x: len(x) if x is not None else 0)


## Ok, so now we've covered most of the points we should ahve addressed. Some of the points go over the limit I set (10 for example) so to reach a proper ranking I will instead assess how far away are these points from the average of the company.

In [322]:
#let's just reset the index so it's easier to work with and the columns, let's reorder them and remove duplicates
companies_to_benchmark_with_coordinates.reset_index(inplace=True,drop=True)
new_column_order = ["name","category_code","founded_year","latitude","longitude","Design Studios nearby",
                    "Design Studios Count","Starbucks nearby","Starbucks Count","Daycare nearby",
                    "Daycare Count","Airport nearby","Airport Count","Train nearby","Train Count",
                    "Metro Station nearby","Metro Station Count","Night Club nearby","Night Club Count",
                    "Strip Club nearby","Strip Club Count","Cocktail Bar nearby","Cocktail Bar Count",
                    "Vegan and Vegetarian Restaurant nearby","Vegan and Vegetarian Restaurant Count",
                    "Basketball Stadium nearby","Basketball Stadium Count","Pet Grooming Service nearby",
                    "Pet Grooming Service Count"]
companies_to_benchmark_with_coordinates = companies_to_benchmark_with_coordinates[new_column_order]
companies_to_benchmark_with_coordinates.drop_duplicates(subset=['name'],inplace=True)
companies_to_benchmark_with_coordinates.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  companies_to_benchmark_with_coordinates.drop_duplicates(subset=['name'],inplace=True)


Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Design Studios Count,Starbucks nearby,Starbucks Count,Daycare nearby,Daycare Count,Airport nearby,Airport Count,Train nearby,Train Count,Metro Station nearby,Metro Station Count,Night Club nearby,Night Club Count,Strip Club nearby,Strip Club Count,Cocktail Bar nearby,Cocktail Bar Count,Vegan and Vegetarian Restaurant nearby,Vegan and Vegetarian Restaurant Count,Basketball Stadium nearby,Basketball Stadium Count,Pet Grooming Service nearby,Pet Grooming Service Count
0,Kidos,games_video,2011,40.768058,-73.956599,"[{'name': 'Mociun', 'latitude': 40.717913, 'lo...",10,"[{'name': 'Starbucks', 'latitude': 40.772469, ...",10,"[{'name': 'Smart Start Academy', 'latitude': 4...",10,"[{'name': 'Airport', 'latitude': 40.752971, 'l...",10,"[{'name': '7 Train', 'latitude': 40.748703, 'l...",10,"[{'name': 'MTA Subway - 72nd St', 'latitude': ...",10,"[{'name': 'Lavo', 'latitude': 40.76294, 'longi...",20,"[{'name': 'Flashdancers NYC', 'latitude': 40.7...",5,"[{'name': 'NR', 'latitude': 40.770027, 'longit...",20,"[{'name': 'Beyond Sushi', 'latitude': 40.76321...",10,"[{'name': 'Madison Square Garden', 'latitude':...",9,"[{'name': 'Finishing Touches by Stephanie', 'l...",1
1,Clowdy,web,2013,53.483707,-2.243949,"[{'name': 'Tyi', 'latitude': 53.483578, 'longi...",10,"[{'name': 'Starbucks', 'latitude': 53.485087, ...",10,"[{'name': 'Little Learning Ladder', 'latitude'...",10,"[{'name': 'Ezybook', 'latitude': 53.448744, 'l...",10,"[{'name': 'Train Manchester', 'latitude': 53.4...",10,"[{'name': 'Platform 4b', 'latitude': 53.487636...",10,"[{'name': '42nd Street', 'latitude': 53.478402...",20,"[{'name': 'Victorias Gentlemens Club', 'latitu...",4,"[{'name': 'The Alchemist', 'latitude': 53.4801...",20,"[{'name': 'Eighth Day Cafe', 'latitude': 53.47...",10,[],0,[],0
2,Unison Technologies,software,2011,40.764577,-73.979901,"[{'name': 'Mociun', 'latitude': 40.717913, 'lo...",10,"[{'name': 'Starbucks', 'latitude': 40.764052, ...",10,"[{'name': 'Dawning Village Daycare', 'latitude...",10,"[{'name': 'Airport', 'latitude': 40.752971, 'l...",10,"[{'name': 'J Train', 'latitude': 40.725666, 'l...",10,[{'name': 'MTA - 57th St/7th Ave Subway Statio...,10,"[{'name': 'Lavo', 'latitude': 40.76294, 'longi...",20,"[{'name': 'Flashdancers NYC', 'latitude': 40.7...",5,"[{'name': 'Tanner Smith's', 'latitude': 40.764...",20,"[{'name': 'Beyond Sushi', 'latitude': 40.76321...",10,"[{'name': 'Madison Square Garden', 'latitude':...",9,"[{'name': 'Finishing Touches by Stephanie', 'l...",1
3,Ziippi,web,2011,37.444098,-122.161287,[{'name': 'Facebook Analog Research Laboratory...,10,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10,"[{'name': 'Philz Coffee', 'latitude': 37.44222...",10,"[{'name': 'Homer Ave Ped/Bike Tunnel', 'latitu...",2,"[{'name': 'Friday Night Waltz', 'latitude': 37...",3,[],0,[{'name': 'San Agus Cocina Urbana & Cocktails'...,20,"[{'name': 'Wildseed', 'latitude': 37.438956, '...",10,"[{'name': 'Maples Pavilion', 'latitude': 37.42...",1,[],0
4,Social Gaming Network,games_video,2011,37.446823,-122.161523,[{'name': 'Facebook Analog Research Laboratory...,10,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10,"[{'name': 'Philz Coffee', 'latitude': 37.44222...",10,"[{'name': 'Homer Ave Ped/Bike Tunnel', 'latitu...",2,"[{'name': 'Friday Night Waltz', 'latitude': 37...",3,[],0,[{'name': 'San Agus Cocina Urbana & Cocktails'...,20,"[{'name': 'Wildseed', 'latitude': 37.438956, '...",10,"[{'name': 'Maples Pavilion', 'latitude': 37.42...",1,[],0


In [324]:
#let's go step by step, from the classes, I'm stealing a function that gets me the distance from things to the companies:

import math

def haversine(coord1, coord2):
    lon1, lat1 = coord1
    lon2, lat2 = coord2

    R = 6371000  # radius of Earth in meters
    phi_1 = math.radians(lat1)
    phi_2 = math.radians(lat2)

    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    a = math.sin(delta_phi / 2.0) ** 2 + math.cos(phi_1) * math.cos(phi_2) * math.sin(delta_lambda / 2.0) ** 2

    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    meters = R * c  # output distance in meters
    return meters

#Example with RazorGator's coordinates
razorgator_coordinates = (34.047312, -118.445243)

#list of daycares nearby RazorGator
daycares = [{'name': 'Lala Land Daycare',
  'latitude': 33.997818,
  'longitude': -118.450388},
 {'name': 'Inspire Martial Arts & Fitness',
  'latitude': 34.180761,
  'longitude': -118.309185},
 {'name': 'Lotfizadeh Family WeeCare',
  'latitude': 34.034936,
  'longitude': -118.423998},
 {'name': 'Big and Tiny', 'latitude': 34.013146, 'longitude': -118.466244},
 {'name': 'Maple Tree Academy', 'latitude': 34.0179, 'longitude': -118.478919},
 {'name': "A Kid's Place", 'latitude': 34.004001, 'longitude': -118.432424},
 {'name': 'Happy Nanny Happy Child',
  'latitude': 34.023317,
  'longitude': -118.399662},
 {'name': 'Trinity Baptist Childrens Center',
  'latitude': 34.025778,
  'longitude': -118.49399},
 {'name': 'A-list Montessori',
  'latitude': 34.005019,
  'longitude': -118.420855},
 {'name': 'Rover Kennels', 'latitude': 34.005845, 'longitude': -118.487926}]

total_distance = 0
num_daycares = len(daycares)

# Calculate the total distance to all daycares
for daycare in daycares:
    daycare_coordinates = (daycare['latitude'], daycare['longitude'])
    distance = haversine(razorgator_coordinates, daycare_coordinates)
    total_distance += distance

# Calculate the average distance
average_distance = total_distance / num_daycares

print("Average distance to daycares:", average_distance, "meters")


Average distance to daycares: 5104.928734612003 meters


#### Nice, we've got something for RazorGator. Let's do this but for every company:

In [325]:
#doing this for every company:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    daycares = row['Daycare nearby']
    total_distance = 0

    for daycare in daycares:
        daycare_coordinates = (daycare['latitude'], daycare['longitude'])
        distance = haversine(company_coordinates, daycare_coordinates)
        total_distance += distance

    if daycares:
        average_distance = total_distance / len(daycares)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average Daycare distance'] = average_distances


#### Nice, now let's do this but for every criteria:

### Starbucks:

In [326]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    starbucks = row['Starbucks nearby']
    total_distance = 0

    for starbucks in starbucks:
        starbucks_coordinates = (starbucks['latitude'], starbucks['longitude'])
        distance = haversine(company_coordinates, starbucks_coordinates)
        total_distance += distance

    if starbucks:
        average_distance = total_distance / len(starbucks)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average Starbucks distance'] = average_distances


### Design Studios:

In [327]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    design_studios = row['Design Studios nearby']
    total_distance = 0

    for studio in design_studios:
        studio_coordinates = (studio['latitude'], studio['longitude'])
        distance = haversine(company_coordinates, studio_coordinates)
        total_distance += distance

    if design_studios:
        average_distance = total_distance / len(design_studios)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average design_studios distance'] = average_distances


### Airport:

In [328]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    airport = row['Airport nearby']
    total_distance = 0

    for airport in airport:
        airport_coordinates = (airport['latitude'], airport['longitude'])
        distance = haversine(company_coordinates, airport_coordinates)
        total_distance += distance

    if airport:
        average_distance = total_distance / len(airport)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average airport distance'] = average_distances


### Train:

In [329]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    train = row['Train nearby']
    total_distance = 0

    for train in train:
        train_coordinates = (train['latitude'], train['longitude'])
        distance = haversine(company_coordinates, train_coordinates)
        total_distance += distance

    if train:
        average_distance = total_distance / len(train)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average train distance'] = average_distances


### Metro Station:

In [330]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    metro_station = row['Metro Station nearby']
    total_distance = 0

    for metro in metro_station:
        metro_coordinates = (metro['latitude'], metro['longitude'])
        distance = haversine(company_coordinates, metro_coordinates)
        total_distance += distance

    if metro_station:
        average_distance = total_distance / len(metro_coordinates)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average metro distance'] = average_distances


### Night Club:

In [331]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    night_club = row['Night Club nearby']
    total_distance = 0

    for club in night_club:
        club_coordinates = (club['latitude'], club['longitude'])
        distance = haversine(company_coordinates, club_coordinates)
        total_distance += distance

    if night_club:
        average_distance = total_distance / len(night_club)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average night_club distance'] = average_distances


### Strip Club:

In [332]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    strip_club = row['Strip Club nearby']
    total_distance = 0

    for strip in strip_club:
        strip_coordinates = (strip['latitude'], strip['longitude'])
        distance = haversine(company_coordinates, strip_coordinates)
        total_distance += distance

    if strip_club:
        average_distance = total_distance / len(strip_club)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average strip_coordinates distance'] = average_distances


### Cocktail Bar

In [333]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    cocktail_bar = row['Cocktail Bar nearby']
    total_distance = 0

    for cocktail in cocktail_bar:
        cocktail_coordinates = (cocktail['latitude'], cocktail['longitude'])
        distance = haversine(company_coordinates, cocktail_coordinates)
        total_distance += distance

    if cocktail_bar:
        average_distance = total_distance / len(cocktail_bar)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average cocktail_bar distance'] = average_distances


### Vegan and Vegetarian Restaurant

In [334]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    vegan_rest = row['Vegan and Vegetarian Restaurant nearby']
    total_distance = 0

    for vegan in vegan_rest:
        vegan_coordinates = (vegan['latitude'], vegan['longitude'])
        distance = haversine(company_coordinates, vegan_coordinates)
        total_distance += distance

    if vegan_rest:
        average_distance = total_distance / len(vegan_rest)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average vegan_rest distance'] = average_distances


### Basketball Stadium

In [335]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    basket_stadium = row['Basketball Stadium nearby']
    total_distance = 0

    for basket in basket_stadium:
        basket_coordinates = (basket['latitude'], basket['longitude'])
        distance = haversine(company_coordinates, basket_coordinates)
        total_distance += distance

    if basket_stadium:
        average_distance = total_distance / len(basket_stadium)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average basket_stadium distance'] = average_distances


### Pet Grooming Service

In [336]:
average_distances = []

for index, row in companies_to_benchmark_with_coordinates.iterrows(): #I'm getting the coordinates of the company
    company_coordinates = (row['latitude'], row['longitude'])
    pet_grooming = row['Pet Grooming Service nearby']
    total_distance = 0

    for pet in pet_grooming:
        pet_coordinates = (pet['latitude'], pet['longitude'])
        distance = haversine(company_coordinates, pet_coordinates)
        total_distance += distance

    if pet_grooming:
        average_distance = total_distance / len(pet_grooming)
    else:
        average_distance = None
    average_distances.append(average_distance)

companies_to_benchmark_with_coordinates['Average pet_grooming distance'] = average_distances


## OK so now we've got a dataframe with the points of interest, their count and their average distance.

### Let's export the Dataframe:

In [290]:
companies_to_benchmark_with_coordinates.to_csv('../Data/benchmarking_companies.csv')

### I want to rank the several companies based on the proximity to their points of interest (lower prox --> more points) and find a winning company. I'll start by attributing weights to the several criteria

In [338]:
criteria_weights = {"Average Starbucks distance":0.07,"Average design_studios distance":0.11,"Average Daycare distance":0.16,
                   "Average airport distance":0.1,"Average train distance":0.13,"Average metro distance":0.15,
                   "Average night_club distance":0.05,"Average strip_coordinates distance":0.03,
                    "Average cocktail_bar distance":0.04,"Average vegan_rest distance":0.06,"Average basket_stadium distance":0.04,
                   "Average pet_grooming distance":0.06}
criteria_weights.values()

dict_values([0.07, 0.11, 0.16, 0.1, 0.13, 0.15, 0.05, 0.03, 0.04, 0.06, 0.04, 0.06])

In [339]:
sum_criteria=0
for i in criteria_weights.values():
    sum_criteria+=i
sum_criteria

1.0000000000000002

In [357]:
#mega function with some explanation on each thing it's doing:

In [346]:
import pandas as pd
import numpy as np

criteria_weights = {
    "Average Starbucks distance": 0.07,
    "Average design_studios distance": 0.1,
    "Average Daycare distance": 0.15,
    "Average airport distance": 0.1,
    "Average train distance": 0.13,
    "Average metro distance": 0.15,
    "Average night_club distance": 0.15,
    "Average strip_coordinates distance": 0.15,
    "Average cocktail_bar distance": 0.15,
    "Average vegan_rest distance": 0.15,
    "Average basket_stadium distance": 0.15,
    "Average pet_grooming distance": 0.15}

companies_to_benchmark_with_coordinates['Total Points'] = 0

# Loop through each criterion
for criterion, weight in criteria_weights.items():
    #I want companies that are closer to points of interest to receive more points for each criterion:
    companies_to_benchmark_with_coordinates = companies_to_benchmark_with_coordinates.sort_values(by=criterion, ascending=True)

    # I am creating a new column that's called basically
    #the name of the criterion and add "points" at the end of it.
    #I then calculate points for each company based on the inverse of the 
    #distance to the specific point of interest. Companies with shorter distances
    #receive higher points, while companies with longer distances receive fewer points
    companies_to_benchmark_with_coordinates[criterion + ' Points'] = 1 / companies_to_benchmark_with_coordinates[criterion]

    # replacing NaN  with 0 points cause these companies don't have this point of interest nearby
    companies_to_benchmark_with_coordinates[criterion + ' Points'] = companies_to_benchmark_with_coordinates[criterion + ' Points'].fillna(0)

    # Apply the weight to the points
    companies_to_benchmark_with_coordinates[criterion + ' Points'] = companies_to_benchmark_with_coordinates[criterion + ' Points'] * weight

    # Accumulate the criterion points to the Total Points (I multiply by 1000 cause otherwise the inverse of the distance 
    #would give me very little points)
    companies_to_benchmark_with_coordinates['Total Points'] += companies_to_benchmark_with_coordinates[criterion + ' Points']*1000

#this is the ordered ranking by points!
companies_to_benchmark_with_coordinates = companies_to_benchmark_with_coordinates.sort_values(by='Total Points', ascending=False)

#top 1 company should be this:
best_company = companies_to_benchmark_with_coordinates.iloc[0]

# Print the name of the best company
print(f"The best company is: {best_company['name']} with a total of {best_company['Total Points']} points!")

The best company is: Kidos with a total of 2.935753497633534 points!


In [361]:
companies_to_benchmark_with_coordinates.sample(3)

Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Design Studios Count,Starbucks nearby,Starbucks Count,Daycare nearby,Daycare Count,Airport nearby,Airport Count,Train nearby,Train Count,Metro Station nearby,Metro Station Count,Night Club nearby,Night Club Count,Strip Club nearby,Strip Club Count,Cocktail Bar nearby,Cocktail Bar Count,Vegan and Vegetarian Restaurant nearby,Vegan and Vegetarian Restaurant Count,Basketball Stadium nearby,Basketball Stadium Count,Pet Grooming Service nearby,Pet Grooming Service Count,Average Daycare distance,Average Starbucks distance,Average design_studios distance,Average airport distance,Average train distance,Average metro distance,Average night_club distance,Average strip_coordinates distance,Average cocktail_bar distance,Average vegan_rest distance,Average basket_stadium distance,Average pet_grooming distance,Total Points,Average Starbucks distance Points,Average design_studios distance Points,Average Daycare distance Points,Average airport distance Points,Average train distance Points,Average metro distance Points,Average night_club distance Points,Average strip_coordinates distance Points,Average cocktail_bar distance Points,Average vegan_rest distance Points,Average basket_stadium distance Points,Average pet_grooming distance Points
14,Fuzz,games_video,2011,37.760524,-122.387799,"[{'name': 'Stamen Design', 'latitude': 37.7647...",10,"[{'name': 'Starbucks', 'latitude': 37.767121, ...",10,"[{'name': 'Little Bee Daycare & Preschool', 'l...",10,[{'name': '1st Classic Limousine & Car Service...,10,"[{'name': 'Train', 'latitude': 37.788149, 'lon...",10,[{'name': 'Yerba Buena/Moscone MUNI Metro Stat...,10,"[{'name': 'The Great Northern', 'latitude': 37...",20,"[{'name': 'Gold Club', 'latitude': 37.785979, ...",5,"[{'name': 'Third Rail', 'latitude': 37.760692,...",20,"[{'name': 'Cha-Ya', 'latitude': 37.760671, 'lo...",10,"[{'name': 'Chase Center Stadium', 'latitude': ...",9,[],0,3157.244744,6387.630477,4911.313438,10375.748201,5840.256174,14681.182386,2682.974478,2756.572502,3033.934155,3334.22964,5345.89951,,0.353755,1.1e-05,2e-05,4.8e-05,1e-05,2.2e-05,1e-05,5.6e-05,5.4e-05,4.9e-05,4.5e-05,2.8e-05,0.0
1,Clowdy,web,2013,53.483707,-2.243949,"[{'name': 'Tyi', 'latitude': 53.483578, 'longi...",10,"[{'name': 'Starbucks', 'latitude': 53.485087, ...",10,"[{'name': 'Little Learning Ladder', 'latitude'...",10,"[{'name': 'Ezybook', 'latitude': 53.448744, 'l...",10,"[{'name': 'Train Manchester', 'latitude': 53.4...",10,"[{'name': 'Platform 4b', 'latitude': 53.487636...",10,"[{'name': '42nd Street', 'latitude': 53.478402...",20,"[{'name': 'Victorias Gentlemens Club', 'latitu...",4,"[{'name': 'The Alchemist', 'latitude': 53.4801...",20,"[{'name': 'Eighth Day Cafe', 'latitude': 53.47...",10,[],0,[],0,12244.61641,3586.643548,604.871066,6458.693686,4162.418303,8009.468439,749.496059,755.430162,1079.109747,1181.461799,,,0.927196,2e-05,0.000165,1.2e-05,1.5e-05,3.1e-05,1.9e-05,0.0002,0.000199,0.000139,0.000127,0.0,0.0
3,Ziippi,web,2011,37.444098,-122.161287,[{'name': 'Facebook Analog Research Laboratory...,10,"[{'name': 'Starbucks', 'latitude': 37.443647, ...",10,"[{'name': 'Fredy's DayCare', 'latitude': 37.42...",10,"[{'name': 'Palo Alto Airport (PAO)', 'latitude...",10,"[{'name': 'Philz Coffee', 'latitude': 37.44222...",10,"[{'name': 'Homer Ave Ped/Bike Tunnel', 'latitu...",2,"[{'name': 'Friday Night Waltz', 'latitude': 37...",3,[],0,[{'name': 'San Agus Cocina Urbana & Cocktails'...,20,"[{'name': 'Wildseed', 'latitude': 37.438956, '...",10,"[{'name': 'Maples Pavilion', 'latitude': 37.42...",1,[],0,5205.327398,9638.422587,15415.982403,9496.999223,5342.459355,1323.734989,877.507037,,472.625457,1501.219436,867.888431,,0.951812,7e-06,6e-06,2.9e-05,1.1e-05,2.4e-05,0.000113,0.000171,0.0,0.000317,0.0001,0.000173,0.0


## Ladies and gentlement, we've got a winner!

In [358]:
company_to_benchmark = companies_to_benchmark_with_coordinates.head(1)
company_to_benchmark

Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Design Studios Count,Starbucks nearby,Starbucks Count,Daycare nearby,Daycare Count,Airport nearby,Airport Count,Train nearby,Train Count,Metro Station nearby,Metro Station Count,Night Club nearby,Night Club Count,Strip Club nearby,Strip Club Count,Cocktail Bar nearby,Cocktail Bar Count,Vegan and Vegetarian Restaurant nearby,Vegan and Vegetarian Restaurant Count,Basketball Stadium nearby,Basketball Stadium Count,Pet Grooming Service nearby,Pet Grooming Service Count,Average Daycare distance,Average Starbucks distance,Average design_studios distance,Average airport distance,Average train distance,Average metro distance,Average night_club distance,Average strip_coordinates distance,Average cocktail_bar distance,Average vegan_rest distance,Average basket_stadium distance,Average pet_grooming distance,Total Points,Average Starbucks distance Points,Average design_studios distance Points,Average Daycare distance Points,Average airport distance Points,Average train distance Points,Average metro distance Points,Average night_club distance Points,Average strip_coordinates distance Points,Average cocktail_bar distance Points,Average vegan_rest distance Points,Average basket_stadium distance Points,Average pet_grooming distance Points
0,Kidos,games_video,2011,40.768058,-73.956599,"[{'name': 'Mociun', 'latitude': 40.717913, 'lo...",10,"[{'name': 'Starbucks', 'latitude': 40.772469, ...",10,"[{'name': 'Smart Start Academy', 'latitude': 4...",10,"[{'name': 'Airport', 'latitude': 40.752971, 'l...",10,"[{'name': '7 Train', 'latitude': 40.748703, 'l...",10,"[{'name': 'MTA Subway - 72nd St', 'latitude': ...",10,"[{'name': 'Lavo', 'latitude': 40.76294, 'longi...",20,"[{'name': 'Flashdancers NYC', 'latitude': 40.7...",5,"[{'name': 'NR', 'latitude': 40.770027, 'longit...",20,"[{'name': 'Beyond Sushi', 'latitude': 40.76321...",10,"[{'name': 'Madison Square Garden', 'latitude':...",9,"[{'name': 'Finishing Touches by Stephanie', 'l...",1,8825.362979,2177.403362,2988.194938,7938.50292,5181.482425,4182.094352,3842.354047,2731.183723,2602.396323,2232.94495,3219.030371,59.660709,2.935753,3.2e-05,3.3e-05,1.7e-05,1.3e-05,2.5e-05,3.6e-05,3.9e-05,5.5e-05,5.8e-05,6.7e-05,4.7e-05,0.002514


In [359]:
company_to_benchmark.iloc[0]["Design Studios nearby"]

[{'name': 'Mociun', 'latitude': 40.717913, 'longitude': -73.962519},
 {'name': 'Manhattan Wardrobe Supply',
  'latitude': 40.748713,
  'longitude': -73.994591},
 {'name': 'Vandervoort Studio',
  'latitude': 40.715914,
  'longitude': -73.934378},
 {'name': 'D & D Building', 'latitude': 40.760967, 'longitude': -73.96637},
 {'name': 'La-Z-Boy', 'latitude': 40.754239, 'longitude': -73.981791},
 {'name': 'The Color House NY',
  'latitude': 40.720503,
  'longitude': -73.998413},
 {'name': 'Steelcase', 'latitude': 40.767431, 'longitude': -73.982875},
 {'name': 'Rooq Fine Art & Framing',
  'latitude': 40.728244,
  'longitude': -73.993435},
 {'name': 'The Future Perfect',
  'latitude': 40.726545,
  'longitude': -73.992408},
 {'name': 'Gagosian Shop', 'latitude': 40.77475, 'longitude': -73.963584}]

### Let's plot it on the map!

In [397]:
# Company's coordinates and the map
company_latitude = company_to_benchmark.iloc[0]["latitude"]
company_longitude = company_to_benchmark.iloc[0]["longitude"]
company_map = folium.Map(location=[company_latitude, company_longitude], zoom_start=15)

# here I am customizing the company's icon
company_icon = folium.Icon(
    icon="building-flag",
    prefix="fa",
    icon_color="black",
    color="white",
    icon_size=(40, 40)
)

# and here I am actually creating it
company_marker = folium.Marker(
    location=[company_latitude, company_longitude],
    icon=company_icon,
    popup='Our company!'
)
company_marker.add_to(company_map)

# These are the criteria I used in my dataframe classification and for each criteria I'll customize a marker
criteria_to_customize = [
    'Design Studios nearby',
    'Starbucks nearby',
    'Daycare nearby',
    'Airport nearby',
    'Train nearby',
    'Metro Station nearby',
    'Night Club nearby',
    'Strip Club nearby',
    'Cocktail Bar nearby',
    'Vegan and Vegetarian Restaurant nearby',
    'Basketball Stadium nearby',
    'Pet Grooming Service nearby'
]

for criterion in criteria_to_customize:
    # Retrieve the data associated with the current criterion
    places = company_to_benchmark.iloc[0][criterion]

    # Loop through the places for the current criterion
    for place in places:
        location = [place['latitude'], place['longitude']]
        name = place['name']

        # Customize the icon based on the criterion
        icon = None  # Initialize as None

        if criterion == 'Design Studios nearby':
            icon = folium.Icon(
                icon='pencil',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )
        elif criterion == 'Starbucks nearby':
            icon = folium.Icon(
                icon='mug-hot',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )
        elif criterion == 'Daycare nearby':
            icon = folium.Icon(
                icon='child',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )            
        elif criterion == 'Airport nearby':
            icon = folium.Icon(
                icon='plane-departure',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )              
        elif criterion == 'Train nearby':
            icon = folium.Icon(
                icon='train',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )       
        elif criterion == 'Metro Station nearby':
            icon = folium.Icon(
                icon='m',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )                   
        elif criterion == 'Night Club nearby':
            icon = folium.Icon(
                icon='moon',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )              
        elif criterion == 'Strip Club nearby':
            icon = folium.Icon(
                icon='eye-slash',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )  
        elif criterion == 'Cocktail Bar nearby':
            icon = folium.Icon(
                icon='martini-glass',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )       
        elif criterion == 'Vegan and Vegetarian Restaurant nearby':
            icon = folium.Icon(
                icon='seedling',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )                
        elif criterion == 'Basketball Stadium nearby':
            icon = folium.Icon(
                icon='basketball',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )                
        elif criterion == 'Pet Grooming Service nearby':
            icon = folium.Icon(
                icon='dog',
                prefix="fa",
                icon_color="white",
                color="darkblue"
            )                  
            
            
        if icon is not None:
            place_marker = folium.Marker(
                location=location,
                icon=icon,
                popup=name
            )
            place_marker.add_to(company_map)

# Display the map
company_map


In [398]:
company_map.save("../Figures/company_map.html")


In [370]:
company_to_benchmark

Unnamed: 0,name,category_code,founded_year,latitude,longitude,Design Studios nearby,Design Studios Count,Starbucks nearby,Starbucks Count,Daycare nearby,Daycare Count,Airport nearby,Airport Count,Train nearby,Train Count,Metro Station nearby,Metro Station Count,Night Club nearby,Night Club Count,Strip Club nearby,Strip Club Count,Cocktail Bar nearby,Cocktail Bar Count,Vegan and Vegetarian Restaurant nearby,Vegan and Vegetarian Restaurant Count,Basketball Stadium nearby,Basketball Stadium Count,Pet Grooming Service nearby,Pet Grooming Service Count,Average Daycare distance,Average Starbucks distance,Average design_studios distance,Average airport distance,Average train distance,Average metro distance,Average night_club distance,Average strip_coordinates distance,Average cocktail_bar distance,Average vegan_rest distance,Average basket_stadium distance,Average pet_grooming distance,Total Points,Average Starbucks distance Points,Average design_studios distance Points,Average Daycare distance Points,Average airport distance Points,Average train distance Points,Average metro distance Points,Average night_club distance Points,Average strip_coordinates distance Points,Average cocktail_bar distance Points,Average vegan_rest distance Points,Average basket_stadium distance Points,Average pet_grooming distance Points
0,Kidos,games_video,2011,40.768058,-73.956599,"[{'name': 'Mociun', 'latitude': 40.717913, 'lo...",10,"[{'name': 'Starbucks', 'latitude': 40.772469, ...",10,"[{'name': 'Smart Start Academy', 'latitude': 4...",10,"[{'name': 'Airport', 'latitude': 40.752971, 'l...",10,"[{'name': '7 Train', 'latitude': 40.748703, 'l...",10,"[{'name': 'MTA Subway - 72nd St', 'latitude': ...",10,"[{'name': 'Lavo', 'latitude': 40.76294, 'longi...",20,"[{'name': 'Flashdancers NYC', 'latitude': 40.7...",5,"[{'name': 'NR', 'latitude': 40.770027, 'longit...",20,"[{'name': 'Beyond Sushi', 'latitude': 40.76321...",10,"[{'name': 'Madison Square Garden', 'latitude':...",9,"[{'name': 'Finishing Touches by Stephanie', 'l...",1,8825.362979,2177.403362,2988.194938,7938.50292,5181.482425,4182.094352,3842.354047,2731.183723,2602.396323,2232.94495,3219.030371,59.660709,2.935753,3.2e-05,3.3e-05,1.7e-05,1.3e-05,2.5e-05,3.6e-05,3.9e-05,5.5e-05,5.8e-05,6.7e-05,4.7e-05,0.002514


In [396]:
#final database to export:
companies_to_benchmark_with_coordinates.to_csv('../Data/benchmarking_companies.csv')