Importamos los datos necesarios para nuestro proyecto. Nos conectamos a MongoDB. 

In [1]:
#Importamos los datos

from pymongo import MongoClient
import pandas as pd
client = MongoClient("mongodb://localhost:27017/")
db = client.db_companies

In [2]:
companies = db.companies.find()
data_companies = pd.DataFrame(companies)
data_companies.head()

Unnamed: 0,_id,name,permalink,crunchbase_url,homepage_url,blog_url,blog_feed_url,twitter_username,category_code,number_of_employees,...,video_embeds,screenshots,external_links,partners,founded_month,founded_day,deadpooled_month,deadpooled_day,deadpooled_url,ipo
0,52cdef7c4bab8bd675297d8b,AdventNet,abc3,http://www.crunchbase.com/company/adventnet,http://adventnet.com,,,manageengine,enterprise,600.0,...,[],"[{'available_sizes': [[[150, 94], 'assets/imag...",[],[],,,,,,
1,52cdef7c4bab8bd675297d8a,Wetpaint,abc2,http://www.crunchbase.com/company/wetpaint,http://wetpaint-inc.com,http://digitalquarters.net/,http://digitalquarters.net/feed/,BachelrWetpaint,web,47.0,...,[],"[{'available_sizes': [[[150, 86], 'assets/imag...",[{'external_url': 'http://www.geekwire.com/201...,[],10.0,17.0,,,,
2,52cdef7c4bab8bd675297d8c,Zoho,abc4,http://www.crunchbase.com/company/zoho,http://zoho.com,http://blogs.zoho.com/,http://blogs.zoho.com/feed,zoho,software,1600.0,...,"[{'embed_code': '<object width=""430"" height=""2...",[],[{'external_url': 'http://www.online-tech-tips...,[],9.0,15.0,,,,
3,52cdef7c4bab8bd675297d90,Postini,postini,http://www.crunchbase.com/company/postini,http://postini.com,,,,web,,...,[],[],[],[],6.0,2.0,,,,
4,52cdef7c4bab8bd675297d8d,Digg,digg,http://www.crunchbase.com/company/digg,http://www.digg.com,http://blog.digg.com/,http://blog.digg.com/?feed=rss2,digg,news,60.0,...,"[{'embed_code': '<embed src=""http://blip.tv/pl...","[{'available_sizes': [[[117, 150], 'assets/ima...",[{'external_url': 'http://www.sociableblog.com...,[],10.0,11.0,,,,


Contabilizamos los tipos de empresas que aparecen en el dataframe para extraer con facilidad las empresas que nos interesan. No olvidemos que nos hemos especializado en el sector de los videojuegos y, por tanto, los intereses de nuestros trabajadores serán cercanos al sector al que pertenecen. 

In [3]:
data_companies['category_code'].value_counts()

web                 3787
software            2736
games_video         1083
mobile              1018
other                986
advertising          928
enterprise           742
ecommerce            688
consulting           637
network_hosting      626
public_relations     533
search               394
biotech              373
hardware             368
cleantech            305
semiconductor        167
security             156
analytics             66
social                49
finance               49
news                  48
education             36
music                 33
messaging             30
travel                25
legal                 25
medical               25
photo_video           23
health                23
manufacturing         19
sports                13
real_estate           10
fashion               10
automotive             9
hospitality            8
transportation         7
nanotech               5
design                 4
nonprofit              4
government             1


Filtramos los datos que nos interesan acerca de las empresas de las que ya conocemos el sector. Evitamos las empresas con bajos ingresos. 

In [4]:
companies_flt = db.companies.find({
    "founded_year" : {"$gte" : 2006},
    "total_money_raised" : {
        '$nin': ["$100k", "$0", "$500k"]},
    "category_code":{
        "$in":["design", 'software', 'web', 'games video', 'hadware', 'mobile']}})

companies_flt = pd.DataFrame(companies_flt)
companies_flt.shape

(749, 42)

Creamos el geopoint necesario para cada empresa extrayendo una longitud y una latitud. Excluimos los NaN.

In [5]:
import numpy as np

def geopoint(x): 
    for row_companies in x:
        
        latitude = row_companies["latitude"]
        longitude = row_companies["longitude"]
        if latitude is None:
            return np.NaN
        return {"type": "Point", "coordinates": [longitude,latitude]}


companies_flt['geo']=companies_flt['offices'].apply(geopoint)
companies_flt.dropna(subset=['geo'], inplace = True)

In [6]:
comp_cl = companies_flt[["acquisition","category_code","created_at","description",
"founded_year","name","number_of_employees","offices","overview","products","total_money_raised", "geo"]]

comp_cl.to_json(r'dataset_def.json', orient='records')


In [7]:
companies_flt['total_money_raised'].value_counts()

$1M       16
$4M       14
$5M       14
$2M       12
$200k      9
          ..
$4.95M     1
$3.11M     1
$890k      1
$990k      1
$11.8M     1
Name: total_money_raised, Length: 319, dtype: int64

Cargamos la base de datos a la que previamente hemos creado un geoíndice en MongoDB. 

In [8]:
companies_2 = db.companies_def.find()
dc2 = pd.DataFrame(companies_2)
dc2.shape


(565, 13)

Estudiamos las ciudades más interesantes para crear nuestra empresa

In [9]:
dc2["cities"] = [element[0]["city"] for element in dc2["offices"]]
dc2["cities"].value_counts()

San Francisco    89
New York         41
Palo Alto        19
Austin           15
London           14
                 ..
Encino            1
Pittsburgh,       1
IDYLLWILD         1
La Jolla          1
Tulsa             1
Name: cities, Length: 207, dtype: int64

In [10]:
#integrando la fn geonear

def geonear(geopoint, maxdistance=1000):
    return db.companies_def.find({
        "geo":{
            "$near":{
                "$geometry":geopoint,
                "$maxDistance":maxdistance
            }}})



In [11]:
# Con el near, quiero pasarle un geopoint y me devuelve una lista de empresas 
#elijo San Francisco porque tiene una potente empresa de diseño en la ciudad

sanfrancisco = {'type': 'Point', 'coordinates': [-122.431297, 37.773972]}
sanfranciscogp = pd.DataFrame(geonear(sanfrancisco, 10000))
sanfranciscogp



Unnamed: 0,_id,acquisition,category_code,created_at,description,founded_year,name,number_of_employees,offices,overview,products,total_money_raised,geo
0,5d862a24752696fefdd05463,,software,Mon Sep 08 22:16:49 UTC 2008,Screen Writing Software Company,2007,Scripped,4.0,"[{'description': None, 'address1': None, 'addr...",<p>Scripped provides web-based screenwriting s...,[],$700k,"{'type': 'Point', 'coordinates': [-122.4194155..."
1,5d862a24752696fefdd05483,,software,Mon Sep 08 22:16:49 UTC 2008,Screen Writing Software Company,2007,Scripped,4.0,"[{'description': None, 'address1': None, 'addr...",<p>Scripped provides web-based screenwriting s...,[],$700k,"{'type': 'Point', 'coordinates': [-122.4194155..."
2,5d862a24752696fefdd0533a,"{'price_amount': None, 'price_currency_code': ...",web,Mon Oct 08 10:17:18 UTC 2007,Social software applications,2007,Seesmic,13.0,"[{'description': '', 'address1': '1550 Bryant ...",<p>Seesmic is a powerful suite of social media...,"[{'name': 'Seesmic', 'permalink': 'seesmic'}, ...",$16M,"{'type': 'Point', 'coordinates': [-122.419204,..."
3,5d862a24752696fefdd05441,,web,Wed Jul 30 21:45:37 UTC 2008,Rich Media Internet Communications,2008,Zorap,2.0,"[{'description': '', 'address1': '', 'address2...",<p>Zorap enables real-time participatory event...,"[{'name': 'Zorap.com', 'permalink': 'zorap-web...",$2.25M,"{'type': 'Point', 'coordinates': [-122.419204,..."
4,5d862a24752696fefdd054d3,,web,Tue Dec 30 22:18:00 UTC 2008,Lifestyle Brand Site Hosting,2007,Fuego Nation,,"[{'description': '', 'address1': '', 'address2...",<p>Fuego Nation was founded to build a social ...,[],$1.5M,"{'type': 'Point', 'coordinates': [-122.419204,..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,5d862a24752696fefdd05458,"{'price_amount': None, 'price_currency_code': ...",web,Wed Aug 27 06:32:49 UTC 2008,Rating and review site,2007,GoodGuide,10.0,"[{'description': '', 'address1': '', 'address2...",<p>GoodGuide.com provides free and easy access...,[],$14.2M,"{'type': 'Point', 'coordinates': [-122.4000032..."
75,5d862a24752696fefdd0547c,"{'price_amount': None, 'price_currency_code': ...",web,Wed Aug 27 06:32:49 UTC 2008,Rating and review site,2007,GoodGuide,10.0,"[{'description': '', 'address1': '', 'address2...",<p>GoodGuide.com provides free and easy access...,[],$14.2M,"{'type': 'Point', 'coordinates': [-122.4000032..."
76,5d862a24752696fefdd05525,,web,Sun May 03 22:10:19 UTC 2009,,2008,Kidlandia,,"[{'description': '', 'address1': '1360 Montgom...",<p>Kidlandia is a new web destination where pa...,[],$3.23M,"{'type': 'Point', 'coordinates': [-122.4043215..."
77,5d862a24752696fefdd053c1,,software,Thu Mar 13 21:01:28 UTC 2008,software company,2007,Elastra,,"[{'description': 'HQ', 'address1': '160 Pacifi...","<p><a href=""http://www.elastra.com"" title=""Ela...","[{'name': 'Cloud Server', 'permalink': 'cloud-...",$14.6M,"{'type': 'Point', 'coordinates': [-122.391843,..."


Recurro a la API de Google para cargar las cafeterías de la ciudad elegida con cuidado de no incluir la key de la API en mi proyecto y hacerla visible a cualquier persona. 

In [12]:
from pymongo import MongoClient
import pandas as pd
import numpy as np
import requests
import os
#from dotenv import load_dotenv

In [13]:
from dotenv import load_dotenv
load_dotenv()
key = os.getenv('key')
response = requests.get(key)
sb = response.json()

In [14]:
def starbucks(lugares):
    names=[]
    lats=[]
    long=[]
    for place in lugares["results"]:
        names.append(place["name"])
        lats.append(place["geometry"]["location"]["lat"])
        long.append(place["geometry"]["location"]["lng"])
    dictstarbucks = {"name":names, "latitude":lats, "longitude":long}
    return pd.DataFrame(dictstarbucks)
datastarbucks = starbucks(sb)
datastarbucks.head()

Unnamed: 0,name,latitude,longitude
0,Starbucks,37.778262,-122.415038
1,Starbucks,37.806565,-122.420425
2,Starbucks,37.797007,-122.398107
3,Starbucks,37.786146,-122.409094
4,Starbucks,37.784124,-122.407653


Trato de encontrar un punto idóneo desde donde cargará el mapa creando una coordenada producto de las coordenadas medias de todas las cafeterías

In [15]:
dl = datastarbucks['latitude'].mean()
dlon =  datastarbucks['longitude'].mean()
location_point= dl, dlon
location_point

(37.785889055000005, -122.41686530499999)

Creo una columna a la que nombro 'coordenadas' para tener la latitud y la longitud en un formato más accesible

In [16]:
def latcomp(latitude):
    return latitude.get('coordinates')
    
    
sanfranciscogp['coordenadas']=sanfranciscogp['geo'].apply(latcomp)
sanfranciscogp

Unnamed: 0,_id,acquisition,category_code,created_at,description,founded_year,name,number_of_employees,offices,overview,products,total_money_raised,geo,coordenadas
0,5d862a24752696fefdd05463,,software,Mon Sep 08 22:16:49 UTC 2008,Screen Writing Software Company,2007,Scripped,4.0,"[{'description': None, 'address1': None, 'addr...",<p>Scripped provides web-based screenwriting s...,[],$700k,"{'type': 'Point', 'coordinates': [-122.4194155...","[-122.4194155, 37.7749295]"
1,5d862a24752696fefdd05483,,software,Mon Sep 08 22:16:49 UTC 2008,Screen Writing Software Company,2007,Scripped,4.0,"[{'description': None, 'address1': None, 'addr...",<p>Scripped provides web-based screenwriting s...,[],$700k,"{'type': 'Point', 'coordinates': [-122.4194155...","[-122.4194155, 37.7749295]"
2,5d862a24752696fefdd0533a,"{'price_amount': None, 'price_currency_code': ...",web,Mon Oct 08 10:17:18 UTC 2007,Social software applications,2007,Seesmic,13.0,"[{'description': '', 'address1': '1550 Bryant ...",<p>Seesmic is a powerful suite of social media...,"[{'name': 'Seesmic', 'permalink': 'seesmic'}, ...",$16M,"{'type': 'Point', 'coordinates': [-122.419204,...","[-122.419204, 37.775196]"
3,5d862a24752696fefdd05441,,web,Wed Jul 30 21:45:37 UTC 2008,Rich Media Internet Communications,2008,Zorap,2.0,"[{'description': '', 'address1': '', 'address2...",<p>Zorap enables real-time participatory event...,"[{'name': 'Zorap.com', 'permalink': 'zorap-web...",$2.25M,"{'type': 'Point', 'coordinates': [-122.419204,...","[-122.419204, 37.775196]"
4,5d862a24752696fefdd054d3,,web,Tue Dec 30 22:18:00 UTC 2008,Lifestyle Brand Site Hosting,2007,Fuego Nation,,"[{'description': '', 'address1': '', 'address2...",<p>Fuego Nation was founded to build a social ...,[],$1.5M,"{'type': 'Point', 'coordinates': [-122.419204,...","[-122.419204, 37.775196]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,5d862a24752696fefdd05458,"{'price_amount': None, 'price_currency_code': ...",web,Wed Aug 27 06:32:49 UTC 2008,Rating and review site,2007,GoodGuide,10.0,"[{'description': '', 'address1': '', 'address2...",<p>GoodGuide.com provides free and easy access...,[],$14.2M,"{'type': 'Point', 'coordinates': [-122.4000032...","[-122.4000032, 37.7983181]"
75,5d862a24752696fefdd0547c,"{'price_amount': None, 'price_currency_code': ...",web,Wed Aug 27 06:32:49 UTC 2008,Rating and review site,2007,GoodGuide,10.0,"[{'description': '', 'address1': '', 'address2...",<p>GoodGuide.com provides free and easy access...,[],$14.2M,"{'type': 'Point', 'coordinates': [-122.4000032...","[-122.4000032, 37.7983181]"
76,5d862a24752696fefdd05525,,web,Sun May 03 22:10:19 UTC 2009,,2008,Kidlandia,,"[{'description': '', 'address1': '1360 Montgom...",<p>Kidlandia is a new web destination where pa...,[],$3.23M,"{'type': 'Point', 'coordinates': [-122.4043215...","[-122.4043215, 37.8017938]"
77,5d862a24752696fefdd053c1,,software,Thu Mar 13 21:01:28 UTC 2008,software company,2007,Elastra,,"[{'description': 'HQ', 'address1': '160 Pacifi...","<p><a href=""http://www.elastra.com"" title=""Ela...","[{'name': 'Cloud Server', 'permalink': 'cloud-...",$14.6M,"{'type': 'Point', 'coordinates': [-122.391843,...","[-122.391843, 37.791137]"


In [17]:
def latitudcol(latcol):
    return latcol[0]
sanfranciscogp['lat_col']=sanfranciscogp['coordenadas'].apply(latitudcol)

def longcol(lcol):
    return lcol[1]
sanfranciscogp['long_col']=sanfranciscogp['coordenadas'].apply(longcol)

sanfranciscogp

Unnamed: 0,_id,acquisition,category_code,created_at,description,founded_year,name,number_of_employees,offices,overview,products,total_money_raised,geo,coordenadas,lat_col,long_col
0,5d862a24752696fefdd05463,,software,Mon Sep 08 22:16:49 UTC 2008,Screen Writing Software Company,2007,Scripped,4.0,"[{'description': None, 'address1': None, 'addr...",<p>Scripped provides web-based screenwriting s...,[],$700k,"{'type': 'Point', 'coordinates': [-122.4194155...","[-122.4194155, 37.7749295]",-122.419415,37.774929
1,5d862a24752696fefdd05483,,software,Mon Sep 08 22:16:49 UTC 2008,Screen Writing Software Company,2007,Scripped,4.0,"[{'description': None, 'address1': None, 'addr...",<p>Scripped provides web-based screenwriting s...,[],$700k,"{'type': 'Point', 'coordinates': [-122.4194155...","[-122.4194155, 37.7749295]",-122.419415,37.774929
2,5d862a24752696fefdd0533a,"{'price_amount': None, 'price_currency_code': ...",web,Mon Oct 08 10:17:18 UTC 2007,Social software applications,2007,Seesmic,13.0,"[{'description': '', 'address1': '1550 Bryant ...",<p>Seesmic is a powerful suite of social media...,"[{'name': 'Seesmic', 'permalink': 'seesmic'}, ...",$16M,"{'type': 'Point', 'coordinates': [-122.419204,...","[-122.419204, 37.775196]",-122.419204,37.775196
3,5d862a24752696fefdd05441,,web,Wed Jul 30 21:45:37 UTC 2008,Rich Media Internet Communications,2008,Zorap,2.0,"[{'description': '', 'address1': '', 'address2...",<p>Zorap enables real-time participatory event...,"[{'name': 'Zorap.com', 'permalink': 'zorap-web...",$2.25M,"{'type': 'Point', 'coordinates': [-122.419204,...","[-122.419204, 37.775196]",-122.419204,37.775196
4,5d862a24752696fefdd054d3,,web,Tue Dec 30 22:18:00 UTC 2008,Lifestyle Brand Site Hosting,2007,Fuego Nation,,"[{'description': '', 'address1': '', 'address2...",<p>Fuego Nation was founded to build a social ...,[],$1.5M,"{'type': 'Point', 'coordinates': [-122.419204,...","[-122.419204, 37.775196]",-122.419204,37.775196
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,5d862a24752696fefdd05458,"{'price_amount': None, 'price_currency_code': ...",web,Wed Aug 27 06:32:49 UTC 2008,Rating and review site,2007,GoodGuide,10.0,"[{'description': '', 'address1': '', 'address2...",<p>GoodGuide.com provides free and easy access...,[],$14.2M,"{'type': 'Point', 'coordinates': [-122.4000032...","[-122.4000032, 37.7983181]",-122.400003,37.798318
75,5d862a24752696fefdd0547c,"{'price_amount': None, 'price_currency_code': ...",web,Wed Aug 27 06:32:49 UTC 2008,Rating and review site,2007,GoodGuide,10.0,"[{'description': '', 'address1': '', 'address2...",<p>GoodGuide.com provides free and easy access...,[],$14.2M,"{'type': 'Point', 'coordinates': [-122.4000032...","[-122.4000032, 37.7983181]",-122.400003,37.798318
76,5d862a24752696fefdd05525,,web,Sun May 03 22:10:19 UTC 2009,,2008,Kidlandia,,"[{'description': '', 'address1': '1360 Montgom...",<p>Kidlandia is a new web destination where pa...,[],$3.23M,"{'type': 'Point', 'coordinates': [-122.4043215...","[-122.4043215, 37.8017938]",-122.404321,37.801794
77,5d862a24752696fefdd053c1,,software,Thu Mar 13 21:01:28 UTC 2008,software company,2007,Elastra,,"[{'description': 'HQ', 'address1': '160 Pacifi...","<p><a href=""http://www.elastra.com"" title=""Ela...","[{'name': 'Cloud Server', 'permalink': 'cloud-...",$14.6M,"{'type': 'Point', 'coordinates': [-122.391843,...","[-122.391843, 37.791137]",-122.391843,37.791137


Pinto los datos extraídos hasta la fecha en el mapa utilizando Folium

In [22]:
import folium
import pandas as pd
from folium.plugins import HeatMap
import requests
from folium.plugins import MeasureControl


map_starbucks = folium.Map(location=location_point, width=750, height=500, zoom_start=15)

for index, row in datastarbucks.iterrows():
    folium.CircleMarker([row['latitude'], row['longitude']],
                        radius=9,
                        popup="City: {}, latitude {}, longitude {}".format(row['name'], row['latitude'], row['longitude']),
                        fill_color="#F35C50",
                       ).add_to(map_starbucks)

latitude_companies = [i[0] for i in sanfranciscogp.coordenadas]
longitude_companies = [i[1] for i in sanfranciscogp.coordenadas]
    
for coordinate in list(zip(longitude_companies,latitude_companies)):
    
    folium.Marker(coordinate,
                        radius=5,
                        icon=folium.Icon(icon='cloud'),
                        fill_color="#F45C1", 
                       ).add_to(map_starbucks)

    '''                   
    folium.Circle(location=[40.42, -3.7],
                    radius=100
                   ).add_to(map_starbucks)
    map_starbucks.add_child(MeasureControl()) 
 '''

    #map_starbucks.save('./map_starbucks.html') # Podemos guardar nuestro mapa como un html
    folium.Marker(location=[37.789349 , -122.403774], icon=folium.Icon(color='lightgray', icon='home', prefix='fa')).add_to(map_starbucks)
map_starbucks

The chosen point (represented with a grey house) is (-122.403774, 37.789349) near the companies of the sector (blue icons with cloud) with more than 1M of turnover and less than 10 years of experience, near a design company and next to an area of coffee shops Starbucks (represented by blue circles filled with green).