# Office location selection

First, lets run our python script that defines the functions we will work with:

In [51]:
%run -i 'python_scripts/mongo_connection.py'

Now lets establish the connection to our Mongo Database. For this we will use a function specified in the mongo_connection.py script.

In [52]:
c = connect_mongo("ironhack","companies")
c

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'ironhack'), 'companies')

Our connection was succesful! Now lets query for tech startups and design companies. Again we will use functions defined for these tasks in the mongo_connection.py script since the queries are quite long.

In [53]:
tech = get_tech_startups(1000000,2008)

Query returned  274  companies.


In [54]:
design = get_design_companies()

Query returned  965  companies.


Now that we have our company data, lets process it to eliminate those without coordinates and transform these lists into DataFrames.

In [55]:
%run -i 'python_scripts/data_process.py'


In [56]:
tech_data = get_offices_location(tech)

Received data for  333  companies. 
 202  companies with full location information left.


In [57]:
design_data = get_offices_location(design)

Received data for  1094  companies. 
 708  companies with full location information left.


Now that we have data for all tech and design companies that fit our criteria, lets make a heatmap of their location to see which cities have more of both.

In [58]:
%run -i 'python_scripts/mapping.py'
map_heatmap(tech_data, design_data)

Browsing this heatmap, it looks like our offices should be either in San francisco, Los Angeles, New york, Miami, Paris or Manchester.

Lets get GeoJsons for these cities. Rather than looking for defined poligons with the bordes of the city, I created GeoJson files of cirles encompassing these cities. I made them wide enough so that it covered nearby towns so that they would be counted as nearby.

In [59]:
city_names = ["san_francisco", "los_angeles", "new_york", "miami", "paris", "manchester"]
cities = load_cities(city_names)
cities

{'San_Francisco': <POLYGON ((-122.241 38.23, -122.312 38.227, -122.383 38.219, -122.452 38.205...>,
 'Los_Angeles': <POLYGON ((-118.113 34.81, -118.207 34.807, -118.3 34.795, -118.392 34.776, ...>,
 'New_York': <POLYGON ((-73.977 41.103, -74.021 41.102, -74.064 41.097, -74.107 41.089, -...>,
 'Miami': <POLYGON ((-80.271 26.121, -80.305 26.12, -80.338 26.115, -80.371 26.108, -8...>,
 'Paris': <POLYGON ((2.354 49.066, 2.322 49.065, 2.292 49.062, 2.261 49.057, 2.232 49....>,
 'Manchester': <POLYGON ((-2.244 53.593, -2.263 53.593, -2.282 53.591, -2.3 53.589, -2.318 ...>}

Now that we have our GeoJsons loaded, lets add the city name, based on these GeoJsons, to our DataFrames.

This stepp will check for every office in our Data Frame, if that location is in one of the cities defined by the GeoJsons, it will add that city name to our DataFrame.

In [61]:
tech_data = add_city_name(tech_data, cities)
# We will also be adding a "tech" identifier in a new column called "type". This will be useful in a future step, where we will consolidate all of our data in a single DataFrame.
tech_data["type"] = "tech"
tech_data.head()

Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,Movirtu,Headquarter,51.549971,-0.1816,,tech
2,Movirtu,India Office,28.58212,77.326699,,tech
4,GitHub,,37.775196,-122.419204,San_Francisco,tech
5,Gridstore,Gridstore,37.418907,-122.088429,San_Francisco,tech
7,WEEZEVENT,French Office,48.895271,2.447633,Paris,tech


In [62]:
# Now we will do the same thing for our desgin companies
design_data = add_city_name(design_data, cities)
design_data["type"] = "design"
design_data.head()

Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,Technorati,,37.779558,-122.393041,San_Francisco,design
1,AddThis,HQ - Virginia,38.926172,-77.245195,,design
2,AddThis,New York Office,40.724604,-73.996876,New_York,design
3,AddThis,Los Angeles Office,34.026302,-118.380954,Los_Angeles,design
6,AddThis,Michigan Office,42.557958,-83.167884,,design
