# Data selection 

### As a data engineer you have asked all the employees to show their preferences on where to place the new office. Your goal is to place the new company offices in the best place for the company to grow. You have to find a place that more or less covers all the following requirements (note that it's impossible to cover all requirements, so you have to prioritize at your glance):

#### - Designers like to go to design talks and share knowledge. There must be some nearby companies that also do design.
#### - 30% of the company staff have at least 1 child.
#### - Developers like to be near successful tech startups that have raised at least 1 Million dollars.
#### - Executives like Starbucks A LOT. Ensure there's a starbucks not too far.
#### - Account managers need to travel a lot.
#### - Everyone in the company is between 25 and 40, give them some place to go party.
#### - The CEO is vegan.
#### - If you want to make the maintenance guy happy, a basketball stadium must be around 10 Km.
#### - The office dog—"Dobby" needs a hairdresser every month. Ensure there's one not too far away.

In [1]:
%run -i 'src/mongo.py'


In [2]:
c = mongo("ironhack","companies")
c

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'ironhack'), 'companies')

In [3]:
tech = startups(1000000,2009)


88  companies.


In [4]:
design1 = design()



37  companies.


In [5]:
%run -i 'src/locations.py'


In [6]:
tech_location = offices_location(tech)


117  companies. 
 58  full location information.


In [7]:
design_location = offices_location(design1)


45  companies. 
 29  full location information.


In [8]:
df = pd.concat([design_location, tech_location], axis = 0)
df_unique = df.drop_duplicates(subset = ["name"])
df_unique


Unnamed: 0,name,office description,office latitude,office longitude
0,SmugMug,,37.390056,-122.067692
1,Clipmarks,,40.757929,-73.985506
3,BeFunky,,37.774929,-122.419415
6,Youku,,31.200657,121.438470
8,Gilt Groupe,New York Office,40.747270,-73.980064
...,...,...,...,...
104,Z2Live,Z2Live,47.610301,-122.339978
107,Gridstore,Gridstore,37.418907,-122.088429
108,HyperWeek,,46.212146,6.150324
109,Meez,,37.785271,-122.397582


In [9]:
%run -i 'src/maps.py'


In [10]:
heat_map = create_heatmap(df_unique)
heat_map

In [11]:
city_names = ["San_Francisco", "Los_Angeles", "New_York"]
cities = cities_loc(city_names)
cities

{'San_Francisco': <POLYGON ((-122.241 38.23, -122.312 38.227, -122.383 38.219, -122.452 38.205...>,
 'Los_Angeles': <POLYGON ((-118.113 34.81, -118.207 34.807, -118.3 34.795, -118.392 34.776, ...>,
 'New_York': <POLYGON ((-73.977 41.103, -74.021 41.102, -74.064 41.097, -74.107 41.089, -...>}

In [12]:
tech_data_clean = add_city_name(tech_location, cities)
tech_data_clean["type"] = "tech"
tech_data_clean

Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,Sofa Labs,,37.564605,-122.322924,San_Francisco,tech
2,Brandsclub,,-23.548943,-46.638818,,tech
3,Clovis Oncology,HQ,40.026,-105.259041,,tech
4,GameChanger Media,New York Office,40.707834,-74.013661,New_York,tech
5,Moblica,HQ,32.0554,34.7595,,tech
6,ticketea,Office,40.445515,-3.706176,,tech
7,Althea Systems,HQ,12.93496,77.613685,,tech
8,Ykone,Ykone Headquarters,48.856667,2.350987,,tech
12,travelmob,HQ,21.303049,-157.78907,,tech
13,Skydeck,,37.564538,-122.32547,San_Francisco,tech


In [13]:
design_data_clean = add_city_name(design_location, cities)
design_data_clean["type"] = "design"
design_data_clean

Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,SmugMug,,37.390056,-122.067692,San_Francisco,design
1,Clipmarks,,40.757929,-73.985506,New_York,design
3,BeFunky,,37.774929,-122.419415,San_Francisco,design
6,Youku,,31.200657,121.43847,,design
8,Gilt Groupe,New York Office,40.74727,-73.980064,New_York,design
9,Smilebox,,47.676378,-122.122155,,design
10,Howcast,New York City,40.646166,-73.889492,New_York,design
11,99designs,United States (HQ),37.795531,-122.400598,San_Francisco,design
12,99designs,Australia,-37.802659,144.986855,,design
13,99designs,Europe,52.49862,13.446903,,design


In [14]:
cities_counts_tech = tech_data_clean['city'].value_counts()
cities_counts_tech

city
San_Francisco    13
New_York          7
Los_Angeles       4
Name: count, dtype: int64

In [15]:
cities_counts_design = design_data_clean['city'].value_counts()
cities_counts_design

city
New_York         6
San_Francisco    5
Los_Angeles      2
Name: count, dtype: int64

In [16]:
final_data_tech = tech_data_clean[tech_data_clean['city'].isin(["San_Francisco", "Los_Angeles", "New_York"])]
final_data_tech


Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,Sofa Labs,,37.564605,-122.322924,San_Francisco,tech
4,GameChanger Media,New York Office,40.707834,-74.013661,New_York,tech
13,Skydeck,,37.564538,-122.32547,San_Francisco,tech
17,PeekYou,,40.757929,-73.985506,New_York,tech
22,VisualOn,Headquarters,37.270518,-121.955879,San_Francisco,tech
25,ChallengePost,,40.740804,-74.00717,New_York,tech
26,Factery,,37.448491,-122.180281,San_Francisco,tech
39,Magento,,34.052187,-118.243425,Los_Angeles,tech
40,VistaGen Therapeutics,,37.665648,-122.384349,San_Francisco,tech
48,ScaleMP,,37.322973,-122.038579,San_Francisco,tech


In [17]:
final_data_design = design_data_clean[design_data_clean['city'].isin(["San_Francisco", "Los_Angeles", "New_York"])]
final_data_design

Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,SmugMug,,37.390056,-122.067692,San_Francisco,design
1,Clipmarks,,40.757929,-73.985506,New_York,design
3,BeFunky,,37.774929,-122.419415,San_Francisco,design
8,Gilt Groupe,New York Office,40.74727,-73.980064,New_York,design
10,Howcast,New York City,40.646166,-73.889492,New_York,design
11,99designs,United States (HQ),37.795531,-122.400598,San_Francisco,design
14,EatLime,,37.774929,-122.419415,San_Francisco,design
17,Non-Member Films,West Coast Office,33.989029,-118.462421,Los_Angeles,design
20,Stylesight,NY (Headquarters),40.730763,-74.000827,New_York,design
21,Non-Member Films,West Coast Office,33.989029,-118.462421,Los_Angeles,design


In [18]:
final_data = pd.concat([final_data_tech, final_data_design], axis=0)
top_cities = final_data.groupby(['city', 'type'])['name'].agg('count').reset_index()
top_cities

Unnamed: 0,city,type,name
0,Los_Angeles,design,2
1,Los_Angeles,tech,4
2,New_York,design,6
3,New_York,tech,7
4,San_Francisco,design,5
5,San_Francisco,tech,13


In [19]:
final_data2 = final_data[final_data['city'].isin(['Los_Angeles', 'New_York', 'San_Francisco'])]
final_data2

Unnamed: 0,name,office description,office latitude,office longitude,city,type
0,Sofa Labs,,37.564605,-122.322924,San_Francisco,tech
4,GameChanger Media,New York Office,40.707834,-74.013661,New_York,tech
13,Skydeck,,37.564538,-122.32547,San_Francisco,tech
17,PeekYou,,40.757929,-73.985506,New_York,tech
22,VisualOn,Headquarters,37.270518,-121.955879,San_Francisco,tech
25,ChallengePost,,40.740804,-74.00717,New_York,tech
26,Factery,,37.448491,-122.180281,San_Francisco,tech
39,Magento,,34.052187,-118.243425,Los_Angeles,tech
40,VistaGen Therapeutics,,37.665648,-122.384349,San_Francisco,tech
48,ScaleMP,,37.322973,-122.038579,San_Francisco,tech


In [20]:
top3_map = top3_map(final_data2)
top3_map

In [21]:
top3_map.fit_bounds([[37.8,-122.4],[37.5,-122.1]], padding=(5,5))
top3_map
