# Step 1: finding candidates

### Our client has a series of requirements that we have summarize as follows:

- It is a web company that wants to be near similar companies
- 30% of the company staff have at least 1 child.
- Executives like Starbucks A LOT. Ensure there's a starbucks not too far.
- Everyone in the company is between 25 and 40, give them some place to go party.
- The CEO is vegan.
#### For the first request, we look for web-sector companies in the Crunchbase(R) database:
(we will further evaluate the other four requirements)

In [1]:
from pymongo import MongoClient
import pandas as pd

In [2]:
from src import api_functions as af

In [3]:
client = MongoClient("localhost:27017")

In [4]:
db = client.get_database("IronHack")

In [5]:
c = db.get_collection("companies")

In [6]:
proj = {"_id":0, "name":1, "category_code":1, "offices":1}

In [7]:
compa = list(c.find({"category_code": "web"},proj))

In [8]:
df=pd.DataFrame(compa)

In [9]:
df.shape

(3787, 3)

In [10]:
df["city"]=df.apply(lambda fila : fila.offices[0]["city"] if len(fila.offices)>0 else "nulo" , axis=1)

In [11]:
df["zip_code"]=df.apply(lambda fila : fila.offices[0]["zip_code"] if len(fila.offices)>0 else "nulo" , axis=1)

### and we get the 10 most repeated zip codes within those companies:

In [12]:
top = df.zip_code.value_counts().head(12)

(we skip the "nulo" and blank spaces)

In [13]:
zips=list(top.index[2:])

In [14]:
zips

['94107',
 '94301',
 '94111',
 '10003',
 '94103',
 '94105',
 '10016',
 '10011',
 '94041',
 '98104']

### Then, we get the coords for those zip codes from the US postal service API:

(we use the "get coord from zip" function from our API_functions file)

In [15]:
coords = [af.get_coord_from_zip(elem)  for elem in zips]

In [16]:
df = pd.DataFrame(coords)
df.drop("state_fullname", axis=1,inplace=True)
df

Unnamed: 0,city,state,latitude,longitude
0,San Francisco,CA,37.76785,-122.392861
1,Palo Alto,CA,37.44296,-122.151198
2,San Francisco,CA,37.798853,-122.398599
3,New York,NY,40.731392,-73.9884
4,San Francisco,CA,37.775504,-122.41292
5,San Francisco,CA,37.788543,-122.393872
6,New York,NY,40.744594,-73.978088
7,New York,NY,40.74406,-74.004592
8,Mountain View,CA,37.388022,-122.07431
9,Seattle,WA,47.602134,-122.328431


### and we export this data to a csv:

In [17]:
df.to_csv("DATA/candidates.csv")

## Bonus: showing candidates in a map:

In [18]:
df = pd.read_csv("DATA/candidates.csv")          # por si el limite diario de la API da problemas
df.drop("Unnamed: 0", axis=1,inplace=True)

In [19]:
import folium

In [20]:
mymap = folium.Map(location=[38.75,-98.79], zoom_start=4, tiles="cartodbpositron")

In [21]:
for i in range(10):
    mark = folium.Marker(location=[df.iloc[i].latitude, df.iloc[i].longitude], tooltip = f"{zips[i]}")
    mark.add_to(mymap)

In [24]:
mymap

### and saving it:

In [23]:
mymap.save("DATA/candidates_map")