I like cafes a bunch. They often feature a nice combination of ambient noice and coffee that's condusive to good work. I have, however, fallen into a bit of a rut. I visit the same two or three places. I know there's more out there but without having an explicit recommendation or walking past them day-in, day-out it's difficult to find them.

To get around this blind spot I thought it would be a good idea to leverage the data available on the Chicago Open Data Portal. I wanted to see whether I could build groups of unvisited coffee places to explore in a afternoon so that I can find a few more work spaces for evenings and weekends.

In [1]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points
from sklearn.neighbors import NearestNeighbors

neighborhoods = gpd.read_file('data/chicago/v3 08192017/ChicagoNeighborhoods.shp')
transit = gpd.read_file('data/chicago/v3 08192017/transit_upd081917.shp')
coffee = gpd.read_file('data/chicago/v3 08192017/coffee_upd081917.shp')

Here I set 

In [2]:
### setting up variables and prelim processing
home = Point(-87.659349,41.9880054)
work = Point(-87.6438878,41.884193)

# set up functions
def getNearestPoint(pt,searchPts):
    # get nearest point relative to a given point
    pt = pt['geometry']
    nearest = nearest_points(pt,searchPts.geometry.unary_union)[1]
    nearest = gpd.GeoDataFrame(searchPts[searchPts.geometry == nearest])
    y = nearest.geometry.distance(pt)
    if y.iloc[0] == 0.0:
        searchPts = searchPts.loc[[i for i in searchPts.index if i != nearest.index[0] ],:]
        nearest = nearest_points(pt,searchPts.geometry.unary_union)[1]
        nearest = gpd.GeoDataFrame(searchPts[searchPts.geometry == nearest])
        y = nearest.geometry.distance(pt)
    return float(y.iloc[0])

In [4]:
# clean and the business data
coffee.dropna(subset=['LONGITUDE'],inplace=True)
coffee = coffee.drop_duplicates(subset=['LONGITUDE','LATITUDE'])
coffee = coffee[coffee['DOING BUSI'].str.contains('STARBUCK') == False]

# get distance from home
coffee['homeDist'] = coffee['geometry'].apply(lambda x: home.distance(x))
coffee['workDist'] = coffee['geometry'].apply(lambda x: work.distance(x))

# get nearest other coffee place and train CTA train stop
coffee['nearestCoffeeDist'] = coffee.apply(lambda x: getNearestPoint(x,coffee),axis=1)
coffee['nearestTransitDist'] = coffee.apply(lambda x: getNearestPoint(x,transit),axis=1)

In [5]:
# get designated neighborhood and reduce columns
neighborhoods['area'] = neighborhoods['geometry'].area
neighborhoods.rename(columns={'pri_neigh':'neighborhood'},inplace=True)

joinedData = gpd.tools.sjoin(coffee,neighborhoods,op='within',how='inner')
joinedData = joinedData[['neighborhood','DOING BUSI', 
            'ADDRESS','CITY', 'STATE', 'ZIP CODE',
            'LONGITUDE','LATITUDE',
            'nearestTransitDist','workDist','homeDist','nearestCoffeeDist','geometry'
            ]].reset_index(drop=True)
joinedData.reset_index(inplace=True)

In [6]:
joinedData.head()

Unnamed: 0,index,neighborhood,DOING BUSI,ADDRESS,CITY,STATE,ZIP CODE,LONGITUDE,LATITUDE,nearestTransitDist,workDist,homeDist,nearestCoffeeDist,geometry
0,0,Old Town,"EVA'S COFFEE, INC.",1447 N SEDGWICK ST 1ST,CHICAGO,IL,60610,-87.638375,41.908857,0.001808,0.025272,0.08188,0.014932,POINT (-87.63837516299999 41.908856726)
1,1,West Loop,ARTURO EXPRESS,130 S CANAL ST,CHICAGO,IL,60606,-87.639765,41.879616,0.004256,0.00616,0.110144,0.001574,POINT (-87.639764523 41.879616268)
2,2,West Loop,MEDDLE COFFEE BAR,601 W JACKSON BLVD 1 A,CHICAGO,IL,60661,-87.642609,41.877889,0.002857,0.006433,0.111382,0.003328,POINT (-87.642609175 41.877888529)
3,3,West Loop,PEET'S COFFEE & TEA,222 S RIVERSIDE PLZ 1ST,CHICAGO,IL,60606,-87.638579,41.878582,0.003879,0.007725,0.111378,0.001574,POINT (-87.63857866799999 41.878581561)
4,4,West Loop,GROUNDSWELL COFFEE ROASTERS,1168 W MADISON ST 1ST 2,CHICAGO,IL,60607,-87.656987,41.881729,0.006153,0.013329,0.106303,0.003664,POINT (-87.65698670899999 41.881728772)


At this point I saved the data as it is processed now and manually categorized locations as visited or not visited. After that, I reloaded the data.

In [7]:
# writer = pd.ExcelWriter('./NSB_preprocessed_upd082317.xlsx')
# pd.DataFrame(joinedData.drop('geometry',axis=1)).to_excel(writer,index=False)
# writer.close()

In [None]:
NSBData_processed = pd.read_excel('NSB_preprocessed_upd082317.xlsx')
NSBData_processed = NSBData_processed[NSBData_processed['visitInd']==False]
NSBData_processed.head()

In [None]:
# split groups into places that are closer to home, or work
NSBData_processed['closerTo'] = NSBData_processed.apply(lambda x: 'home' if x.workDist > x.homeDist else 'work',axis=1)
closerToHome = NSBData_processed[NSBData_processed['closerTo']=='home']
closerToWork = NSBData_processed[NSBData_processed['closerTo']=='work']

closerToHomePoints = closerToHome[['LONGITUDE','LATITUDE']]
closerToWorkPoints = closerToWork[['LONGITUDE','LATITUDE']]

In [None]:
neighborsWork = NearestNeighbors(n_neighbors=3).fit(closerToWorkPoints)
distancesWork,clustersWork = neighborsWork.kneighbors(closerToWorkPoints)
closerToWork.iloc[clustersWork[0]]

In [None]:
neighborsHome = NearestNeighbors(n_neighbors=3).fit(closerToHomePoints)
distancesHome,clustersHome = neighborsHome.kneighbors(closerToHomePoints)
closerToHome.iloc[clustersHome[0]]