I like cafes a bunch. They often feature a nice combination of ambient noice and coffee that's condusive to good work. I have, however, fallen into a bit of a rut. I visit the same two or three places. I know there's more out there but without having an explicit recommendation or walking past them day-in, day-out it's difficult to find them.

To get around this blind spot I thought it would be a good idea to leverage the data available on the Chicago Open Data Portal. I wanted to see whether I could build groups of unvisited coffee places to explore in a afternoon so that I can find a few more work spaces for evenings and weekends.

In [1]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points
from sklearn.neighbors import NearestNeighbors

neighborhoods = gpd.read_file('data/chicago/v3 08192017/ChicagoNeighborhoods.shp')
transit = gpd.read_file('data/chicago/v3 08192017/transit_upd081917.shp')
coffee = gpd.read_file('data/chicago/v3 08192017/coffee_upd081917.shp')

I want to first establish a couple of relationships. Because time and ease of acces is a prime factor for me, I want to consider the istance betwen the following four points. 

- home to coffee and work to coffee -> these would be helpful in sorting out the clusters tht would be easier to access after work vs. on a weekend. 
- home to cta (train) vs work to cta (train) -> distance to a given stop and have a similar function to home/work to coffee metric
- coffee to cta (train) ->  distance from coffee to train. 



In [2]:
### setting up variables and prelim processing
home = Point(-87.659349,41.9880054)
work = Point(-87.6438878,41.884193)

# set up functions
def getNearestPoint(pt,searchPts):
    # get nearest point relative to a given point
    pt = pt['geometry']
    nearest = nearest_points(pt,searchPts.geometry.unary_union)[1]
    nearest = gpd.GeoDataFrame(searchPts[searchPts.geometry == nearest])
    y = nearest.geometry.distance(pt)
    if y.iloc[0] == 0.0:
        searchPts = searchPts.loc[[i for i in searchPts.index if i != nearest.index[0] ],:]
        nearest = nearest_points(pt,searchPts.geometry.unary_union)[1]
        nearest = gpd.GeoDataFrame(searchPts[searchPts.geometry == nearest])
        y = nearest.geometry.distance(pt)
    return float(y.iloc[0])

Process data, building relationships as mapped above. 

In [3]:
# clean and the business data
coffee.dropna(subset=['LONGITUDE'],inplace=True)
coffee = coffee.drop_duplicates(subset=['LONGITUDE','LATITUDE'])
coffee = coffee[coffee['DOING BUSI'].str.contains('STARBUCK') == False]

# get distance from home and work
coffee['homeDist'] = coffee['geometry'].apply(lambda x: home.distance(x))
coffee['workDist'] = coffee['geometry'].apply(lambda x: work.distance(x))

# get distance from nearest other coffee place and train CTA train stop
coffee['nearestCoffeeDist'] = coffee.apply(lambda x: getNearestPoint(x,coffee),axis=1)
coffee['nearestTransitDist'] = coffee.apply(lambda x: getNearestPoint(x,transit),axis=1)

# get designated neighborhood and reduce columns
neighborhoods['area'] = neighborhoods['geometry'].area
neighborhoods.rename(columns={'pri_neigh':'neighborhood'},inplace=True)

joinedData = gpd.tools.sjoin(coffee,neighborhoods,op='within',how='inner')
joinedData = joinedData[['neighborhood','DOING BUSI', 
            'ADDRESS','CITY', 'STATE', 'ZIP CODE',
            'LONGITUDE','LATITUDE',
            'nearestTransitDist','workDist','homeDist','nearestCoffeeDist','geometry'
            ]].reset_index(drop=True)
joinedData.reset_index(inplace=True)

At this point I saved the data as it is processed now and manually categorized locations as visited or not visited. After that, I reloaded the data.

In [4]:
NSBData_processed = pd.read_excel('NSB_preprocessed_upd082317.xlsx')
NSBData_processed = NSBData_processed[NSBData_processed['visitInd']==False]
NSBData_processed.head()

Unnamed: 0,neighborhood,visitInd,DOING BUSI,ADDRESS,CITY,STATE,ZIP CODE,LONGITUDE,LATITUDE,nearestTransitDist,workDist,homeDist,nearestCoffeeDist,medianPrice
0,Little Village,False,ACE COFFEE BAR,2650 S CALIFORNIA AVE 2ND,CHICAGO,IL,60608,-87.695309,41.842996,0.011126,0.065889,0.149402,0.024981,935.0
1,Grand Boulevard,False,ACE COFFEE BAR INC.,5001 S MICHIGAN AVE 1,CHICAGO,IL,60615,-87.622545,41.803731,0.004377,0.083244,0.187914,0.025786,1299.5
2,West Loop,False,"ACE COFFEE BAR, INC.",120 N SANGAMON ST,CHICAGO,IL,60607,-87.651067,41.883715,0.002184,0.007195,0.104619,0.00607,3034.0
3,Archer Heights,False,"ACE COFFEE BAR, INC.",3642 W 47TH ST,CHICAGO,IL,60632,-87.715395,41.808085,0.011644,0.10443,0.188447,0.029281,882.5
4,"Little Italy, UIC",False,Ace ICRE Roosevelt,1950 W ROOSEVELT RD BASEMENT,CHICAGO,IL,60608,-87.675813,41.866871,0.007835,0.036322,0.122248,0.004607,1800.0


Split data into whether it is closer to home or work.

In [5]:
# split groups into places that are closer to home, or work
NSBData_processed['closerTo'] = NSBData_processed.apply(lambda x: 'home' if x.workDist > x.homeDist else 'work',axis=1)
closerToHome = NSBData_processed[NSBData_processed['closerTo']=='home']
closerToWork = NSBData_processed[NSBData_processed['closerTo']=='work']

closerToHomePoints = closerToHome[['LONGITUDE','LATITUDE']]
closerToWorkPoints = closerToWork[['LONGITUDE','LATITUDE']]

Get clusters of three locations not visited for 1) places closer to work, and 2) places closer to home.

In [6]:
neighborsWork = NearestNeighbors(n_neighbors=3).fit(closerToWorkPoints)
distancesWork,clustersWork = neighborsWork.kneighbors(closerToWorkPoints)
closerToWork.iloc[clustersWork[0]]

Unnamed: 0,neighborhood,visitInd,DOING BUSI,ADDRESS,CITY,STATE,ZIP CODE,LONGITUDE,LATITUDE,nearestTransitDist,workDist,homeDist,nearestCoffeeDist,medianPrice,closerTo
0,Little Village,False,ACE COFFEE BAR,2650 S CALIFORNIA AVE 2ND,CHICAGO,IL,60608,-87.695309,41.842996,0.011126,0.065889,0.149402,0.024981,935.0,work
68,"Little Italy, UIC",False,HOPE COFFEEHOUSE,2431 W ROOSEVELT RD 1ST,CHICAGO,IL,60608,-87.68683,41.866494,0.009127,0.046446,0.12458,0.011023,1625.0,work
99,Lower West Side,False,NITECAP COFFEE BAR LLC,1738 W 18TH ST 1 1,CHICAGO,IL,60608,-87.670284,41.857846,0.001138,0.037295,0.130618,0.002397,1872.5,work


In [7]:
neighborsHome = NearestNeighbors(n_neighbors=3).fit(closerToHomePoints)
distancesHome,clustersHome = neighborsHome.kneighbors(closerToHomePoints)
closerToHome.iloc[clustersHome[0]]

Unnamed: 0,neighborhood,visitInd,DOING BUSI,ADDRESS,CITY,STATE,ZIP CODE,LONGITUDE,LATITUDE,nearestTransitDist,workDist,homeDist,nearestCoffeeDist,medianPrice,closerTo
6,West Ridge,False,ADRIANA COFFEE SHOP,6345 N CALIFORNIA AVE 1ST,CHICAGO,IL,60659,-87.699476,41.997086,0.031115,0.125836,0.041141,0.013344,1250.0,home
34,West Ridge,False,CAFE ZIPO,5645 N LINCOLN AVE 1ST,CHICAGO,IL,60659,-87.696492,41.98408,0.018123,0.112892,0.03735,0.008693,1272.5,home
13,North Park,False,BABA'S COFFEE,5544-5546 N KEDZIE AVE 1,CHICAGO,IL,60625,-87.708996,41.982582,0.015234,0.117981,0.049942,0.011598,1450.0,home


Next up, start visiting some of these places.  
To be continued...