#### CSCE 670 :: Information Storage and Retrieval :: Texas A&M University :: Spring 2017


# Project: Different Faces of a City


Team members: 
- Tae Jun Jeon
- Donghwa Shin
- Jingyu Hong
- Phakpoom Chinpruttthiwong

Project description here...

## Introduction and Problem Statement

Traditional municipal boundaries set by the government have been used to classify the cities where we live today. Surprisingly, these boundaries have been strong indications of social classes. However, these indications often stay stagnant, and they do not accurately represent the social dynamics that consistently change. To address this problem, we implement a clustering model using the data set from a social network service, Foursquare, to link the geographical locations and its users to better understand a city. Due to the limitation of evaluating clusters, we intend to observe the social activities of a populated city that we are familiar with such as Houston or Austin.



## Related Works

The methodology of using clustering algorithms to understand social dynamics has existed in previous papers. Lynch approached the problem by observing the structure and function of cities [1], and Milgram approached the problem by highlighting the social interactions by labeling them with local characters [2]. These papers were based on data obtained before the expansion of social media. However, Cranshaw et al. made a map illustrating hidden structures of a city with machine learning and data from a social network service, Foursquare [3]. This new methodology allows us to discern how people in a city actually use the city such as activities people do at night. Here is their good example of New York City. Hence, we have focused on this recent paper that uses the prevalent social data with a novel clustering method, so we can capture the dynamics of our local data set. 

## Core Algorithm

To discover clusters in our data set, we implement the variation of spectral clustering method introduced by Ng et al. (2001). Cranshaw et al. improve the previous spectral clustering with the new design of an affinity matrix[3]. We expand the idea by considering the time as a new variable. We suspect that there could be differences in the social pattern when considering types of venues of check-ins from Foursquare. We look at top four common types of venues of check-ins and put them in one time zone. This leads to 4 different time periods: 0-10, 10-18, 19-22,  and 23-0. Thus, the change of social clustering can be observed based on times of check-ins. 

In short, our goal is to compute affinity matrices for 4 different time periods as described below.

Affinity Matrix A = A(i, j) = cosine_sim(i, j) + alpha; if i and j are in the set of n nearest venues (in this work, we use n=20, alpha=0.1). When i and j are not in the nearest venues set, A(i, j) is 0.

i, j are vectors of users check-in frequencies of venue i and j respectively.

Therefore, the affinity matrix is Nv x Nv where Nv is the number of venues. Most of the matrix entries are zeroes except for 20 entries for each row where the entries are the closest 20 venues.

Then we use spectral clustering on these matrices to compute clusters of the cities at each time period and compare the result.

## Implementation 

### Dataset

The dataset comes from Yang et al [4]. This dataset includes long-term global-scale check-in data collected from Foursquare over approximately 18 months from April 2012 to September 2013. 

#### Data preprocessing and filtering

Because the original dataset is enormous and we focus our evaluation only on two cities, Houston, and Austin, we have to filter out data showed in the code below.

First we predefine Houston and Austin area as latitude and longitude coordinates as follows:
- Austin is a rectangle box from coordinate (30.097979, -98.038402) to (30.51, -97.555003).
- Houston is a rectangle box from coordinate (29.488986, -95.810704) to (30.139553, -95.022435).

In [1]:
# Imports and global variables

import string
import glob, os
import re
import math
from collections import defaultdict
from math import radians, cos, sin, asin, sqrt

import operator
import json
import numpy as np
import matplotlib.pyplot as plt

from sklearn.feature_extraction import image
from sklearn.cluster import SpectralClustering

AUSTIN_LAT_MIN = 30.097979
AUSTIN_LAT_MAX = 30.51
AUSTIN_LNG_MIN = -98.038402
AUSTIN_LNG_MAX = -97.555003
    
HOUSTON_LAT_MIN = 29.488986
HOUSTON_LAT_MAX = 30.139553
HOUSTON_LNG_MIN = -95.810704
HOUSTON_LNG_MAX = -95.022435

Then, we filter out data to only include venues inside specified coordinates and write them to new files for further processing.

The filter_venue function filters venue information data, which includes venue id, latitude, longitude, and type, to only include data from Austin and Houston. There are 8585 venues for Austin, and 10350 venues for Houston after filtering.

The filter_checkin function filters check-in information data, which include user id, venue id, and check-in time, to only include data from Austin and Houston. There are 71369 check-ins for Austin, and 65460 check-ins for Houston after filtering.

In [3]:
# Filter venue information to only include Austin and Houston venues
# Input: Filename of venue data
# Return: Venue information {venueid : {"lat" : latitude, "lng" : longitude}, ...}

    
def filter_venue(filename):
    ret = {}

    fp_austin = open("austin_venue.txt", 'a')
    fp_houston = open("houston_venue.txt", 'a')
    
    with open(filename, 'r') as fp:
        for line in fp:
            arr = re.split("\t", line)

            venueid = arr[0]
            lat = float(arr[1])
            lng = float(arr[2])

            venue_info = {}
            venue_info["lat"] = lat
            venue_info["lng"] = lng

            if (lat > AUSTIN_LAT_MIN and lat < AUSTIN_LAT_MAX) and (lng > AUSTIN_LNG_MIN and lng < AUSTIN_LNG_MAX):
                fp_austin.write(line)
                ret[venueid] = venue_info

            if (lat > HOUSTON_LAT_MIN and lat < HOUSTON_LAT_MAX) and (lng > HOUSTON_LNG_MIN and lng < HOUSTON_LNG_MAX):
                fp_houston.write(line)
                ret[venueid] = venue_info
            
    return ret
    
austin_houston_venue = filter_venue("dataset_TIST2015_POIs.txt")

In [5]:
# Filter checkin information to only include Austin and Houston checkins
# Input: filename of checkin data, venue information {venueid : {"lat" : latitude, "lng" : longitude}, ...}

def filter_checkin(filename, venue):
    fp_austin = open("austin_checkin.txt", 'a')
    fp_houston = open("houston_checkin.txt", 'a')
    
    with open(filename, 'r') as fp:
        for line in fp:
            arr = re.split("\t", line)

            venueid = arr[1]
            
            if not venue.has_key(venueid):
                continue
                
            venue_info = {}
            lat = venue[venueid]["lat"]
            lng = venue[venueid]["lng"]

            if (lat > AUSTIN_LAT_MIN and lat < AUSTIN_LAT_MAX) and (lng > AUSTIN_LNG_MIN and lng < AUSTIN_LNG_MAX):
                fp_austin.write(line)

            if (lat > HOUSTON_LAT_MIN and lat < HOUSTON_LAT_MAX) and (lng > HOUSTON_LNG_MIN and lng < HOUSTON_LNG_MAX):
                fp_houston.write(line)

filter_checkin("dataset_TIST2015_Checkins.txt", austin_houston_venue)

Then, we preprocess the n nearest venues of each venue. Because the original algorithm involves implementing kd-tree for efficiency but our implementation is simpler and less efficient, we save the processed data in a file for further usage so that we don't have to recompute it every run. 

Our method is simple. First, for each venue, we compute the distances of the given venue and the other venues. Then we sort the distances and save the closest 1000 venues. We pick 1000 because the number should cover all venues in the same area, and because there are approximately ten thousands of venues, keeping more than 1000 would make the files too large.

The output is then saved in the form ranked by closest distance:

venue1: closest_venue1, closest_venue2, ... closest_venue1000

and so on ...


In [12]:
# Helper function to compute the distance between two points using lat/long coordinate system
# Taken from Micheal Dunn from post http://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points
# Input: Two points {"latitude": latitude, "longitude": longitude}
# Return: Distance between two points in float

def distance(v1, v2):
    lon1 = float(v1["longitude"])
    lat1 = float(v1["latitude"])
    lon2 = float(v2["longitude"])
    lat2 = float(v2["latitude"])

    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

In [None]:
# Compute 1000 nearest neigbors for later computation
# Input: filaname, venue information {venueid : {"latitiude" : latitude, "longitude" : longitude, "type" : type}, ...}
# Return: None
# Output: To file: venueid:1st_nearest_venueid,2nd_nearest_venueid,...

def preprocess_nearest_venues(filename, info):
    fp = open(filename, 'a')
    l = len(info)
    c = 0
    n = 1000
    for v1 in info.keys():
        if c % (l/100) == 0:
            print str(c*100.0/float(l))+"%"
        c += 1
        
        dist = []
        for v2 in info.keys():
            if v1 == v2:
                continue
                
            d = distance(info[v1], info[v2])
            dist.append((v2, d))
        dist.sort(key = lambda elem: elem[1])
        s = ""
        
        i = 0
        for e in dist:
            if i >= 1000:
                break
            s += e[0]+","
            i += 1
            
        fp.write(v1+":"+s[:-1]+"\n")
        
preprocess_nearest_venues("nearest_neighbors_austin_1000.txt", austin_venue)
preprocess_nearest_venues("nearest_neighbors_houston_1000.txt", houston_venue)

#### Data extraction

Here we extract necessary data for clustering. 

First, we extract check-in information of each venue from the given city. For each venue, it contains a list of check-in time of users.

In [6]:
# Get venue's checkin information of given city data
# Input: Filename of checkin data
# Return: Venue's checkin information {venueid : [{userid : time}, {userid : time}, ...], ... }

def get_venue_checkin(filename):
    ret = {}
    
    fp = open(filename, 'r')
    lines = fp.readlines()
    
    for line in lines:
        checkin_time = {}
        arr = re.split("\t", line)
        
        userid = arr[0]
        venueid = arr[1]
        time = re.split(' ', arr[2])
        hour = time[3][0:2]
        
        checkin_time[userid] = hour

        if not ret.has_key(venueid):
            ret[venueid]=[checkin_time]
        else:
            ret[venueid].append(checkin_time)
            
    return ret

austin_venue_checkin = get_venue_checkin("austin_checkin.txt")
houston_venue_checkin = get_venue_checkin("houston_checkin.txt")

Then we extract venue information. For each venue, it contains latitude, longitude, and venue type.

In [7]:
# Get venue information
# Input: Filename of checkin data
# Return: Venue information {venueid : {"latitiude" : latitude, "longitude" : longitude, "type" : type}, ...}

def get_venue_info(filename):
    ret = {}
    
    fp = open(filename, 'r')
    lines = fp.readlines()

    for line in lines:
        arr = re.split("\t", line)
        
        venue_info = {}
        
        venueid = arr[0]
        venue_info["latitude"] = arr[1]
        venue_info["longitude"] = arr[2]
        venue_info["type"] = arr[3]
        
        ret[venueid] = venue_info

    return ret

austin_venue = get_venue_info("austin_venue.txt")
houston_venue = get_venue_info("houston_venue.txt")

As our additional contribution to the original work, we have to observe the differences of cities at different periods of time. Thus, we also extact check-in information of each hour in addition to extracting check-in information of the whole day. For each hour, it contains venues, for each venues it contains the frequency of each user visit during the hour.

In [9]:
# Get checkin frequency of venues based on checkin time in hour
# Input: Filename of checkin data, period in hours
# Return: Checkin information {hour : {venueid : {userid : checkin_freq, ...}, ...}, ...}

def get_checkin_freq_by_hour(filename):
    ret = defaultdict(lambda : defaultdict(dict))
    
    fp = open(filename, 'r')
    lines = fp.readlines()

    for line in lines:
        arr = re.split('\t', line)
        
        userid = arr[0]
        venueid = arr[1]
        time = re.split(' ', arr[2])
        hour = int(time[3][0:2])
        if not ret.has_key(hour):
            ret[hour][venueid][userid] = 1
        else:
            if not ret[hour].has_key(venueid):
                ret[hour][venueid][userid] = 1
            else:
                if not ret[hour][venueid].has_key(userid):
                    ret[hour][venueid][userid] = 1
                else:
                    ret[hour][venueid][userid] += 1
                    
    return ret
    
austin_checkin_freq = get_checkin_freq_by_hour("austin_checkin.txt")
houston_checkin_freq = get_checkin_freq_by_hour("houston_checkin.txt")

In [27]:
def all_day_checkin_freq(filename):
    ret = dict()
    
    for group in range(0,4):
        ret[group]=dict()

    for hours,values in filename.items():
        for venueid,id_value in values.items():
            for userid,count in id_value.items():
                    if not ret[0].has_key(venueid):
                        ret[0][venueid] = dict()
                        ret[0][venueid][userid] = count
                    else:
                        if not ret[0][venueid].has_key(userid):    
                            ret[0][venueid][userid] = count
                        else:
                            ret[0][venueid][userid] += count
    return ret

all_houston_checkin_freq = all_day_checkin_freq(houston_checkin_freq)
all_austin_checkin_freq = all_day_checkin_freq(austin_checkin_freq)
# print houston_checkin_freq_categorized[0]

After we have the data for check-in frequencies of each hour, we separate them to two cities: Austin, and Houston.

For Houston, we separate time periods into following:
- 23-24
- 1-9
- 10-21
- 22

For Austin, we separate time periods into the following:
- 24-7
- 8-9
- 10-21
- 22

In [28]:
def categorized_hoston_data(filename):
    ret = dict()
    for group in range(0,4):
        ret[group]=dict()
    #hours 0 and 23 go to group 0
    for hours,values in filename.items():
        for venueid,id_value in values.items():
            for userid,count in id_value.items():
                if hours== 0 or hours==23:
                    if not ret[0].has_key(venueid):
                        ret[0][venueid]=dict()
                        ret[0][venueid][userid] = count
                    else:
                        if not ret[0][venueid].has_key(userid):    
                            ret[0][venueid][userid] = count
                        else:
                            ret[0][venueid][userid] += count
                elif 1<= hours and hours <=9:
                    if not ret[1].has_key(venueid):
                        ret[1][venueid]=dict()
                        ret[1][venueid][userid] = count
                    else:
                        if not ret[1][venueid].has_key(userid):
                            ret[1][venueid][userid] = count
                        else:
                            ret[1][venueid][userid] += count
                elif 10<= hours and hours <=21:
                    if not ret[2].has_key(venueid):
                        ret[2][venueid]=dict()
                        ret[2][venueid][userid] = count
                    else:
                        if not ret[2][venueid].has_key(userid):
                            ret[2][venueid][userid] = count
                        else:
                            ret[2][venueid][userid] += count
                else:
                    if not ret[3].has_key(venueid):
                        ret[3][venueid]=dict()
                        ret[3][venueid][userid] = count
                    else:
                        if not ret[3][venueid].has_key(userid):
                            ret[3][venueid][userid] = count
                        else:
                            ret[3][venueid][userid] += count
    return ret

houston_checkin_freq_categorized = categorized_hoston_data(houston_checkin_freq)

In [29]:
def categorized_austin_data(filename):
    ret = dict()
    for group in range(0,4):
        ret[group]=dict()
    #hours 0 and 23 go to group 0
    for hours,values in filename.items():
        for venueid,id_value in values.items():
            for userid,count in id_value.items():
                if 0<= hours and hours <=7:
                    if not ret[0].has_key(venueid):
                        ret[0][venueid]=dict()
                        ret[0][venueid][userid] = count
                    else:
                        if not ret[0][venueid].has_key(userid):    
                            ret[0][venueid][userid] = count
                        else:
                            ret[0][venueid][userid] += count
                elif 8<= hours and hours <=9:
                    if not ret[1].has_key(venueid):
                        ret[1][venueid]=dict()
                        ret[1][venueid][userid] = count
                    else:
                        if not ret[1][venueid].has_key(userid):
                            ret[1][venueid][userid] = count
                        else:
                            ret[1][venueid][userid] += count
                elif 10<= hours and hours <=21:
                    if not ret[2].has_key(venueid):
                        ret[2][venueid]=dict()
                        ret[2][venueid][userid] = count
                    else:
                        if not ret[2][venueid].has_key(userid):
                            ret[2][venueid][userid] = count
                        else:
                            ret[2][venueid][userid] += count
                else:
                    if not ret[3].has_key(venueid):
                        ret[3][venueid]=dict()
                        ret[3][venueid][userid] = count
                    else:
                        if not ret[3][venueid].has_key(userid):
                            ret[3][venueid][userid] = count
                        else:
                            ret[3][venueid][userid] += count
    return ret

austin_checkin_freq_categorized = categorized_austin_data(austin_checkin_freq)

#### Visualizing data
**we could move this section to the evaluation too I think

Here we create check-in histogram and pie chart of venue type for better understanding the check-in trends.

This function creates check-in histogram for each hour of users. We can see the most active time for users here.

In [8]:
# Create hour histogram
# Input: Filename of checkin data, period in hours
# Return: [freq@0-period, freq@1-2*period, ...]

def create_histogram(filename, period):
    ret = {}
    
    fp = open(filename, 'r')
    lines = fp.readlines()

    for line in lines:
        arr = re.split('\t', line)
        
        time = re.split(' ', arr[2])
        hour = int(int(time[3][0:2])/period)
        
        if not ret.has_key(hour):
            ret[hour] = 1
        else:
            ret[hour] += 1
            
    sorted_keys = sorted(ret)
    
    ret_list = []
    
    for k in sorted_keys:
        ret_list.append((k, ret[k]))
            
    return ret_list

austin_histogram = create_histogram("austin_checkin.txt", 1)
houston_histogram = create_histogram("houston_checkin.txt", 1)

This function creates a pie chart for venue's type of each hour. We can see how there is changes to the most popular venue's type of each hour here.

In [10]:
# Create pie chart of venue types for each hour
# Input: Checkin information {hour : {venueid : {userid : checkin_freq, ...}, ...}, ...}, 
#        Venue information {venueid : {"latitiude" : latitude, "longitude" : longitude, "type" : type}, ...},
#        period in hours
# Return: pie chart {hour : {venue_type : freq, ...}, ...}

def create_pie_chart(freq, info):
    ret = {}
    
    for hour, venues in freq.iteritems():
        for venueid, users in venues.iteritems():
            venue_type = info[venueid]["type"]
            
            checkin_total = 0
            for userid, checkin_count in users.iteritems():
                checkin_total += checkin_count
                
            if not ret.has_key(hour):
                ret[hour] = {}
                ret[hour][venue_type] = checkin_total
                
            else:
                if not ret[hour].has_key(venue_type):
                    ret[hour][venue_type] = checkin_total
                else:
                    ret[hour][venue_type] += checkin_total
                    
    return ret

austin_pie_chart = create_pie_chart(austin_checkin_freq, austin_venue)
houston_pie_chart = create_pie_chart(houston_checkin_freq, houston_venue)

### Clustering Algorithm

After we have extracted necessary information (users check-ins per hour, and venues information), we proceed further to compute the affinity matrix and implement Spectral clustering algorithm.

First we normalize the vector to make it a unit vector for further computation.

In [11]:
# Helper function for cosine similarity. Create a unit vector.
# Input: Vector for user checkin frequency {userid : freq, ...}
# Return: Unit vector for user checkin frequency {userid : normalized_freq, ...}

def normalized(vect):
    ret = {}
    
    total = 0
    
    for userid, freq in vect.iteritems():
        total += freq**2
    
    normalizer = math.sqrt(total)
    
    for userid, freq in vect.iteritems():
        ret[userid] = float(float(freq)/float(normalizer))
        
    return ret

Then we obtain the precomputed nearest venues to compute affinity matrix.

In [25]:
# Get the n nearest venues from precomputed nearest venue list file
# Input: filename, number of nearest venues
# Return: n nearest venues for each venue {venueid : [venueid, ...], ...}

def get_nearest_venues(filename, n):
    ret = {}

    fp = open(filename, 'r')
    lines = fp.readlines()

    for line in lines:
        arr = re.split(':', line)
        
        venueid = arr[0]
        nearest_list = re.split(',', arr[1])
        
        ret[venueid] = nearest_list[0:n]
             
    return ret

austin_nearest = get_nearest_venues("nearest_neighbors_austin_1000.txt", 20)
houston_nearest = get_nearest_venues("nearest_neighbors_houston_1000.txt", 20)

Finally, we compute the cosine similarity and affinity matrix as described in the Core Algorithm section.

In [26]:
# Compute cosine similarity between two vectors
# Input: Two user checkin frequency vectors {userid : freq, ...}, {userid : freq, ...}
# Return: similarity score

def cosine_sim(vec1, vec2):
    ret = 0
    
    v1 = normalized(vec1)
    v2 = normalized(vec2)
    
    common_keys = list(set(v1.keys()) & set(v2.keys()))
    
    for k in common_keys:
        ret += v1[k]*v2[k]
    
    return ret

In [None]:
# Input: {venueid : {userid : checkin_freq, ...}, ...},
#        {venueid : [venueid, ...], ...}
# Return: Nv x Nv Affinity Matrix; Nv = # venues
#         {venueid : {venueid : score, ...}, ...}

def create_affinity_matrix(venues, nearest):
    alpha = 0.1
    
    ret = {}
    
    #l = len(venues)
    #c = 0
    
    for v1 in venues.keys():
        ret[v1] = {}
        
        #if c % (l/100) == 0:
        #    print str(c*100.0/float(l))+"%"
        #c += 1

        for v2 in venues.keys():
            if v2 in nearest[v1] or v1 in nearest[v2]:
                ret[v1][v2] = cosine_sim(venues[v1], venues[v2]) + alpha
            else:
                ret[v1][v2] = 0
        
#         for v2 in nearest[v1]:
#             ret[v1][v2] = cosine_sim(venues[v1], venues[v2]) + alpha

    return ret

# houston_affinity = create_affinity_matrix(houston_checkin_freq_categorized[0], houston_nearest)
# houston_affinity = create_affinity_matrix(all_houston_checkin_freq[0], houston_nearest)
# austin_affinity = create_affinity_matrix(austin_checkin_freq_categorized[3], austin_nearest)
austin_affinity = create_affinity_matrix(all_austin_checkin_freq[0], austin_nearest)

After we have the affinity matrices, we apply a general version of Spectral clustering algorithm to obtain the final results.


In [1]:
# General version of Spectral clustering algorithm

def spectralClustering(venue, precomputed_affinity, theshold):

    affinity_matrix = []

    l = len(precomputed_affinity)
    c = 0

    for k in precomputed_affinity:

        if c % (l/100) == 0:
            print str(c*100.0/float(l))+"%"
        c += 1

        temp_matrix = []
        for k2 in precomputed_affinity[k]:
            temp_matrix.append(precomputed_affinity[k][k2])
        affinity_matrix.append(np.array(temp_matrix))

    affinity_matrix = np.array(affinity_matrix)

    print affinity_matrix.shape

    #labels = spectral_clustering(graph, n_clusters = 4, eigen_solver = 'arpack')
    sc = SpectralClustering(n_clusters = 25, n_jobs = -1, affinity = 'precomputed')
    sc.fit(affinity_matrix)

    labels = sc.labels_

    cluster = {}
    c = 0

    for v in precomputed_affinity:
        cluster[v] = labels[c]
        c += 1

    cluster_count = {}

    for v in venue:
        if v in precomputed_affinity:
            if cluster_count.get(cluster[v]) == None:
                cluster_count[cluster[v]] = 1
            else:
                cluster_count[cluster[v]] += 1

    max_index = max(cluster_count.iteritems(), key=operator.itemgetter(1))[0]

    x = []
    y = []
    t = []
    data = {}

    print "Total # of venues", len(precomputed_affinity)
    print "Noise cluster: ", max_index
    print "Cluster size: ", cluster_count[max_index]

    for v in venue:
        if v in precomputed_affinity:
             if float(venue[v]['longitude']) > theshold and cluster[v] != max_index:
                x.append(venue[v]['longitude'])
                y.append(venue[v]['latitude'])
                t.append(cluster[v])

                if data.get(cluster[v]) == None:
                    data.setdefault(cluster[v], [])
                temp = {}
                temp.setdefault('longitude', venue[v]['longitude'])
                temp.setdefault('latitude', venue[v]['latitude'])

                data[cluster[v]].append(temp)

    #          elif cluster[v] == max_index:
    #             print houston_venue[v]['type']

    x = np.array(x)
    y = np.array(y)
    t = np.array(t)

    plt.figure(figsize=(10, 10))
    plt.scatter(x, y, c = t)
    plt.show()

    out = []

    for k in data:
        out.append(data[k])

    with open('coords_list.json', 'w+') as json_file:
            json_data = json.dumps(out, indent = True)
            json_file.write(json_data)


    print "Clustering completed"
    
#spectralClustering(houston_venue, houston_affinity, -96.0)
#spectralClustering(austin_venue, austin_affinity, -98.0)


## Evaluation



## Reference:

[1] Lynch, K, 1992, The image of the city. MIT press

[2] Milgram, S, 1977, The individual in a Social World: Essays and Experiments. London: Longman Education 
	
[3] Cranshaw, J., Schwartz, R., Hong, J.I. and Sadeh, N., 2012, The livehoods project: Utilizing social media to understand the dynamics of a city. International AAAI Conference on Weblogs and Social Media

[4] Yang, Dingqi, et al. "NationTelescope: Monitoring and visualizing large-scale collective behavior in LBSNs." Journal of Network and Computer Applications 55 (2015): 170-180.