# Applications of Search


## Exercise – Web tracking
We maintain a symbol table of symbol tables. In the first ST, we store users as keys. Associated to each user is another symbol table, where we store visited websites as keys, and number of past visits as values. We use our previous BST realisation of a symbol table (copied below).

In [1]:
class BSTNode: 
    def __init__(self, key, value): 
        self.key = key
        self.value = value
        self.left = None 
        self.right = None 


    def get(self, key):
        if self.key == key: 
            return self.value 
        elif key < self.key and self.left != None:
            return self.left.get(key) 
        elif key > self.key and self.right != None: 
            return self.right.get(key) 
        else: 
            return None

    def put(self, key, value):
        if key == self.key:
            self.value = value
        elif key < self.key:
            if self.left is None:
                self.left = BSTNode(key, value)
            else:
                self.left.put(key, value)
        elif key > self.key:
            if self.right is None:
                self.right = BSTNode(key, value)
            else:
                self.right.put(key, value) 

In [2]:
class WebTracker:
    def __init__(self, user, website):
        webST = BSTNode(website,1)
        self.wt = BSTNode(user, webST)
    
    def recordVisit(self, user, website):
        webST = self.wt.get(user)
        if webST == None:
            webST = BSTNode(website, 1)
            self.wt.put(user, webST)
        else:
            freq = webST.get(website)
            if freq is None:
                webST.put(website, 1)
            else:
                webST.put(website, freq+1)
            
            
    def hasVisited(self, user, website):
        webST = self.wt.get(user)
        if webST== None:
            return False
        else:
            return (webST.get(website) is not None)
        
    def getVisitFrequency(self, user, website):
        webST = self.wt.get(user)
        if webST == None:
            return 0
        else:
            return webST.get(website)


In [3]:
# driver code

# read websites visits one by one from 3-webtrack.txt (format: user,website)
# record visit in WebTracker            
with open('3-webtrack.txt','r') as f:
    line = f.readline().rstrip("\n")
    trackedVisit = line.split(",")
    webTracker = WebTracker(trackedVisit[0], trackedVisit[1])

    line = f.readline().rstrip("\n")
    while line != '':
        trackedVisit = line.split(",")
        webTracker.recordVisit(trackedVisit[0], trackedVisit[1])         
        line = f.readline().rstrip("\n")
        
print("Number of visists of user 1 to url 1 (should be 2)", webTracker.getVisitFrequency("user 1", "url 1"))
print("Number of visists of user 1 to url 2 (should be 5)", webTracker.getVisitFrequency("user 1", "url 2"))
print("Number of visists of user 1 to url 3 (should be 3)", webTracker.getVisitFrequency("user 1", "url 3"))

Number of visists of user 1 to url 1 (should be 2) 2
Number of visists of user 1 to url 2 (should be 5) 5
Number of visists of user 1 to url 3 (should be 3) 3


## Exercise - IMDB Search
To support the first part of the API, we can use a symbol table where we store movie titles as keys, and a list of actor names as values. We could create an unordered list of actors as we read the input file (in the constructor), then sort names as needed when implementing the "list performers alphabetically" API, but that would mean incurring the cost of sorting as often as the API is called. Rather, we sort the actor lists alphabetically upon construction. We can do this in two ways: (1) by first reading all the input file and creating unsorted actors' lists, then sorting them once and for allt; (2) by creating sorted lists as part of file processing. We follow option (1) below. 

To begin with, we extend the BST implementation with some auxiliary finctions.

In [4]:
class BSTNode: 
    def __init__(self, key, value): 
        self.key = key
        self.value = value
        self.left = None 
        self.right = None 

    def get(self, key):
        if self.key == key: 
            return self.value 
        elif key < self.key and self.left != None:
            return self.left.get(key) 
        elif key > self.key and self.right != None: 
            return self.right.get(key) 
        else: 
            return None

    def put(self, key, value):
        if key == self.key:
            self.value = value
        elif key < self.key:
            if self.left is None:
                self.left = BSTNode(key, value)
            else:
                self.left.put(key, value)
        elif key > self.key:
            if self.right is None:
                self.right = BSTNode(key, value)
            else:
                self.right.put(key, value) 
                
    # auxiliary function to retrieve all keys in BST             
    def keys(self):
        allKeys = []
        self.addKeys(self, allKeys)
        return allKeys
    
    def addKeys(self, node, allKeys):
        if node is None:
            return
        allKeys.append(node.key)
        self.addKeys(node.left, allKeys)
        self.addKeys(node.right, allKeys)
        
    # auxiliary function to compute size of BST             
    def size(self):
        return self.computeSize(self, 0)
    
    def computeSize(self, node, count):
        if node is None:
            return count
        count = self.computeSize(node.left, count + 1)
        count = self.computeSize(node.right, count)
        return count


In [5]:
 class IMDBSearch:
    def __init__(self):
        # process the imdb file
        with open('3-imdbtest.txt','r') as f:
            line = f.readline()
            movie = line.split("/")
            title = movie[0]
            actors = []
            for i in range(1, len(movie)):
                actors.append(movie[i])
            self.movieST = BSTNode(title, actors)
            self.sizeMovies = 1
               
            line = f.readline()
    
            while line != '':
                movie = line.split("/")
                title = movie[0]
                actors = []
                for i in range(1, len(movie)):
                    actors.append(movie[i])
                self.movieST.put(title, actors)
                self.sizeMovies = self.sizeMovies + 1
                
                line = f.readline()
                
        # we now sort the actors in alphabetical order
        # we use python sorted() but could have equally used one of the sorting algotithms 
        # we previously implemented
        allTitles = self.movieST.keys()
        for i in range(self.sizeMovies):
            sortedList = sorted(self.movieST.get(allTitles[i]))
            self.movieST.put(allTitles[i], sortedList)

    # the following method satisfies both "list performers" and "list performers alphabetically" API
    def performers(self, movie):
        return self.movieST.get(movie)     


In [6]:
# driver code processing first 5 movies in test file
testMovies = []

with open('3-imdbtest.txt','r') as f:
    for i in range(5):
        line = f.readline()
        movie = line.split("/")
        testMovies.append(movie[0])

myIMDBclient = IMDBSearch()
p = myIMDBclient.performers(testMovies[0])
print("In movie:", testMovies[0], "actors are:")
print(p)

p = myIMDBclient.performers(testMovies[2])
print("In movie:", testMovies[2], "actors are:")
print(p)

In movie: 'Breaker' Morant (1980) actors are:
['Ball, Ray (I)', 'Ball, Vincent (I)', 'Bell, Wayne (I)', 'Bernard, Hank', 'Brown, Bryan (I)', 'Cassell, Alan (I)', 'Cisse, Halifa', 'Cornish, Bridget', 'Currer, Norman', 'Dick, Judy', 'Donovan, Terence (I)', 'Erskine, Ria\n', 'Fitz-Gerald, Lewis', 'Gray, Ian (I)', 'Haywood, Chris (I)', 'Henderson, Dick (II)', 'Horseman, Sylvia', 'Kiefel, Russell', 'Knez, Bruno', 'Lovett, Alan', 'Mann, Trevor (I)', 'Meagher, Ray', 'Mullinar, Rod', 'Nicholls, Jon', 'Osborn, Peter', 'Peterson, Ron', 'Pfitzner, John', 'Procanin, Michael', 'Quin, Don', 'Radford, Elspeth', 'Reed, Maria', 'Rodger, Ron', 'Seidel, Nellie', 'Smith, Chris (I)', 'Steele, Rob (I)', 'Thompson, Jack (I)', "Tingwell, Charles 'Bud'", 'Walton, Laurie (I)', 'Waters, John (III)', 'West, Barbara', 'Wilson, Frank (II)', 'Woodward, Edward']
In movie: 'Crocodile' Dundee II (1988) actors are:
['Alexander, Jace', 'Ali, Tatyana', 'Andrews, Jose', 'Arriaga, Luis', 'Arvanites, Steven', 'Asai, Hisayo',

To support the API "Did actor <tt>a</tt> perform in movie <tt>m</tt>?" we expand our previous implementation of IMDBSearch with <tt>performes(m)</tt> to retrieve the actor list, then perform a binary search in the alphabetically sorted list. 


In [7]:
 class IMDBSearch:
    def __init__(self):
        # process the imdb file
        with open('3-imdbtest.txt','r') as f:
            line = f.readline()
            movie = line.split("/")
            title = movie[0]
            actors = []
            for i in range(1, len(movie)):
                actors.append(movie[i])
            self.movieST = BSTNode(title, actors)
            self.sizeMovies = 1
               
            line = f.readline()
    
            while line != '':
                movie = line.split("/")
                title = movie[0]
                actors = []
                for i in range(1, len(movie)):
                    actors.append(movie[i])
                self.movieST.put(title, actors)
                self.sizeMovies = self.sizeMovies + 1
                
                line = f.readline()
                
        # we now sort the actors in alphabetical order
        # we use python sorted() but could have equally used one of the sorting algotithms 
        # we previously implemented
        allTitles = self.movieST.keys()
        for i in range(self.sizeMovies):
            sortedList = sorted(self.movieST.get(allTitles[i]))
            self.movieST.put(allTitles[i], sortedList)

    # the following method satisfies both "list performers" and "list performers alphabetically" 
    def performers(self, movie):
        return self.movieST.get(movie)     

    
    # new support function
    def binSearch(self, aList, item, lo, hi):    
        if lo > hi:
            return False

        mid = (lo + hi) // 2
        if item == aList[mid]:
            return True

        if item < aList[mid]:
            return self.binSearch(aList, item, lo, mid-1)
        else:
            return self.binSearch(aList, item, mid+1, hi)
    
    # did actor a perform in movie m?
    def hasPerformed(self, movie, actor):
        movieList = self.performers(movie)
        if movieList == None:
            return False
        else:
            return self.binSearch(movieList, actor, 0, len(movieList))


In [8]:
# driver code, as before we process first 5 movies only
testMovies = []

with open('3-imdbtest.txt','r') as f:
    for i in range(5):
        line = f.readline()
        movie = line.split("/")
        testMovies.append(movie[0])

myIMDBclient = IMDBSearch()
p = myIMDBclient.performers(testMovies[0])
print("In movie:", testMovies[0], "actors are:")
print(p)

b = myIMDBclient.hasPerformed(testMovies[0], p[0])
print("Has actor", p[0], "performed in movie", testMovies[0], "?", b)
b = myIMDBclient.hasPerformed(testMovies[0], p[1])
print("Has actor", p[1], "performed in movie", testMovies[0], "?", b)
b = myIMDBclient.hasPerformed(testMovies[0], "Licia Capra")
print("Has actor", "Licia Capra", "performed in movie", testMovies[0], "?", b)



In movie: 'Breaker' Morant (1980) actors are:
['Ball, Ray (I)', 'Ball, Vincent (I)', 'Bell, Wayne (I)', 'Bernard, Hank', 'Brown, Bryan (I)', 'Cassell, Alan (I)', 'Cisse, Halifa', 'Cornish, Bridget', 'Currer, Norman', 'Dick, Judy', 'Donovan, Terence (I)', 'Erskine, Ria\n', 'Fitz-Gerald, Lewis', 'Gray, Ian (I)', 'Haywood, Chris (I)', 'Henderson, Dick (II)', 'Horseman, Sylvia', 'Kiefel, Russell', 'Knez, Bruno', 'Lovett, Alan', 'Mann, Trevor (I)', 'Meagher, Ray', 'Mullinar, Rod', 'Nicholls, Jon', 'Osborn, Peter', 'Peterson, Ron', 'Pfitzner, John', 'Procanin, Michael', 'Quin, Don', 'Radford, Elspeth', 'Reed, Maria', 'Rodger, Ron', 'Seidel, Nellie', 'Smith, Chris (I)', 'Steele, Rob (I)', 'Thompson, Jack (I)', "Tingwell, Charles 'Bud'", 'Walton, Laurie (I)', 'Waters, John (III)', 'West, Barbara', 'Wilson, Frank (II)', 'Woodward, Edward']
Has actor Ball, Ray (I) performed in movie 'Breaker' Morant (1980) ? True
Has actor Ball, Vincent (I) performed in movie 'Breaker' Morant (1980) ? True
Has a

To support the reminder of the API, we could call <tt>hasPerformed()</tt> on all movies stored in IMDB as shown below.


In [9]:
class IMDBSearch:
    def __init__(self):
        # process the imdb file
        with open('3-imdbtest.txt','r') as f:
            line = f.readline()
            movie = line.split("/")
            title = movie[0]
            actors = []
            for i in range(1, len(movie)):
                actors.append(movie[i])
            self.movieST = BSTNode(title, actors)
            self.sizeMovies = 1
               
            line = f.readline()
    
            while line != '':
                movie = line.split("/")
                title = movie[0]
                actors = []
                for i in range(1, len(movie)):
                    actors.append(movie[i])
                self.movieST.put(title, actors)
                self.sizeMovies = self.sizeMovies + 1
                
                line = f.readline()
                
        # we now sort the actors in alphabetical order
        # we use python sorted() but could have equally used one of the sorting algotithms 
        # we previously implemented
        allTitles = self.movieST.keys()
        for i in range(self.sizeMovies):
            sortedList = sorted(self.movieST.get(allTitles[i]))
            self.movieST.put(allTitles[i], sortedList)

    # the following method satisfies both "list performers" and "list performers alphabetically"
    def performers(self, movie):
        return self.movieST.get(movie)     

    
    # support function
    def binSearch(self, aList, item, lo, hi):    
        if lo > hi:
            return False

        mid = (lo + hi) // 2
        if item == aList[mid]:
            return True

        if item < aList[mid]:
            return self.binSearch(aList, item, lo, mid-1)
        else:
            return self.binSearch(aList, item, mid+1, hi)
    
    # Did actor a perform in movie m?
    def hasPerformed(self, movie, actor):
        movieList = self.performers(movie)
        if movieList == None:
            return False
        else:
            return self.binSearch(movieList, actor, 0, len(movieList))
        
        
    # NEW: In how many movies did actor a perform?
    def howManyPerformed(self, actor):
        allTitles = self.movieST.keys()
        performedMovies = 0
        for i in range(self.sizeMovies):
            actorList = self.movieST.get(allTitles[i])
            found = self.binSearch(actorList, actor, 0, len(actorList))
            if found:
                performedMovies = performedMovies + 1
        return performedMovies

    
    # NEW: Return all movies in which actor a performed 
    def performed(self, actor):
        allTitles = self.movieST.keys()
        performedMovies = []
        for i in range(self.sizeMovies):
            actorList = self.movieST.get(allTitles[i])
            found = self.binSearch(actorList, actor, 0, len(actorList))
            if found:
                performedMovies.append(allTitles[i])
        return performedMovies
                

In [15]:
# driver code
testMovies = []

with open('3-imdbtest.txt','r') as f:
    for i in range(5):
        line = f.readline()
        movie = line.split("/")
        testMovies.append(movie[0])

myIMDBclient = IMDBSearch()

p = myIMDBclient.performers(testMovies[2])
print("In movie:", testMovies[2], "actors are:")
print(p)

b = myIMDBclient.hasPerformed(testMovies[2], p[0])
print("Has actor", p[0], "performed in movie", testMovies[2], "?", b)

c = myIMDBclient.howManyPerformed(p[0])
print("In how many movies has actor", p[0], "performed?", c)

l = myIMDBclient.performed(p[0])
print("In what movies has actor", p[0], "performed?", l)


In movie: 'Crocodile' Dundee II (1988) actors are:
['Alexander, Jace', 'Ali, Tatyana', 'Andrews, Jose', 'Arriaga, Luis', 'Arvanites, Steven', 'Asai, Hisayo', 'Batten, Tom (I)', 'Blinco, Maggie', 'Bobbit, Betty', 'Boutsikaris, Dennis', 'Carrasco, Carlos', 'Castle, Angela', 'Cerullo, Al', 'Cooper, Jim (I)', 'Cooper, Sam (I)', 'Cox, Hannah', 'Creighton, Rhett', 'Crittenden, Dianne\n', 'Crivello, Anthony (I)', 'Dingo, Ernie', 'Dutton, Charles S.', 'Essman, Susie', 'Fernández, Juan (I)', 'Folger, Mark', 'Guzmán, Luis (I)', 'Hogan, Paul (I)', 'Holt, Jim (I)', 'Jbara, Gregory', 'Jerosa, Vincent', 'Kozlowski, Linda', 'Krivak, Bryan', 'Lane, Rita', 'Maldonado, Edwin', 'Meillon, John', 'Mercurio, Gus', 'Quinn, Colin', 'Rackman, Steve', 'Ramsey, John (I)', 'Rios, Julio', 'Rockafellow, Stacey', 'Rogers, Maria Antoinette', 'Root, Stephen (I)', 'Ruiz, Anthony', 'Sandy, Bill', 'Saunders, Mark (I)', 'Segura, Fernando', 'Serbagi, Roger', 'Shams, Homay', 'Skilton, Gerry', 'Skinner, Doug', 'Sokol, Marily

What is the run-time cost of the above? Consider expanding the IMDB data structure, so to maintain a second symbol table, this one having actor names as keys, and the list of movies they performed in as values. Consider pros / cons of the two approaches from a time and space complexity point of view. 