In [1]:
import csv
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import word_tokenize, regexp_tokenize
import pandas as pd
from collections import Counter
import numpy as np
import heapq
import ast
import re
import math

# functions is the file with our reading vocabulary, inv_indx and inv_indx_tfidf
import functions

These are all paths used in notebook.

In [2]:
path_all_files = r'C:\Users\alice\Desktop\ADM_HW3\file_unique'
path_all_files_doc = r'C:\Users\alice\Desktop\ADM_HW3\file_unique\doc_'
path_vocabulary = r'C:\Users\alice\Desktop\ADM_HW3\vocabulary.txt'
path_inverted_indx = r'C:\Users\alice\Desktop\ADM_HW3\inverted_indx.txt'
path_inv_indx_tfid = r'C:\Users\alice\Desktop\ADM_HW3\inverted_indx_tfid.txt'

# Search Engine 1

The **preprocess** function converts words in files.
* removing '\n'
* removing punctuation
* filter the non stopwords
* removing the stem

In [3]:
def preprocess(text):
    # converting to lower case
    text = text.lower()
    # removing '\n'
    text = text.replace('\\n', ' ')
    # removing punctuation
    tokenizer = regexp_tokenize(text, "[\w\$]+")
    # filter the non stopwords
    filtered = [w for w in tokenizer if not w in stopwords.words('english')]
    ps = PorterStemmer()
    # removing the stemming
    filtered = [ps.stem(word) for word in filtered]
    # it returns tokenized text
    return filtered

The **findTheBestDocuments** is a function which is searching for documents with the all words from a query. 

In [4]:
def findTheBestDocuments(docs_list, pre_query):
    # it counts how many times each word of the query occurs in the documents
    # at first we initialize the words_dict we are adding the whole words from the first file with value 1
    words_dict = {word:1 for word in docs_list[0]}
    # now we are counting how many times each word occurs in the next files
    for sublist_ind in range(1, len(docs_list)):
        for k in docs_list[sublist_ind]:
            try:
                # if the word already exists
                words_dict[k] += 1
            except:
                # if the word doesn't exist
                words_dict[k] = 1
    # it returns only the documents which contain all the query words
    return [doc_id for doc_id in words_dict if words_dict[doc_id] == len(pre_query)]

**SearchEngine** is a function with the arguments:
* query - The input text from user
* vocab - dictionary (saved in 'vocabulary.txt' file)
* inv_indx - inverted index dictionary (saved in 'inv_ind.txt' file)

The **output** is a list of doc_id's for the best documents. 

In [5]:
def SearchEngine(query, vocabulary, inv_indx):
    pre_query = preprocess(query)
    word_list = []
    for item in pre_query:
        if item not in vocabulary:
            print('No documents found')
            break
        word_id = vocabulary[item]
        word_list.append(word_id)
    # word_list contains the id's of words according to vocabulary file
    
    result_list = []
    for term_id in word_list:
        result_list.append(inv_indx[term_id])
    # result_list contains the id's of documents which contain at least one word from the query
    
    best_docs = findTheBestDocuments(result_list, pre_query)
    # best_docs is the list of the id's of documents which contain all the words of the query
    
    return best_docs

The function that opens and reads the *vocabulary.txt* is in the **functions.py** script. We recall it and save the vocabulary as dictionary.

In [6]:
vocabulary = functions.read_vocabulary('vocabulary.txt')

We recall the function that opens the *inverted_indx.txt* and save the inv_index dictionary.

In [7]:
inv_indx = functions.read_inv_indx('inverted_indx.txt')

**Test of the SearchEngine:**

Take the user query as input.

In [11]:
query = input()

a beautiful house with beach


In [12]:
test = SearchEngine(query, vocabulary, inv_indx)

We use *pandas* to show the best documents for the user query so we obtain a table in which every row is a document with the whole informations needed.

In [15]:
docs_files = []
for i in test:
    docs_files.append(pd.read_csv(path_all_files_doc + str(i) + '.tsv', sep = '\t'))
docs_list = [[row for row in doc_i] for doc_i in docs_files]
cols = ['1', '2', 'City', '4', 'Description', '6', '7', 'Title', 'Url']
pd.set_option('max_colwidth', 500)
df = pd.DataFrame(docs_list, columns = cols)[['Title', 'Description', 'City', 'Url']]
df.head()

Unnamed: 0,Title,Description,City,Url
0,Bethel Blue,Bethel Blue is a cozy 2 bedroom 1.5 bath house about 5 blocks from the beautiful beach. It is decorated nicely and equipped with everything your family needs for a wonderful beach getaway. Military and Senior citizens receive a 10% discount!,Bolivar Peninsula,https://www.airbnb.com/rooms/12253684?location=Bolivar%20Peninsula%2C%20TX\r\n
1,Monroe Manor,Come relax on the water in our beautiful beach house style river home. You'll love our shiplap walls but the view is even better!,Houston,https://www.airbnb.com/rooms/19273801?location=Channelview%2C%20TX\r\n
2,MarBella House Near the Galveston Beach,Location Location Location - Large 3 bdrm house can walk 2 blocks to the beach - or 1/4 mile to pleasure pier - or several area restaurants to walk to. Rehabilitated beautiful 100 year old home. Lots of parking space - large backyard deck with BBQ grill for the family hangout. Central A/C units separated up and down for your comfort.\n\nDon't be fooled by the cottage or bungalow - that means small.... This is a 2000 square foot large home all to yourselves with plenty of parking.,Galveston,https://www.airbnb.com/rooms/7459305?location=Bayou%20Vista%2C%20TX\r\n
3,Beautiful 4 Bedroom Beach House,"Feel the sun, sand &amp; water on your feet at rock bottom prices!.\n\nA stone's throw away from the beach, experience the Gulf Of Mexico like never before in a beautiful condo that is so big and luxurious you will be amazed. \n\n",South Padre Island,https://www.airbnb.com/rooms/2846134?location=Brownsville%2C%20TX\r\n
4,SPI Golf Course 1st class #4,"The South Padre Island Golf Course offers first class amenities. Premiere golf, tennis courts, heated swimming pools, hiking trails, a club house with an on-site restaurant &amp; bar, 24 hour gated security.\n\nLocated within close driving distance, 8 or 10 miles, you are just 15 minutes away from the entertainment, shopping and restaurants in Port Isabel and on South Padre Island. Dine out every night or if you want, cook in. You can enjoy a kitchen that is fully equipped or you can cook ou...",Laguna Vista,https://www.airbnb.com/rooms/924616?location=Brownsville%2C%20TX\r\n


# Search Engine 2

We recall the function that opens the 'inverted_indx_tfid.txt' and save the inv_idx_tfidf dictionary.

In [16]:
inv_indx_tfidf = functions.read_inv_indx_tfidf(file = 'inverted_indx_tfid.txt')

We use **numpy** library to compute cosine distance between two vectors.

In [17]:
def cosine_dist(x, y):
    dot_product = np.dot(x, y)
    norm_x = np.linalg.norm(x)
    norm_y = np.linalg.norm(y)
    return 1 - (dot_product / (norm_x * norm_y))

The function **query_tfidf** calculates the TFIDF score for words in a query and returns the occurrence of each word in the query:

In [18]:
def query_tfidf(prep_query_list):
    return [prep_query_list.count(word) for word in prep_query_list]

The function **get_tfidf** returns the TFIDF score for a word in the (doc_id) document.

ARGS:
* arg_list - it is the value (as a list) for a given word (the key of the dict) in 'inv_indx_tfid' dictionary.
* doc_id - the document id for which function returns the tfid score

In [19]:
def get_tfidf(arg_list, doc_id):
    for tuple_ in arg_list:
        # tuple format: (doc_id, tdidf_score)
        if tuple_[0] == doc_id:
            return tuple_[1]

The function **get_top_n** returns n documents sorted by cosine similarity. We use the **heap** structure: actually we work on the **cos_dist** and get the top n_smallest in order to have the top of n_biggest **cos_similarity** which is $1-cos\_dist$.

In [20]:
def get_top_n(n, score_doc_list):
    heap = []
    for tup in score_doc_list:
        # creating the heap structure
         heapq.heappush(heap, tup)
    return heapq.nsmallest(n, heap)

**SearchEngine_new** is a function with the arguments:

* query - The input text from user
* vocabulary - dictionary (saved in 'vocabulary.txt' file)
* inv_indx - inverted index dictionary (saved in 'inv_ind.txt' file)
* inv_indx_tfidf  - inverted index with the tfidf score dictionary (saved in 'inv_ind_tfid.txt' file)
* n  - number of documents we want as return

The output is the dict of the n best documents with the corresponding cosine distances.

In [21]:
def SearchEngine_new(query, vocabulary, inv_indx, inv_indx_tfidf, n = 20):
    query_list = preprocess(query)
    word_list = []
    for item in query_list:
        if item not in vocabulary:
            print('No documents found')
            break
        word_id = vocabulary[item]
        word_list.append(word_id)
        
    result_list = []
    for term_id in word_list:
        result_list.append(inv_indx[term_id])
    selected_docs = findTheBestDocuments(result_list, query_list)
    
    # calculating the Cosine Similarities
    cos_dist_list = []
    for doc_id in selected_docs:
        
        # creating the TFID vector for a document
        tfid_vector = []
        
        for word in word_list: 
            g = get_tfidf(inv_indx_tfidf[word], doc_id)
            tfid_vector.append(g)    
        # creating the list of cos_dist and doc_id
        cos_dist_list.append((round(cosine_dist(query_tfidf(word_list), tfid_vector), 6), doc_id))
        # selecting the n best documents 
        top_n_list = get_top_n(n, cos_dist_list)
        
    return top_n_list

**Test of the SearchEngine_new:**

In [22]:
query = input()

a beautiful house with beach


In [23]:
top_n_list = SearchEngine_new(query, vocabulary, inv_indx, inv_indx_tfidf)

This function creates the table with the whole information from selected files sorted by the **cosine similarity**:

In [24]:
def display_df_with_cos(top_n_list, path_all_files_doc):
    cos_dist = [(1-k, v) for k, v in top_n_list]
    cos_dataframe = pd.DataFrame(cos_dist)
    cos_dataframe.columns = ['Similarity', 'Doc_id']
    cos_dataframe_values = [v for k, v in cos_dist]
    docs_files = []
    for i in cos_dataframe_values:
        docs_files.append(pd.read_csv(path_all_files_doc + str(i) + '.tsv', sep = '\t'))
    docs_list = [[row for row in doc_i] for doc_i in docs_files]
    cols = ['1', '2', 'City', '4','Description', '6', '7', 'Title', 'Url']

    z = pd.DataFrame(docs_list, columns = cols)[['Title', 'Description', 'City', 'Url']]
    df = pd.concat([z, cos_dataframe], axis = 1)
    
    # we don't want to show the 'Doc_id' column
    df.drop(columns = ['Doc_id'], inplace = True)
    
    return df.style.set_table_styles([{'selector': '.row_heading, .blank', 'props': [('display', 'none;')]}])

In [33]:
display_df_with_cos(top_n_list, path_all_files_doc)

Unnamed: 0,Title,Description,City,Url,Similarity
0,Beautiful Creekside Country Home,"Edelweiss is a lovely country home on the banks of beautiful Smith Creek, yet only minutes from the fabulous shopping and dining of downtown Wimberley. This spacious house is perfect for family and friend gatherings. Take a stroll through the 12 acres of creek side nature (2 private, 10 shared with the owners), watch the deer graze as you enjoy your morning coffee, strum your guitar beside the fire pit, challenge your friends to a game of pool, watch your kiddos play in the tree house, have a family bbq on the gorgeous back deck, snuggle up by the fireplace with a cup of hot cocoa, and feel the stress of your daily life melt away.\n\nThe name Edelweiss belongs to a delicate white flower that grows in the Alps, made famous in The Sound of Music. From the morning songs of the sweet, chirping birds to the evening orchestra courtesy of the frogs and toads along the creek, we like to say that the Hill Country is alive with the sound of music! Here you get the privacy and serenity of a country home with all the amenities of modern life, including wifi. During the house remodel the kitchen was built from scratch, and some of the furniture was hand made by the owner, a professional woodworker. The house has two and a half baths, three bedrooms (king, queen, and three twin beds), plus a large game room with two queen futons, games for kids of all ages, and a pool table. Add that to the spacious living room with a fold-out couch, dining room with a table for 8, and the large deck with a beautiful hand-made picnic table, and you’ve got the perfect space for your family and friend retreat! At Edelweiss the owners are environmentally conscious. They offer full recycling, use environmentally friendly dish soap, provide recycled paper products, offer a reverse-osmosis filtration system to eliminate the need for bottled water, and landscape minimally with native low-water plants. \n\nIf you plan to visit in the summer with children, you might also be interested in enrolling in one of the owner’s fun art classes next door. Jennifer, a TX certified art teacher, offers the best creative summer camps for small, peaceful groups in her creek side studio. Check out her offerings for kids ages 6+ at Agua Fresca Studios. \n\nWe are confident that Edelweiss will exceed your expectations. Just read some of the comments from the previous guests. The owners really go out of their way to add those special touches to make your trip complete, such as keeping the kitchen stocked with condiments and basics that you may have forgotten. When their chickens can keep up you’ll even find fresh, organic eggs in the fridge. You won’t want to leave Edelweiss, but you can always come back for another visit!\n\n\nPet policy:\nPets allowed at $50 per pet and we ask that you leave cash or check for the pet fee with the keys upon checkout. Pets must be declared to the owners in advance, preferably at the time of booking. We ask that if dogs are left unattended at the home that they be crated and that they remain on leash when outside the fenced yard if they might be inclined to chase a cat, chicken or child. Thanks!\n\nThis home sits on two private acres and we allow our guests as much privacy as possible. We also invite guests to explore our 10 shared acres next door, and are happy to strike up a conversation in passing. We love meeting new people!\n\nThere is a lovely creek view from the backyard, but the bank is steeper here and access down to the water is not super easy. There are multiple easy creek access points on the shared 10 acres, though. While our creek is a year 'round creek, our extended Texas drought has taken its toll on area springs and water levels can fluctuate during hot, dry spells.",Wimberley,https://www.airbnb.com/rooms/434915?location=Canyon%20Lake%2C%20TX,0.998618
1,Canyon Lake Hideaway- 3 Relaxing Acres next to Potter's Creek Park!,"CANYON LAKE HIDEAWAY - a SkyRun Texas Property \n\n\nNestled on the north side of Canyon Lake with easy access to Potter's Creek Park Canyon Lake Hideaway is a great place for your next lake getaway.\n\nBuilt in 2013 the Hideaway boasts three bedrooms and three acres of relaxation, Hill Country Style. Indoor-outdoor living is the name of the game at the Hideaway. Relax inside with friends and family in the charming and well-appointed common spaces or step outside and enjoy the Hill Country on over 800 square feet of deck space.\n\nThe open and spacious kitchen/dining and living area with tall, vaulted ceilings is perfect for gathering, visiting, and taking it easy. Enjoy the open layout and joy of cooking a meal while catching up with friends in the adjoining dining and living areas. The kitchen is fully stocked with all the essentials needed to make that perfect vacation feast, and the living room has an HDTV with cable and a unique electric fireplace to help put you in vacation mode from day one.\n\nThe master suite is comprised of a queen bed and a private master bathroom. So if you're traveling with couples be sure to be the one who does the booking! The second bedroom also hosts a queen-sized bed. The third bedroom is perfect for the kids as it houses a bunk bed and a daybed with a trundle for a total of four twin beds.\n\nA Hill Country Getaway isn't complete without some time spent outside. Don't miss the large outdoor deck with comfortable seating for all, a charcoal grill, and a beverage cooler. It's the perfect spot to enjoy the nearly three acres of peace and quiet that's yours for the stay ensuring that only the friendly deer are likely to interrupt your morning cup of coffee or evening gahtering after a day on Canyon Lake. \n\nPotter's Creek Park is just down the road with great lake swimming areas, boat ramp access, and much more. Come on out and enjoy Canyon Lake Hideaway.\n\nYou must be 25 years or older to rent this property. Maximum occupancy at all times for this home is 8 guests. We manage several nearby rental properties and therefore often have staff in the area. Please come and have a good time, but be aware that behavior that is disrespectful or disruptive to surrounding neighbors will result in you forfeiting the remainder of your stay. \n\nNo smoking or pets are allowed on this property.There is a limit of 3 vehicles unless otherwise approved.Enjoy your stay at beautiful Canyon Lake!\n\nWORD Permit L1311",Canyon Lake,https://www.airbnb.com/rooms/16313157?location=Canyon%20Lake%2C%20TX,0.998274
2,*REDUCED* F1 RENTAL 1.5 MI TO TRACK,"BEAUTIFUL 3 bedroom 2 bathroom home is just 1.5 miles from the Circuit of Americas track. Our home is in the perfect location for you to enjoy the events of Formula 1 without the hassel of finding transportation or having to spend hours in traffic. Comfortably sleeps 6 (1 king, 1 queen, 2 twins) but can accommodate 10 with futon, couch, & blow-up mattress. House comes fully furnished with Wi-Fi, cable/surround sound, stocked food and beverages, brand new linens throughout, & gaming consoles (Playstation 3 & Wii). **Special requests can be made at an additional charge that is negotiable between host and renter.**\r\n\r\nThis home is conveniently located just 20 minutes from Downtown Austin and less than 10 minutes from Austin-Bergstrom International Airport. Downtown Austin boasts an array of restaurants, bars, clubs, & outdoor attractions fun for all ages. Adults can enjoy Austin's popular nightlife in the Warehouse District and the world famous 6th Street.\r\n\r\n**Renters will be provided a full list of amenities, attractions, local hot spots, & more useful information to make staying in Austin an easy, fun, & memorable experience!**\r\n\r\nMORE PICTURES TO COME: MASTER BEDROOM & 3RD BEDROOM",Austin,https://www.airbnb.com/rooms/674706?location=Cedar%20Creek%2C%20TX,0.997459
3,"Park Manor Guest House, lovely!","Beautiful country retreat located on historical Independence Trail! Charming shops, superb restaurants, antiques fair in Round Top, and home to Blue Bell Ice Cream! 2 bedroom (both with antique double/full beds), fully self contained cottage, complete kitchen and bath. Decorated in antiques and front porch with rockers to enjoy the fresh air and gorgeous sunsets. Washington on the Brazos State Park close by and just 30 minutes to A&M University campus! Perfect as a little getaway!",Brenham,https://www.airbnb.com/rooms/18657931?location=Chappell%20Hill%2C%20TX,0.997268
4,Spacious Living Near Lake Houston,"My house is in a Master-Planned community w/lots of green space, parks, playgrounds, jogging trail & ponds. The living space is modern, spacious, w/an open floor plan, beautifully furnished w/1 king sized bed & 3 queen sized beds. Close proximity to city attractions including 25 mins from Downtown Houston, 10 mins from Bush intercontinental airport & many energy companies along beltway 8. There are several attractions; restaurants, movie theater, bowling alley, shopping center, & grocery stores.",Humble,https://www.airbnb.com/rooms/19196476?location=Cleveland%2C%20TX,0.997268
5,The Lake House at Knob Hill,"Beautiful quiet lakefront home away from home....with lake views from every room. Bring your kids and grandkids to make many great memories. Relax or dine on the 80ft covered porch, or take the kayaks, canoe or paddle boat and drop a line or have a leisure float in the serene cove. Fully equipped kitchen, just bring your groceries...or choose from many restaurants close by. WiFi, Netflix, movie library for all ages, ping pong, air hockey, karaoke, and board games of all kinds. You will remember this getaway forever!",Little Elm,https://www.airbnb.com/rooms/16160041?location=Corinth%2C%20TX,0.997268
6,"Just outside ATX, 2 beds private rm","You will have one private room with two twin beds upstairs in my beautiful house, access to two sitting areas, two bathrooms, living room w/ Netflix, cool backyard patio/fire pit, kitchen, dining room, wifi, laundry.\n\nPerfect for business or tourism travelers. Close to Dell, Samsung, Apple, etc. An easy drive to downtown Austin. Great for ACL, SXSW, and Formula 1. Close to the Domain, Georgetown, Round Rock (and RR Outlets), Pflugerville, Leander, Cedar Park, Austin, Manor, etc. !",Pflugerville,https://www.airbnb.com/rooms/13462160?location=Coupland%2C%20TX,0.997268
7,Formula 1 Private Getaway-2+ acres!,"Home available to discriminating clientele for F-1 race. \r\n\r\nBeautiful 2/2 on 2.5 acres. Located 15 minutes to track, 20 minutes to downtown.\r\nAirport, F-1 and house ALL east of Austin! \r\n\r\nyou don't ever have to deal with Austin traffic if you choose!\r\n\r\n$4,500.00 for the entire week!\r\n\r\nIncluded: Full amenities, Laundry, Garage, Grill and grilling area,\r\n2 decks, one covered and one with outdoor shower.\r\nA private, wooded getaway of your own to enjoy. \r\n\r\nVERY clean, secure, safe country home with modern features- \r\n\r\nWe also have connections to private transport to racetrack!\r\n\r\n",Elgin,https://www.airbnb.com/rooms/739245?location=Cedar%20Creek%2C%20TX,0.997268
8,"Entire home-Peaceful, relaxing, safe neighborhood","Relax and enjoy yourself in my Texas Hill Country style home located on a safe & quiet street 15-20 mins from downtown, the airport & both Medical centers. There are two spacious decks to enjoy the beautiful outdoor spaces, and guests can use the whole house including the gourmet kitchen, laundry, living room, office with desk, breakfast nook, pool table, two outside dining areas, two indoor dining areas, a hot tub, and more! Stone Oak is one of the nicest and safest suburbs in San Antonio.",San Antonio,https://www.airbnb.com/rooms/5041349?location=Bulverde%2C%20TX,0.997268
9,Granbury lakehouse for Lrg Families w pool/wifi,"Private home on 1/3 acre on quiet canal opens up to the large part of the lake. The perfect place for families & friends to experience the ultimate lake vacation. With 5 bedrooms/4 baths, this spacious house sleeps up to 23. Home is completely equipped with all linens, towels, kitchen wares, etc. Watch sunrises, cook smores, play board games, do puzzles, watch TV and RELAX! Kitchen, living area and gameroom all offer huge windows with beautiful views of the lake. PLEASE NOTE: Families only.",Granbury,https://www.airbnb.com/rooms/16056772?location=Cleburne%2C%20TX,0.997054


# New scoring Function

Textual query from the user:

In [25]:
query = input()

a beautiful house with beach


We request additional informations in order to perform our research:

In [26]:
city_user = input('Please insert the city: ')
beds_user = int(input('Please insert the number of bedrooms: '))
price_user = int(input('Please insert the price: '))

Please insert the city: San Antonio
Please insert the number of bedrooms: 2
Please insert the price: 100


The **scoringFunction** calculates the new score for one document. We give the lowest score to the best 'match' because we use a heap structure with **nsmallest** function. We know that there are some missing or even wrong informations inside our documents so we use **try** and **except** in order to take them into account and give (them) a low score. 

We give a weight in range $(0,1)$ to each informations we have, in particular $0.65$ to the city_score, $0.25$ to the bed_score and $0.1$ to the price_score. For example we give $0$ as score when the city of the document is that required by the user and similarly for the bed_score and price_score. Furthermore we consider cases in which, for example, the difference between the document price and the user one is not so relevant (in our opinion) and give an intermediate score. So we get our weighted score.

In [27]:
def scoringFunction(city_user, beds_user, price_user, document_id):
    with open(path_all_files_doc + str(document_id) + '.tsv', 'r', encoding = 'utf8') as csvfile:
        file1 = csv.reader(csvfile, delimiter = '\t')
        
        # inside the price column we don't need the $ symbol
        remove_dol = re.compile(r'[^\d.,]+')
        
        # we read the documents (formatted by '\t') as columns and select those of the city, price and bedrooms number
        columns = [i for i in file1]
        
        city = columns[0][2].lower()
        if city == city_user: 
            city_score = 0
        else: city_score = 1
            
        try:
            beds = int(columns[0][1])
            if beds == int(beds_user): 
                beds_score = 0
            elif beds < beds_user:
                beds_score = 0.9
            else:
                beds_score = 0.1
        except:
            beds_score = 1
        
        try:
            price = int(remove_dol.sub('', columns[0][0]))
            if price == int(price_user): 
                price_score = 0
            elif price < price_user:
                price_score = 0.1
            elif price - price_user <= 10:
                price_score = 0.3
            else:
                price_score = 1
        except:
            price_score = 1
        
        tot_score = city_score * 0.65 + price_score * 0.1 + beds_score * 0.25
        
        return round(tot_score, 4)

The **sorted_documents** function returns the list of the top n documents (doc_id) sorted by score (from the highest one).

In [28]:
def sorted_documents(selected_documents, city_user, beds_user, price_user, n):
    sorted_list = []
    for doc_id in selected_documents:
        tot_score = scoringFunction(city_user, beds_user, price_user, doc_id)
        sorted_list.append((tot_score, doc_id))
    top_n_list = get_top_n(n, sorted_list)
    
    return top_n_list

In [29]:
selected_documents = SearchEngine(query, vocabulary, inv_indx)

In [30]:
top_n_list = sorted_documents(selected_documents, city_user.lower(), beds_user, price_user, n = 20)

# we are using heap function 'nsmallest' so we print (1-k) in order to have the highest score for the best document
top_n_list = [(1-k, v) for (k,v) in top_n_list]

In [31]:
def display_df_with_rank(top_n_list, path_all_files_doc):
    doc_id_list = [v for k, v in top_n_list]
    docs_files = []
    for i in doc_id_list:
        docs_files.append(pd.read_csv(path_all_files_doc + str(i) + '.tsv', sep = '\t'))
    docs_list = [[row for row in doc_i] for doc_i in docs_files]
    cols = ['1', '2', 'City', '4','Description', '6', '7', 'Title', 'Url']
    pd.set_option('max_colwidth', 500)
    z = pd.DataFrame(docs_list, columns = cols)[['Title', 'Description', 'City', 'Url']]

    z.insert(0, 'Ranking', z.index + 1)
    z.reset_index(drop=True)

    return z.style.set_table_styles([{'selector': '.row_heading, .blank', 'props': [('display', 'none;')]}])

In [32]:
display_df_with_rank(top_n_list, path_all_files_doc)

Unnamed: 0,Ranking,Title,Description,City,Url
0,1,"2 STORY SPACIOUS HOUSE, NOT TO FAR FROM DOWNTOWN!","My place is close to Restaurants and shopping centers, the beach (Sylvan Beach) and 20min from downtown houston (depending on traffic). You’ll love my place because of Its a beautiful 2 story house, very spacious, and the neighborhood is friendly.. My place is good for couples, solo adventurers, and business travelers.",La Porte,https://www.airbnb.com/rooms/15875030?location=Beach%20City%2C%20TX
1,2,Private Coastal Retreat,"Guesthouse sits in the back of a two acre beautiful oak tree covered lot, less than 2 miles from water. Includes a private patio with wood or gas grill for outdoor entertainment.\n Seven miles to Port Aransas beach ferry and ten minutes to Rockport shopping and dining. Fishing, duck hunting and bird watching paradise. Hummingbirds average stay 2 weeks. (Mar 10-? / September 10-?). Boat ramps and all amenities needed are 5 minutes from house. Street and alley entrances, plenty of parking space.",Aransas Pass,https://www.airbnb.com/rooms/15141506?location=Corpus%20Christi%2C%20TX
2,3,Beach Front Beauty - House,"This home is a beach front property. This is one of the few residential homes that is a waterfront property on North Beach/Corpus Christi Beach. This is close proximity to the Texas State Aquarium, Lexington Museum, Hurricane Alley Water Park and CC Hooks Baseball Field. There is a walking/jogging trail from outside the home all the way down to the Aquarium, Lexington, gift shops and restaurants. Fishing area is walking distance. Great place to enjoy the sun and the beach !",Corpus Christi,https://www.airbnb.com/rooms/13921867?location=Corpus%20Christi%2C%20TX
3,4,Beautiful Baby Beach House SPI TX.,"2 Bedroom- Nice comfortable spacious condo centrally located walking distance from popular restaurants, shops, bars and beach access. Pool, showers and BarBQ area to host your guest plus more. Join the fun and vacation at South Padre Island!",South Padre Island,https://www.airbnb.com/rooms/6710775?location=Brownsville%2C%20TX
4,5,Bethel Blue,Bethel Blue is a cozy 2 bedroom 1.5 bath house about 5 blocks from the beautiful beach. It is decorated nicely and equipped with everything your family needs for a wonderful beach getaway. Military and Senior citizens receive a 10% discount!,Bolivar Peninsula,https://www.airbnb.com/rooms/12253684?location=Bolivar%20Peninsula%2C%20TX
5,6,SPI Golf Course 1st class #4,"The South Padre Island Golf Course offers first class amenities. Premiere golf, tennis courts, heated swimming pools, hiking trails, a club house with an on-site restaurant & bar, 24 hour gated security.\n\nLocated within close driving distance, 8 or 10 miles, you are just 15 minutes away from the entertainment, shopping and restaurants in Port Isabel and on South Padre Island. Dine out every night or if you want, cook in. You can enjoy a kitchen that is fully equipped or you can cook out on the BBQ grill. Dish washer, also washer and dryer and garage add convenience to make this the perfect location for your stress free South Padre Vacation.\n\nExcursions include the beautiful South Padre Island beach on the Gulf of Mexico. Additionally Mexico is just a few miles away and features fabulous food, shopping and entertainment. There is a zoo in Brownsville, great for kids and adults alike. You may want to indulge in horseback ridding on the beach, the turtle rescue is a site to see, deep sea and bay fishing, the Schlitterbahn water park, dolphin tours, bird watching, sailing, snorkeling, surfing, parasailing, windsurfing and jet skiing. \n\nWith a Queen size bed in the master and two twins in the second bedroom, this two bedroom, two bathroom home located directly on the golf course is perfect for two couples to share. \n\nBeautifully furnished with contemporary furnishings already and Lots of rich color, this home features a full living and dining area as well as a screened patio. Designer touches and with three TV's with Directv to complete the package.\n",Laguna Vista,https://www.airbnb.com/rooms/924616?location=Brownsville%2C%20TX
6,7,SPI Golf Course 1st class #20,"The South Padre Island Golf Course offers first class amenities. Premiere golf, tennis courts, heated swimming pools, hiking trails, a club house with an on-site restaurant & bar, 24 hour gated security.\n\nLocated within close driving distance, 8 or 10 miles, you are just 15 minutes away from the entertainment, shopping and restaurants in Port Isabel and on South Padre Island. Dine out every night or if you want, cook in. You can enjoy a kitchen that is fully equipped or you can cook out on the BBQ grill. Dish washer, also washer and dryer and garage add convenience to make this the perfect location for your stress free South Padre Vacation. \n\nExcursions include the beautiful South Padre Island beach on the Gulf of Mexico. Additionally Mexico is just a few miles away and features fabulous food, shopping and entertainment. There is a zoo in Brownsville, great for kids and adults alike. You may want to indulge in horseback ridding on the beach, the turtle rescue is a site to see, deep sea and bay fishing, the Schlitterbahn water park, dolphin tours, bird watching, sailing, snorkeling, surfing, parasailing, windsurfing and jet skiing. \n\nWith a Queen size bed in the master and two twins in the second bedroom, this two bedroom, two bathroom home located directly on the golf course is perfect for two couples to share. \n\nBeautifully furnished with contemporary furnishings already and Lots of rich color, this home features a full living and dining area as well as a screened patio. Designer touches and with three TV's with Directv to complete the package.\n",Laguna Vista,https://www.airbnb.com/rooms/923151?location=Brownsville%2C%20TX
7,8,MarBella House Near the Galveston Beach,Location Location Location - Large 3 bdrm house can walk 2 blocks to the beach - or 1/4 mile to pleasure pier - or several area restaurants to walk to. Rehabilitated beautiful 100 year old home. Lots of parking space - large backyard deck with BBQ grill for the family hangout. Central A/C units separated up and down for your comfort.\n\nDon't be fooled by the cottage or bungalow - that means small.... This is a 2000 square foot large home all to yourselves with plenty of parking.,Galveston,https://www.airbnb.com/rooms/7459305?location=Bayou%20Vista%2C%20TX
8,9,Spacious Beach House,"Large, multi-level home 4 bedrooms and 3.5 bath. Two large common areas, big dining room, and gourmet kitchen. Beautiful deck and backyard with sand box for small children. Well furnished and appointed. Washer/Dryer. Great for large groups. Port Aransas Short-Term Rental #270309.",Port Aransas,https://www.airbnb.com/rooms/13012107?location=Corpus%20Christi%2C%20TX
9,10,The Beach House Too,"Beautiful new house with all the amenities of home!\n3 bedrooms, 2 baths -- Stunning open floor plan , beautiful kitchen with bar area. Central Air/Heat, Microwave, Dishwasher, Washer/Dryer. No carpet , all tile floor . Features 4 flat screen tvs and a dvd player. Large covered decks front and back,downstairs outdoor shower , short walk to the beach, House is located in quiet area of beach that is wider and has less traffic. Great family house and area",Bolivar Peninsula,https://www.airbnb.com/rooms/18568358?location=Anahuac%2C%20TX
