# Homework 3 - Find the perfect place to stay in Texas!

## Group 31 :  Livia Lilli, Edoardo Gallo, Amirhossein Rajabi Shizari 



The following are all the libraries that we need. So let's import them!

In [1]:
import pandas
import csv
from pandas.core.frame import DataFrame
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import nltk
nltk.download('stopwords')
import string
import numpy as np
import math
import sklearn
from IPython.display import display, HTML
from scipy import spatial
import json
import heapq
import folium
from geopy import distance

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/livialilli/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/livialilli/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

# Step 1: Data

For this step, we just download the data!

In [3]:
rentals_file = pandas.read_csv(("files/Airbnb_Texas_Rentals.csv"), sep =",", delimiter= None, header = "infer", names = None, index_col = None,
                       encoding="ISO-8859-1")

In [4]:
rentals_file.head(3)

Unnamed: 0.1,Unnamed: 0,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url
0,1,$27,2,Humble,May 2016,Welcome to stay in private room with queen bed...,30.020138,-95.293996,2 Private rooms/bathroom 10min from IAH airport,https://www.airbnb.com/rooms/18520444?location...
1,2,$149,4,San Antonio,November 2010,"Stylish, fully remodeled home in upscale NW â...",29.503068,-98.447688,Unique Location! Alamo Heights - Designer Insp...,https://www.airbnb.com/rooms/17481455?location...
2,3,$59,1,Houston,January 2017,'River house on island close to the city' \nA ...,29.829352,-95.081549,River house near the city,https://www.airbnb.com/rooms/16926307?location...


# Step 2: Create documents

We create a .tsv file for each record of the dataset.

Then we store the documents in a directory with inside one file per house review. You should name each file as <b><i>doc_i.tsv</i></b> where <b>i</b> is the dataframe index the document.

In [5]:
#creating tsv files
i = 0
for r in range(len(rentals_file)):
    record = rentals_file.loc[[r]]
    name = "files/doc_" + str(i) +".tsv"
    record.to_csv(path_or_buf = name, sep='\t')
    
    i += 1
    

KeyboardInterrupt: 

We create a function to read the tsv file: it will be really usefull in future, so that we won't have to write everytime the same code.


In [5]:
#function to read the tsv file
def read(file_name):
    read_file = pandas.read_csv(file_name, sep ="\t", delimiter= None, header = "infer", names = None, index_col = None, usecols = None,
                       encoding="ISO-8859-1")
    return read_file
    

In [6]:
col_names = ["average_rate_per_night","bedrooms_count", "city", "date_of_listing", "description", "latitude", "longitude","title","url"]

# Step 3: Search Engine

## Preprocessing

Now, we want to create two different Search Engines that, given as input a <b>query</b>, return the houses that match the query.

As a first common step, we must <b>preprocess</b> the documents by

* Removing stopwords
* Removing punctuation
* Stemming
* Anything else we think it's needed

So, we create our functions:

#### stopWords

This function takes as input a dataframe column and returns a list with all the words not contained in the stopwords set.

#### punctuation

This function takes as input a list (we suppose is the list returned by the stopWords function) and it returns another list with just the elements of the input which are not in the punctuation set.

#### stemming

This function has got as input a list (we suppose is the list returned by the stopWords function) and it returns an other list with all the words where the <i>stemming</i> was applied.

#### all_col

Finally there is this function that applies all the preproccessing functions. It infact takes as input a read file and for every column of the file it applies all the precedent functions. 

But pay attention! As we can see from the read file, there are many new line characters "\n" which, maybe for a mistake, have been written attached to other words, so that the the punctuation function could not find them. 
For this reason we have to remove them before that it's too late! We do this with a easy replace string method.

The all_col function returnes a dictionary where, for every column name (key), we have all the correspondent cleaned text in form of a list.


In [7]:
def stopWords(column_file):
    
    stopWords = set(stopwords.words('english'))
    words = word_tokenize(str(column_file))
    wordsFiltered = []

    for w in words:
        if w not in stopWords:
            wordsFiltered.append(w)
    return wordsFiltered
    

In [8]:
def punctuation(list):
    exclude = set(string.punctuation)
    for el in list:
        if el in exclude:
            list.remove(el)
    return list

In [9]:
def stemming(list):
    ps = PorterStemmer()
    output = []
    for word in list:
        stem_word = ps.stem(word)
        output.append(stem_word)
    return output
#it returns a list (of the current column) with all the stem-words


In [10]:
#On ALL the COLUMNS
#applyng stopWords,punctuation and stemming functions for every column of the dataframe
#it returns a dictionary, where keys are the columns names, and for every key there is the list of cleaned words.
def all_col(read_file):
    dic = {}
    for name in col_names:
        s = str(read_file[name][0]).replace("\\n", " ")
        l = stopWords(s)
        l = punctuation(l)
        result = stemming(l)
        dic[name] = result
        #print(result)
    return dic

## 3.1 Conjunctive query

At this moment, we narrow out interest on the <b>description</b> and <b>title</b> of each document. 

It means that the first <b>Search Engine</b> will evaluate queries with respect to the aforementioned information.

For this reason we create a function to filter, everytime we need, the columns we need. In particular it takes as input a dictionary (we suppose that it is the one returned by the <b>all_col</b> function) and it returns an other one with just the keys we need (description and title). 

In [11]:
#we have to consider just description and title columns


def filter_keys(dictionary):
    keys = ["description", "title"]
    return {x: dictionary[x] for x in dictionary if x in keys}



### 3.1.1 Create your index

We create some dictionaries that will be usefull to our <b>Search Engine</b>.

#### vocabulary

For our goal we decide to create a dictionary called <b> vocabulary</b> where for each word, is given a list of all documents that contain that word.

Because of the long processing time, we save it on a json file, so that in the future we can just open the file without running the code again.

#### indexing

Then we create another dictionary. At this time we have as keys the words and as values an <b>ID term</b>, assigned to each word.

#### file_voc

We also have the inverse: this dictionary contains infact the Id terms as keys, and the correspondent words as values.



In [11]:
#in future we can move this two blocks of code in an othe file, and upload them when we need
#for now, for convention we leave them here

In [12]:
#for every tsv file, I apply the preprocess, I filter the columns that I need, and then I build the vocabulary
vocabulary = {}
for i in range(len(rentals_file)):
    name = "files/doc_" + str(i) +".tsv"
    df = read(name)
    clean_d = all_col(df)
    filtered_d = filter_keys(clean_d)
    values = filtered_d.values()
    for l in values:
        for el in l:
            if el not in vocabulary.keys():
                vocabulary[el] = [name]
            else:
                vocabulary[el].append(name)
    
    
#it has got as keys the words, as values all the docs where words are in 
    

In [13]:
jv = json.dumps(vocabulary)
f0 = open("myVoc.json","w")
f0.write(jv)
f0.close()

In [12]:
with open("myVoc.json") as f0:
    vocabulary = json.load(f0)

In [13]:
#index assignment
#for every key-word, we have the index
keys = vocabulary.keys()
list_keys = list(keys)
indexing = {}
for i in range(len(list_keys)):
    term = "term_id_" + str(i)
    
    indexing[list_keys[i]] = term

In [14]:
file_voc = {}

for i in range(len(list_keys)):
    term = "term_id_" + str(i)
    
    file_voc[term] = list_keys[i]

#it has as keys the id terms, as values, the corrispondent words

In [15]:
file_voc_keys = list(file_voc.keys())

#### Inverted Index

... But the most important dictionary we need is the <i>Inverted Index</i>.

 It will be a dictionary of this format:

`{term_id_1:[document_1, document_2, document_4], term_id_2:[document_1, document_3, document_5, document_6], ...}`

Also in this case we need to save the dictionary on a json file to avoid to run always this part of code, that takes long time.


In [18]:
#SAVE ON OTHER FILE
#this has as keys the term id, as values the docs where id's word is in
inv_index = {}
for i in range(len(file_voc_keys)):
    new_key = file_voc_keys[i]
    old_key = file_voc[new_key]
    inv_index[new_key] = vocabulary[old_key]

In [19]:
j = json.dumps(inv_index)
f = open("vocabulary.json","w")
f.write(j)
f.close()

In [16]:
with open("vocabulary.json") as fh:
    inv_index = json.load(fh)

### 3.1.2 Execute the query

Given a query, that we let the user enter, the Search Engine is supposed to return a list of documents.

<b> What documents do we want?</b>

Since we are dealing with <b>conjunctive queries (AND)</b>, each of the returned documents should contain <b>all the words in the query</b>.

The final output of the query must return, if present, the following information for each of the selected documents:

* Title
* Description
* City
* Url


So, to make our Search Query, we le the user enter a query. Our first goal is to "clean" it with preprocessing.

The second goal is to be sure that all the words of the (preprocessed) query are contained in our vocabulary. Infact if a word is not in our vocabulary, it means that there can be documents that contain <b>all</b> the words of the query! 

Then we create a dictionary called <b>docs_query</b> where for every word (considered with its ID term), it gives a list of all the docs which contain it.
So making an <b>intersection</b> of all these lists of documents, we find the ones in common among all the query's words.

What does it mean?

The result of the intersection is a set with all the documents that contains <b>ALL</b> the words of the query.
The very last goal is to display the files returned by the search engine, with just the columns required above.


In [17]:
def user_query():
    query = input()
    query = stopWords(query)
    query = punctuation(query)
    query = stemming(query)
    return query

In [24]:
query = user_query()

f = True
count = 0
while f:
    for word in query:
        if word not in vocabulary.keys():
            count = count
            query = user_query()
        else:
            count += 1
        
    if count == len(query):
        f = False
        
 

print(query)

with garden and near to the airport
['garden', 'near', 'airport']


In [25]:
docs_query = {}
for word in query:
    index = indexing[word]
    docs_query[index] = inv_index[index]
    
    

In [26]:
v = docs_query.values()

In [27]:
v = list(v)

In [28]:
intersection = set(v[0]).intersection(*v)

In [29]:
for file in intersection:
    file = read(file)
    
    result = file[["title","description","city","url"]]
    display(result.style)
    #print(result.to_html())

Unnamed: 0,title,description,city,url
0,Peaceful garage apt. close in!,"Quiet street, neat neighborhood near bus/train service to downtown, hospitals, conventions, sports. Equidistant to both airports. \nView onto tranquil garden area. Full kitchen and basic cable.\n\nThe light rail is 1/2 block from the apartment, 3 minute walk. \n We are at the LINDALE PARK stop.\nSee this link for a map of the lines, present and proposed: \n",Houston,https://www.airbnb.com/rooms/5781472?location=Atascocita%2C%20TX


Unnamed: 0,title,description,city,url
0,Family Home near Austin Airport,"Spacious home with wood floors, and fire place. Kitchen has full available appliances and living area has recliner, LED TV, local channels, & Wifi all over the house. Master bedroom has King size bed, double sink, separate shower and a garden tub.",Austin,https://www.airbnb.com/rooms/8191636?location=Cedar%20Creek%2C%20TX


Unnamed: 0,title,description,city,url
0,1000 sq ft 1bed 1ba near DFW airport/Six Flags,"Depending on your type of stay, I can accommodate for what you are needing. Its on the third floor with only stair access, garden tub, and upgraded appliances. Super close to Six Flags, DFW Airport, and 30mins from Downtown Dallas",Grand Prairie,https://www.airbnb.com/rooms/19214169?location=Cedar%20Hill%2C%20TX


Unnamed: 0,title,description,city,url
0,BREATHE DEEPLY A Cozy Austin Cabin,"A stone throw south of the Austin City Limits, this cabin is situated in 2.5 wooded acres. Luxurious and quiet. It is inhabited by deer, raccoon, rabbits, owls, lizards and butterflies. There is an old stone well filled by an aquifer (legend states the well was built by Ben McCullough; the civil war hero). A large back yard is used for walks, bonfires and golfing/batting balls. Lounge in the dappled sun on the patio near a running pond surrounded with gardens. Prepare a leisurely bar-b-que. Talk late into the night by the fire pit. \n\nThis 100+ yr old cabin has been revived and furnished with love. Time seems to stop while soaking in the deep old claw tub filled with endless hot spring water . Drifting to sleep, become aware of crackling in the wood burning stove, a train whistle and the trickling pond. Breakfast choices are street tacos or waffles...with pecans? \n\nIts possible you'll be tempted to stay all day. However, the center of Austin or San Marcos, with music, swimming, fishing and kayaking are just 15-25 minutes from this retreat. \nSound like a fit for you? More info:\nCircuit of the Americas is 18 miles through country roads. It is possible to attend this event and never enter Austin. A toll road will deliver you from the airport to the cabin and the event without the Austin traffic. Also close and south is an after race restaurant popular with race participants. \n\nNote for SXSW and other major events: Consider being in the crowds day and night then retreating by the fire at your quiet abode just 15 minutes south from all the ruckus of Central Austin. Invite your friends to hang with you and perhaps make your own music. Guests under 21 welcomed. \n\n\nIncluded in price:\nheating with wood burner &/or electric heater. Host can prep the fire. Firewood complementary AC for the summer/fall plus ceiling fan\nNO TV\nwireless internet - fast\nwasher/dryer available\nclaw tub with shower\nkitchen equipped with small refrigerator, toaster oven, small microwave, coffee maker\nlarge BBQ pit available\nof course all towels, quality cotton sheets, feathered quilt, pillows, utensils, dishes, cups, coffee, teas and toiletries are ready for you cleaning when requested. \nup to 3 parking spaces\n3 fire pits to lounge by\nprivate patio\n\nExtra charges:\ntransportation: Round trip transportation to the airport or elsewhere offered by host\npublic bus stop 3 miles from the cabin, it is easy to grab a ride to the bus stop from the host- no charge.\nweekend reservations are a minimum of 2 nights. If you can only stay 1 night, please ask about an exception so that I can attempt to make it work.\n\nGuest have exclusive access to the patio. You are welcome to wander all the grounds with the exception of course, of the private residence.\nComing soon: a sauna is being converted into a cedar silo sauna.\nFormula 1 rate includes stocked local wines and beer. Toll road to F1 is less that 2 miles from the cabin and provides a direct shot to the venue, without having to deal with Austin traffic. \nWEDDING PACKAGE: Evening before and night of wedding. This facilitates hosting out of town guests the night before and the day of. (note weekends are a two night minimum. this generally is not practical for the newly weds.) However, the events prior to the big event serves as an informal staging area and a place for your guests who need a meet up, changing and rest area prior to the wedding. Before the bride and groom return from the wedding, host will tidy the cabin and light a fire and candles for their special arrival.\n\nI interact with guests as much as they want. Usually, an initial orientation to the cabin and grounds, then texting for the morning \",Manchaca,https://www.airbnb.com/rooms/728502?location=Buda%2C%20TX


Unnamed: 0,title,description,city,url
0,Peaceful home near airport & downtown,Our home is filled with warmth from lots of natural light. The original wood floors and earthy decor create a peaceful environment for anyone. We have a big backyard & garden that is very kid and pet friendly as well as many toys to share in our boys room. The house is 5 minutes away from the airport as well as 10 minutes away from downtown. We are in the middle of San Antonio which is a great location for accessing all types of fun.,San Antonio,https://www.airbnb.com/rooms/19014109?location=Converse%2C%20TX


Unnamed: 0,title,description,city,url
0,Master Suite Gem Dallas ~ home access,"Master Suite w/bath, walkin closet, cafe table. 3 mi.from White Rock Lake, 4 mi.from Dallas Arboretum & Botanical Gardens w/Concerts, close to Casa Linda restaurants, shops & movies, many things 2 do. Downtown (9 mi). Elite Northpark mall and Galleria mall near by: 6/8 miles away.\nAirprt: DFW/LOVE. Ask about airport rides!!\nAmenities: laundry room, kitchen & backyard oasis!!",Dallas,https://www.airbnb.com/rooms/11010416?location=Balch%20Springs%2C%20TX


Unnamed: 0,title,description,city,url
0,Home near Galveston Beach and other attractions,"Nice 4 Bedroom 2 Bath home in a great location close to freeways. \n5 min to Tanger Outlet Mall\n15 min from Galveston Beach/Schlitterbahn/Moody Gardens / Texas City Dike\n18 minutes to NASA\n23 min Kema Boardwalk\n35 min Houston Hobby Airport\n50 min to Houston (Minute Maid Park, NRG Stadium, Zoo and Musems\n60 min to George Bush Intercontinel Airport",Texas City,https://www.airbnb.com/rooms/18076465?location=Bayou%20Vista%2C%20TX


Unnamed: 0,title,description,city,url
0,BREATHE DEEPLY A Cozy Austin Cabin,"A stone throw south of the Austin City Limits, this cabin is situated in 2.5 wooded acres. Luxurious and quiet. It is inhabited by deer, raccoon, rabbits, owls, lizards and butterflies. There is an old stone well filled by an aquifer (legend states the well was built by Ben McCullough; the civil war hero). A large back yard is used for walks, bonfires and golfing/batting balls. Lounge in the dappled sun on the patio near a running pond surrounded with gardens. Prepare a leisurely bar-b-que. Talk late into the night by the fire pit. \n\nThis 100+ yr old cabin has been revived and furnished with love. Time seems to stop while soaking in the deep old claw tub filled with endless hot spring water . Drifting to sleep, become aware of crackling in the wood burning stove, a train whistle and the trickling pond. Breakfast choices are street tacos or waffles...with pecans? \n\nIts possible you'll be tempted to stay all day. However, the center of Austin or San Marcos, with music, swimming, fishing and kayaking are just 15-25 minutes from this retreat. \nSound like a fit for you? More info:\nCircuit of the Americas is 18 miles through country roads. It is possible to attend this event and never enter Austin. A toll road will deliver you from the airport to the cabin and the event without the Austin traffic. Also close and south is an after race restaurant popular with race participants. \n\nNote for SXSW and other major events: Consider being in the crowds day and night then retreating by the fire at your quiet abode just 15 minutes south from all the ruckus of Central Austin. Invite your friends to hang with you and perhaps make your own music. Guests under 21 welcomed. \n\n\nIncluded in price:\nheating with wood burner &/or electric heater. Host can prep the fire. Firewood complementary AC for the summer/fall plus ceiling fan\nNO TV\nwireless internet - fast\nwasher/dryer available\nclaw tub with shower\nkitchen equipped with small refrigerator, toaster oven, small microwave, coffee maker\nlarge BBQ pit available\nof course all towels, quality cotton sheets, feathered quilt, pillows, utensils, dishes, cups, coffee, teas and toiletries are ready for you cleaning when requested. \nup to 3 parking spaces\n3 fire pits to lounge by\nprivate patio\n\nExtra charges:\ntransportation: Round trip transportation to the airport or elsewhere offered by host\npublic bus stop 3 miles from the cabin, it is easy to grab a ride to the bus stop from the host- no charge.\nweekend reservations are a minimum of 2 nights. If you can only stay 1 night, please ask about an exception so that I can attempt to make it work.\n\nGuest have exclusive access to the patio. You are welcome to wander all the grounds with the exception of course, of the private residence.\nComing soon: a sauna is being converted into a cedar silo sauna.\nFormula 1 rate includes stocked local wines and beer. Toll road to F1 is less that 2 miles from the cabin and provides a direct shot to the venue, without having to deal with Austin traffic. \nWEDDING PACKAGE: Evening before and night of wedding. This facilitates hosting out of town guests the night before and the day of. (note weekends are a two night minimum. this generally is not practical for the newly weds.) However, the events prior to the big event serves as an informal staging area and a place for your guests who need a meet up, changing and rest area prior to the wedding. Before the bride and groom return from the wedding, host will tidy the cabin and light a fire and candles for their special arrival.\n\nI interact with guests as much as they want. Usually, an initial orientation to the cabin and grounds, then texting for the morning \",Manchaca,https://www.airbnb.com/rooms/728502?location=Colorado%20River%2C%20TX


Unnamed: 0,title,description,city,url
0,Master Suite Gem Dallas ~ home access,"Master Suite w/bath, walkin closet, cafe table. 3 mi.from White Rock Lake, 4 mi.from Dallas Arboretum & Botanical Gardens w/Concerts, close to Casa Linda restaurants, shops & movies, many things 2 do. Downtown (9 mi). Elite Northpark mall and Galleria mall near by: 6/8 miles away.\nAirprt: DFW/LOVE. Ask about airport rides!!\nAmenities: laundry room, kitchen & backyard oasis!!",Dallas,https://www.airbnb.com/rooms/11010416?location=Arlington%2C%20TX


Unnamed: 0,title,description,city,url
0,Family Home near Austin Airport,"Spacious home with wood floors, and fire place. Kitchen has full available appliances and living area has recliner, LED TV, local channels, & Wifi all over the house. Master bedroom has King size bed, double sink, separate shower and a garden tub.",Austin,https://www.airbnb.com/rooms/8191636?location=Bastrop%20County%2C%20TX


Unnamed: 0,title,description,city,url
0,1000 sq ft 1bed 1ba near DFW airport/Six Flags,"Depending on your type of stay, I can accommodate for what you are needing. Its on the third floor with only stair access, garden tub, and upgraded appliances. Super close to Six Flags, DFW Airport, and 30mins from Downtown Dallas",Grand Prairie,https://www.airbnb.com/rooms/19214169?location=Bedford%2C%20TX


Unnamed: 0,title,description,city,url
0,R & M Roadhouse,"Center point is conveniently located near: the Alamo, San Antonio River Walk; Schlitterbahn Waterpark; John Newcombe Tennis Ranch; historic Fredericksburg for local wines and the National Museum of the Pacific War; Camp Verde historic post office and general store; Coming King Sculpture prayer garden. \n\nNightlife: Live music at Gruene Hall, the oldest dance hall in Texas; Live music and dining at John T. Floore Country Store in Helotes, Tx.\n\nOne hour to San Antonio International Airport.",Center Point,https://www.airbnb.com/rooms/15966243?location=Center%20Point%2C%20TX


Unnamed: 0,title,description,city,url
0,Spacious two bedroom apt @ 360North,"Clean & Comfortable two bedroom apt is great for your stay in Grand Prairie, Tx! Located only 20 minutes from downtown Dallas. This unit features high vaulted ceilings, new steel appliances, & more available for use by my guests. My apartment is about 15 minutes from DFW airport & The Parks Mall in Arlington. About 5 minutes away from Six Flags over Texas, AT&T stadium, Globe Life stadium (Texas Rangers) and Restaurants such as BJ's, TGI Fridays, Olive Garden etc. I don't have a private parking spot, but there is always plenty of open spaces near my apt.",Grand Prairie,https://www.airbnb.com/rooms/17655499?location=Cedar%20Hill%2C%20TX


Unnamed: 0,title,description,city,url
0,SURROUND YOURSELF WITH CHARM,"Clean & Comfortable two bedroom apt is great for your stay in Grand Prairie, TX! Located only 15 minutes from downtown Dallas. My apartment is about 10 minutes from DFW airport & The Parks Mall in Arlington. About 15 minutes away from Six Flags over Texas, AT&T stadium, Globe Life stadium (Texas Rangers) and Restaurants such as BJ's, TGI Fridays, Olive Garden etc. I don't have a private parking spot, but there is always plenty of open spaces near my apt.",Grand Prairie,https://www.airbnb.com/rooms/18807470?location=Cedar%20Hill%2C%20TX


## 3.2 Conjunctive query & Ranking score

In the new Search Engine, given a query, we want to get the <b>top-k</b> (the choice of k it's up to us!) documents related to the query. In particular:

* We have to find all the documents that contains all the words in the query (<b>as before...</b>).
    
* We sort them by their <b>similarity</b> with the query.

* The search engine returns in output <b>k</b> documents, or all the documents with non-zero similarity with the query when the results are less than k. We must use a <b>heap data structure</b> (you can use Python libraries) for maintaining the top-k documents.

To solve this task, we need <b>tfIdf score</b>, and the <b>Cosine similarity</b>. 



### 3.2.1 Inverted index

Our second Inverted Index must be of this format:

`
{
term_id_1:[(document1, tfIdf_{term,document1}), (document2, tfIdf_{term,document2}), (document4, tfIdf_{term,document4}), ...],
term_id_2:[(document1, tfIdf_{term,document1}), (document3, tfIdf_{term,document3}), (document5, tfIdf_{term,document5}), (document6, tfIdf_{term,document6}), ...],
...}
`


Let's save also here the dictionary on a separated document!


Practically, for each word we want the list of documents in which it is contained in, and the relative tfIdf score.
The tfIdf values are invariant with respect to the query, for this reason we can precalculate them!


At first, let's make an introduction to our scores!

#### An introduction to TF-IDF

<b>TF-IDF</b> stands for <i>“Term Frequency — Inverse Data Frequency”</i>. 

At first, we let's see what this term means mathematically.

<b>Term Frequency</b> (tf): it gives us the frequency of the <i>i</i>-word in each <i>j</i>-document. It is the ratio of number of times the word appears in a document compared to the total number of words in that document. 
It <b>increases</b> as the number of occurrences of that word within the document increases. 
<b>Each document has its own TF</b>.
    
$$ tf_{i,j} = \frac{n_{i,j}}{\sum_{i}{n_{i,j}}} $$

Where:
* $n_{i,j}$ = number of times the <i>i</i>-word appears in <i>j</i>-document;
* $\sum_{i}{n_{i,j}}$ = total of words in the <i>j</i>-document.



<b>Inverse Data Frequency</b> (idf): it is used to compute the weight of words across all documents. 
The words that occur <b>rarely</b> in the corpus have a <b>high</b> IDF score. 
It is given by the equation below.
 $$ idf_i = \lg{\frac{N}{df_{i}}} $$
 
Where 
* N = total of documents;
* $df_i$ = number documents containing <i>i</i>-word.


Combining these two we come up with the <b>TF-IDF score</b>:

$$ TF-IDF_{i,j} = TF_{i,j} * IDF_i $$



In [24]:
N = len(rentals_file)

In [26]:
inv_index2 = {}
for i in range(N):
    name_d = "files/doc_" + str(i) +".tsv"
    dataframe = read(name_d)
    clean_df = all_col(dataframe) #pulisce e restituisce diz: ad ogni colonna, lista di parole
    filtered_df = filter_keys(clean_df) 
    values = list(filtered_df.values()) #this is a list of two values list (one for title, one for description)
    doc_words = values[0] + values[1]
    distinct_words = set(doc_words)
    for word in distinct_words:
        freq_word = doc_words.count(word)
        TF = freq_word/ float(len(doc_words))
        IDF = math.log((float(N)) / len(vocabulary[word]))
        TFIDF = TF * IDF
        index = indexing[word]
        if index not in inv_index2.keys():
            inv_index2[index] = [(name_d, TFIDF)]
        else:
            inv_index2[index].append((name_d, TFIDF))

In [27]:
js2 = json.dumps(inv_index2)
f2 = open("vocabulary2.json","w", encoding = "utf-8")
f2.write(js2)
f2.close()

In [25]:
with open("vocabulary2.json","r", encoding = "utf-8") as f2:
    inv_index2 = json.load(f2)

### 3.2.2 Execute the query

In [35]:
query = user_query()

f = True
count = 0
while f:
    for word in query:
        if word not in vocabulary.keys():
            count = count
            query = user_query()
        else:
            count += 1
        
    if count == len(query):
        f = False
        
 

print(query)








with garden and netflix
['garden', 'netflix']


In [36]:
TF_query = {}
for word in query:
    freq_word = query.count(word) 
    TF = freq_word/ float(len(query))
    TF_query[word] = TF


In [37]:
TF_arr = np.asarray(list(TF_query.values()))

In [38]:
#key = word, value = idf
dicIDF = {}
for word in query:
    l = vocabulary.get(word)
    l = set(l)
    n_docs = len(l)
    if n_docs > 0:
        idf = math.log((float(N)) / n_docs)
    else:
        idf = 0
    dicIDF[word] = idf

In [39]:
IDF_arr = np.asarray(list(dicIDF.values()))

In [40]:
TFIDF_query = TF_arr * IDF_arr

In [41]:
#for every id, I search for the list of its docs
#so I build a list of lists , every list has docs of a query word.
#doc_score for every doc, gives a list of TFIDF values
all_docs = []
doc_score = {}
for word in query:
    index = indexing[word]
    l_tupl = list(inv_index2[index]) #lista delle tuple corrispondenti ad index
    l_docs = []
    for i in range(len(l_tupl)):
        l_docs.append(l_tupl[i][0])
        d = l_tupl[i][0]
        if d not in doc_score:
            doc_score[d] = [l_tupl[i][1]]
        else:
            doc_score[d].append(l_tupl[i][1])
    all_docs.append(l_docs)

In [42]:
intersection = set(all_docs[0]).intersection(*all_docs)
#this are all docs in common among the query words
#this means that they have TFID values != 0

In [43]:
intersection

{'files/doc_5454.tsv'}

In [44]:
#for every doc of intersection, I have the list of the three TFIDF
ds = {}
for doc in intersection:
    l_score = doc_score[doc]
    ds[doc] = l_score

In [45]:
#dict cos: for every doc of intersection, I compute its cosine sim with the query in terms of TFIDF
#then I create a dic where for every doc I have the value of cos sim
cos_d = {}
for el in ds:
    
    wanted = np.asarray(list(ds[el]))
    
    cos = 1 - spatial.distance.cosine(wanted, TFIDF_query)
    cos_d[el] = cos
        

In [46]:
heap = []
for el in cos_d:
    t = [el, cos_d[el]]
    heapq.heappush(heap, t)





In [47]:
heap

[['files/doc_5454.tsv', 0.9534807107342951]]

In [48]:
first_k = heapq.nlargest(3, heap)

In [49]:
first_k

[['files/doc_5454.tsv', 0.9534807107342951]]

In [50]:
for tup in first_k:
    print(tup[0])
    print(cos_d[tup[0]])

files/doc_5454.tsv
0.9534807107342951


In [51]:
for tupla in first_k:
    name_doc = tupla[0] 
    document = read(name_doc)
    document["similarity score"] = [cos_d[name_doc]]
    document = document[["title","description","city","url", "similarity score"]]
    display(document.style)
    
    

Unnamed: 0,title,description,city,url,similarity score
0,2 blocks to Rainey! Walk downtown!,"HOLIDAYS, LOCAL EVENTS, HIGH DEMAND WEEKENDS, DATES WITH LOW AVAILABILITY MAY BE SUBJECT TO HIGHER RATES, NIGHT MINIMUMS, AND/OR HIGHER FEES. AVAILABILITY SHOWN ON THIS CALENDAR'S WEBSITE IS NOT ALWAYS 100% ACCURATE. IT IS ALWAYS BEST TO INQUIRE WITH THE HOST/MANAGER/HOMEOWNER TO VERIFY AVAILABILTY AND A SPECIFIC QUOTE FOR YOUR STAY. QUOTES MAY DIFFER FROM THE BASE RATES, BASE FEES, NIGHT MINIMUMS AND/OR QUOTES PROVIDED BY THIS WEBSITE AND/OR DISPLAYED ON CALENDAR\n\nPLEASE READ ALL HOUSE RULES AVAILABLE ON WEBSITE AND/OR PROVIDED BEFORE BOOKING. ALL GUESTS ON RESERVATION MUST UNDERSTAND AND FOLLOW ALL RULES.\n\nTaylor House is in a residential neighborhood and all Guests will need to acknowledge they agree to respect my neighbors and the surrounding community to be accepted. Noise levels will be closely monitored. PARTIES ARE NOT ALLOWED. See below for more information.\n\nWelcome to my home! I look forward to hosting you!\n\nThis is the perfect vacation rental home in downtown Austin! I strive to provide Guests with the perfect home base with all the creature comforts, in the perfect location for exploring what Austin has to offer. I have lived in Austin my entire life and I want to help make your trip one-of-a-kind! IÃ¢ÂÂm available for suggestions of trendy new hot spots, and unique, only in Austin activities!\n\nYour perfect stay in the perfect location for the perfect vacation!\n\nA must stay! 2 blocks from the Rainey Street entertainment district! This area is the go-to spot for the Austin young professional nightlife scene. A short walk anywhere else downtown, including the convention center. The East 6th Street entertainment district is blocks away. Walk 2 blocks to Lady Bird Lake. Walk/Run/Bike the scenic Hike and Bike Trail or Kayak/Paddle Board on the lake as the sun sets. Kayak/Paddle Board/Small sailing and paddle boat rentals available blocks away. Walking distance to 6th Street, Congress, Paramount Theater, restaurants, bars, music venues, Texas Longhorn tailgating, DKR stadium, museums, a public pool, grocery store, coffee shops, comedy shows, improv shows, movie theaters (Alamo Ritz and Violet Crown), breweries (Hops & Grain), theater, art galleries, and other nightlife. The perfect location for the vacationer who wants to walk to everything unique that Austin has to offer. The home has tons of upgrades. Real wood floors. Large master bedroom with big bathroom. Modern kitchen with granite counters and stainless steel appliances. High ceilings. Huge granite island. The perfect fenced in yard for hanging out/re-cooperation. Ping pong table, a cornhole set, a washer set, tons of patio chairs, a Weber grill and tons of games are also included.\n\nEverything you need! (More about the home)\n\n-The home is very well maintained and very clean. \n-Tons of windows to let in light for a very open feel. \n-1st floor: Big fully furnished living room with a LED TV. Amazon Fire Stick with streaming Netflix subscription and other Apple TV content. There is an antenna that picks up the major networks and some other channels, but this is not guaranteed. There is no cable TV. Great place to recoup while watching some Netflix or grabbing a provided local magazine to find out more about what Austin has to offer. A large granite counter with several stools divides the kitchen and living area. Perfect for meal preparation, dining, a game of cards or just socializing. A state of the art kitchen complete with stainless steel appliances, gas stove, refrigerator/freezer, garbage disposal, dishwasher, microwave, toaster, blender, and coffee maker. Large surfaces for cooking and serving. All needed basic cooking ware and dining ware provided. Stained concrete flooring throughout. Large one half bath.\n-2nd floor: Master bedroom with private large bathroom. It has a double vanity and a large garden tub/shower. Some soap, shower gel, shampoo and conditioner typically provided. Hair dryer provided. Wood flooring on the stairs and landing. Energy efficient, front-loading Washer/Dryer. \n-3rd floor: All wood flooring. 2nd bedroom. Deck overlooking the neighborhood and the downtown skyline. Deck has plenty of seating and a table. \n-Yard: The perfect yard with privacy fence for rest, relaxation and socializing. Corhole set, washer set, tons of chairs, ping pong table, and a Weber grill.\n-Big front patio with chairs.\n-Central Energy Efficient HVAC. \n-1 parking spot on the lot with plenty of free street parking. Please do not park in front of neighboring homes.\n-High speed WIFI. \n-Fresh towels provided. \n-Cards and tons of games. \n-Fresh coffee. \n-BBQ tools. (Bring your own charcoal, lighter fluid and matches)\n-Hair Dryer. \n-Laundry detergent, softener and bleach provided for on-site laundry. \n-Iron and ironing board. \n\nEasy to Find!\n\nArriving by car: 4 blocks off of Cesar Chavez. Very easy to find with plenty of parking in the area.\n\nArriving by plane: Capitol Metro Bus Route '100-AIRPORT FLYER' drop offs downtown. 15 minute cab ride from the airport. Remember, Austin cabs only take 4 persons at one time. Uber/Lyft/Car2go all available in Austin.\n\n*15% LOCAL AND STATE TAXES ARE TYPICALLY INCLUDED IN THE ONLINE QUOTE .\n\nAustin City Code - Chapt Ã¢ÂÂ Ã¢ÂÂExcept as otherwise provided in this section, not more than six unrelated adult may reside in a dwelling unit.Ã¢ÂÂ (Ordinance No. ). THIS APPLIES TO ALL SHORT TERM/VACATION RENTALS IN AUSTIN. Please inquire further if you have any questions or concerns.\n\n***Furnishings and amenities will be furnished as described whenever possible. In the event that an item is damaged/broken/lost/consumed the Host will restock as soon as possible. All items in description/pictures are not guaranteed.***\n\nLet me help you have the perfect vacation! Thank you for looking! Have a great day!\n\nOL #",Austin,https://www.airbnb.com/rooms/1250575?location=Colorado%20River%2C%20TX,0.953481


# Step 4: Define a new score!    (?)

### Prices

In [62]:
min_input = input("Insert min avarage-rate: ")
max_input = input("Insert max avarage-rate: ")





min_input = stopWords(min_input)
max_input = stopWords(max_input)


min_input = punctuation(min_input)
max_input = punctuation(max_input)





min_input = stemming(min_input)
max_input = stemming(max_input)






Insert min avarage-rate: 50 
Insert max avarage-rate: 120


In [63]:
print(min_input, max_input)

['50'] ['120']


In [64]:
request = [min_input[0], max_input[0]]
request

['50', '120']

In [65]:
float(request[0])

50.0

In [77]:
d = read(d)

In [80]:
d = all_col(d)
d

{'average_rate_per_night': ['150'],
 'bedrooms_count': ['1'],
 'city': ['corpu'],
 'date_of_listing': ['octob'],
 'description': ['profess'],
 'latitude': ['27.600363829481'],
 'longitude': ['-97.21563909697399'],
 'title': ['luxuri'],
 'url': ['http']}

In [84]:
d["average_rate_per_night"][0]

'150'

In [85]:
#key = price in the interval wanted by user, value = list of docs
our_docs = {}
for i in range(N):
    name_d = "files/doc_" + str(i) +".tsv"
    dataframe = read(name_d)
    clean_df = all_col(dataframe)
    price = clean_df["average_rate_per_night"][0]
    if float(price) >= float(request[0]) and float(price) <= float(request[1]):
        if price not in our_docs:
            our_docs[price] = [name_d]
        else:
            our_docs[price].append(name_d)
    else:
        pass

In [67]:
our = json.dumps(our_docs)
f3 = open("ourDocs.json","w")
f3.write(our)
f3.close()

In [68]:
with open("ourDocs.json") as f3:
    our_docs = json.load(f3)

In [136]:
#prezzo:dist da min range

distances= {}
for p in our_docs.keys():
    dist = abs(float(p)-float(request[0]))
    distances[p] = dist

In [137]:
#doc: price's distance

dd= {}
for p in our_docs.keys():
    docs = our_docs[p] #list of docs
    for d in docs:
        dd[d] = distances[p]
        
        


In [138]:
heap1 = []
for document in dd:
    ll = [document, dd[document]]
    heapq.heappush(heap1, ll)



In [139]:
first_n = heapq.nsmallest(10, heap1)

In [140]:
first_n

[['files/doc_10.tsv', 15.0],
 ['files/doc_100.tsv', 49.0],
 ['files/doc_10000.tsv', 49.0],
 ['files/doc_10001.tsv', 0.0],
 ['files/doc_10002.tsv', 29.0],
 ['files/doc_10007.tsv', 49.0],
 ['files/doc_10010.tsv', 49.0],
 ['files/doc_10012.tsv', 40.0],
 ['files/doc_1002.tsv', 40.0],
 ['files/doc_10021.tsv', 0.0]]

In [141]:
#this are the first lowest ones choosen by the heap algorithm

In [142]:
for l in first_n:
    d = l[0]
    d = read(d)
    d = d[["title","description","city","url"]]
    display(d.style)
    
    

Unnamed: 0,title,description,city,url
0,Cozy 1 bedroom/bathroom with pool,My cool and comfortable bedroom apartment feels like home. It comfortably fits 2 people and centrally located on a quiet street. Enjoy a gourmet kitchen and easy access to all major highways.,Irving,https://www.airbnb.com/rooms/7276294?location=Colleyville%2C%20TX


Unnamed: 0,title,description,city,url
0,Room at the Lake,"Room at the Lake, whether your just traveling through and need a room for your stay or looking for a weekend get-a-way at the Lake. Just a short Stroll through the Countryside at Cedar Creek Lake.",Mabank,https://www.airbnb.com/rooms/15370783?location=Cedar%20Creek%20Reservoir%2C%20TX


Unnamed: 0,title,description,city,url
0,*Zen private studio in the heart of South Austin*,"The peaceful private backyard studio is close to everything - downtown, Lady Bird Lake, South Congress, Barton Springs, Zilker Park, Auditorium Shores, Palmer Auditorium, minutes from East Austin. YouÃ¢ÂÂll love the place because of the unique space. Tucked under sprawling Southern Live Oaks trees it has incredible light, a lush queen-sized bed, comfy fold-out leather sofa bed. My place is ideal for couples, solo adventurers, business travelers, families (with kids), and it's pet friendly.",Austin,https://www.airbnb.com/rooms/16157135?location=Brazos%20River%2C%20TX


Unnamed: 0,title,description,city,url
0,Private Room near Fiesta Texas,"Lovely quiet neighborhood just outside San Antonio in Helotes only 10 minutes away from Fiesta Texas & UTSA, and 15 minute to Sea World. Two full bathrooms available for use. You're welcome to use the kitchen, deck, TV room, and washer & dryer.",Helotes,https://www.airbnb.com/rooms/6360252?location=Boerne%2C%20TX


Unnamed: 0,title,description,city,url
0,Good Vibe Vineyards Retreat,GVV Retreat is located in the heart of downtown San Saba. Enjoy the view of the town and beyond from one of our two spacious balconies while sipping a nice glass of your favorite wine from the two outstanding wineries located across the street.,San Saba,https://www.airbnb.com/rooms/10381801?location=Brady%2C%20TX


Unnamed: 0,title,description,city,url
0,Dignowity Hill Backyard Bungalow,"My place is close to parks, eats and drinks. Walking distance to Lockwood park,The Pearl, Dignowity meats, Panchos and Gringos, Tucker's Kozy Korner, Alamo Brewery and Burleson Yard Beergarden.\n\n5-10 minute bike-ride to the Alamo, Hemisfair park, Historic Market square, Southtown and more.. YouÃ¢ÂÂll love my place because of the location, the outdoor space, and privacy of a small bungalow in walking distance of the city. My place is good for couples, solo adventurers, and business travel.",San Antonio,https://www.airbnb.com/rooms/13435858?location=Bulverde%2C%20TX


Unnamed: 0,title,description,city,url
0,Comfy NW Austin Suburb Apt near Metro Rail to Aus,"This apartment is: \n* Just outside NW Austin city limits in Cedar Park\n* Convenient to all things Austin\n* Minutes from Lake Travis\n* Located a few minutes from the Austin Metro Rail\n* In an area where Uber and Lyft are available! \n* Walking distance to local parks, restaurants and dining\n* Down the street from one of the area's largest malls\n* A popular place because of location \n* Good for couples, solo adventurers, and business travelers\n* FREE WIFI",Cedar Park,https://www.airbnb.com/rooms/13153903?location=Bertram%2C%20TX


Unnamed: 0,title,description,city,url
0,Mi Casita Hideaway,"FREE-STANDING BEDROOM SUITE WITH SEPARATE ENTRANCE. Experience Tuscan flavor peace & quiet centrally located between San Antonio and Austin; at The Bandit Golf Club on the banks of the Guadalupe River. Only minutes to marvelous food and live entertainment in Gruene; family fun at Schlitterbaun water park; river floating; Outlet Malls; wineries, and San Antonio and Austin.",New Braunfels,https://www.airbnb.com/rooms/1124535?location=Brazos%20River%2C%20TX


Unnamed: 0,title,description,city,url
0,Comfy guest house close to it all!,"Get everywhere in 5 minutes- TCU, cultural district, medical district, downtown, west 7th.",Fort Worth,https://www.airbnb.com/rooms/751987?location=Benbrook%2C%20TX


Unnamed: 0,title,description,city,url
0,Fish Pond,It's a house!!!!,Harker Heights,https://www.airbnb.com/rooms/17847394?location=Bell%20County%2C%20TX


### Coordinates

In [10]:
latitude=float(input('Enter latitude:'))
longitude=float(input('Enter longitude:'))
r=float(input('Enter radius in km:'))
coord=[latitude, longitude]


Enter latitude:30.020138
Enter longitude:-95.293996
Enter radius in km:100


In [11]:
coord

[30.020138, -95.293996]

In [13]:
r


100.0

In [14]:
df4 = rentals_file.filter(items = ['latitude','longitude'])

In [34]:
#doc:dist from input coord
llc = {}
for i in range(len(df4)):
    docu = "files/doc_" + str(i) +".tsv"
    try:
        
        all_coord = [df4.latitude[i], df4.longitude[i]]
        dis = distance.distance(coord, all_coord).kilometers #we want km values
        if dis < r:
            llc[docu] = dis
    except:
        pass


In [35]:
heap2 = []
for document in llc:
    uu = [document, llc[document]]
    heapq.heappush(heap2, uu)
heap2


[['files/doc_0.tsv', 8.883104225691304e-06],
 ['files/doc_10032.tsv', 34.51318632926681],
 ['files/doc_1006.tsv', 58.554857565262544],
 ['files/doc_10077.tsv', 39.564766552630175],
 ['files/doc_1281.tsv', 94.39667699272606],
 ['files/doc_1430.tsv', 58.90400133293887],
 ['files/doc_1007.tsv', 35.48269175297866],
 ['files/doc_10078.tsv', 37.508753636586945],
 ['files/doc_1148.tsv', 32.058987075944536],
 ['files/doc_1283.tsv', 94.07468351979479],
 ['files/doc_1359.tsv', 66.16160626551425],
 ['files/doc_1431.tsv', 65.79534435585958],
 ['files/doc_1530.tsv', 30.1956491111698],
 ['files/doc_1610.tsv', 30.317901524341327],
 ['files/doc_1008.tsv', 93.24841593599811],
 ['files/doc_10079.tsv', 37.93680786829855],
 ['files/doc_1075.tsv', 62.846657831941634],
 ['files/doc_1154.tsv', 68.88836639572379],
 ['files/doc_1239.tsv', 60.188197237830686],
 ['files/doc_1285.tsv', 29.629246185517623],
 ['files/doc_1325.tsv', 29.320329999008592],
 ['files/doc_1360.tsv', 54.88562290295608],
 ['files/doc_1389.t

In [39]:
first_m = heapq.nsmallest(10, heap2)
first_m

[['files/doc_0.tsv', 8.883104225691304e-06],
 ['files/doc_10032.tsv', 34.51318632926681],
 ['files/doc_1006.tsv', 58.554857565262544],
 ['files/doc_1007.tsv', 35.48269175297866],
 ['files/doc_10077.tsv', 39.564766552630175],
 ['files/doc_10078.tsv', 37.508753636586945],
 ['files/doc_10079.tsv', 37.93680786829855],
 ['files/doc_1008.tsv', 93.24841593599811],
 ['files/doc_10101.tsv', 86.18226327338421],
 ['files/doc_1012.tsv', 57.23889027711029]]

In [41]:
for l in first_m:
    d = l[0]
    d = read(d)
    d = d[["title","description","city","url"]]
    display(d.style)

Unnamed: 0,title,description,city,url
0,2 Private rooms/bathroom 10min from IAH airport,Welcome to stay in private room with queen bed and detached private bathroom on the second floor. Another private bedroom with sofa bed is available for additional guests. 10$ for an additional guest.\n10min from IAH airport\nAirport pick-up/drop off is available for $10/trip.,Humble,https://www.airbnb.com/rooms/18520444?location=Cleveland%2C%20TX


Unnamed: 0,title,description,city,url
0,Luxurious High Rise Apartment by the Galleria,"Galleria apartment with beautiful views of city. Walking distance to the Galleria Houston, plenty of shopping, and Whole Foods. High speed Internet TV cable, pool, 24/7 club house, fitness center. This room is about 1,000 SQ. FT.",Houston,https://www.airbnb.com/rooms/16933713?location=Bellaire%2C%20TX


Unnamed: 0,title,description,city,url
0,33' of Freedom Sailboat,"This is best for one person or two people sleeping separate as the beds are singles. A fun antique store is close by, as well as good food- such as Skipper's and Joe's. All in all, a walkable area. The Johnson Space Center and the Space Center Houston Visitor's Center are about 15-20 minutes away, either way around Clear Lake.",Kemah,https://www.airbnb.com/rooms/18581948?location=Beach%20City%2C%20TX


Unnamed: 0,title,description,city,url
0,"Look no further! MODERN, LUXURY 1 (URL HIDDEN)","Look no further! Your perfect stay in houston is a click away. Modern one bedroom apartment, brand NEW complex, clean, quiet, and let's talk about LOCATION (minutes from the galleria and easy access to the highway). You will not be disappointed. TV in the living room, comfortable mattress, WIFI, washer and dryer in the unit.",Houston,https://www.airbnb.com/rooms/18874368?location=Bellaire%2C%20TX


Unnamed: 0,title,description,city,url
0,10 MIN WALK TO NRG: Private Room! Near Med Center,"Located less than a mile from NRG Stadium, this gated community is perfect for attending events at the NRG! Park your car here for free with no worries (gated parking) and walk over to the stadium.\n\nWe are also a short drive or bus ride from the Texas Medical Center (2 miles)!\n\nYou'll have a private bedroom - with a TV and work desk, private walk-in closet, private 0.5 bathroom (sink and toilet), and full access to the shared bathroom (shower/bathtub) living room, kitchen, and patio.",Houston,https://www.airbnb.com/rooms/16649137?location=Bellaire%2C%20TX


Unnamed: 0,title,description,city,url
0,Luxury Medical Center NRG Stadium 615,"Beautiful fully furnished 1 bed 1 bath centrally located in the heart of Houston. One mile from MD Anderson and NRG stadium. Close to museums, theaters and the zoo. Access to pool and weight room. Unlike others taxes are included in my price.",Houston,https://www.airbnb.com/rooms/8822609?location=Bellaire%2C%20TX


Unnamed: 0,title,description,city,url
0,$45 Comfortable Apartment,"Very comfortable and clean apartment. There are 6 pools on the entire property. There is a convenient store inside the premises, 20 ft. away from unit. The Galleria Mall of Houston is less than 10 mins away. Everything you need is surrounding this apt. Very convenient and quiet neighborhood.",Houston,https://www.airbnb.com/rooms/18823920?location=Bellaire%2C%20TX


Unnamed: 0,title,description,city,url
0,The Stuttgart Room (Schaefer Haus),"This is the Stuttgart Room of the Schaefer Haus. You will have a private balcony and bathroom. You have full access to the kitchen and parlor. All the restaurants, shops, and bars located in The Strand are only a five minute walk away.",Galveston,https://www.airbnb.com/rooms/16917205?location=Bayou%20Vista%2C%20TX


Unnamed: 0,title,description,city,url
0,Ben's apartment C.,One of three apartments located on private Canal facing intercoastal waterway nice front porch fishing area under structure parking for boat under apartment Great sunsets and 1/2 a mile from beachtruly a fisherman's paradise cross the canal to East Bay 30 minutes to gulf of Mexico,Bolivar Peninsula,https://www.airbnb.com/rooms/18839303?location=Bolivar%20Peninsula%2C%20TX


Unnamed: 0,title,description,city,url
0,Resort Style Waterfront Super Bowl Rental,"Enjoy breathtaking sunsets while sitting poolside in the backyard of this gorgeous 5 bedroom 3.5 bathroom home with theatre room, study, and outdoor kitchen located on the Peninsula at Clear Lake. Make yourself at home while staying in Houston for Super Bowl LI. Take an Uber for a short commute to NRG. While in town visit the famous Kemah Boardwalk which offers exquisite fresh gulf seafood, amusement rides, exciting night life, and beautiful views of Galveston Bay!",League City,https://www.airbnb.com/rooms/16899678?location=Baytown%2C%20TX


# Bonus Step: Make a nice visualization!

In [149]:
latitude=float(input('Enter latitude:'))
longitude=float(input('Enter longitude:'))
radius=float(input('Enter radius in km:'))
coord=[latitude, longitude]

Enter latitude:28.503068
Enter longitude:-99.447688
Enter radius in km:100


In [172]:
coord

[30.020138, -95.293996]

In [173]:
df_map = rentals_file.filter(items = ['latitude','longitude', 'url', 'average_rate_per_night'])

In [174]:
df_map.head(3)

Unnamed: 0,latitude,longitude,url,average_rate_per_night
0,30.020138,-95.293996,https://www.airbnb.com/rooms/18520444?location...,$27
1,29.503068,-98.447688,https://www.airbnb.com/rooms/17481455?location...,$149
2,29.829352,-95.081549,https://www.airbnb.com/rooms/16926307?location...,$59


In [175]:
df_map['latitude'][0]

30.0201379199512

In [176]:
#there are NA values of coord in df, so we use try
accomodations = []
for i in range(len(rentals_file)):
    try:
        acc_coord = [df_map.latitude[i], df_map.longitude[i]]
        dist = distance.distance(coord, acc_coord) #we want km values
        if dist < radius:
            accomodations.append(i)
    except:
        pass


In [179]:
[df_map.latitude[0], df_map.longitude[0]]

[30.0201379199512, -95.29399600425128]

In [156]:
inside_r = df_map.iloc[accomodations]

In [159]:
m = folium.Map(location=coord, zoom_start=10, tiles = "OpenStreetMap")

folium.Marker(
    location=coord,
    popup='Location',
    icon=folium.Icon(color='blue')
).add_to(m)

folium.Circle( 
    location=coord,
    radius=radius*1000,
    color='#6a5acd',
    fill=True,
    fill_color='#7a6dd0'
).add_to(m)

for acc in inside_r.T: 
    house_coord = inside_r['latitude'][acc], inside_r['longitude'][acc] #house coord
    folium.Marker(
    location=house_coord,
    

    
    popup=folium.Popup('<a href=' + inside_r['url'][acc] + '>'  + inside_r['average_rate_per_night'][acc] + ' </a>'), #airbnb page of house
    icon = folium.Icon(color='darkpurple', icon='home')
).add_to(m)
   
m.save("bonusMap.html")

m