## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [0]:
# File location and type
file_location = "/FileStore/tables/airbnb_clean.csv"
file_type = "csv"
 
# CSV options
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","
 
# The applied options are for CSV files. For other file types, these will be ignored.
airbnb_clean = spark.read.format(file_type) \
  .option("escape",';') \
  .option("inferSchema", infer_schema) \
  .option("multiLine","true") \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)
 
display(airbnb_clean)

listing_id,Name,Host ID,Host Name,Host Response Rate,Host Is Superhost,Host total listings count,City,Neighbourhood cleansed,State,Country,Property type,Room type,Accommodates,Bathrooms,Bedrooms,Amenities,Price,Number of reviews,Review Scores Rating,Review Scores Accuracy,Review Scores Cleanliness,Review Scores Checkin,Review Scores Communication,Review Scores Location,Review Scores Value,Reviews per month,id,date,reviewer_id,reviewer_name,comments
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,1107.0,2009-03-27,9193.0,Holly,Stephanie's offered all the most important things: a warm welcome into a comfortable home; a comfortable bed in a quiet room; fresh & clean towels & blankets; and easy access to Manhattan. Finding myself travelling to NYC in the future I feel I already have a open invitation to make home away from home through Stephanie's generousity.
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,7350.0,2009-08-13,26718.0,Greg,awesome couldn't have been better.
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,137500.0,2010-11-12,185050.0,Gerry,"""We stayed at the 111th Street apartment with Stephanie and her family. As an alternative to staying in an """"Overpriced"""" New York tourist hotel"
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,235192.0,2011-04-22,456184.0,Nicola,"""Very conveniently located just North of Central Park,out of the """" hussle and bussle """" and affordable"
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,244065.0,2011-04-29,351855.0,Leonella,"We had a great stay with Stephanie and her family : very welcome, good times shared together....The location is very well situated in a quiet neighborhood, close to public transport and Central park A strongly advise !! Bises à Spiderman :D Leonella et Karine"
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,564350.0,2011-09-25,363433.0,Tom,"We stayed at the Chez Chic budget room for a week. Stephanie was a great host, she respected our privacy and she provided us with information about the neighbourhood. The room is quite spacious en very clean. The bathroom is shared with other guests, but although the other room was constantly occupied, we practically never had to wait for the bathroom to be free. The wireless internet worked just fine, just as the television with digital channels. There are two subway stations at approximately 4-5 min walking distance. Lines 2,3, B and C depart form these stations. We felt safe in the neighourhood. Overall, we would definately recommend to stay at this apartment!"
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,2230438.0,2012-09-08,3150890.0,Bruno,"We had such an amazing experience staying over stephanie's place. We didn't have a specific time to arrive because we came from abroad, but she was very flexible with this. Arriving there, she was very kind by giving us a map from the subway and all the restaurants around and giving us tips about New York. She showed us the house and made us very comfortable staying there. We could use the kitchen, the fridge and the phone. We had wi-fi that worked normally (thank God!!!) The room was perfectly clean and organized, with AC, TV, towels, clean sheets, fan, etc. The bathroom was very clean too. Not to mention that there are two subway stations less than a block away, which makes everything so easy. We bought a 7-day pass and in 10 minutes from the place, we were downtown. In a nutshell, I don't have anything bad to say about the place at all. Actually, the opposite. I deeply recommend staying there and if I have to go to NYC again, that's the place I will stay."
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,15404006.0,2014-07-08,14072657.0,Lukas,"Stephanie, her husband and the kids were great. We really enjoyed our stay and would recommend it to anyone who plans to go to New York. The room, the bathroom and the kitchen were exactely like in the pictures. The information that Stephanie sent us beforehand made it easy to find the appartement and were a great guide to our first days with all the recommendations, a Subway map and much more! We would defenitely come back and stay there again. It's a great deal for a good prize!"
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,15899473.0,2014-07-17,15394739.0,Vinícius,The reservation was canceled the day before arrival. This is an automated posting.
2515,Sunny Private Room,16286162,Pat,1.0,False,4.0,Bronx,Allerton,NY,United States,House,Private room,1.0,1.0,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout,43.0,66.0,96.0,10.0,9.0,10.0,10.0,9.0,10.0,1.77,29622567.0,2015-04-10,29112884.0,Jessamyn,Stephanie had an emergency situation and couldn't host us at the last minute so we never met her. She made arrangements for us to stay at a budget hotel near by. The hotel was less than ideal but I was grateful to not have to make new arangments myself with one days notice.


In [0]:
airbnb=airbnb_clean.dropna()

In [0]:
# Tokenize the text
from pyspark.ml.feature import RegexTokenizer
regexTokenizer = RegexTokenizer(gaps = False, pattern = '\w+', inputCol = 'Amenities', outputCol = 'amenities_token')
amenities_token = regexTokenizer.transform(airbnb)
amenities_token.show(3)

In [0]:
# Remove stopwords
from pyspark.ml.feature import StopWordsRemover
swr = StopWordsRemover(inputCol = 'amenities_token', outputCol = 'amenities_sw_removed')
amenities_swr = swr.transform(amenities_token)
amenities_swr.show(3)

In [0]:
# Word Term Frequency
from pyspark.ml.feature import CountVectorizer
cv = CountVectorizer(inputCol="amenities_sw_removed", outputCol="tf")
cv_model = cv.fit(amenities_swr)
amenities_cv = cv_model.transform(amenities_swr)
amenities_cv.show(3)

In [0]:
# TF-IDF
from pyspark.ml.feature import IDF
idf = IDF(inputCol="tf", outputCol="features")
idf_model = idf.fit(amenities_cv)
amenities_tfidf = idf_model.transform(amenities_cv)
amenities_tfidf.show(3)

In [0]:
# Predict Rating Score
from pyspark.ml.feature import StringIndexer
stringIdx = StringIndexer(inputCol="Review Scores Rating", outputCol="label")
final = stringIdx.fit(amenities_tfidf).transform(amenities_tfidf)
final.show(3)

In [0]:
# Fit & Train Word2Vec
from pyspark.ml.feature import Word2Vec
 
#create an average word vector for each document
word2vec = Word2Vec(vectorSize = 100, minCount = 5, inputCol = 'amenities_sw_removed', outputCol = 'result')
model = word2vec.fit(amenities_swr)
amenities_w2v = model.transform(amenities_swr)
amenities_w2v.show(3)

In [0]:
# Find Similar Business
listing_id = amenities_w2v.select('listing_id').take(1)[0][0]
input_vec = amenities_w2v.select('result').filter(amenities_w2v['listing_id'] == listing_id).collect()[0][0]   
 
# Calculate cosine similarity between two vectors 
import numpy as np
from pyspark.sql.functions import udf
@udf("float")
def cossim_udf(v1): 
    v2 = input_vec
    similarity = np.dot(v1, v2) / np.sqrt(np.dot(v1, v1)) / np.sqrt(np.dot(v2, v2)) 
    return float(similarity)
similarity = amenities_w2v.select('listing_id', cossim_udf('result').alias("similarity"), "Amenities")
similarity = similarity.orderBy("similarity", ascending = False)
display(similarity)

listing_id,similarity,Amenities
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout
2515,1.0,Cable TV;Internet;Wireless Internet;Kitchen;Free parking on premises;Buzzer/wireless intercom;Heating;Smoke detector;Carbon monoxide detector;First aid kit;Safety card;Fire extinguisher;Essentials;Shampoo;Lock on bedroom door;Hangers;Hair dryer;Iron;Laptop friendly workspace;translation missing: en.hosting_amenity_49;translation missing: en.hosting_amenity_50;Private living room;Private entrance;Hot water;Bed linens;Extra pillows and blankets;Microwave;Refrigerator;Dishes and silverware;Cooking basics;Stove;Garden or backyard;Luggage dropoff allowed;Long term stays allowed;Cleaning before checkout


In [0]:
#test similarity between words
synonyms = model.findSynonyms('parking', 5)
synonyms.show(5) 

In [0]:
# Recommend Business based on Keyword
key_word = "heating"
docvecs = amenities_w2v
x = spark.createDataFrame([('newlistingid', key_word)]).\
    withColumnRenamed('_1', 'listing_id').\
    withColumnRenamed('_2', 'Amenities')
x.show()
x_token = regexTokenizer.transform(x)
x_swr = swr.transform(x_token)
input_vec = model.transform(x_swr)
input_vec = input_vec.select('result').collect()[0][0]
 
# Calculate cosine similarity between two vectors 
from pyspark.sql.functions import udf
@udf("float")
def cossim_udf(v1): 
    v2 = input_vec
    similarity = np.dot(v1, v2) / np.sqrt(np.dot(v1, v1)) / np.sqrt(np.dot(v2, v2)) 
    return float(similarity)
similarity2 = amenities_w2v.select('listing_id', cossim_udf('result').alias("similarity"), 'Amenities').distinct()
similarity2 = similarity2.orderBy("similarity", ascending = False)
display(similarity2)

listing_id,similarity,Amenities
846799,0.68240005,Internet;Kitchen;Heating
17779504,0.6786232,Wireless Internet;Heating
245366,0.6786232,Wireless Internet;Heating
16661916,0.67059827,Kitchen;Heating;Essentials
2677677,0.6684881,Air conditioning;Heating;Family/kid friendly;Essentials
2107939,0.6595231,Air conditioning;Heating;Smoke detector;Essentials
7512294,0.6506098,TV;Kitchen;Heating;Essentials;Shampoo
828540,0.6491058,Wireless Internet;Kitchen;Heating;Washer;Dryer
15422844,0.64836335,TV;Wireless Internet;Kitchen;Elevator in building;Heating;Family/kid friendly;Smoke detector;Essentials;Shampoo
1451415,0.64800614,TV;Wireless Internet;Kitchen;Heating;Family/kid friendly;Washer;Dryer;Essentials;Shampoo


In [0]:
# Recommend Business based on Keyword
key_word = "parking"
docvecs = amenities_w2v
x = spark.createDataFrame([('newlistingid', key_word)]).\
    withColumnRenamed('_1', 'listing_id').\
    withColumnRenamed('_2', 'Amenities')
x.show()
x_token = regexTokenizer.transform(x)
x_swr = swr.transform(x_token)
input_vec = model.transform(x_swr)
input_vec = input_vec.select('result').collect()[0][0]
 
# Calculate cosine similarity between two vectors 
from pyspark.sql.functions import udf
@udf("float")
def cossim_udf(v1): 
    v2 = input_vec
    similarity = np.dot(v1, v2) / np.sqrt(np.dot(v1, v1)) / np.sqrt(np.dot(v2, v2)) 
    return float(similarity)
similarity2 = amenities_w2v.select('listing_id',cossim_udf('result').alias("similarity"), 'Amenities').distinct()
similarity2 = similarity2.orderBy("similarity", ascending = False)
display(similarity2)


listing_id,similarity,Amenities
6005217,0.7339025,Free Parking on Premises;Heating;Dryer
9709076,0.7291008,Wireless Internet;Free parking on premises
8521307,0.7048495,Wireless Internet;Pool;Kitchen;Free parking on premises;Heating;Dryer
11568447,0.70141625,Kitchen;Free parking on premises;Heating;Essentials
1713741,0.69261247,Wireless Internet;Kitchen;Free parking on premises;Pets allowed;Family/kid friendly
1267991,0.6904249,Wireless Internet;Free parking on premises;Heating;Essentials
2337605,0.68917555,TV;Kitchen;Free parking on premises;Heating
8736623,0.6786475,Wireless Internet;Free parking on premises;Essentials;Shampoo
14418483,0.6783424,Kitchen;Free parking on premises;Essentials;Shampoo
912129,0.67810655,Wireless Internet;Pool;Free parking on premises;Hot tub;Washer;Dryer;Shampoo
