# AirBnB Topic Modeling Summary

In [2]:
import pandas as pd
import numpy as np
import glob as glob

from gensim.models.ldamulticore import LdaMulticore

  utils.PersistentlyDeprecated2018,


In [3]:
def display_results(results):
    for index, results in results:
        print(str(index) + ': ' + str(', '.join(results.split('"')[1::2])))

# Full Review Tokens

### 50 Topics, 10 Words, 50 Passes

This model was trained with full set of tokens from the airbnb reviews.  There does seem to be some useful topic clusters within the list of 50 but many of the words are names of people or places which are not relevant for the sake of this project so we will likely try to remove these to get a more focused result.

In [4]:
# Load Model
ldamodel_full = LdaMulticore.load('../models/ldam_reviews_50topics_10words_50passes_full.model')

In [5]:
# Print Topics
full_review_results = ldamodel_full.print_topics(num_topics=50, num_words=10)

# Display Results
display_results(full_review_results)

0: ben, tower, greg, debbie, vista, siempre, nikki, venue, coit, cocina
1: und, die, sehr, ist, wir, der, war, man, mit, das
2: n't, place, would, room, night, bit, stay, noise, one, nice
3: day, back, garden, morning, night, sunset, lovely, loved, one, wonderful
4: city, place, quiet, great, perfect, neighborhood, space, spot, studio, stay
5: 10/10, charm, barbara, lady, face, chip, recommand, address, conforme, painted
6: touch, breakfast, coffee, snack, thoughtful, wine, even, morning, left, provided
7: great, gave, local, tip, recommendation, host, city, area, helpful, provided
8: really, enjoyed, stay, thank, much, hospitality, cat, appreciated, thanks, staying
9: san, francisco, fran, visit, perfect, trip, time, visiting, fransisco, explore
10: room, bathroom, private, bedroom, clean, kitchen, living, space, bed, shared
11: house, people, place, n't, meet, friendly, really, get, time, great
12: per, casa, joyce, con, molto, zona, non, muito, una, com
13: detail, michelle, gem, hi

# Non Name Entities

### 50 Topics, 10 Words, 50 Passes

With this model I took out all of the name entities to try to reduce some of the noise from people and place name and focus on the feedback itself.  From first glance this seems to be the best option.

In [12]:
# Load Model
ldamodel_non_ne = LdaMulticore.load('../models/ldam_reviews_50topics_10words_50passes_no_ner.model')

# Print Topics
non_ne_review_results = ldamodel_non_ne.print_topics(num_topics=50, num_words=10)

In [13]:
# Display Results
display_results(non_ne_review_results)

0: home, feel, felt, like, host, beautiful, stay, made, wonderful, comfortable
1: good, excellent, value, always, location, host, help, available, accommodation, new
2: parking, car, street, find, easy, spot, park, free, garage, rental
3: house, day, 's, one, amazing, full, kitchen, time, fun, meal
4: easy, public, access, transportation, get, close, transport, city, transit, around
5: make, sure, made, went, way, stay, everything, beyond, expectation, comfortable
6: ..., per, di, molto, il, non, ed, che, judy, ad
7: rent, og, unexpected, particular, waited, till, til, mine, delay, zone
8: back, time, come, next, stay, definitely, place, sf, would, visit
9: apartment, could, better, picture, n't, asked, even, spacious, location, beat
10: suite, square, union, pier, wharf, fisherman, 39, fall, gute, restaurants
11: noise, bit, night, issue, unit, little, door, hear, street, good
12: recommand, cuisine, chambres, mail, terrasse, temp, 5min, fonctionnel, tions, auto
13: communication, qui

# Non-Name Entities Plus Adjectives

## 50 Topics 10 Words 50 Passes 

Added Adjectives to the non-name entities to emphasize descriptive parts of the reviews.

In [4]:
# Load LDA
ldam_no_ne_plus_adj = LdaMulticore.load('../models/ldam_reviews_50topics_10words_50passes_no_ner_plus_adj.model')

# Print Topics
results_non_ne_plus_adj = ldam_no_ne_plus_adj.print_topics(num_topics=50, num_words=10)

In [5]:
display_results(results_non_ne_plus_adj)

0: short, main, walk, cosy, handy, professional, brilliant, away, ride, attraction
1: light, noise, bad, upstairs, noisy, loud, night, hear, bit, natural
2: small, new, incredible, roomy, parent, unbeatable, mary, smart, solid, brand
3: lovely, welcoming, warm, home, interesting, charming, host, open, house, inviting
4: cool, whole, full, mission, fun, enough, place, district, really, house
5: beautiful, wonderful, view, host, home, gorgeous, stay, gracious, house, city
6: public, close, happy, transportation, accessible, transport, restaurant, calm, city, easily
7: kitchen, bed, extra, comfy, room, bathroom, space, well, coffee, towel
8: little, responsive, cozy, communicative, cheap, question, knowledgeable, space, host, spot
9: und, die, ist, sehr, wir, war, der, mit, man, auch
10: rent, i, wi-fi, och, var, till, dollar, med, det, som
11: easy, check-in, check, access, get, made, communication, location, communicate, place
12: attentive, muy, que, una, con, casa, para, un, est, n
13

# Adjectives

## 50 Topics, 5 Words, 50 Passes

Out of all of the parts-of-speech tokens that were extracted and modeled the adjective tokens were the only one with a positive result.  One interesting thing to note is that compared to any other topic modeling sample the adjectives were best able to display cleanliness as topics.  It seem that the best case would be to have No Name Entities and an emphasis on adjectives.

In [15]:
# Load Model
ldamodel_adj1 = LdaMulticore.load('../models/ldam_reviews_50topics_5words_50passes_adjectives.model')

# Print Topics
adj1_review_results = ldamodel_adj1.print_topics(num_topics=50, num_words=5)

In [16]:
# Display Results
display_results(adj1_review_results)

0: able, cheap, tricky, living, active
1: cool, true, communal, interested, o
2: nice, clean, comfortable, small, few
3: 10-15, smart, attractive, lively, upper
4: large, comfy, second, open, different
5: awesome, ok, exceptional, handy, general
6: est, un, nous, une, n
7: responsive, great, clean, helpful, comfortable
8: due, difficult, una, smooth, es
9: stylish, clean, soft, reasonable, comfortable
10: super, amazing, sweet, clean, adorable
11: helpful, friendly, great, clean, welcoming
12: happy, busy, front, fast, expensive
13: lovely, such, simple, basic, thorough
14: accessible, communicative, walkable, ideal, affordable
15: light, advertised, automated, functional, furnished
16: cozy, peaceful, clean, accommodating, tidy
17: great, clean, comfortable, many, few
18: beautiful, enough, comfortable, delightful, clean
19: check-in, flexible, prompt, less, muni
20: quiet, safe, close, clean, comfortable
21: private, more, own, clean, comfortable
22: other, most, only, same, special


# Nouns

This had the least relevant topics so far with no topic related to accuracy or cleanliness.  Using only nouns for tokens also amplified the problem of having many people and place names and gave little details about the actual feel or message of the review.

In [9]:
# Load Model
ldamodel_nouns = LdaMulticore.load('../models/ldam_reviews_50topics_10words_50passes_nouns.model')

In [10]:
# Print Topics
noun_review_results = ldamodel_nouns.print_topics(num_topics=50, num_words=10)

# Display Results
display_results(noun_review_results)

0: apartment, stay, everything, host, location, time, sf, thanks, neighborhood, question
1: place, location, host, sf, stay, everything, clean, time, nice, neighborhood
2: beach, square, union, wharf, building, location, fisherman, car, minute, cable
3: mission, district, castro, heart, park, location, brian, street, dolores, distance
4: space, view, city, hill, neighborhood, host, deck, patio, top, plenty
5: s, tr, le, la, et, bien, d, dans, de, avon
6: photo, mike, flat, amy, tony, communicator, email, thanks, pete, fast
7: shower, water, bed, review, thing, heater, towel, sheet, cleanliness, lock
8: room, bathroom, bed, bedroom, kitchen, living, space, host, area, location
9: studio, work, valley, noe, week, conference, center, brand, hayes, tom
10: unit, check-in, flight, door, luggage, breeze, hour, bag, stair, key
11: michael, t, response, process, question, star, awesome, wa, booking, quick
12: description, will, %, sara, accommodating, wait, superb, self, equipment, china
13: h

# Verbs

## 50 Topics, 10 Words, 50 Passes

Verbs were the least useful.  With no context there were no real trends within the topics defined by the model.  

In [19]:
# Load Model
ldamodel_verb1 = LdaMulticore.load('../models/ldam_reviews_50topics_10words_50passes_verbs.model')

# Print Topics
verb1_review_results = ldamodel_verb1.print_topics(num_topics=50, num_words=10)

In [20]:
# Display Results
display_results(verb1_review_results)

0: welcome, explain, anticipate, unwind, concern, straight, comfort, deserve, crawl, would
1: help, share, describe, invite, play, sightsee, read, serve, introduce, noise
2: enter, mind, block, respect, disturb, become, warn, label, separate, hand
3: check, visit, meet, spend, forget, reccomend, mesmerize, be, transportation.., aa
4: realize, reserve, bottle, await, spark, recomended, revisit, responsive, overprice, always
5: think, experience, sparkle, save, understand, bake, outfit, airport, s., pop
6: din, breathtaking, w, pull, crowd, microwave, name, cool, spoil, tourist
7: do, meet, want, picture, spend, mean, require, interact, tire, assure
8: set, exceed, chat, pass, tour, def, transport, complain, borrow, e
9: book, surprise, attach, ve, depict, stream, soak, recomend, surpass, combine
10: be, have, expect, appreciate, spend, surround, bus, face, arrange, interest
11: prepare, hide, wash, des, die, med, f, top, uns, et
12: find, park, rent, drive, size, worry, land, limit, man