Date: 02.11.21

This script matches existing review-response dataset with their corresponding TripAdvisor URLs

Scripts used to rescue URLs from SLite DB files (extracted from original XMLs):
- data_prep/from_SQLite/rescue_review_urls.sh
- data_prep/from_SQLite/merge_review_response_pairs_with_url_info.sh


In [1]:
import random
from tqdm.notebook import trange, tqdm
import pandas as pd
pd.set_option('display.max_columns', None)

In [3]:
lang = 'en'
df_new = pd.read_pickle(f'/srv/scratch6/kew/trip_data_to_map/{lang}_rrgen.pkl')
if lang == 'en':
    df_old = pd.read_pickle(f'/mnt/storage/clwork/projects/readvisor/RESPONSE_GENERATION/hospo_respo_datasets/{lang}/{lang}_rrgen.alphasys.scored.rg.pkl')
else:
    df_old = pd.read_pickle(f'/mnt/storage/clwork/projects/readvisor/RESPONSE_GENERATION/hospo_respo_datasets/{lang}/{lang}_rrgen.alphasys.scored.pkl')

print(f'new dataframe contains {len(df_new)} entries. Columns:')
print(df_new.columns)
print(f'old dataframe contains {len(df_old)} entries. Columns:')
print(df_old.columns)


new dataframe contains 3327603 entries. Columns:
Index(['reviewid', 'grpid', 'domain', 'platformid_rev', 'rating', 'url',
       'platformrating', 'review_author', 'response_author', 'review_clean',
       'review_pp', 'review_lang', 'response_clean', 'response_pp',
       'response_lang', 'sentiment', 'source', 'db_internal_id',
       'establishment', 'trip_id', 'trip_url', 'country', 'xml_file_name',
       'city', 'trip_city_id', 'trip_review_id', 'trip_review_url',
       'authoridplatform', 'authorurlplatform'],
      dtype='object')
old dataframe contains 3327603 entries. Columns:
Index(['reviewid', 'grpid', 'domain', 'platformid_rev', 'rating', 'url',
       'platformrating', 'review_author', 'response_author', 'review_clean',
       'review_pp', 'review_lang', 'response_clean', 'response_pp',
       'response_lang', 'sentiment', 'source', 'db_internal_id',
       'establishment', 'trip_id', 'trip_url', 'country', 'split',
       'sentiment_alpha_system_1', 'score:review_respon

In [6]:
df_new = df_new[
        ['xml_file_name', # new
         'city', # new
         'trip_city_id', # new
         'trip_review_id', # new
         'trip_review_url', # new
         'authoridplatform', # new
         'authorurlplatform', # new
         'db_internal_id', # should match
         'establishment', # should match
         'review_clean', # should match
         'response_clean', # should match
         'review_author', # should match
         'response_author', # should match
         'source' # should match
        ]
    ]

print(df_new.columns)
print(df_old.columns)

Index(['xml_file_name', 'city', 'trip_city_id', 'trip_review_id',
       'trip_review_url', 'authoridplatform', 'authorurlplatform',
       'db_internal_id', 'establishment', 'review_clean', 'response_clean',
       'review_author', 'response_author', 'source'],
      dtype='object')
Index(['reviewid', 'grpid', 'domain', 'platformid_rev', 'rating', 'url',
       'platformrating', 'review_author', 'response_author', 'review_clean',
       'review_pp', 'review_lang', 'response_clean', 'response_pp',
       'response_lang', 'sentiment', 'source', 'db_internal_id',
       'establishment', 'trip_id', 'trip_url', 'country', 'split',
       'sentiment_alpha_system_1', 'score:review_response_length_ratio',
       'score:response_sentence_length', 'score:genericness_semantic_avg',
       'score:genericness_length_ratio', 'score:review_response_wmd',
       'response_pp_rg', 'split_imrg_compat'],
      dtype='object')


In [7]:
df_m = pd.merge(left=df_old, right=df_new, how='left', 
                on=[
                    'db_internal_id', # should match
                    'establishment', # should match
                    'review_clean', # should match
                    'response_clean', # should match
                    'review_author', # should match
                    'response_author', # should match
                    'source' # should match
                    ], 
                validate='1:1')

print(len(df_m))
print(df_old['review_clean'].equals(df_m['review_clean']))
print(df_old['response_clean'].equals(df_m['response_clean']))

3327603
True
True


In [9]:
df_m.columns

Index(['reviewid', 'grpid', 'domain', 'platformid_rev', 'rating', 'url',
       'platformrating', 'review_author', 'response_author', 'review_clean',
       'review_pp', 'review_lang', 'response_clean', 'response_pp',
       'response_lang', 'sentiment', 'source', 'db_internal_id',
       'establishment', 'trip_id', 'trip_url', 'country', 'split',
       'sentiment_alpha_system_1', 'score:review_response_length_ratio',
       'score:response_sentence_length', 'score:genericness_semantic_avg',
       'score:genericness_length_ratio', 'score:review_response_wmd',
       'response_pp_rg', 'split_imrg_compat', 'xml_file_name', 'city',
       'trip_city_id', 'trip_review_id', 'trip_review_url', 'authoridplatform',
       'authorurlplatform'],
      dtype='object')

In [10]:
df_m.to_pickle(f'/mnt/storage/clwork/projects/readvisor/RESPONSE_GENERATION/hospo_respo_datasets/{lang}/{lang}_rrgen.urls.pkl')


In [8]:
# sanity checking
idx = random.sample(range(0, len(df_m)), 1000)

for i in idx:
    print(i, df_m.iloc[i]['review_clean'], df_m.iloc[i]['trip_review_url'])

1079279 Valentines stay ---SEP--- We chose this location as it’s a short walk away from one of our favourite restaurants- peace and loaf. Parking was simple, check in was quick. We booked a superior room.. The room was quite large, the bed was huge. The only issue I had was when we first walked in there appeared to be a sewer problem..we opened the windows for a short while and the smell disappeared. Nice touch to have kit kats and free drink if your a member. Would definitely recommend and stay again https://www.tripadvisor.co.uk/ShowUserReviews-g186394-d1974359-r657010061-Holiday_Inn_Newcastle_Jesmond-Newcastle_upon_Tyne_Tyne_and_Wear_England.html
1276511 OMG!! ---SEP--- This place is the best steakhouse I have been to! And Maria, the waitress was so nice :) Everything was just perfect! https://www.tripadvisor.co.uk/ShowUserReviews-g186338-d952591-r723829704-Gaucho_Piccadilly-London_England.html
1954502 Pleasantly surprised ---SEP--- We went for drinks during happy hour and had El Pa

2416682 Lovely hotel in enviable location ---SEP--- Just returned from an overnight break to Brighton and stay at The Old Ship. Lovely old hotel in an ideal location right on the seafront road, fronting The Lanes. Check in was easy with the friendliest of staff and even though we had arrived early, our room was ready enabling us to enjoy a cold but glorious day. The hallways/corridors of the hotel were a bit tatty and tired but I understand that it is being refurbished and the reception/bar/main areas were neat and tastefully decorated. Old Otis lift to floors but you would still need to use stairs to access some rooms. Our room was lovely, no view (you could pay to upgrade to a sea view.) Bed comfy enough and again nicely decorated. I imagine that it would be noisy if you were on the front but our room was quiet and tucked away on the 2nd floor to the side. Only downside was the bathroom which was FREEZING! Only heating in there was a small heated towel rail which did nothing. It was 

1644897 Family friendly and great location. ---SEP--- Everything was as it should be, tea coffee biscuits etc. The staff were all friendly and efficient, lovely touch when a handbag was returned after our lunch in the red lion. We would return to Ballarat the city has a lot to offer. https://www.tripadvisor.com.au/ShowUserReviews-g255346-d737221-r474506063-Sovereign_Park_Motor_Inn-Ballarat_Victoria.html
515847 40th Birthday Stay ---SEP--- Hotel was lovely, staff are very welcoming from the moment we arrived at the door, nothing was too much trouble. Room was extremely clean and had everything you would require. The bathroom was out of this world and the bath was very relaxing. We had breakfast included and it was delicious, all good quality, fresh ingredients. Spa was very well looked after and clean with a lovely ambiance, perfect for relaxing and unwinding. Overall an amazing stay, couldn’t fault it. https://www.tripadvisor.co.uk/ShowUserReviews-g186338-d1845678-r741339455-Corinthia_

1155369 Room with a view ---SEP--- Stayed in the superior room with a view just for the one night and it was perfect! We'd forgotten we booked the superior room so it was a lovely surprise. Parking underneath. Check in/out was easy and the couple of staff we did speak to were friendly and helpful. The only downside was that we arrived late on the Friday evening and checked out Saturday morning so didn't get to spend much time in the hotel but that's because of our plans so nothing to do with the hotel! Mini bar stocked and included in the room, nespresso machine and kettle. Lovely spacious room with the perfect view of the pier (old and new). We're already looking to go back in the next couple of months and take full advantage of all the hotel has to offer this time! Can't wait! https://www.tripadvisor.co.uk/ShowUserReviews-g186273-d192531-r521121586-Jurys_Inn_Brighton_Waterfront-Brighton_East_Sussex_England.html
2431477 Raymond James Concert Trip ---SEP--- We recently stayed at this E

2520280 It's a local hang out ---SEP--- Great place to meet friends in the bar .staff were helpful But lacked knowledge of wines .ordered a chard got sav told it was same diff.We had bar food have had better.Sevice was slow one girl behind the bar To be fair It was a quiz night for locals . https://www.tripadvisor.com.au/ShowUserReviews-g255069-d2419879-r280032713-Salt_House-Cairns_Cairns_Region_Queensland.html
2068508 Business Trip ---SEP--- Stayed there Feb 13th on business trip. I booked a suite. It was small, cramped, loud and smelled badly. Definitely not worth the extra money. Pool area was awesome. Seemed like a great place for kids. https://www.tripadvisor.ca/ShowUserReviews-g154995-d182029-r348360304-Best_Western_Plus_Lamplighter_Inn_Conference_Centre-London_Ontario.html
610460 Worst place we have ever stayed ---SEP--- We were booked in here by friends, and we had not visited the place until we arrived for a wedding. The first room they gave us was very dirty, with mould growi

2193637 Beautiful lobby ---SEP--- Nice hotel. Rooms are a bit dated but very clean. Lobby is modern and beautiful. Bathroom was large and bright. I liked that there are shops and places to eat right in the building . Convenient on a rainy day. Would stay there again. https://www.tripadvisor.ca/ShowUserReviews-g154976-d182763-r274861907-The_Barrington_Hotel-Halifax_Halifax_Regional_Municipality_Nova_Scotia.html
1748803 Could have been better ---SEP--- The hotel itself is very nice and clean. It’s in a great location. The room was nice. King bed with a sitting area, we had plenty of room. The problem is we stayed 3 nights and they never refilled our coffee/ tea for the room. The 2nd day, I had to call down to the front desk and inform them that at 5pm they still had not Bennett there to clean our room. They informed me we had opted for our room not to be cleaned. We never requested that and we were never asked to forgo the cleaning of our room. The breakfast was free but nothing spectacu

917798 Great Property, Great Location ---SEP--- The Marriott Courtyard is a great place to stay when your flying in our out of the Denver Airport. The rooms are spacious and comfortable, the lounge is relaxing and inviting, and the staff is friendly and helpful. I travel often and have come to expect a lot from Marriott properties; this one does not disappoint. I'm sure you'll be just as satisfied. https://www.tripadvisor.com/ShowUserReviews-g33388-d85372-r214520278-Courtyard_by_Marriott_Denver_Airport-Denver_Colorado.html
1069000 "A Restaurant who value Customer Service as much as we do". ---SEP--- We had dinner here on our last night in surfers, it was delicious, The waiter and all Staff were amazing, very entertaining and very professional, the restaurant was exceptionally clean, the food was very fresh, they had no problem changing a couple of ingredients for us, the location was great also, We cant wait to return on our next visit to surfers, https://www.tripadvisor.com.au/ShowUse

2710760 Pampered ---SEP--- You feel like a queen/king staying here. The service is unbelievable. If you meet any of the staff they ask you how you are doing, if you need anything, do you need assistance. The room was charming with historical feel. Highly updated with electronic gizmos which was surprising. Lovely turn down service. The breakfast buffet has anything you could want but you can also order from a menu. There are so many activities that you could be busy for a week just trying them all. Walking the grounds is a treat. The gardens are extensive and it is like you have a huge city park to yourself. The three restaurants cover anything you might be looking for. https://www.tripadvisor.ie/ShowUserReviews-g663560-d246452-r514361759-Ashford_Castle-Cong_County_Mayo_Western_Ireland.html
2271596 Great hotel,lovely retreat. ---SEP--- Great room,number 30,large space great sized bed,a wee small bathroom though but comfortable enough for my birthday treat.The reception staff on hearing

1969630 Buggers are fabulous ---SEP--- Lunch special Monday to Friday, $18 which includes a beer or wine! Taste amazing, highly recommend. Made to Maha excellent standards, tasty and great ingredients. Make sure you order the chips, as they are also fabulishious! https://www.tripadvisor.com.au/ShowUserReviews-g255100-d1050531-r269439826-Maha-Melbourne_Victoria.html
538277 Amazing food ---SEP--- Food came out fast and it was delicious. Our waitress, Angel had awesome customer service and was very prompt and courteous. Will definitely come back. https://www.tripadvisor.com/ShowUserReviews-g45963-d3293526-r593083837-Senor_Frog_s_Las_Vegas-Las_Vegas_Nevada.html
681902 Very pleased ---SEP--- My family and I have been guests at this hotel twice and I'm sure we'll be back. The beds are very comfortable and the rooms are clean. The location is great for our needs. The staff is very helpful and friendly, making sure your needs are met! https://www.tripadvisor.com/ShowUserReviews-g34678-d87687-r

1093406 Cabin Cosy room ---SEP--- Was looking forward to a lovely few days away with a friend, Room was far to small, bunk beds were comfortable and room clean,but unable to move around in the room. Bathroom clean but again tiny. Really far too small to be offered as a twin room. I was expecting a small room with the word cosy in the title but didn't think it was possible to pay for such a tiny room. Would be ideal as a single room or for children. Also had a flicking light so couldn't really use the lights with out getting s headache. This was tried to be sorted but after one was fixed the other light started, and that was after the maintenance guys had left, Very glad we only booked for two nights as I don't think I could have stayed much longer, it was nice weather outside but not too hot and room was very stuffy, window left open at all times, the little fan given helped but was the first thing you put on when you entered. Spa was beautifully set out but was set at a far too high t

1940952 A good find ---SEP--- I reached Armidale in the late afternoon on a chilly August day, and used Google+TripAdvisor to find somewhere to stay. My chosen location was full, and, driving away, I found the New England Motor Inn by accident: I was glad I did. It's a pretty standard motel, with all the things you'd expect - including adequate (though not great) heating, and wifi that worked fine for me. It was quiet and I got a good night's sleep; it was good and clean, and the front desk service was helpful and fine. The motel is next door to "The Wicklow Hotel" - which is a well-appointed pub with good food on offer. I managed to enjoy a few beers and a meal there - going back to my room in the interim! It's also well-placed for the other delights of downtown Armidale. https://www.tripadvisor.com.au/ShowUserReviews-g255315-d1645480-r140833667-New_England_Motor_Inn-Armidale_New_South_Wales.html
1657841 Great food - great service!! ---SEP--- Excellent food and service (thanks Macie!)

3066817 Well worth a visit! ---SEP--- I don’t often visit Maghull but I will definitely be coming back to this gorgeous pub. Me and my partner wandered across it on Sunday afternoon and thought we would pop in for a drink. My partner was impressed with the beer on tap - he opted for a Carling and I got a gin and tonic - plenty of gins to select from and was served how a gin should. We had no intention of eating but the smell of the food and seeing how delicious it looked we ordered some nachos with pulled pork and chicken wings. I could not fault a single thing, served piping hot and absolutely DELICIOUS! Would like to thank the staff and management for such an enjoyable afternoon. Worth a visit! https://www.tripadvisor.co.uk/ShowUserReviews-g186337-d2190231-r609319673-Hare_and_Hounds-Liverpool_Merseyside_England.html
3061587 a bad birthday party ---SEP--- I gave a birthday luncheon for a relative who turned 80 years of age. The contacts with the restaurant staff in charge of group eve

1438469 Wonderful ---SEP--- Date night with the boy, never been here before and we was not disappointed. Really good food and the staff are lovely. Our Waitess Danielle was brilliant! Thankyou https://www.tripadvisor.co.uk/ShowUserReviews-g1897554-d15190186-r689473108-Chamuyo-Brighton_and_Hove_East_Sussex_England.html
3189360 Family stay ---SEP--- Our family had 3 rooms in your hotel for 4 days. The actual hotel was great, very clean, staff was kind & helpful, breakfast was well rounded. The only complaint we had was when we checked in 2 of the 3 room reeked of marijuana. It’s not the staffs fault but there is a no smoking policy in place for a reason. & you could smell it from the hall. We had to go out and buy air freshener & took till the second day for the smell to leave. https://www.tripadvisor.com/ShowUserReviews-g37209-d6557701-r669284054-Sleep_Inn_Suites_And_Conference_Center_Downtown-Indianapolis_Indiana.html
174933 Worst bday ever ---SEP--- We booked this place for 15 persons