# PureGym – Customer Review Topic & Emotion Analysis (Google + Trustpilot)

## Business problem
PureGym receives a high volume of customer reviews across platforms. Manually reading reviews does not scale, and it makes it difficult to identify recurring operational issues by location.

## Objective
Build a repeatable NLP pipeline that:
- Cleans and standardises review text across sources
- Surfaces recurring issues via topic modelling (BERTopic; benchmark with LDA)
- Adds an emotion signal to better prioritise high-frustration feedback
- Produces location-level insights to support operational prioritisation

## Data
Two sources of English-language reviews:
- Google reviews (rating, location, comment text)
- Trustpilot reviews (stars, location, review content)

## Scope and assumptions
- Ratings < 3 are treated as negative reviews for both sources.
- Location names are used as the join key; minor naming inconsistencies may reduce matches.
- Topic models provide directional insights and require human validation before action.


#1. Large language Model from Hugging Face

##1.1 Importing packages and data





In [None]:
!pip install -U -q plotly==5.19.0

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m59.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import time
# Start time to track the total execution time of the notebook
start_time = time.time()

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
import json

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
file_path = "/content/drive/MyDrive/Colab Notebooks/Course 3/Project/all_bad_reviews.pkl"
print("File size:", os.path.getsize(file_path), "bytes")

File size: 2269337 bytes


In [None]:
#load "all_bad_reviews.pkl" saved file from google drive
import pickle

with open("/content/drive/MyDrive/Colab Notebooks/Course 3/Project/all_bad_reviews.pkl","rb") as f:
  data=pickle.load(f)

all_bad_reviews = data["all_bad_reviews"]
all_bad_reviews

Unnamed: 0,Review Content,Review Stars,Location Name,processed_comments
0,"Extremely busy, no fresh air.",1,Sutton Times Square,extremely busy fresh air
1,The men’s changing rooms smell bad. They need ...,2,Leeds City Centre South,men changing room smell bad deep clean sort sm...
2,No one was cleaning the equipment after use. C...,1,Dunfermline,cleaning equipment cleaning station hidden awa...
3,Not the best experience at 7am on a week day. ...,1,Bristol Harbourside,best experience bought pas received receipt ac...
4,Staff have their hands tied but surely head of...,1,Sunderland,staff hand tied surely head office bring rule ...
...,...,...,...,...
3942,"Equipment 7/10\n\nThey're pretty good, over th...",2,London Leytonstone,equipment pretty year attachment added variety...
3943,"""Alright for the price. Some members are a bit...",2,London Bermondsey,alright price member bit loud probably include...
3944,"Disorganised, cramped, and untidy. The old fre...",1,Bristol Union Gate,disorganised cramped untidy old free weight ar...
3945,"music always too loud, especially when cycling...",1,Bristol Union Gate,music loud especially cycling studio class com...


In [None]:
#load "all_bad_reviews_list_clean.pkl" saved file from google drive
import pickle

with open("/content/drive/MyDrive/Colab Notebooks/Course 3/Project/all_bad_reviews_list_clean.pkl","rb") as f:
  data=pickle.load(f)

all_bad_reviews_list_clean = data["all_bad_reviews_list_clean"]


## 1.2 Initialize LLM

In [None]:
!pip install -U transformers accelerate -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m90.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

In [None]:
import transformers
print(transformers.__version__)

4.56.2


In [None]:
model_id = "microsoft/Phi-4-mini-instruct"

# Load tokenizer & model into Colab local cache (default path: /root/.cache/huggingface)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto"  # uses GPU if available, else CPU
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/15.5M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/249 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.77G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]



In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

Device set to use cuda:0


In [None]:
len(all_bad_reviews_list_clean)

3947

In [None]:
generation_args_1 = {
    "max_new_tokens": 500, #500,
    "return_full_text": False,
    "do_sample": False,
}

In [None]:
from tqdm import tqdm

topics_llm = []
#print(len(all_bad_reviews_list_clean))
iter = 0
for review in tqdm(all_bad_reviews_list_clean[0:100]): # subset for speed
  if len(str(review)) < 1000:   # Taking 1000 to skip very long reviews
    messages_2 = [
        {"role": "system", "content": "You work as a data analyst insights guru for a  large gym company in the UK and you want to find topics for improvments from customer reviews. You should return these in an array of strings only ['topic 1', 'topic 2', 'topic 3', ...]"},
        {"role": "user", "content": "In the following customer review interaction pick out a maximum of 3 main topics and return them as an array of topics: The showers are disgusting"},
        {"role": "assistant", "content": "['Shower cleanliness', 'odour', 'customer discomfort']"},
        {"role": "user", "content": f"In the following customer review interaction pick out the main 3 topics and return them as an array of topics: {review}"},
      ]

    print(review)
    # Flatten into string
    #chat_prompt = "\n".join([f"{m['role'].capitalize()}: {m['content']}" for m in messages_2]) + "\nAssistant:"
    #print(chat_prompt)
    output = pipe(messages_2, **generation_args_1)
    print("Output:" + str(output))


    #print(sequences[0]['generated_text'])
    topic_list_string = output[0]['generated_text'].replace("'", '"')
    topic_list = json.loads(topic_list_string)

    topics_llm.append(topic_list)



  0%|          | 0/100 [00:00<?, ?it/s]The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


extremely busy fresh air


  1%|          | 1/100 [00:08<13:46,  8.34s/it]

Output:[{'generated_text': "['Crowded facilities', 'air quality', 'customer experience']"}]
men changing room smell bad deep clean sort smell management toilet pleasant put using facility regular cleaning enough top


  2%|▏         | 2/100 [00:16<13:18,  8.15s/it]

Output:[{'generated_text': "['Men changing room cleanliness','smell management', 'toilet cleanliness']"}]
cleaning equipment cleaning station hidden away corner empty can bottle lying floor using equipment properly staff around ensure going injured


  3%|▎         | 3/100 [00:26<14:19,  8.86s/it]

Output:[{'generated_text': "['Equipment cleanliness', 'Accessibility of cleaning station', 'Staff presence and assistance', 'Safety concerns']"}]
best experience bought pas received receipt access code walked delivery guy find anyone work help walked three round without result tried leave possible leave needed code money spare urge everyone pay bit service another


  4%|▍         | 4/100 [00:32<12:39,  7.91s/it]

Output:[{'generated_text': "['Delivery process', 'Payment issues', 'Customer service']"}]
staff hand tied surely head office bring rule email member using weight bench sit standing bicep curl unacceptable reserving machine sport bottle whatever unacceptable scary atmosphere monitor phone surely notice asking member consider long sit watching video making call sat equipment lot ask feel problem head office staff must scary time case pile stack high attitude


  5%|▌         | 5/100 [00:38<11:31,  7.28s/it]

Output:[{'generated_text': "['Staff behavior', 'equipment availability', 'gym atmosphere']"}]
manager showed code without giving support code access immediately manager available question left reachable anymore


  6%|▌         | 6/100 [00:45<10:57,  7.00s/it]

Output:[{'generated_text': "['Support accessibility', 'Manager responsiveness', 'Code access']"}]
sunderland wild west bring dinner feel free eat sat machine see video footage leg curl machine feel free leave drink machine note pad another machine leave vape third machine men wandering around top staff turn blind eye powerless anything fear repetition member head office intimidating place pitty staff following order rule aint rule sucker


  7%|▋         | 7/100 [00:52<10:52,  7.02s/it]

Output:[{'generated_text': "['Staff behavior', 'Customer experience', 'Facility cleanliness and maintenance']"}]
croydon purley romford puregyms around town however none stuffy smelly sydenham smell horrible smell proper ventilation mean sure enough entire wherever whichever machine smell sweat air window useful help bring fresh air felt stuffy machine felt bit sweaty thats partially expected put disinfectant spray paper towel covid time


  8%|▊         | 8/100 [00:59<10:55,  7.12s/it]

Output:[{'generated_text': "['Ventilation', 'Air quality', 'Disinfectant use']"}]
blocked access take work meet idiot manager want lose money job recommend anyone found touched green box height door handle found touched green box right took money account remember call bank request refund account refund registered office address town centre house merrion centre leeds


  9%|▉         | 9/100 [01:05<10:20,  6.82s/it]

Output:[{'generated_text': "['Blocked access', 'Manager issues', 'Account problems']"}]
air con seems big issue uncomfortable partake class hiit training due air con unit either functioning blowing warm air reported time reply customer service every time regard temperature air conditioning unit heat air drop cool air go higher keeping temperature within acceptable level regularly checked ensure unit operating across gym


 10%|█         | 10/100 [01:13<10:40,  7.12s/it]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Output:[{'generated_text': "['Air conditioning unit performance', 'temperature control', 'customer service responsiveness']"}]
terrible experience recommend went induction cancelled membership car park leaving waste time money


 11%|█         | 11/100 [01:22<11:35,  7.81s/it]

Output:[{'generated_text': "['Cancellation process', 'Induction experience', 'Car park management', 'Value for money']"}]
awful experience joining let buddy membership tried bolt friend website useless telling issue bought expensive pas end freaked multiple time payment went email pin took age send absolutely awful discourages greatly using different gym chain email sent write review demanding tone rude


 12%|█▏        | 12/100 [01:30<11:10,  7.62s/it]

Output:[{'generated_text': "['Membership joining process', 'Payment issues', 'Customer service experience']"}]
staff allowed blame annoying hoxton second time working guy clear brown hair work start blame dressing heavy metal shirt singer band criminal past employee blaming wonder guy make circus hip hop rock shirt amy winehouse shirt staff bother telling drug addicted point suggest better choose page wage guy ruined mood work


 13%|█▎        | 13/100 [01:36<10:29,  7.24s/it]

Output:[{'generated_text': "['Staff behaviour', 'Work environment', 'Customer experience']"}]
full impossible find available equipment relaxed workout insufficient number locker sufficiently air conditioned warm stuffy smelly inside toilet smell disgusting


 14%|█▍        | 14/100 [01:48<12:29,  8.71s/it]

Output:[{'generated_text': "['Equipment availability', 'Workout environment', 'Locker room conditions', 'Air conditioning', 'Temperature control', 'Sanitation']"}]
worse leicester


 15%|█▌        | 15/100 [01:54<11:17,  7.97s/it]

Output:[{'generated_text': "['Service quality', 'food quality', 'customer satisfaction']"}]
smelt bad sweaty lot smaller expected every way feel gyum expecting sseing recently opened


 16%|█▌        | 16/100 [01:59<09:58,  7.12s/it]

Output:[{'generated_text': "['Smell', 'Size', 'Expectation']"}]
staff available greet left outside fumbling code website recognise log pin reset waiting far long reset still arrived happy take money worked fine customer service total rubbish experiencing min first impression count please look away bottom line turn focus customer journey improve


 17%|█▋        | 17/100 [02:06<09:45,  7.05s/it]

Output:[{'generated_text': "['Staff responsiveness', 'Website usability', 'Customer service quality']"}]
awful first morning member arrive site find drove hour work eastbourne


 18%|█▊        | 18/100 [02:13<09:25,  6.90s/it]

Output:[{'generated_text': "['Facility accessibility', 'Navigation difficulties', 'Long wait times']"}]
trained late sat night weight floor dumbbell put back correctly rack lacked atmosphere equipment need upgrading


 19%|█▉        | 19/100 [02:22<10:13,  7.57s/it]

Output:[{'generated_text': "['Equipment maintenance', 'late training sessions', 'lack of atmosphere', 'equipment quality']"}]
disgusting everything dirty damaged happy


 20%|██        | 20/100 [02:28<09:19,  6.99s/it]

Output:[{'generated_text': "['Cleanliness', 'Damage', 'Customer satisfaction']"}]
first help main door problem enter reception know woman section bathroom overall great


 21%|██        | 21/100 [02:35<09:31,  7.23s/it]

Output:[{'generated_text': "['Main door problem','reception assistance', 'bathroom facilities']"}]
cleanliness male changing area deteriorated last month many occasion enter changing room see dirty floor toilet period day time currently shower cubicle order past week fixed


 22%|██▏       | 22/100 [02:42<09:10,  7.05s/it]

Output:[{'generated_text': "['Maintenance and cleanliness', 'Facility condition', 'Customer experience']"}]
missing cleaning station


 23%|██▎       | 23/100 [02:48<08:46,  6.84s/it]

Output:[{'generated_text': "['Cleaning station availability','maintenance', 'customer inconvenience']"}]
said looked young felt little


 24%|██▍       | 24/100 [02:55<08:33,  6.76s/it]

Output:[{'generated_text': "['Aesthetic appeal', 'age perception', 'customer satisfaction']"}]
bought pas never got sent code


 25%|██▌       | 25/100 [03:01<08:17,  6.64s/it]

Output:[{'generated_text': "['Order processing', 'Delivery issues', 'Communication problems']"}]
member exeter little fine number occasion toilet paper hot water shower going think hot water toilet paper basic business let alone please sort guy keep getting information getting personal trainer tip workout meal let sort basic first shall hot water start want cancel membership thing continue hill choice sad enjoying staff seem care


 26%|██▌       | 26/100 [03:09<08:45,  7.10s/it]

Output:[{'generated_text': "['Customer service experience', 'product quality', 'personal training and meal services']"}]
pas pin failed work worker present assist consequently endure two hour wait outside


 27%|██▋       | 27/100 [03:17<08:46,  7.22s/it]

Output:[{'generated_text': "['Security system failure', 'Worker assistance', 'Extended waiting time']"}]
extremely grotty small complaint contact manager wait contact info manager old phone email reach previous discrumpled recommend simply bedford


 28%|██▊       | 28/100 [03:24<08:29,  7.07s/it]

Output:[{'generated_text': "['Customer service experience', 'communication issues', 'product quality']"}]
registered blind paralympian treated badly told using couldnt seen manager though ive used gym hundred time although button asking help disabled access ask cant speak anyone phone think disabled consulted access colour contrast disappointed


 29%|██▉       | 29/100 [03:31<08:24,  7.11s/it]

Output:[{'generated_text': "['Accessibility issues', 'Communication barriers', 'Discrimination and disrespect']"}]
pay jym workout reached temporarily closed tell confirming confirmation


 30%|███       | 30/100 [03:37<08:00,  6.86s/it]

Output:[{'generated_text': "['Gym closure', 'Payment issues', 'Confirmation process']"}]
dark inside equipment limited freeweight area organised weight bar left floor equipment tired went machine tried hold handle hand covered sticky solution meant hold grip bike went foot strap snapped toilet roll woman toilet visited bradford idle branch around


 31%|███       | 31/100 [03:45<08:09,  7.10s/it]

Output:[{'generated_text': "['Equipment maintenance', 'Limited free weight area', 'Customer safety concerns']"}]
beckton staff aggressive violent thought safe place protected staff case staff one charge square face wanting fight avoid cost protected staff complaint procedure bad book think twice attack threaten violence exactly happened tuesday day ago since management team say still investigating confront trainer client asked hogging machine whilst taking minute break though waiting patiently already around min staff got angry rushed face wanting fight emailed management team ray marshall asking update got liner back saying looking due uncertainty happen question safe train weekend ignored unbelievable big company deal serious complaint nature show customer empathy lack customer care seems urgency finding solution happy customer safe avoid staff know react management care


 32%|███▏      | 32/100 [03:52<08:11,  7.23s/it]

Output:[{'generated_text': "['Staff behaviour', 'Safety concerns', 'Customer service and communication']"}]
extremely dirty cleaning night quiet cleaned properly bottle dirty toilet paper place day


 33%|███▎      | 33/100 [04:02<08:44,  7.83s/it]

Output:[{'generated_text': "['Cleaning quality', 'nighttime service', 'toilet paper quality', 'place cleanliness']"}]
got charge reason sure money back


 34%|███▍      | 34/100 [04:08<08:08,  7.41s/it]

Output:[{'generated_text': "['Refund policy', 'Billing issue', 'Customer satisfaction']"}]
construction purchased


 35%|███▌      | 35/100 [04:14<07:35,  7.02s/it]

Output:[{'generated_text': "['Construction quality', 'customer satisfaction', 'product durability']"}]
suitable amount using get far crowded lack equipment especially weight section rack suitable deadlifts lucky available shortage plate facility unclean feel lot empty space upper floor better used equipment


 36%|███▌      | 36/100 [04:21<07:17,  6.84s/it]

Output:[{'generated_text': "['Crowding', 'Equipment availability', 'Facility cleanliness']"}]
choice selection equipment great gym busy changing room facility best newly joined tried day ended cancelling membership happy pay extra better better shower changing facility


 37%|███▋      | 37/100 [04:28<07:13,  6.88s/it]

Output:[{'generated_text': "['Membership cancellation', 'Gym facilities', 'Customer satisfaction and experience']"}]
joined next flooded day idea happened email customer imagine someone paid journey


 38%|███▊      | 38/100 [04:35<07:15,  7.02s/it]

Output:[{'generated_text': "['Customer service communication', 'Service disruption', 'Payment processing issues']"}]
treadmill face wall window look whilst running speaker quite loud difficult hear music went afternoon clear staff wearing uniform quite small shower smell mouldy need bleached locker seemed occupied belonging without padlock best experience worth price pay bit nicer


 39%|███▉      | 39/100 [04:46<08:29,  8.35s/it]

Output:[{'generated_text': "['Treadmill design and comfort','staff interaction and uniform size','shower cleanliness and locker maintenance']"}]
equipment unfriendly staff


 40%|████      | 40/100 [04:52<07:40,  7.67s/it]

Output:[{'generated_text': "['Equipment usability','staff friendliness', 'customer experience']"}]
impossibly busy spent min left


 41%|████      | 41/100 [04:59<07:09,  7.28s/it]

Output:[{'generated_text': "['Staff efficiency', 'time management', 'customer satisfaction']"}]
appreciated class run help much care safe manual asked help fix stuff look unhappy help bad experience east sheen


 42%|████▏     | 42/100 [05:12<08:42,  9.01s/it]

Output:[{'generated_text': "['Class run assistance','manual clarity', 'customer dissatisfaction','staff helpfulness','safety concerns', 'poor experience']"}]
carry induction minute managed find someone show around problem booking system


 43%|████▎     | 43/100 [05:19<08:00,  8.42s/it]

Output:[{'generated_text': "['Induction process', 'Booking system issues', 'Customer assistance']"}]
robroyston stair stair laltely diwnstair closed dont kniw upstairs available make busy machine wait long


 44%|████▍     | 44/100 [05:29<08:17,  8.89s/it]

Output:[{'generated_text': "['Stairway accessibility','signage clarity', 'equipment availability', 'long wait times']"}]
upgraded core membership purely sport water got never worked booked class noone turned twice fortunatley pt gave instead email told reply nothing cleaned though ppl meant clean quite hard solo machine meat head hogging needless say paying better away


 45%|████▌     | 45/100 [05:36<07:31,  8.21s/it]

Output:[{'generated_text': "['Membership upgrade issues', 'Service quality', 'Equipment maintenance']"}]
grateful please young fit able client stop using three disabled parking space side blue badge holder yr old mobility problem numerous occasion unable park uncaring young choose disabled space put notice remind selfish client thank


 46%|████▌     | 46/100 [05:46<08:00,  8.90s/it]

Output:[{'generated_text': "['Disabled parking space misuse', 'parking difficulties for disabled individuals', 'lack of consideration for disabled parking']"}]
worst ever first never cold muscle grow cold full blast reason matter winter summer high fan speed full blast power pump ice cold air annoying absolutely unprofessional train wear layer cloth avoid cold reason join lower price gym worth penny


 47%|████▋     | 47/100 [06:02<09:36, 10.89s/it]

Output:[{'generated_text': "['Muscle growth experience', 'climate control in the gym', 'customer service and professionalism', 'clothing and comfort', 'gym pricing and value']"}]
great location lot equipment staff temperature however let massively say website fully air conditioned lie main section hot feel oppressive soon walk useless want sort cardio side better go show issue super busy much pick different time come sort temperature


 48%|████▊     | 48/100 [06:09<08:37,  9.96s/it]

Output:[{'generated_text': "['Website usability', 'Temperature control', 'Staff availability and customer service']"}]
parking explained know enter reg start arcadian birmingham explained anywhere worried fine punch bag


 49%|████▉     | 49/100 [06:15<07:28,  8.79s/it]

Output:[{'generated_text': "['Parking information', 'Registration process', 'Customer concerns']"}]
turned code let still nobody contacted


 50%|█████     | 50/100 [06:22<06:49,  8.20s/it]

Output:[{'generated_text': "['Customer service', 'communication issues', 'unresolved concerns']"}]
fullest ever visited went around basically free equipment two flat bench bench pressing crazy huge moving around equipment pain everything stuffed together money making machine sell way contract capacity


 51%|█████     | 51/100 [06:28<06:10,  7.56s/it]

Output:[{'generated_text': "['Equipment quality', 'contract terms', 'facility layout']"}]
take way long fix thing double door going blackburn worked dat


 52%|█████▏    | 52/100 [06:35<05:45,  7.21s/it]

Output:[{'generated_text': "['Door functionality','repair time', 'customer satisfaction']"}]
unable correct profile website


 53%|█████▎    | 53/100 [06:41<05:22,  6.87s/it]

Output:[{'generated_text': "['Website functionality', 'Profile management', 'User experience']"}]
membership wanted access list contacted customer support said give temporary access action request told buy passbfor delay responding issue cost money time despite trying proactive getting access early


 54%|█████▍    | 54/100 [06:49<05:35,  7.29s/it]

Output:[{'generated_text': "['Membership access', 'Customer support response time', 'Proactive access measures']"}]
purchased pas wrong nobody available fix issue waiting refunded heard anyone reached email facebook messenger


 55%|█████▌    | 55/100 [06:56<05:18,  7.07s/it]

Output:[{'generated_text': "['Product quality', 'customer service response','refund process']"}]
lot equipment broken time


 56%|█████▌    | 56/100 [07:02<05:02,  6.86s/it]

Output:[{'generated_text': "['Equipment maintenance', 'equipment quality', 'customer satisfaction']"}]
staff clearly unsafe saw kid confirmed actually looked young using heavy equipment without supervision adult staff worrying dangerous


 57%|█████▋    | 57/100 [07:08<04:45,  6.63s/it]

Output:[{'generated_text': "['Staff supervision', 'equipment safety', 'child safety']"}]
confirmation booked found spam folder email attend rebook gutted sat thinking accepted booking etc realised spam folder missed induction sadly


 58%|█████▊    | 58/100 [07:15<04:36,  6.58s/it]

Output:[{'generated_text': "['Email communication', 'Booking confirmation', 'Customer frustration']"}]
wanted bring friend staff equipment low adjusting free weight bench tiny way low


 59%|█████▉    | 59/100 [07:22<04:34,  6.71s/it]

Output:[{'generated_text': "['Friend policy', 'equipment size', 'free weight bench adjustment']"}]
say joining fee took pound


 60%|██████    | 60/100 [07:28<04:27,  6.68s/it]

Output:[{'generated_text': "['Membership fee', 'payment method', 'cost concerns']"}]
went newport south wale couple hour pas visited previously couple year back unfortunately filthy deplorable ever changing room absolutely disgusting firstly took trying locker found locked standard padlock toilet toilet paper two urinal inaccessible due litre urine sink dirty cleaned month incredibly dirty floor matted due hair machine old rusting unsafe handle machine human skin stuck little cleaning station cleaner staff around mention anything saw bed bug carpet reported council bench free weight area cheapest unsafe bench ever come across dumbbell chest press feel safe living body feel safe bench almost cushion instability subscribed regular basis another location visit newport seen never subscribe attend another ever


 61%|██████    | 61/100 [07:36<04:38,  7.14s/it]

Output:[{'generated_text': "['Locker accessibility', 'Sanitation and cleanliness', 'Equipment safety and maintenance']"}]
session friend pin work door entry exit staff said common issue


 62%|██████▏   | 62/100 [07:44<04:33,  7.19s/it]

Output:[{'generated_text': "['Staff communication', 'common issues', 'entry/exit procedures']"}]
female cleaner male changing room woman react male cleaner non stop female changing room


 63%|██████▎   | 63/100 [07:50<04:20,  7.03s/it]

Output:[{'generated_text': "['Staff interaction', 'cleaning quality', 'customer experience']"}]
cold water dirty floor missing equipment


 64%|██████▍   | 64/100 [07:57<04:05,  6.82s/it]

Output:[{'generated_text': "['Water temperature', 'Floor cleanliness', 'Equipment availability']"}]
attended guest pin work machine example jogger rower disinfected fact covid cured paper towel gel placed view customer clean machine advised


 65%|██████▌   | 65/100 [08:05<04:11,  7.18s/it]

Output:[{'generated_text': "['Machine cleanliness', 'COVID-19 information', 'Customer service and advice']"}]
app login pin


 66%|██████▌   | 66/100 [08:11<03:59,  7.03s/it]

Output:[{'generated_text': "['App login security', 'user authentication', 'password management']"}]
manager gone reason obvious


 67%|██████▋   | 67/100 [08:18<03:43,  6.79s/it]

Output:[{'generated_text': "['Staff management', 'customer service','manager accountability']"}]
smell unbearable start using deodorant


 68%|██████▊   | 68/100 [08:25<03:41,  6.91s/it]

Output:[{'generated_text': "['Deodorant effectiveness','smell', 'customer dissatisfaction']"}]
pin number work access paid access access frustrating start honest


 69%|██████▉   | 69/100 [08:36<04:11,  8.13s/it]

Output:[{'generated_text': "['Pin number functionality', 'access control', 'frustration with system', 'honesty in service']"}]
far busy enough machine variety suprised let many quite clearly accommodate amount waiting equipment pleasant work maybe better time


 70%|███████   | 70/100 [08:42<03:45,  7.52s/it]

Output:[{'generated_text': "['Machine variety', 'waiting time', 'equipment quality']"}]
made mistake package opted tried changing sent many email correct acknowledgement received corrected package tried change fact think error expensive package listed asking changed accommodate matter thank


 71%|███████   | 71/100 [08:49<03:32,  7.33s/it]

Output:[{'generated_text': "['Package handling errors', 'customer communication', 'price discrepancies']"}]
went morning first time code pin code working pushed button assistance came help


 72%|███████▏  | 72/100 [08:55<03:18,  7.10s/it]

Output:[{'generated_text': "['Customer assistance', 'code entry process', 'initial experience']"}]
overall equipment bad thing bugged fact pin woman changing room walk straight anyone shower glass door yeah portion frosted absolutely privacy


 73%|███████▎  | 73/100 [09:02<03:06,  6.89s/it]

Output:[{'generated_text': "['Equipment quality', 'Privacy concerns', 'Design issues']"}]
list bothered staff first got stuck quiet long time qrcode working new client scanning regular client held long time assistant unresponsive unprofessional


 74%|███████▍  | 74/100 [09:08<02:56,  6.80s/it]

Output:[{'generated_text': "['Staff responsiveness', 'equipment functionality', 'customer service quality']"}]
manager arrogant rude


 75%|███████▌  | 75/100 [09:15<02:47,  6.69s/it]

Output:[{'generated_text': "['Manager behaviour', 'customer service','management training']"}]
wrexham joke alot underage kid wearing school uniform pissing going anything besides taking space manager walk around see going say anything bad


 76%|███████▌  | 76/100 [09:21<02:36,  6.53s/it]

Output:[{'generated_text': "['Facility cleanliness', 'age restrictions','staff behavior']"}]
way class cancelled member turn class appalling structure place say thing though kieth lloyd lucy current trainer heart soul place class amazing make work potential trainners leave reason leaf accommodate class aswell individual client class need invest trainners cancel class last minute unprofessional business cause inconvenience member take time attend pay money membership bearing mind birmingham west compared highest rate membership charge


 77%|███████▋  | 77/100 [09:34<03:14,  8.43s/it]

Output:[{'generated_text': "['Cancellation policy', 'Trainer professionalism', 'Value for money', 'Customer convenience', 'Membership fees comparison', 'Trainer availability']"}]
super busy expensive better equipped puregyms small enough equipment amount user


 78%|███████▊  | 78/100 [09:43<03:12,  8.77s/it]

Output:[{'generated_text': "['Facility size', 'equipment quality', 'cost', 'cleanliness', 'user experience']"}]
close although claiming bangor northern ireland find news update regarding long joined purely case


 79%|███████▉  | 79/100 [09:49<02:47,  7.97s/it]

Output:[{'generated_text': "['Customer satisfaction','service quality', 'location awareness']"}]
joined online arrived found discount first joining fee


 80%|████████  | 80/100 [09:56<02:33,  7.66s/it]

Output:[{'generated_text': "['Online joining process', 'discount availability', 'joining fee']"}]
inform buddy program enough new immigrant thought access find four time


 81%|████████  | 81/100 [10:06<02:35,  8.19s/it]

Output:[{'generated_text': "['Buddy program effectiveness', 'immigrant integration', 'accessibility issues', 'program scheduling']"}]
great far actual staff kit plenty choose place packed car park full air con keep demand along terrible hygiene place stink etiquette enforce rule bag floor trip walking around barefoot cheap pay


 82%|████████▏ | 82/100 [10:12<02:19,  7.74s/it]

Output:[{'generated_text': "['Staff attitude', 'Facilities cleanliness', 'Parking and amenities']"}]
pin code sent work phone number call someone assist assistant button work


 83%|████████▎ | 83/100 [10:20<02:11,  7.74s/it]

Output:[{'generated_text': "['Pin code delivery', 'Phone number verification', 'Customer support assistance']"}]
filthy


 84%|████████▍ | 84/100 [10:26<01:53,  7.12s/it]

Output:[{'generated_text': "['Cleanliness', 'Maintenance', 'Customer dissatisfaction']"}]
congested


 85%|████████▌ | 85/100 [10:32<01:43,  6.88s/it]

Output:[{'generated_text': "['Facility cleanliness', 'air quality', 'customer discomfort']"}]
wanted join daughter visit school realised sevenoaks twice expensive else looked justified look daughter going reply comment sorry


 86%|████████▌ | 86/100 [10:38<01:33,  6.70s/it]

Output:[{'generated_text': "['Cost justification', 'Family visit', 'School location']"}]
asked somebody show treadmill machine work try told nobody available machine easy second attempt manager came show turn machine asking different program setting work told idea treadmill shower disgusting dirty smelling badley first shower went work large amount different equipment see


 87%|████████▋ | 87/100 [10:46<01:31,  7.02s/it]

Output:[{'generated_text': "['Treadmill availability', 'equipment cleanliness', 'customer service experience']"}]
mess weight everywhere expand cope amount member


 88%|████████▊ | 88/100 [10:56<01:32,  7.74s/it]

Output:[{'generated_text': "['messiness', 'weight management', 'coping with weight gain','membership satisfaction']"}]


 89%|████████▉ | 89/100 [11:03<01:23,  7.55s/it]

Output:[{'generated_text': "['Staff assistance', 'Policy understanding', 'Customer discouragement']"}]
equipment dated came expecting able training per coached plan equipment basic baffled none staff heard watt bike machine synch smart watch tell watt complete training based guess work said staff found equipment disappointing said price membership reflects quality however paid visit nearly much membership back home swim class


 90%|█████████ | 90/100 [11:10<01:15,  7.56s/it]

Output:[{'generated_text': "['Equipment quality', 'Staff training and guidance', 'Value for money']"}]
absolute madness booked class went attend conduct class


 91%|█████████ | 91/100 [11:17<01:05,  7.23s/it]

Output:[{'generated_text': "['Booking process', 'Class conduct', 'Customer experience']"}]
member year three time usually morning excellent training option facility clean maintained temperature controlled however going rave class complained manager dismissive explained cut several session short due excessive noise become ridiculous arguably detrimental member hearing looking alternative update contacted response comment advised speak manager regarding noise level welcome response company policy maximum decibel level relying individual manager randomly monitor noise level subjective unless using proper decibel monitor intention pain problem affect lot member complain directly


 92%|█████████▏| 92/100 [11:23<00:55,  6.94s/it]

Output:[{'generated_text': "['Noise control', 'Manager communication', 'Policy enforcement']"}]
booked onto class minute due start instructor turned enquired working client take class understandable said thought cancelled earlier emailed member service response day saw eating sandwich whilst working client professionalism appear something care


 93%|█████████▎| 93/100 [11:32<00:52,  7.48s/it]

Output:[{'generated_text': "['Class scheduling and communication', 'Instructor professionalism and behavior', 'Member service responsiveness']"}]
receipt say contact customer support issue signing


 94%|█████████▍| 94/100 [11:39<00:44,  7.38s/it]

Output:[{'generated_text': "['Customer support contact', 'issue resolution','signing process']"}]
review free pas messaged company see move date poorly still waiting reply day later


 95%|█████████▌| 95/100 [11:46<00:35,  7.18s/it]

Output:[{'generated_text': "['Communication issues', 'Unmet expectations', 'Delayed response']"}]
firstly staff entrance upon first visit barcode working nobody help floor floor working air con partner began feel light headed dizzy certain workout due overwhelming heat lack air con floor closed unable yet still paying full price membership think quite unfair come equipment different equipment focus different area body scattered around different floor think make much sense back tricep equipment floor leg equipment floor etc finally drink machine floor order yet signed membership allow pay extra part monthly membership


 96%|█████████▌| 96/100 [11:53<00:28,  7.11s/it]

Output:[{'generated_text': "['Staff assistance', 'equipment organization', 'air conditioning issues']"}]
paid month access several unit travel lot work far able access unit disappointing review sad


 97%|█████████▋| 97/100 [12:01<00:22,  7.54s/it]

Output:[{'generated_text': "['Payment issues', 'Access to units', 'Travel distance', 'Customer satisfaction']"}]
yellow blue code wall work manually entered web address signed normal email access via mobile paid pas find pin accessible via email changed email account setting access asked two separate pin reminder waited minute never came awful experience took business elsewhere anytime fitness road greeted actual human went great workout date yet figured refund pas able


 98%|█████████▊| 98/100 [12:10<00:15,  7.92s/it]

Output:[{'generated_text': "['Customer service experience', 'payment and refund process', 'accessibility and convenience']"}]
look cleaner site toilet clean smelly water flooded floor sink terrible experience


 99%|█████████▉| 99/100 [12:21<00:08,  8.81s/it]

Output:[{'generated_text': "['Site cleanliness', 'toilet condition', 'flooding','sink condition', 'overall experience']"}]
went open eve greet show around etc promotion video ensure great experience extremely busy pretty putting comfortable walked round asked inwhich reply never normally staff reply open eve event reply right completely clueless intimating best time experience welcome new member need staffing holding event


100%|██████████| 100/100 [12:33<00:00,  7.53s/it]

Output:[{'generated_text': "['Staff responsiveness', 'Event organization', 'Customer experience', 'Staff training', 'Event timing', 'Event capacity']"}]





##1.3 BERTopic

In [None]:
!pip install bertopic -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/153.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.0/153.0 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from bertopic import BERTopic

# Flatten your nested list of topics
topic_string_array=[topic for sublist in topics_llm for topic in sublist]

# Fit BERTopic
topic_model = BERTopic(language="english")
topics, probs = topic_model.fit_transform(topic_string_array)

print("topic_string_array:", topic_string_array)
# Inspect results
print(topic_model.get_topic_info().head())

  axis.set_ylabel('$\lambda$ value')
  $max \{ core_k(a), core_k(b), 1/\alpha d(a,b) \}$.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

topic_string_array: ['Crowded facilities', 'air quality', 'customer experience', 'Men changing room cleanliness', 'smell management', 'toilet cleanliness', 'Equipment cleanliness', 'Accessibility of cleaning station', 'Staff presence and assistance', 'Safety concerns', 'Delivery process', 'Payment issues', 'Customer service', 'Staff behavior', 'equipment availability', 'gym atmosphere', 'Support accessibility', 'Manager responsiveness', 'Code access', 'Staff behavior', 'Customer experience', 'Facility cleanliness and maintenance', 'Ventilation', 'Air quality', 'Disinfectant use', 'Blocked access', 'Manager issues', 'Account problems', 'Air conditioning unit performance', 'temperature control', 'customer service responsiveness', 'Cancellation process', 'Induction experience', 'Car park management', 'Value for money', 'Membership joining process', 'Payment issues', 'Customer service experience', 'Staff behaviour', 'Work environment', 'Customer experience', 'Equipment availability', 'Work

In [None]:
topic_model.get_topic_info().head()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,13,-1_service_age_quality_policy,"[service, age, quality, policy, crowded, hones...","[service quality, Service quality, Service qua..."
1,0,51,0_staff_manager_training_responsiveness,"[staff, manager, training, responsiveness, beh...","[Staff responsiveness, Staff responsiveness, S..."
2,1,46,1_process_payment_access_membership,"[process, payment, access, membership, issues,...","[Membership access, Membership joining process..."
3,2,39,2_cleanliness_facility_cleaning_smell,"[cleanliness, facility, cleaning, smell, toile...","[Facility cleanliness, Facility cleanliness, F..."
4,3,29,3_concerns_safety_dissatisfaction_issues,"[concerns, safety, dissatisfaction, issues, co...","[Safety concerns, Safety concerns, Customer sa..."


In [None]:
topic_model.get_topic(-1)

[('service', np.float64(0.25713531062751366)),
 ('age', np.float64(0.2429426701187254)),
 ('quality', np.float64(0.22481338191844447)),
 ('policy', np.float64(0.18097834385410227)),
 ('crowded', np.float64(0.1456263372823471)),
 ('honesty', np.float64(0.1456263372823471)),
 ('expectation', np.float64(0.1456263372823471)),
 ('enforcement', np.float64(0.1456263372823471)),
 ('restrictions', np.float64(0.1456263372823471)),
 ('understanding', np.float64(0.1456263372823471))]

In [None]:
#topic_model.get_topic(1)

##1.4 Get Actionable Insight from LLM

In [None]:
messages_3 = [
    {"role": "system", "content": "You work as a data analyst insights guru for a large gym company in the UK and you want to compress a given list into collated topics. You should return these in a numbered list as business insights that can be used to improve the business"},
    {"role": "user", "content": "In the following list containing the main extracted topics from customer reviews, group or compress the topics and return them with actionable insights in a numbered list: {topic_string_array}"},
    {"role": "assistant", "content": "1. Customer Support: Improve response times and staff availability at reception and helplines to enhance member satisfaction. \n 2. Facilities & Maintenance: Implement a proactive equipment maintenance schedule and regular cleanliness checks.\n ..."},
    {"role": "user", "content": f"In the following list containing the main extracted topics from customer reviews, group or compress the topics and return them with actionable insights in a numbered list: {topic_string_array}"},
]



In [None]:
output = pipe(messages_3, **generation_args_1)

#print(sequences[0]['generated_text'])
insights = output[0]['generated_text']

In [None]:
insights

'1. Facility Cleanliness and Maintenance: Implement a proactive cleaning schedule, improve ventilation, and ensure regular maintenance of equipment and facilities to enhance customer satisfaction and safety.\n2. Customer Service and Staff Behavior: Improve staff responsiveness, provide better training, and ensure a friendly and professional attitude to enhance customer experience and satisfaction.\n3. Equipment Availability and Maintenance: Regularly maintain and clean equipment, ensure availability, and address any issues promptly to improve customer satisfaction and safety.\n4. Customer Experience and Satisfaction: Improve the overall customer experience by addressing issues related to cleanliness, staff behavior, and service quality, and ensure prompt resolution of customer concerns.\n5. Payment and Billing Issues: Streamline the payment process, address billing issues promptly, and ensure clear communication regarding membership fees and cancellation policies.\n6. Website Usability

In [None]:
print(insights)

1. Facility Cleanliness and Maintenance: Implement a proactive cleaning schedule, improve ventilation, and ensure regular maintenance of equipment and facilities to enhance customer satisfaction and safety.
2. Customer Service and Staff Behavior: Improve staff responsiveness, provide better training, and ensure a friendly and professional attitude to enhance customer experience and satisfaction.
3. Equipment Availability and Maintenance: Regularly maintain and clean equipment, ensure availability, and address any issues promptly to improve customer satisfaction and safety.
4. Customer Experience and Satisfaction: Improve the overall customer experience by addressing issues related to cleanliness, staff behavior, and service quality, and ensure prompt resolution of customer concerns.
5. Payment and Billing Issues: Streamline the payment process, address billing issues promptly, and ensure clear communication regarding membership fees and cancellation policies.
6. Website Usability and A

This LLM output reframes the granular complaint clusters from previous Runs 1–3 into strategic improvement themes.

Core issues are consistent (cleanliness, equipment, staff, payments, parking, atmosphere), but the LLM elevates them into broader categories such as Customer Safety and Security and Website Usability.

This abstraction introduces some redundancy (e.g. multiple customer service/satisfaction categories), but also surfaces new actionable angles:

- Safety as a distinct management focus (lockers, overcrowding, fire alarms).

- Website usability as part of the access/joining/cancellation journey.

- Communication as an organisational capability, not just staff politeness.


Compared with BERTopic’s raw clusters, this output is less diagnostic but more strategic — suitable for senior stakeholders and action planning.

In [None]:
# End time to track the total execution time of the notebook
time.time()
end_time = time.time()
total_time = end_time - start_time
# Convert to hours, minutes, and seconds
mins, secs = divmod(total_time, 60)
hrs, mins = divmod(mins, 60)

print(f"Total execution time: {int(hrs)}h {int(mins)}m {int(secs)}s")

Total execution time: 0h 22m 34s
