#Transfer Learning for Aspect-Based Sentiment Analysis (ABSA) on Hotel Reviews

__Motivation:__

Hotel reviews from sites like TripAdvisor are a treasure trove of feedback for hotels to improve themselves. However, hotel reviews on these sites only consists of lengthy word text and one overall review, which is not helpful for hotels to identify areas of improvement quickly. Reading through all the reviews is too time-consuming. <p> Instead, one way is to provide categories: Food, Cost, Cleanliness, Service etc, and based on each review, provide a sentiment (positive, neutral, negative) for the categories that the review mentioned. Then, average the sentiment over all the reviews <p>


PYABSA is an open framework for Aspect-Based Sentiment Analysis (ABSA) and this is the [github repo](https://github.com/yangheng95/PyABSA). The model used is pre-trained on Restaurant data, which, like hotels, is also in the service industry. Thus, by using transfer learning, we fit the limited hotel data onto the model, which worked surprisingly well.

## I. Model Training

### Install pyabsa

In [None]:
!pip install pyabsa

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyabsa
  Downloading pyabsa-2.3.1-py3-none-any.whl (526 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.1/526.1 kB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting findfile>=2.0.0 (from pyabsa)
  Downloading findfile-2.0.0-py3-none-any.whl (9.3 kB)
Collecting autocuda>=0.16 (from pyabsa)
  Downloading autocuda-0.16-py3-none-any.whl (5.1 kB)
Collecting metric-visualizer>=0.9.6 (from pyabsa)
  Downloading metric_visualizer-0.9.7-py3-none-any.whl (24 kB)
Collecting boostaug>=2.3.5 (from pyabsa)
  Downloading boostaug-2.3.5-py3-none-any.whl (16 kB)
Collecting seqeval (from pyabsa)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting update-checker (from pyabsa)
  Download

### Reformat labelled data for model

In [None]:
import pandas as pd
annotated_reviews = pd.read_excel("/content/labelled_hotel_reviews.xlsx")

In [None]:
annotated_reviews.head

<bound method NDFrame.head of           rid          id  OutOfScope  \
0    73821_30  73821_30#0         NaN   
1    73821_30  73821_30#1         NaN   
2    73821_30  73821_30#1         NaN   
3    73821_30  73821_30#2         NaN   
4    73821_30  73821_30#2         NaN   
..        ...         ...         ...   
373  73718_19  73718_19#2         NaN   
374  73718_19  73718_19#2         NaN   
375  73718_19  73718_19#2         NaN   
376  73718_19  73718_19#3         1.0   
377  73718_19  73718_19#4         NaN   

                                                  text  Opinion       target  \
0                                  Room Was Acceptable      NaN         Room   
1    The room was nice and the furnishings were com...      NaN         room   
2    The room was nice and the furnishings were com...      NaN  furnishings   
3    The food from the restaurant was very good but...      NaN         food   
4    The food from the restaurant was very good but...      NaN         food 

In [None]:
def format_text(text, begin, end, target, polarity):
    try:
        begin = int(begin)
        end = int(end)
        new_text = text[:begin] + "$T$" + text[end:]
        final_text = new_text + "\n" + target + "\n" + polarity + "\n"
        return final_text
    except:
        # skip rows that did not have a target aka "overall" sentence 
        return

In [None]:
annotated_reviews["reformated"] = annotated_reviews.apply(
    lambda x: format_text(x["text"], x["from"], x["to"], x["target"], x["polarity"]), axis=1)

In [None]:
formated_df = annotated_reviews["reformated"]
formated_df.dropna(inplace=True)

In [None]:
with open("/content/hotel.train.txt", "w") as txtfile:
    for line in formated_df:
        txtfile.write(line)

In [None]:
# Sometimes the import statement crashes the runtime. Just re-run it until it works.
from pyabsa import convert_apc_set_to_atepc_set
convert_apc_set_to_atepc_set("/content/hotel.train.txt")
# convert_apc_set_to_atepc_set("/content/hotel.test.txt")



[2023-06-12 08:04:33] (2.3.1) PyABSA(2.3.1): If your code crashes on Colab, please use the GPU runtime. Then run "pip install pyabsa[dev] -U" and restart the kernel.
Or if it does not work, you can use v1.16.27

[New Feature] Aspect Sentiment Triplet Extraction since v2.1.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration)
[New Feature] Aspect CategoryOpinion Sentiment Quadruple Extraction since v2.2.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_opinion_sentiment_category_extraction)



### Train the model
- Requires GPU
- To reformat the data, you must first create a folder in the following format, and also rename the txt files generated in the previous step into `hotel.train.txt.atepc` and `hotel.test.txt.atepc` respectively.
```
integrated_datasets
|__ /atepc_datasets
    |__ /10.Hotel
        |__ hotel.train.txt.atepc
        |__ hotel.test.txt.atepc
```

In [None]:
from pyabsa import AspectTermExtraction as ATEPC, DeviceTypeOption, ModelSaveOption

[2023-04-14 02:34:24] (2.2.2) PyABSA(2.2.2): 
[New Feature] Aspect Sentiment Triplet Extraction from v2.1.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration)
[New Feature] Aspect CategoryOpinion Sentiment Quadruple Extraction from v2.2.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_opinion_sentiment_category_extraction)

If you find any problems, please report them on GitHub. Thanks!
The v2.x versions are not compatible with Google Colab. Please downgrade to 1.16.27.



  _warn(f"unclosed running multiprocessing pool {self!r}",


In [None]:
# Define the configuration
config = ATEPC.ATEPCConfigManager.get_atepc_config_english()
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 0
config.num_epoch = 1
config.log_step = -1

# Load the model
dataset = "10.Hotel"
aspect_extractor = ATEPC.ATEPCTrainer(
    config=config,
    dataset=dataset,
    from_checkpoint="english",
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    auto_device=DeviceTypeOption.AUTO,
    path_to_save="content"
    ).load_trained_model()

[2023-04-14 02:34:30] (2.2.2) Set Model Device: cuda:0
[2023-04-14 02:34:30] (2.2.2) Device Name: Tesla T4


# II. Performance Metrics

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


The snippet below takes the zip file of the checkpoints from the trained model on restaurant data from the author's drive folder at https://huggingface.co/spaces/yangheng/Multilingual-Aspect-Based-Sentiment-Analysis/tree/main/checkpoint/Multilingual/ATEPC

In [None]:
!unzip /content/drive/MyDrive/fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96.zip

Archive:  /content/drive/MyDrive/fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96.zip
   creating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.state_dict  
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.args.txt  
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.tokenizer  
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.config  


In [None]:
# If import crashes runtime,  retry until import succeeds
from pyabsa import AspectTermExtraction as ATEPC

[2023-04-14 13:13:06] (2.2.2) PyABSA(2.2.2): 
[New Feature] Aspect Sentiment Triplet Extraction from v2.1.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration)
[New Feature] Aspect CategoryOpinion Sentiment Quadruple Extraction from v2.2.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_opinion_sentiment_category_extraction)

If you find any problems, please report them on GitHub. Thanks!
The v2.x versions are not compatible with Google Colab. Please downgrade to 1.16.27.



  _warn(f"unclosed running multiprocessing pool {self!r}",


In [None]:
aspect_extractor = ATEPC.AspectExtractor('fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96',
                                         auto_device=True,  # False means load model on CPU
                                         cal_perplexity=True,
                                         )

[2023-04-14 13:13:08] (2.2.2) Load aspect extractor from fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96
[2023-04-14 13:13:08] (2.2.2) config: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.config
[2023-04-14 13:13:08] (2.2.2) state_dict: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.state_dict
[2023-04-14 13:13:08] (2.2.2) model: None
[2023-04-14 13:13:08] (2.2.2) tokenizer: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.tokenizer
[2023-04-14 13:13:09] (2.2.2) Set Model Device: cuda:0
[2023-04-14 13:13:09] (2.2.2) Device Name: Tesla T4


Downloading (…)lve/main/config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/371M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.bias', 'mask_predictions.LayerNorm.weight', 'mask_predictions.dense.weight', 'mask_predictions.classifier.weight', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.dense.bias', 'mask_predictions.classifier.bias', 'mask_predictions.dense.bias', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
# inference 
inference_source = "hotel_test.dat.apc.inference"
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source, 
                          save_result=True,
                          print_result=True,  # print the result
                          pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                          )

[2023-04-14 13:19:12] (2.2.2) loading: hotel_test.dat.apc.inference


  lcf_cdm_vec = torch.tensor(


[2023-04-14 13:19:17] (2.2.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json
[2023-04-14 13:19:17] (2.2.2) Example 0: " If you get a suite with a <stove:Negative Confidence:0.9048> dont cook except <dishes:Negative Confidence:0.8782> , <pots:Negative Confidence:0.8976> and <pans:Negative Confidence:0.903> they do not provide them you have to buy your own . "
[2023-04-14 13:19:17] (2.2.2) Example 1: " Great <hotel:Positive Confidence:0.9957> , great <location:Positive Confidence:0.9983> , great <price:Positive Confidence:0.9976> "
[2023-04-14 13:19:17] (2.2.2) Example 2: " I had a great <time:Positive Confidence:0.9983> at this <hotel:Positive Confidence:0.9983> and did not experience any of what the bad reviews are about . "
[2023-04-14 13:19:17] (2.2.2) Example 3: " The <staff:Positive Confidence:0.9894> was very friendly and helpful , the <rooms:Positive Confidence:0.9979> were large and 

  float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()


# III. Inference on Britannia Hotel

## Britannia Hotel Reviews Preprocessing for Model Inference

In [None]:
import pandas as pd
data = pd.read_csv('Hotel_Reviews.csv') ### CHANGE TO CORRECT FILE PATH
reviews_df = data[data['Hotel_Name'] == 'Britannia International Hotel Canary Wharf']


ParserError: ignored

In [None]:
cols =["Negative_Review", "Positive_Review"]
reviews_df["Review"] = reviews_df[cols].apply(lambda row: '. '.join(row.values.astype(str)), axis=1)
mod_reviews_df = reviews_df[["Review"]]

  and should_run_async(code)


In [None]:
mod_reviews_df.head

  and should_run_async(code)


<bound method NDFrame.head of                                                  Review
0      The car park was small and unpleasant People ...
1      We weren t told that the only spa facility op...
2      I asked how far the O2 was and got told a 7 m...
3      Hot stuffy room air con not working properly ...
4      Although the price seems like it is cheap you...
...                                                 ...
4784   Long wait for check in arrived at 6 30pm and ...
4785   concierge was uninformed .  good value for money
4786   I had no complaints.  Good location easy to p...
4787   Really shabby and run down hotel Needs a tota...
4788   Stains on the carpet peeling wallpaper scruff...

[4789 rows x 1 columns]>

In [None]:
reviews_list = mod_reviews_df["Review"].tolist()

  and should_run_async(code)


In [None]:
with open("/content/hotel.dat.apc.inference","w") as outfile:
    for review in reviews_list:
        try:
            outfile.write(f"'{review}'" + "\n")
        except Exception as e:
            print(e)
            continue

## Load Model Checkpoints into Model

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!unzip /content/drive/MyDrive/fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96.zip

Archive:  /content/drive/MyDrive/fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96.zip
   creating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.state_dict  
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.args.txt  
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.tokenizer  
  inflating: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.config  


In [None]:
# If import crashes runtime,  retry until import succeeds
from pyabsa import AspectTermExtraction as ATEPC

[2023-04-13 16:45:29] (2.2.2) PyABSA(2.2.2): 
[New Feature] Aspect Sentiment Triplet Extraction from v2.1.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration)
[New Feature] Aspect CategoryOpinion Sentiment Quadruple Extraction from v2.2.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_opinion_sentiment_category_extraction)

If you find any problems, please report them on GitHub. Thanks!
The v2.x versions are not compatible with Google Colab. Please downgrade to 1.16.27.



  _warn(f"unclosed running multiprocessing pool {self!r}",


In [None]:
aspect_extractor = ATEPC.AspectExtractor('fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96',
                                         auto_device=True,  # False means load model on CPU
                                         cal_perplexity=True,
                                         )

[2023-04-13 16:45:34] (2.2.2) Load aspect extractor from fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96
[2023-04-13 16:45:34] (2.2.2) config: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.config
[2023-04-13 16:45:34] (2.2.2) state_dict: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.state_dict
[2023-04-13 16:45:34] (2.2.2) model: None
[2023-04-13 16:45:34] (2.2.2) tokenizer: fast_lcf_atepc_custom_dataset_cdw_apcacc_85.6_apcf1_77.7_atef1_80.96/fast_lcf_atepc.tokenizer
[2023-04-13 16:45:34] (2.2.2) Set Model Device: cuda:0
[2023-04-13 16:45:34] (2.2.2) Device Name: Tesla T4


Downloading (…)lve/main/config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/371M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['mask_predictions.LayerNorm.bias', 'mask_predictions.classifier.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'mask_predictions.LayerNorm.weight', 'mask_predictions.dense.bias', 'mask_predictions.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.dense.weight']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
inference_source = "hotel.dat.apc.inference"
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source,  #
                          save_result=True,
                          print_result=True,  # print the result
                          pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                          )

[2023-04-13 16:49:05] (2.2.2) loading: hotel.dat.apc.inference


preparing ate inference dataloader: 100%|██████████| 4765/4765 [00:10<00:00, 450.14it/s] 
extracting aspect terms: 100%|██████████| 149/149 [00:40<00:00,  3.69it/s]
preparing apc inference dataloader: 100%|██████████| 8758/8758 [00:24<00:00, 360.61it/s]
  lcf_cdm_vec = torch.tensor(
  float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()
classifying aspect sentiments: 100%|██████████| 274/274 [01:17<00:00,  3.54it/s]


[2023-04-13 16:51:41] (2.2.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json
[2023-04-13 16:51:42] (2.2.2) Example 0: ' The <car park:Negative Confidence:0.9954> was small and unpleasant People with Mercedes and BMWs took over 2 spaces We were lucky to get a space after driving around about 10 times . The <location:Positive Confidence:0.9938> was excellent for getting to the O2 '
[2023-04-13 16:51:42] (2.2.2) Example 1: ' We weren t told that the only spa facility open was the <pool:Neutral Confidence:0.9878> and the <sauna:Neutral Confidence:0.9855> but we had already paid and had to find out ourselves when entering the spa area . The <house keeping lady:Positive Confidence:0.9983> made my boyfriends day with how funny she was '
[2023-04-13 16:51:42] (2.2.2) Example 2: ' I asked how far the O2 was and got told a 7 minute <walk:Negative Confidence:0.9961> no no way it was 2 trains away bein

## Clean inference data to achieve final output

In [None]:
import json

aspect_dict = {}
results = open('/content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json')
data = json.load(results)

for sentence in data:
    if sentence["aspect"]:
        aspects = sentence["aspect"]
        sentiments = sentence["sentiment"]
        for aspect, sentiment in zip(aspects, sentiments):
            aspect = aspect.lower()
            if aspect in aspect_dict:
                if sentiment == "Positive":
                    aspect_dict[aspect]["pos"] += 1
                elif sentiment == "Neutral":
                    aspect_dict[aspect]["neut"] += 1
                elif sentiment == "Negative":
                    aspect_dict[aspect]["neg"] += 1
                aspect_dict[aspect]["freq"] += 1
            else:
                freq_dict = {"freq":0, "pos":0, "neut": 0, "neg": 0}
                if sentiment == "Positive":
                    freq_dict["pos"] += 1
                elif sentiment == "Neutral":
                    freq_dict["neut"] += 1
                elif sentiment == "Negative":
                    freq_dict["neg"] += 1
                freq_dict["freq"] += 1
                aspect_dict[aspect] = freq_dict

In [None]:
from pprint import pprint
asc_dict = sorted(aspect_dict.items(), key=lambda x: x[1]["freq"], reverse = True)        

In [None]:
def extract_freq_aspects(item_tuple):
    if item_tuple[1]["freq"] > 50:
        return True
    return False

In [None]:
new_asc_dict = list(filter(extract_freq_aspects, asc_dict))

  and should_run_async(code)


In [None]:
new_asc_dict

[('staff', {'freq': 912, 'pos': 534, 'neut': 3, 'neg': 375}),
 ('location', {'freq': 803, 'pos': 664, 'neut': 14, 'neg': 125}),
 ('room', {'freq': 747, 'pos': 358, 'neut': 19, 'neg': 370}),
 ('breakfast', {'freq': 447, 'pos': 142, 'neut': 77, 'neg': 228}),
 ('bed', {'freq': 326, 'pos': 143, 'neut': 6, 'neg': 177}),
 ('wifi', {'freq': 255, 'pos': 34, 'neut': 13, 'neg': 208}),
 ('rooms', {'freq': 240, 'pos': 97, 'neut': 5, 'neg': 138}),
 ('price', {'freq': 220, 'pos': 120, 'neut': 15, 'neg': 85}),
 ('view', {'freq': 166, 'pos': 149, 'neut': 2, 'neg': 15}),
 ('food', {'freq': 162, 'pos': 77, 'neut': 16, 'neg': 69}),
 ('decor', {'freq': 153, 'pos': 25, 'neut': 1, 'neg': 127}),
 ('bathroom', {'freq': 146, 'pos': 39, 'neut': 10, 'neg': 97}),
 ('beds', {'freq': 123, 'pos': 30, 'neut': 1, 'neg': 92}),
 ('service', {'freq': 114, 'pos': 48, 'neut': 3, 'neg': 63}),
 ('hotel', {'freq': 103, 'pos': 43, 'neut': 0, 'neg': 60}),
 ('bar', {'freq': 94, 'pos': 39, 'neut': 28, 'neg': 27}),
 ('reception', 

In [None]:
def get_aspect_average(amenities_list):
    total_freq = 0
    neg_freq = 0
    neut_freq = 0
    pos_freq = 0
    for item in amenities_list:
        total_freq += item[1]["freq"]
        neg_freq += item[1]["neg"]
        neut_freq += item[1]["neut"]
        pos_freq += item[1]["pos"]
    num = (neg_freq * -1) + (neut_freq * 0) + (pos_freq * 1)
    avg = num/total_freq
    return avg

In [None]:
"""
From new_asc_dict, manually sort the aspects into categories pre-defined by me/ hotel
specifications.

Future improvements could be that with enough aspect data over time, very frequent 
words could be identified to be automatically sorted in categories
"""

# SERVICE
service = [('staff', {'freq': 912, 'pos': 534, 'neut': 3, 'neg': 375}),
('service', {'freq': 114, 'pos': 48, 'neut': 3, 'neg': 63}),
('reception', {'freq': 81, 'pos': 28, 'neut': 11, 'neg': 42}),
('hotel', {'freq': 103, 'pos': 43, 'neut': 0, 'neg': 60})]

#COST
cost = [('price', {'freq': 220, 'pos': 120, 'neut': 15, 'neg': 85}),
('value', {'freq': 56, 'pos': 48, 'neut': 0, 'neg': 8})]

#ROOM QUALITY
room_quality = [('bed', {'freq': 326, 'pos': 143, 'neut': 6, 'neg': 177}),
('bathroom', {'freq': 146, 'pos': 39, 'neut': 10, 'neg': 97}),
('beds', {'freq': 123, 'pos': 30, 'neut': 1, 'neg': 92}),
('wifi', {'freq': 255, 'pos': 34, 'neut': 13, 'neg': 208}),
('decor', {'freq': 153, 'pos': 25, 'neut': 1, 'neg': 127}), 
('air conditioning', {'freq': 56, 'pos': 2, 'neut': 0, 'neg': 54}),
('facilities', {'freq': 64, 'pos': 33, 'neut': 2, 'neg': 29}),
('shower', {'freq': 78, 'pos': 22, 'neut': 1, 'neg': 55}),
('windows', {'freq': 68, 'pos': 4, 'neut': 0, 'neg': 64}),
('furniture', {'freq': 75, 'pos': 11, 'neut': 1, 'neg': 63})]

# LOCATION
location = [('location', {'freq': 803, 'pos': 664, 'neut': 14, 'neg': 125}),
('view', {'freq': 166, 'pos': 149, 'neut': 2, 'neg': 15}),
('views', {'freq': 64, 'pos': 63, 'neut': 0, 'neg': 1}),
('place', {'freq': 68, 'pos': 37, 'neut': 0, 'neg': 31})]

# FOOD
food = [('food', {'freq': 162, 'pos': 77, 'neut': 16, 'neg': 69}),
('bar', {'freq': 94, 'pos': 39, 'neut': 28, 'neg': 27}),
('breakfast', {'freq': 447, 'pos': 142, 'neut': 77, 'neg': 228})]


In [None]:
aspect_names = ["Service", "Cost", "Room Quality", "Location", "Food"]
aspects = [service, cost, room_quality, location, food]
averages = list(map(get_aspect_average,aspects))
for name, average in zip(aspect_names, averages):
    print(f"{name}: {round(average, 5)} \n")

Service: 0.09339 

Cost: 0.27174 

Room Quality: -0.46354 

Location: 0.67302 

Food: -0.09388 

