# DATA PRE-PROCESSING

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

df = pd.read_csv("data/BA_reviews.csv")
df.sample(5)

Unnamed: 0.1,Unnamed: 0,reviews
725,725,"✅ Trip Verified | Johannesburg to London. I booked a seat without luggage. Firstly one cannot make one booking for two passengers if only one has luggage. This caused an issue when trying to seat together. When the passenger without luggage tried to move seats (normally a free option for standard seats), you get advised that you have to pay. This was not disclosed when the booking was made. It was irritating because one of us was ill."
153,153,"Not Verified | London to Cairo. First, on this 5 hour mid morning flight the only complimentary food and drink were a tiny bag of pretzels and a small bottle of water. Even Southwest is more generous. When unable to connect my phone to order food, I hit the FA call button with no response for more than an hour. When the FA came to collect garbage I had to show him the call light and he gruffly asked me what was the matter. He used his phone to place the order."
770,770,✅ Trip Verified | The booking process was easy enough but they have reduced the baggage allowance in terms of the seats one can book. The cost of including a bag added considerably to the cheapest fare. Check-in was well managed and it was good to see passengers with too much luggage required to check their bags in. Departure was well managed and although there was a delay the captain communicated well. My gripe is that my seat was dirty and the window was also dirty. Some of the to buy M&S food items were not available and the lavatory I used was dirty. The staff were very professional and pleasant and the drop-down route map was nice especially for the younger passengers to enjoy. Hardworking crew.
10,10,"✅ Trip Verified | I wasn't going to bother reviewing this flight as I seem to be on a perpetual downer with BA but the airport experience convinced me otherwise. After having our flight class reduced from First to Business, then offered an alternative route on the outward leg in First to make up for the disappointment, they then reneged on this. As it was a special anniversary it sucked. Flying back we checked in online to our chosen window seats, at the gate we had to show our passports as we passport before boarding as we were hand luggage only. Trying to board we had our passports checked twice more and were issued new boarding cards for centre seats. Unbelievable! The flight itself was very quiet and only half the normal crew for this aircraft was unavailable. This meant no pre-departure drinks and a limited selection of food. The crew was great but what an earth is going on at BA, they have this unique knack to snatch defeat from the jaws of victory every time. I say this as the new club suites are great."
875,875,"✅ Trip Verified | Johannesburg to London . For supper I asked for warm / hot water to make milky drink as I don't drink tea or coffee for health reasons. Given with no problems. For breakfast I requested the same and I was refused. I explained I don't drink tea or coffee but milk and water."" We don't keep hot water, I can get it once we have finished serving all other customers"". That never came until the lady had to be called to take care of the unwell passenger. I kept my sugar, milk and stirring stick and asked another male cabin crew who was collecting trays after breakfast for the warm water, explaining I have been waiting. He said he will bring it in 5 seconds and he never came back. I felt like a nuisance, deliberately ignored and couldnt bother anyone and the flight started being bumpy in preparation for landing."


In [2]:
df.shape

(1000, 2)

# Text Clean

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
import pandas as pd
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

def clean_text(text):
    if text is not None and isinstance(text, (str, bytes)):
        # Remove extra spaces
        text = re.sub('\s+', ' ', text).strip()

        # Remove URLs
        text = re.sub(r'http\S+', '', text)

        # Remove HTML encodings
        text = re.sub('&\w+;', '', text)

        # Remove usernames
        text = re.sub('@\w+', '', text)

        # Remove punctuation, numbers, and emojis (except for hashtags)
        text = re.sub('[^a-zA-Z0-9#]', ' ', text)

        # Remove words with 3 letters or less
        text = ' '.join([word for word in text.split() if len(word) > 3])

        # Remove stopwords
        stop_words = set(stopwords.words('english'))
        text = ' '.join([word for word in text.split() if word not in stop_words])

        # Lemmatize words
        lemmatizer = WordNetLemmatizer()
        text = ' '.join([lemmatizer.lemmatize(word) for word in text.split()])

        # Remove duplicate words
        text = ' '.join(set(text.split()))

    else:
        text = ''

    return text

# Apply the clean_text function to the 'text' column of the dataframe
df['clean_sentence_training'] = df['reviews'].apply(clean_text)
df[['reviews','clean_sentence_training']].sample(5)

Unnamed: 0,reviews,clean_sentence_training
330,"✅ Trip Verified | British Airways is taking reservations and then cancelling flights without giving any reasons. We had further plans that all needed to be cancelled because of the flights. BA did nothing to ease the process and offered absolutely no explanation of why the flight was cancelled, I am not going to travel on BA again and may not travel again this year at all.",giving going flight travel reason ease British Airways without process plan needed nothing year reservation Trip absolutely cancelled Verified taking explanation offered cancelling
740,"✅ Trip Verified | London to Cape Town in First and our first taste of the new 'soft' product. Still, unfortunately, a 30 year-old 747 that hadn't had its refurbishment and was showing its age. A wardrobe door that's falling off its hinges, a non-touch touch screen and a cupboard door in the washroom that won't stay shut don't pass muster in First and blemish the impression of the positive changes in the on-board service. The young mixed fleet crew did a reasonable job and served drinks and food efficiently without the wait other reviewers have complained about. Care was also taken in preparing and plating up the food. For once, nothing was overcooked and even the beef was decent. Bedding is also improved, with a good quality duvet and large pillow. The IFE remains the same low-res screen quality and the new headphones are uncomfortable worn for more than the duration of a single film. The goodies bag is much better and worth taking home. Overall, a pretty good flight, but it remains to be seen what BA will do with its long-in-the-tooth First Class when their new Club Suites are launched.",remains large reasonable Overall door tooth crew wait taste soft worth plating year pillow change fleet First unfortunately duvet goody Cape single Still washroom seen mixed cupboard shut food touch Care impression hinge launched quality pretty service Class stay efficiently showing refurbishment product flight drink film board reviewer much falling without good young screen first Suites overcooked complained even headphone duration Trip wardrobe also decent improved taken preparing Town home served beef muster worn nothing Bedding Verified taking blemish Club positive London pas uncomfortable better long
258,"Not Verified | Shout out to the help desk at Heathrow, we arrived to late to make our flight and the lady (with 32 years exp) was great, she had a tough crowd to deal with and was very kind and helpful in getting us a place to stay, food to eat and a flight home the next day.",desk make flight arrived tough home place food lady kind year getting Verified late Shout helpful next Heathrow crowd stay deal help great
921,✅ Trip Verified | Bangkok to London. Seating and interior old with a very small IFE screen. I did not like the meals. Had a short connection and i was rebooked from another partner. Upon arrival in LHR others for the same connecting flight were awaited but not me. I was told there would be service agents in purple everywhere but I found none. The fast track security was non existant. Total experience was bad.,partner Total security everywhere agent flight awaited none rebooked experience would arrival Seating like existant meal short screen Bangkok found another small Trip Verified told connection connecting others London purple service Upon interior track fast
604,"Not Verified | We have had some torrid experiences with BA - which we have not been shy to report. So when we flew yesterday with them from Heathrow to Austin and had a great flight in every way, it felt right to feedback on that. The staff in the upstairs business class cabin were, frankly, wonderful. Cheerful, efficient and calm. What more could you ask? The food was pretty nice and the champagne lovely. The beds were very comfortable and sleeping was easy. I literally cannot think of any real negatives. The luggage lockers were quite small but then it’s an older model of the plane so that probably explains it. If only BA could deliver this kind of experience on every occasion, they would once again be the “world’s favourite airline’. Sadly, the current CEO doesn’t seem able to inspire and motivate his staff to be consistently delightful. Perhaps once he moves on we can expect something better. Although the flight was a bit delayed on departure it arrives bang on time. As we entered the (literally empty) immigration area, we were met by the most charming customs officer you can imagine, who ended up giving us tourist tips and ideas for bars! Surreal, as anyone who regularly travels to the States, will testify! It felt as if we were living in the Truman Show at one point but in the nicest possible way. Get yourselves down to good ole Austin now that BA fly there direct. Fantastic.",giving feedback every report yesterday inspire direct imagine comfortable able Perhaps probably testify calm negative would cabin Surreal Although regularly bed small think area Show living world delayed champagne possible travel move current food entered plane wonderful felt literally pretty expect delightful Heathrow departure What tip charming sleeping nicest right flight favourite airline empty experience arrives Sadly frankly good Austin business locker cannot model could nice something States motivate occasion consistently deliver ended bar idea staff upstairs time torrid older Fantastic immigration class real tourist Truman flew efficient point kind luggage seem Verified Cheerful lovely quite custom better easy officer bang great explains anyone


In [5]:
import nltk

nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/jeffersoncostales/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [6]:
df["clean_sentence_training"] = df["reviews"].apply(clean_text)

In [7]:
df.sample(10)

Unnamed: 0.1,Unnamed: 0,reviews,clean_sentence_training
945,945,✅ Trip Verified | Heraklion to Gatwick. Left my luggage behind on the belt by mistake. 30 mins after I realised I went back to the airport but no British Airways staff was willing to help. Asked me to ring a phone from the arrival hall pillar. Dialed for BA baggage enquiry and staff picked up the phone and hung up on purpose and never answered again. This is so frustrating.,pillar frustrating mistake Gatwick min back Heraklion baggage Dialed British went staff Airways arrival enquiry hung picked purpose realised answered Asked This never luggage airport Trip Left Verified hall ring phone belt willing help behind
554,554,"✅ Trip Verified | Tirana to London Gatwick. I fly British Airways around six times a month thanks to my work, and while I almost always have a pleasant experience, this past flight definitely outdid all the others. We boarded and departed exactly on time, with a very friendly and welcoming crew. While the flight was fully booked, my travel companion and I were lucky that no one was sitting on our row so that we could have an empty middle seat, and therefore more space. The best surprise of this flight by far was the availability of Wi-Fi which I have never had on any other British Airways flight within Europe. In fact, the Wi-Fi package I got which included only texting, was unlimited and incredibly fast - as good as my home connection! This truly made this flight more enjoyable as it allowed me to continue my business even during the flight which is extremely valuable to me. Something that British Airways can improve on would be leg space, which in this flight I noticed was particularly tight, although that is a common problem for me as I am 1.88 cm tall. Also, for unknown reasons, no food or snacks were served on this 2 hours 45 minute flight, only drinks. This was a slight struggle as I had skipped breakfast, but I managed well. The flight however arrived ahead of schedule and we de-boarded quickly. I always recommend BA - and will continue to do so.",well included skipped best texting ahead reason friendly fully however valuable breakfast would Airways problem crew common exactly space recommend pleasant booked others always noticed continue companion struggle tight travel Also arrived boarded tall slight fact While food enjoyable allowed surprise This snack thanks around schedule work truly Gatwick flight empty improve drink availability extremely experience British Tirana definitely therefore good business package within quickly made could unlimited unknown even Trip month hour Europe past welcoming outdid seat almost home minute sitting served time although never incredibly particularly departed Verified connection London lucky managed middle fast Something
627,627,"✅ Trip Verified | London to Abu Dhabi. This is the daytime flight from London. A very good flight. The food was excellent for economy (particularly in view of Etihad‘s disastrous changes in food service). The children’s meals were excellent. The entertainment system was great with a good selection. It makes a big difference having the entertainment system on when you board and not turn it off until the plane has reached the gate, especially when flying with children (Etihad, why don’t you do this?). The cabin crew were good. Economy in the 787 is cramped but probably no worse than many. Good value flight, especially as the only other carrier to Abu Dhabi is Etihad.",entertainment reached difference Dhabi make flight cramped worse system child probably board good Good Etihad crew economy food meal gate plane disastrous cabin This many particularly daytime flying Trip Economy Verified carrier change London service especially selection turn value excellent view great
654,654,"✅ Trip Verified | London Heathrow to Inverness. Having previously written a review about the shockingly appalling experience with BA so far this summer, I felt the need to update with a new review as the final flight home I was rebooked on - after short-notice cancellation, which I had to wait over a day for - was also cancelled due to a fault with the aircraft. We were sat on the plane for over an hour whilst the crew were waiting to hear from ground crew what was happening and where to go. Eventually got off the plane to utter chaos - mixed messages being given by ground crew. Some people given details about rebooking, others given nothing. Some people were sent texts with rebooking options, others (including me) received nothing. Vouchers handed out for hotel accommodation and shuttle bus to and from the hotels - however some people were only given one bus voucher, meaning they had no means to return to the airport the next day! Some people rebooked on a flight in two days time and told they’d have to return to the airport the next morning anyway to get any further refreshment or accommodation vouchers. Utterly appalling. Now I’m hoping against hope I might make it home today, 3 days after setting off!",today previously summer review text day however aircraft option crew wait short detail whilst sent hope final Eventually Vouchers others Some anyway mean handed fault Having hear waiting mixed people plane meaning written felt utter message refreshment Heathrow return shuttle shockingly flight hotel cancellation notice experience Inverness airport voucher morning Trip also cancelled appalling told hour Utterly next including ground received make rebooking rebooked might need home time happening nothing accommodation hoping Verified chaos London update setting given
509,509,"✅ Trip Verified | London Heathrow to Rio De Janeiro. Not the usual aircraft for this route but it is one of my favourites, this aircraft today has a First Class but was flown empty as it usually only offers Club World. Flight not full, Seated in 7F aisle seat, no one in the middle seat so very pleasant. It is away from the Toilet over at 7A/B but next to galley area which was great as it is a day flight. Food (pre-ordered) was very good as usual, plenty of drinks offered crew very keen to assist, again nothing any trouble at all. I did watch some films but I had already been on a previous BA Long Haul so was not really looking for much. I had some rest as it was a long day and I getting a cold! Arrived late due to a technical problem at Heathrow, they managed to fix the problem on the taxiway to the runway but after an hour I must say I didn't really need to be told they were trying to fix the problem rather than return to the departure gate. All in all another very good flight.",Janeiro today Arrived runway aisle seat favourite flight empty drink film rest taxiway Seated aircraft keen usual cold assist must offer Flight full much Haul good problem need rather crew technical gate World away watch trouble usually nothing previous plenty another trying Food looking getting Trip flown galley already Verified late Club pleasant area Long told Toilet London ordered hour next route offered really First Heathrow Class long managed return middle departure great
355,355,"✅ Trip Verified | I have just returned from an amazing holiday and felt compelled to write a review. The flight is generally a torment for me being a nervous flyer, however, on this occasion it was an absolute delight. On both inward and outward journeys, I was impressed by the professionalism, friendliness and kindness of the cabin crew. I was in economy but was treated with the respect and service I would expect in the first class cabin. My particular thanks must go to Amir, who was absolute joy and a credit to British Airways. I cannot wait to fly again with you and that from me is huge, Thank you so so much.",flight journey credit inward Thank review absolute however British generally flyer must nervous much would Airways compelled professionalism friendliness crew economy wait cannot delight class cabin felt first treated huge outward Trip Verified occasion respect expect kindness impressed service Amir returned torment particular thanks write amazing holiday
572,572,"✅ Trip Verified | Frankfurt to London. Flight attendants very kind when flying in. Return flight was canceled at London City Airport and it took almost three hours to get rebooking, hotel voucher and transportation arranged. That could be done far more efficiently. No water, nothing, no necessities.",Frankfurt three flight hotel canceled almost City rebooking Flight done arranged necessity Airport kind nothing could voucher flying attendant Trip Verified London took hour water Return efficiently transportation That
883,883,"✅ Trip Verified | Bucharest to London. They are not giving free food and drinks anymore, you have to pay on short haul. In Bucharest they did not park at the terminal, they shuttle us to and from the plane in the bus. They are charging for checked luggage and ask people to check their hand luggage because 'the flights are too busy'. Avios loyalty reward points will buy you a lot less than it used to. I have no problem with BA turning into a budget airline, but at least charge accordingly, otherwise, it feels like we're being scammed.",feel giving le anymore terminal flight airline charge drink checked accordingly Avios Bucharest budget otherwise people problem like turning scammed check reward used food charging short plane least haul point They luggage busy Trip Verified hand park London free shuttle loyalty
818,818,"✅ Trip Verified | London to Athens. Classic BA love and hate relationship where one flight is perfect and another one is a total disaster. Problems in December 2018 at Heathrow check-in when BA check-in and baggage management IT systems went down for at least 90 minutes. Total chaos with people like myself arrived at the airport 2:30 hours before the flight and queuing while others arriving 45 minutes prior to their departure and given priority. Even though we were checked in at some point, the flight was delayed by 2+ hours on top of the problems. Then flight does not have enough food onboard. At least the crew clearly attempted to reduce flight time to the minimum.",Total Then though Even flight relationship attempted arrived system checked Classic baggage clearly queuing minute went people problem like time management total check crew food Athens reduce least point perfect prior another hate airport Trip love onboard disaster Verified 2018 chaos others London hour December priority Heathrow delayed enough departure arriving minimum Problems given
432,432,✅ Trip Verified | Tokyo to London. 12 hours without anything to eat because the sandwich I was given was inedible. There was no bar service and soft drinks were pitiful with orange running out after a matter of hours. Total waste of money having paid for Premium Economy. I am aware it was the beginning of the Covid19 issues but the situation was appalling and has been made worse by a total lack of follow up service from British Airways. Their attitude is unacceptable - it is all about money with them.,Total Covid19 worse drink British Their without Airways waste attitude total anything running inedible soft matter made Tokyo orange beginning Trip pitiful issue Economy Verified appalling lack London hour Premium situation unacceptable service aware follow money There paid given sandwich


In [8]:
from textblob import TextBlob

# Define a function to get the sentiment polarity score
def get_sentiment(text):
    return TextBlob(text).sentiment.polarity

# Apply the function to the 'clean_text' column and round the result to 2 decimal places
df['textblob_polarity'] = df['clean_sentence_training'].apply(get_sentiment).round(2)

# Define a function to categorize the sentiment polarity score into 4 categories
def categorize_sentiment(score):
    if score >= 0.5:
        return 'Positive'
    elif score >= 0.05 and score < 0.5:
        return 'Partially Positive'
    elif score > -0.05 and score < 0.05:
        return 'Neutral'
    elif score > -0.5 and score <= -0.05:
        return 'Partially Negative'
    else:
        return 'Negative'

# Apply the categorize_sentiment function to the 'textblob_polarity' column
df['sentiment_textblob'] = df['textblob_polarity'].apply(categorize_sentiment)

# Select the relevant columns for display
df[['clean_sentence_training', 'textblob_polarity', 'sentiment_textblob']].sample(5)

Unnamed: 0,clean_sentence_training,textblob_polarity,sentiment_textblob
731,occasionally entertainment seat Gatwick derisory flight maximum system Helpful rubbish touching correctly without time magazine crew Lima touch cabin screen condition failed poor courteous sound warning small Trip movie Verified increased unavailable handset uncomfortable selection shown respond level Movies,-0.02,Neutral
56,Supposed best give able text Overall show would option halfway Rebooked They consider story issue already Eventually unable thing really said working anyway dreadful behind fault sensible worse Arrive centre unless Avoid telling apps anything boarding Only 13th taxi took Return intact departure help despite helpfully happened minus showing Quickly flight Rotterdam email check rating screen care card could even March drop cancelled told automated onto landed desk seat agent call guarantee rebooked printed need Rebook nothing hide Verified confirmed Desk standby stop overbooked train similar moved,0.0,Neutral
881,entertainment though seat Gatwick control Avoid system last different day looked mark remember rate crumb Poor class cabin year carpet flying Trip dirty cleaned Verified finger service route allows Barbados First avoided disgusting third properly,-0.22,Partially Negative
717,Boeing retro seat high almost comfortable stood Flight gave good friendliness wide food storage BOAC portion plenty nice Trip Verified average colour hour service delayed level generous,0.27,Partially Positive
721,seat right flight properly placed bigger place travelling came asked cabin instead luggage space attendant Trip asking keep Verified Somebody remaining kept When oversized laptop elsewhere help overhead moved fair,0.24,Partially Positive


In [9]:
df.to_csv('data/data_pre_processing.csv', index=False)