# Data Extraction Pipeline

> processing pipeline with documentation

## 1. imports

In [1]:
import numpy as np
import pandas as pd
import re
from pathlib import Path

## 2. Reading Data from file

In [None]:
data = (Path().cwd() / "4U_Reviews.txt").read_text(encoding="latin-1")




In [4]:
data[:100]

'"they have been unresponsive"\nBrian McNamee (Canada) 16th October 2015\n2\nWe flew with Germanwings (o'

The data is not structured, but one single string. We can bring it into a structured format by splitting on "\n\n"

In [6]:
print(data.split("\n\n")[0])

"they have been unresponsive"
Brian McNamee (Canada) 16th October 2015
2
We flew with Germanwings (or tried to) on September 30th. After two lengthy delays passengers were notified that the flight was cancelled and to return to the check-in counter for further information. After waiting at the counter for another 30 minutes or so, staff began issuing refunds or providing alternate travel options to passengers. By this point it was quite late in the evening with limited options (plus we were 5+ hours after planned departure). We managed to get a voucher with 3 other passenger to get a flight from Stuttgart to Zurich with an ongoing connection to Amsterdam (flights were with Swiss). We made the flight and the connector (which was held back waiting for about 70 of us) however luggage did not make it. Short of this is the airline was ill-prepared to manage this. They offered little information or options to passengers. since returning home I have been attempting to contact them via email a

## 3. Data Exploration & Transformation

### Extrated data to get free format text from customer reviews
#### Data set isn't completely structured in all texts so in order to use the data, it has to be processed first

In [21]:
reviews = data.strip().split("\n\n")

In [98]:
type(reviews), len(reviews)

(list, 128)

In [100]:
rev_arr = reviews[0].split("\n")
rev_arr

title = rev_arr[0]
name_country_date = rev_arr[1]
score = rev_arr[2]


title, name_country_date, score

('"they have been unresponsive"',
 'Brian McNamee (Canada) 16th October 2015',
 '2')

In [101]:
name_country_date

'Brian McNamee (Canada) 16th October 2015'

In [102]:
name_country_date.split("(")[1].split(")")[0]



'Canada'

In [103]:
def get_score_from_review(review):
    split_review = review.split("\n")
    score = split_review[2]
    return int(score) if score!="na" else score

In [104]:
def get_country_from_review(review):
    try:
        split_review = review.split("\n")
        country = split_review[1].split("(")[1].split(")")[0]
        return country
    except:
        return "unknown"
    

In [2]:
def get_recommendation_from_review(review):
    try:
        split_review = review.split("\n")
        for split in split_review:
            if split.startswith("Recommended"):
                recommended = split.split("\t")[1]
                return recommended
        return "na"
    except:
        return "na"

In [3]:
def get_cabinflown_from_review(review):
    try:
        split_review = review.split("\n")
        for split in split_review:
            if split.startswith("Cabin Flown"):  
               cabinflown = split.split("\t")[1].strip()  
               return cabinflown  
        return "unknown"
    except Exception as error:
        print(f"Error processing review: {error}")
        return "unknown"
                
       
  

In [13]:
def get_traveltype_from_review(review):
    try:
        split_review = review.split("\n")
        for split in split_review:
            if split.startswith("Type Of Traveller"):  
                traveltype = split.split("\t")[1].strip()
                print(traveltype)
                if traveltype == "Business":
                    return traveltype
                else:
                    traveltype = "Leisure"
                    return traveltype
        return "unknown"
    except:
        return "unknown"


In [108]:
def get_text_from_review(review):
    split_review = review.split("\n")
    text = split_review[3]
    return text

In [109]:
def get_year_from_review(review):
    try:
        split_review = review.split("\n")
        year = split_review[1].split(" ")[-1]
        return int(year)
    except:
        return "unknown"

In [111]:
scores = []
countries = []
recommendations = []
cabinflowns = []
traveltypes = []
texts = []
years = []

for review in reviews:
    score = get_score_from_review(review)
    scores.append(score)

    country = get_country_from_review(review)
    countries.append(country)

    recommended = get_recommendation_from_review(review)
    recommendations.append(recommended)
    
    cabinflown = get_cabinflown_from_review(review)
    cabinflowns.append(cabinflown)

    traveltype = get_traveltype_from_review(review)
    traveltypes.append(traveltype)

    text = get_text_from_review(review)
    texts.append(text)

    year = get_year_from_review(review)
    years.append(year)

In [112]:
df = pd.DataFrame({"score": scores,
                   "country": countries,
                   "recommended": recommendations,
                   "cabinflown": cabinflowns,
                   "traveltype": traveltypes,
                   "text": texts,
                   "year": years
                   })

In [113]:
df.head(5)

Unnamed: 0,score,country,recommended,cabinflown,traveltype,text,year
0,2,Canada,no,Economy,Couple Leisure,We flew with Germanwings (or tried to) on Sept...,2015
1,3,United Kingdom,no,Economy,Solo Leisure,I am less than impressed with Germanwings serv...,2015
2,3,Germany,no,Economy,Couple Leisure,Flew from Palma de Mallorca to Cologne with Ge...,2015
3,10,Germany,yes,unknown,Business,Good flight from Berlin-Tegel to London Heathr...,2015
4,4,Germany,no,Economy,Business,I don't get why Germanwings is always late and...,2015


In [114]:
df.score.value_counts()

score
4     15
2     14
9     14
3     13
5     13
na    13
10    12
7     11
8     11
1      7
6      5
Name: count, dtype: int64

In [115]:
df.year.value_counts()

year
2014    27
2013    24
2015    23
2012    17
2010    13
2011    10
2009     7
2008     7
Name: count, dtype: int64

In [116]:
df.recommended.value_counts()

recommended
no     66
yes    62
Name: count, dtype: int64

In [117]:
df.cabinflown.value_counts()

cabinflown
Economy        113
unknown         14
First Class      1
Name: count, dtype: int64

In [118]:
df.traveltype.value_counts()

traveltype
unknown            114
Business             5
Couple Leisure       4
Solo Leisure         2
FamilyLeisure        2
Couple Leis ure      1
Name: count, dtype: int64

In [120]:
df.to_excel("data.xlsx")

In [121]:
df.to_csv("data.csv")

In [122]:
df.score.value_counts()

score
4     15
2     14
9     14
3     13
5     13
na    13
10    12
7     11
8     11
1      7
6      5
Name: count, dtype: int64

In [124]:
df_clean = df[df.score!="na"].copy()
df_bad = df_clean[df_clean.score<5]
df_good = df_clean[df_clean.score>=6]


In [125]:
df_good.score.value_counts()


score
9     14
10    12
7     11
8     11
6      5
Name: count, dtype: int64

In [126]:
list_of_words = " ".join(df_good.text.values).replace(",", "").replace(".", "").replace('"', "").replace("!", "").replace("!", "").split(" ")

In [127]:
df_words = pd.DataFrame({"words": [x for x in list_of_words if len(x)>3]})

In [128]:
words_filtered = df_words.words.value_counts().index.values

frequencies = df_words.words.value_counts().values

In [129]:
wc_df_good = pd.DataFrame({"words": words_filtered, "frequencies": frequencies})

In [130]:
wc_df_good.head(3)

Unnamed: 0,words,frequencies
0,flight,49
1,time,37
2,were,36


In [131]:
wc_df_good.to_excel("wc_good.xlsx")
wc_df_good.to_csv("wc_good.csv")

In [132]:
df_bad.score.value_counts()

score
4    15
2    14
3    13
1     7
Name: count, dtype: int64

In [133]:
list_of_words = " ".join(df_bad.text.values).replace(",", "").replace(".", "").replace('"', "").replace("!", "").replace("!", "").split(" ")

In [134]:
df_words = pd.DataFrame({"words": [x for x in list_of_words if len(x)>3]})

In [135]:
words_filtered = df_words.words.value_counts().index.values

frequencies = df_words.words.value_counts().values

In [136]:
wc_df_bad = pd.DataFrame({"words": words_filtered, "frequencies": frequencies})

In [137]:
wc_df_bad.head(20)

Unnamed: 0,words,frequencies
0,flight,69
1,with,53
2,were,43
3,Germanwings,42
4,from,38
5,that,35
6,late,24
7,have,24
8,this,24
9,time,24


In [138]:
wc_df_bad.to_excel("wc_bad.xlsx")
wc_df_bad.to_csv("wc_bad.csv")

In [139]:

BASE_PROMPT =   """
                /nothinking

                I'll give you a review for an airline. Do the following:
                Scan the review for information on the route that the customer took, like Hamburg to Cologne, HAM-LHR etc. If there is route information return the information as a python dictionary where the key represents the airport code and the value represents the real city name.
                {"HAM": "Hamburg", "JFK": "New York City"}

                review:
                """
FULL_PROMPT = BASE_PROMPT + "\n" + reviews[99]

    
    


In [140]:
import requests
base_url = "http://127.0.0.1:1234"

# Define the full endpoint path (v1/models)
endpoint = "/v1/models"

# Construct the full URL
url = base_url + endpoint

# Set headers to mimic an OpenAI API request (optional, but good practice)
headers = {
    "Authorization": "lmstudio",  # Replace with your actual API key if needed
    "Content-Type": "application/json"
}

# Make the GET request to list models
response = requests.get(url, headers=headers)

In [141]:
from openai import OpenAI


def query_lmstudio(prompt: str, url: str = "http://localhost:1234/v1"):
    # Create OpenAI client pointing to LM Studio
    client = OpenAI(
        base_url=url,
        api_key="lm-studio"  # Dummy key required by SDK, ignored by LM Studio
    )
    try:
        response = client.chat.completions.create(
            model="google/gemma-3-4b",  # Replace with your LM Studio model name
            messages=[
                {"role": "user", "content": prompt}
            ],
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {e}"

In [142]:
from tqdm import tqdm

BASE_PROMPT =   """
                Task: Extract the flight route from the following review.
                Identify all cities in the text that were part of the flight route.


                
                Do not guess if the code is missing or ambiguous.
                Example Review:
                "Flew from Hamburg to Berlin and back. Great experience overall!"
                Expected Output:
                ["Hamburg", "Berlin"]

                RETURN ONLY AN ARRAY OF CITIES. DONT WRITE ANY CODE

                here is the review:
                
                """

In [143]:
import re

def extract_iata_codes(text):
    # Match patterns like "LHR", "JFK", etc. (three uppercase letters)
    return re.findall(r'\b[A-Z]{3}\b', text)

In [144]:
extract_iata_codes(reviews[122])

['STR', 'LIS', 'STR']

In [145]:
code_mapping = pd.read_excel("airport_codes.xlsx")

In [146]:
codes = code_mapping.code.values
cities = code_mapping.city.values

In [147]:
mapping = {}

for i in range(len(codes)):
    mapping[codes[i]] = cities[i]

In [148]:
def replace_airport_codes_in_text(text):
    codes_found = extract_iata_codes(text=text)
    for code in codes_found:
        try:
            text = text.replace(code, mapping[code])
        except:
            print(f"there was an error with {code}")
    return text

In [149]:
answers = []

for review in tqdm(reviews):
    review = replace_airport_codes_in_text(text=review)
    prompt = BASE_PROMPT + review
    answer = query_lmstudio(prompt=prompt)
    answers.append(answer)

  2%|▏         | 2/128 [00:07<07:24,  3.53s/it]

there was an error with EUR
there was an error with EUR
there was an error with EUR


 12%|█▏        | 15/128 [00:40<04:46,  2.53s/it]

there was an error with CRJ


 20%|██        | 26/128 [01:04<03:52,  2.28s/it]

there was an error with CRJ


 28%|██▊       | 36/128 [01:24<02:58,  1.94s/it]

there was an error with EUR


 31%|███▏      | 40/128 [01:31<02:48,  1.91s/it]

there was an error with APP


 39%|███▉      | 50/128 [01:50<02:24,  1.86s/it]

there was an error with ZHR


 48%|████▊     | 62/128 [02:17<02:40,  2.43s/it]

there was an error with LCC


 51%|█████     | 65/128 [02:22<02:05,  1.99s/it]

there was an error with FTL
there was an error with FTL
there was an error with FTL


 55%|█████▌    | 71/128 [02:37<02:05,  2.20s/it]

there was an error with FTL


 57%|█████▋    | 73/128 [02:42<01:58,  2.16s/it]

there was an error with ILS


 74%|███████▍  | 95/128 [03:23<01:03,  1.93s/it]

there was an error with GGN


 80%|████████  | 103/128 [03:35<00:38,  1.55s/it]

there was an error with TUI


 81%|████████▏ | 104/128 [03:37<00:37,  1.56s/it]

there was an error with TUI


 84%|████████▍ | 108/128 [03:42<00:25,  1.29s/it]

there was an error with EUR
there was an error with EUR


 87%|████████▋ | 111/128 [03:49<00:31,  1.86s/it]

there was an error with EUR


 88%|████████▊ | 112/128 [03:50<00:27,  1.71s/it]

there was an error with PAX


 89%|████████▉ | 114/128 [03:54<00:25,  1.86s/it]

there was an error with EUR


 98%|█████████▊| 126/128 [04:15<00:03,  1.84s/it]

there was an error with KLM


100%|██████████| 128/128 [04:19<00:00,  2.03s/it]


In [228]:
import pandas as pd

In [286]:
def process_answer(answer):
     try:
          city = answer.split('"')[1]
     except:
          city = "unknown"
     return city

In [287]:
processed_answers = []

for answer in answers:
    try:
        processed_answer = process_answer(answer=answer)
        processed_answers.append(processed_answer)
    except:
        processed_answers.append("unknown")

In [312]:
len(processed_answers)

128

In [313]:
processed_answers

['Stuttgart',
 'Hamburg',
 'Palma de Mallorca',
 'Berlin-Tegel',
 'Köln',
 'Istanbul',
 'Düsseldorf',
 'Dusseldorf',
 'Duesseldorf',
 'Cologne',
 'Hamburg',
 'Stansted',
 'Rome',
 'Hamburg',
 'Prague',
 'Duesseldorf',
 'Hamburg',
 'Duesseldorf',
 'Köln',
 'Bologna',
 'Duesseldorf',
 'Heathrow',
 'London',
 'Berlin',
 'London',
 'London',
 'Hamburg',
 'Stuttgart',
 'Manchester',
 'Cologne/Bonn',
 'London Heathrow',
 'Hanover',
 'Hamburg',
 'London Heathrow',
 'Vienna',
 'London',
 'Duesseldorf',
 'Brussels',
 'Dusseldorf',
 'Köln',
 'Hamburg',
 'Stansted',
 'Prague',
 'London',
 'Stansted',
 'Berlin',
 'Cologne',
 'Köln',
 'Brussels',
 'Barcelona',
 'Köln',
 'Helsinki',
 'Birmingham',
 'London Stansted',
 'Budapest',
 'Heathrow',
 'Vienna',
 'Köln',
 'United Kingdom',
 'Köln',
 'London',
 'Cologne',
 'Bremen',
 'Verona',
 'Hamburg',
 'Hamburg',
 'Cologne',
 'Split',
 'Manchester',
 'Hamburg',
 'London',
 'Köln',
 'Cologne',
 'Treviso',
 'London',
 'London',
 'Stansted',
 'Stuttgart',
 '

In [531]:
def clean_city_name(answer):
    answer = answer.replace("Dusseldorf", "Duesseldorf").replace("Düsseldorf", "Duesseldorf")
    if "Köln" in answer or "Cologne" in answer:
        answer = "Cologne/Bonn"
    if "Heathrow" in answer: 
        answer = "London Heathrow"
    if "Muenchen" in answer:
        answer = "Munich"
    if "Stansted" in answer:
        answer = "London Stansted"
    return answer 
 

In [532]:
df.departure_city =  df.departure_city.apply(clean_city_name)

In [533]:
departure_cities = []

for answer in processed_answers:
    departure_cities.append(clean_city_name(answer))

In [534]:
df.departure_city.value_counts()

departure_city
Cologne/Bonn         24
London               13
Hamburg              12
London Stansted      11
Duesseldorf           8
Berlin                6
Stuttgart             6
unknown               4
London Heathrow       4
Manchester            4
Prague                3
Istanbul              2
Brussels              2
Dublin                2
Munich                2
Palma de Mallorca     2
Vienna                2
Bologna               2
Germany               1
Zagreb                1
Strasbourg            1
Treviso               1
Edinburgh             1
Ireland               1
Dubrovnik             1
Barcelona             1
Split                 1
Verona                1
Bremen                1
United Kingdom        1
Budapest              1
Birmingham            1
Helsinki              1
Hanover               1
Rome                  1
Berlin-Tegel          1
Bucharest             1
Name: count, dtype: int64

In [3]:
import pandas as pd
df=pd.read_csv("data.csv")

In [5]:
df

Unnamed: 0,score,country,recommended,cabinflown,traveltype,year,departure_city,delay_mention,foodanddrink_mention,service_mention,seat_mention,checkin_mention,text,money
0,2,Canada,no,Economy,Couple Leisure,2015,Stuttgart,True,False,False,False,True,We flew with Germanwings (or tried to) on Sept...,False
1,3,United Kingdom,no,Economy,Solo Leisure,2015,Hamburg,False,True,True,False,False,I am less than impressed with Germanwings serv...,False
2,3,Germany,no,Economy,Couple Leisure,2015,Palma de Mallorca,True,True,True,True,True,Flew from Palma de Mallorca to Cologne with Ge...,False
3,10,Germany,yes,unknown,Business,2015,Berlin-Tegel,True,False,False,False,False,Good flight from Berlin-Tegel to London Heathr...,False
4,4,Germany,no,Economy,Business,2015,Cologne/Bonn,True,False,True,False,False,I don't get why Germanwings is always late and...,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
123,4,unknown,no,unknown,unknown,2008,London Stansted,True,False,False,False,True,STN-CGN-STN. Looking forward to renewing my ac...,False
124,4,unknown,no,unknown,unknown,2008,London Stansted,True,False,True,True,False,STN-CGN-STN first time with Germanwings. Some ...,False
125,3,unknown,no,unknown,unknown,2008,Berlin,False,False,False,False,False,The last-minute addition of ï¿½13 for use of a...,False
126,1,unknown,no,unknown,unknown,2008,London Stansted,True,False,True,False,False,STN-CGN-STN. My second time on GermanWings. Th...,False


In [None]:
df[df.text.str.contains("delayed", case=False)][df.score!="na"]

In [510]:
texts = df.text
keywords = ["delayed", "late", "delay"]

for word in keywords:
    if word in text:
        print(f"{word} is in the sentence")

In [511]:
def delay_mentioned(text):
    keywords = ["delayed", "late", "delay"]
    text_lower = text.lower()
    for word in keywords:
        if word in text_lower:
            return True
    return False


In [512]:
df["delay_mention"] = df.text.apply(delay_mentioned)

In [513]:
df.iloc[20:50].delay_mention.values[10]


False

In [514]:
df[df.delay_mention][df.score!="na"].score.mean()

  df[df.delay_mention][df.score!="na"].score.mean()


4.491525423728813

In [515]:
df.iloc[10:30].delay_mention.values[10]

False

In [516]:
def foodanddrink_mentioned(text):
    keywords = ["food", "drink"]
    text_lower = text.lower()
    for word in keywords:
        if word in text_lower:
            return True
    return False

In [517]:
df["foodanddrink_mention"] = df.text.apply(foodanddrink_mentioned)

In [518]:
df.foodanddrink_mention

0      False
1       True
2       True
3      False
4      False
       ...  
123    False
124    False
125    False
126    False
127    False
Name: foodanddrink_mention, Length: 128, dtype: bool

In [519]:
def service_mentioned(text):
    keywords = ["service"]
    text_lower = text.lower()
    for word in keywords:
        if word in text_lower:
            return True
    return False

In [520]:
df["service_mention"] = df.text.apply(service_mentioned)

In [521]:
df.iloc[10:30].delay_mention

10    False
11    False
12     True
13    False
14     True
15     True
16    False
17     True
18    False
19     True
20    False
21    False
22     True
23     True
24     True
25    False
26     True
27     True
28    False
29     True
Name: delay_mention, dtype: bool

In [522]:
def seat_mentioned(text):
    keywords = ["seat", "seating", "seats"]
    text_lower = text.lower()
    for word in keywords:
        if word in text_lower:
            return True
    return False

In [523]:
df["seat_mention"] = df.text.apply(seat_mentioned)

In [524]:
def checkin_mentioned(text):
    keywords = ["check-in", "check in"]
    text_lower = text.lower()
    for word in keywords:
        if word in text_lower:
            return True
    return False

In [525]:
df["checkin_mention"] = df.text.apply(checkin_mentioned)

In [None]:
# delay_mention =[]
# for value in df["delay_mention"]:
#     delay_mention.append(value)

# foodanddrink_mention = []
# for value in df["foodanddrink_mention"]:
#     foodanddrink_mention.append(value)

# service_mention = []
# for value in df["service_mention"]:
#     service_mention.append(value)

# seat_mention = []
# for value in df["seat_mention"]:
#     seat_mention.append(value)

# checkin_mention = []
# for value in df["checkin_mention"]:
#     checkin_mention.append(value)

In [None]:
# df = pd.DataFrame({
#     "score": scores,
#     "country": countries,
#     "recommended": recommendations,
#     "cabinflown": cabinflowns,
#     "traveltype": traveltypes,
#     "year": years,
#     "departure_city": departure_cities,
#     "delay_mention": delay_mention,
#     "foodanddrink_mention": foodanddrink_mention,
#     "service_mention": service_mention,
#     "seat_mention": seat_mention,
#     "checkin_mention": checkin_mention,
#     "text": texts
# })

In [528]:
df.head(10).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
score,2,3,3,10,4,1,7,2,2,5
country,Canada,United Kingdom,Germany,Germany,Germany,United Kingdom,Germany,United Kingdom,Gibraltar,Australia
recommended,no,no,no,yes,no,no,yes,no,no,no
cabinflown,Economy,Economy,Economy,unknown,Economy,Economy,Economy,Economy,Economy,Economy
traveltype,Couple Leisure,Solo Leisure,Couple Leisure,Business,Business,Couple Leisure,Solo Leisure,FamilyLeisure,Business,Couple Leisure
year,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015
departure_city,Stuttgart,Hamburg,Palma de Mallorca,Berlin-Tegel,Cologne/Bonn,Istanbul,Duesseldorf,Duesseldorf,Duesseldorf,Cologne/Bonn
delay_mention,True,False,True,True,True,True,False,True,True,False
foodanddrink_mention,False,True,True,False,False,False,True,False,False,True
service_mention,False,True,True,False,True,False,False,False,False,False


In [26]:
df.to_excel("data.xlsx",index=False)
df.to_csv("data.csv",index=False)

In [7]:
import pandas as pd

In [8]:
df = pd.read_excel("data.xlsx")

In [22]:
df["traveltype_binary"] = [get_traveltype_from_review(review) for review in reviews]

Couple Leisure
Solo Leisure
Couple Leisure
Business
Business
Couple Leisure
Solo Leisure
FamilyLeisure
Business
Couple Leisure
Business
FamilyLeisure
Business
Couple Leis ure


In [28]:
df.departure_city.value_counts()

departure_city
Cologne/Bonn         24
London               13
Hamburg              12
London Stansted      11
Duesseldorf           8
Berlin                6
Stuttgart             6
unknown               4
London Heathrow       4
Manchester            4
Prague                3
Istanbul              2
Brussels              2
Dublin                2
Munich                2
Palma de Mallorca     2
Vienna                2
Bologna               2
Germany               1
Zagreb                1
Strasbourg            1
Treviso               1
Edinburgh             1
Ireland               1
Dubrovnik             1
Barcelona             1
Split                 1
Verona                1
Bremen                1
United Kingdom        1
Budapest              1
Birmingham            1
Helsinki              1
Hanover               1
Rome                  1
Berlin-Tegel          1
Bucharest             1
Name: count, dtype: int64

In [27]:
df.head()

Unnamed: 0,score,country,recommended,cabinflown,traveltype,year,departure_city,delay_mention,foodanddrink_mention,service_mention,seat_mention,checkin_mention,text,traveltype_binary
0,2,Canada,no,Economy,Couple Leisure,2015,Stuttgart,True,False,False,False,True,We flew with Germanwings (or tried to) on Sept...,Leisure
1,3,United Kingdom,no,Economy,Solo Leisure,2015,Hamburg,False,True,True,False,False,I am less than impressed with Germanwings serv...,Leisure
2,3,Germany,no,Economy,Couple Leisure,2015,Palma de Mallorca,True,True,True,True,True,Flew from Palma de Mallorca to Cologne with Ge...,Leisure
3,10,Germany,yes,unknown,Business,2015,Berlin-Tegel,True,False,False,False,False,Good flight from Berlin-Tegel to London Heathr...,Business
4,4,Germany,no,Economy,Business,2015,Cologne/Bonn,True,False,True,False,False,I don't get why Germanwings is always late and...,Business
