# Keeping it descriptive

To further understand travelers' experiences in the San Francisco Airport, the quality assurance department sent out a qualitative questionnaire to all travelers who gave the airport the worst score on all possible categories. The objective behind this questionnaire is to identify common patterns in what travelers are saying about the airport.

Their response is stored in the `survey_response` column. Upon a closer look, you realized a few of the answers gave the shortest possible character amount without much substance. In this exercise, you will isolate the responses with a character count higher than ***40*** , and make sure your new DataFrame contains responses with ***40*** characters or more using an `assert` statement.

The `airlines` DataFrame is in your environment, and `pandas` is imported as `pd`.

In [3]:
import pandas as pd
import numpy as np
from faker import Faker

path=r'Z:/'
file='airlines_final.csv'

airlines = pd.read_csv(path+file)
print(airlines.head(),'\n')

# Set up Faker with a specific seed for reproducibility
fake = Faker()
Faker.seed(0)

# Number of rows in the DataFrame
num_rows = airlines.shape[0]

# Generate random names and titles
data = {
    'Title': [fake.random_element(elements=('Dr.', 'Mr.', 'Ms.', 'Miss')) for _ in range(num_rows)],
    'First_Name': [fake.first_name() for _ in range(num_rows)],
    'Last_Name': [fake.last_name() for _ in range(num_rows)],
}

# Create DataFrame
df = pd.DataFrame(data)

# Combine columns to create a 'Full_Name' column
airlines['full_name'] = df['Title'] + ' ' + df['First_Name'] + ' ' + df['Last_Name']

# Display the DataFrame
print(airlines.head())

   Unnamed: 0    id        day      airline        destination    dest_region  \
0           0  1351    Tuesday  UNITED INTL             KANSAI           Asia   
1           1   373     Friday       ALASKA  SAN JOSE DEL CABO  Canada/Mexico   
2           2  2820   Thursday        DELTA        LOS ANGELES        West US   
3           3  1157    Tuesday    SOUTHWEST        LOS ANGELES        West US   
4           4  2992  Wednesday     AMERICAN              MIAMI        East US   

  dest_size boarding_area   dept_time  wait_min     cleanliness  \
0       Hub  Gates 91-102  2018-12-31     115.0           Clean   
1     Small   Gates 50-59  2018-12-31     135.0           Clean   
2       Hub   Gates 40-48  2018-12-31      70.0         Average   
3       Hub   Gates 20-39  2018-12-31     190.0           Clean   
4       Hub   Gates 50-59  2018-12-31     559.0  Somewhat clean   

          safety        satisfaction  
0        Neutral      Very satisfied  
1      Very safe      Very satis

In [4]:

responses = ['It was terrible', "I didn't like the flight", 'I hate this ',
             'Not a fan', 'Bad', 'Horrible', 'Very poor', 'Unacceptable flight',
             'It was awful', 'My flight was really unpleasant',
             'I am not a fan', 'I had a bad flight', 'It was very bad',
             'It was horrible', 'Terrible', 'It was substandard',
             'I did not enjoy the flight',
             'The airport personnel forgot to alert us of delayed flights, the bathrooms could have been cleaner',
             'The food in the airport was really expensive - also no automatic escalators!',
             'One of the other travelers was really loud and talkative and was making a scene and no one did anything about it',
             "I don't remember answering the survey with these scores, my experience was great! ",
             'The airport personnel kept ignoring my requests for directions ',
             'The chair I sat in was extremely uncomfortable, I still have back pain to this day! ',
             'I wish you were more like other airports, the flights were really disorganized! ',
             'I was really unsatisfied with the wait times before the flight. It was unacceptable.',
             "The flight was okay, but I didn't really like the number of times I had to stop at security",
             'We were really slowed down by security measures, I missed my flight because of it! ',
             'There was a spill on the aisle next to the bathroom and it took hours to clean!',
             'I felt very unsatisfied by how long the flight took to take off.'
             ]

# Generate a random 'Survey_Response' column with negative responses
airlines['Survey_Response'] = np.random.choice(responses, size=len(airlines))


# Display the DataFrame
print(airlines.head())

   Unnamed: 0    id        day      airline        destination    dest_region  \
0           0  1351    Tuesday  UNITED INTL             KANSAI           Asia   
1           1   373     Friday       ALASKA  SAN JOSE DEL CABO  Canada/Mexico   
2           2  2820   Thursday        DELTA        LOS ANGELES        West US   
3           3  1157    Tuesday    SOUTHWEST        LOS ANGELES        West US   
4           4  2992  Wednesday     AMERICAN              MIAMI        East US   

  dest_size boarding_area   dept_time  wait_min     cleanliness  \
0       Hub  Gates 91-102  2018-12-31     115.0           Clean   
1     Small   Gates 50-59  2018-12-31     135.0           Clean   
2       Hub   Gates 40-48  2018-12-31      70.0         Average   
3       Hub   Gates 20-39  2018-12-31     190.0           Clean   
4       Hub   Gates 50-59  2018-12-31     559.0  Somewhat clean   

          safety        satisfaction            full_name  \
0        Neutral      Very satisfied   Miss Colle

* Using the `airlines` DataFrame, store the length of each instance in the `survey_response` column in `resp_length` by using `.str.len()`.
* Isolate the rows of `airlines` with `resp_length` higher than `40`.
* Assert that the smallest `survey_response` length in `airlines_survey` is now bigger than `40`.


In [7]:
# Store length of each row in survey_response column
resp_length = airlines['Survey_Response'].str.len()

# Find rows in airlines where resp_length > 40
airlines_survey = airlines[resp_length > 40]

# Assert minimum survey_response length is > 40
assert airlines_survey['Survey_Response'].str.len().min() > 40

# Print new survey_response column
print(airlines_survey['Survey_Response'])

2       There was a spill on the aisle next to the bat...
5       We were really slowed down by security measure...
6       The airport personnel kept ignoring my request...
8       The airport personnel forgot to alert us of de...
10      One of the other travelers was really loud and...
                              ...                        
2464    The airport personnel kept ignoring my request...
2466    The airport personnel forgot to alert us of de...
2467    One of the other travelers was really loud and...
2469    We were really slowed down by security measure...
2473    I was really unsatisfied with the wait times b...
Name: Survey_Response, Length: 1060, dtype: object
