# Removing titles and taking names

While collecting survey respondent metadata in the `airlines` DataFrame, the full name of respondents was saved in the `full_name` column. However upon closer inspection, you found that a lot of the different names are prefixed by honorifics such as `"Dr."`, `"Mr."`, `"Ms."` and `"Miss"`.

Your ultimate objective is to create two new columns named `first_name` and `last_name`, containing the first and last names of respondents respectively. Before doing so however, you need to remove honorifics.

The `airlines` DataFrame is in your environment, alongside `pandas` as pd.

In [1]:
import pandas as pd
import numpy as np

path=r'Z:/'
file='airlines_final.csv'

airlines = pd.read_csv(path+file)
print(airlines.head(),'\n')

   Unnamed: 0    id        day      airline        destination    dest_region  \
0           0  1351    Tuesday  UNITED INTL             KANSAI           Asia   
1           1   373     Friday       ALASKA  SAN JOSE DEL CABO  Canada/Mexico   
2           2  2820   Thursday        DELTA        LOS ANGELES        West US   
3           3  1157    Tuesday    SOUTHWEST        LOS ANGELES        West US   
4           4  2992  Wednesday     AMERICAN              MIAMI        East US   

  dest_size boarding_area   dept_time  wait_min     cleanliness  \
0       Hub  Gates 91-102  2018-12-31     115.0           Clean   
1     Small   Gates 50-59  2018-12-31     135.0           Clean   
2       Hub   Gates 40-48  2018-12-31      70.0         Average   
3       Hub   Gates 20-39  2018-12-31     190.0           Clean   
4       Hub   Gates 50-59  2018-12-31     559.0  Somewhat clean   

          safety        satisfaction  
0        Neutral      Very satisfied  
1      Very safe      Very satis

In [6]:
print(airlines.shape[0])

2477


In [9]:
import pandas as pd
from faker import Faker

# Set up Faker with a specific seed for reproducibility
fake = Faker()
Faker.seed(0)

# Number of rows in the DataFrame
num_rows = airlines.shape[0]

# Generate random names and titles
data = {
    'Title': [fake.random_element(elements=('Dr.', 'Mr.', 'Ms.', 'Miss')) for _ in range(num_rows)],
    'First_Name': [fake.first_name() for _ in range(num_rows)],
    'Last_Name': [fake.last_name() for _ in range(num_rows)],
}

# Create DataFrame
df = pd.DataFrame(data)

# Combine columns to create a 'Full_Name' column
airlines['full_name'] = df['Title'] + ' ' + df['First_Name'] + ' ' + df['Last_Name']

# Display the DataFrame
print(airlines.head())


   Unnamed: 0    id        day      airline        destination    dest_region  \
0           0  1351    Tuesday  UNITED INTL             KANSAI           Asia   
1           1   373     Friday       ALASKA  SAN JOSE DEL CABO  Canada/Mexico   
2           2  2820   Thursday        DELTA        LOS ANGELES        West US   
3           3  1157    Tuesday    SOUTHWEST        LOS ANGELES        West US   
4           4  2992  Wednesday     AMERICAN              MIAMI        East US   

  dest_size boarding_area   dept_time  wait_min     cleanliness  \
0       Hub  Gates 91-102  2018-12-31     115.0           Clean   
1     Small   Gates 50-59  2018-12-31     135.0           Clean   
2       Hub   Gates 40-48  2018-12-31      70.0         Average   
3       Hub   Gates 20-39  2018-12-31     190.0           Clean   
4       Hub   Gates 50-59  2018-12-31     559.0  Somewhat clean   

          safety        satisfaction            Full_Name            full_name  
0        Neutral      Very sa

* Remove `"Dr."`, `"Mr."`, `"Miss"` and `"Ms."` from `full_name` by replacing them with an empty string `""` in that order.
* Run the `assert` statement using `.str.contains()` that tests whether `full_name` still contains any of the honorifics.

In [11]:
# Replace "Dr." with empty string ""
airlines['full_name'] = airlines['full_name'].str.replace("Dr.","")

# Replace "Mr." with empty string ""
airlines['full_name'] = airlines['full_name'].str.replace("Mr.","")

# Replace "Miss" with empty string ""
airlines['full_name'] = airlines['full_name'].str.replace("Miss","")

# Replace "Ms." with empty string ""
airlines['full_name'] = airlines['full_name'].str.replace("Ms.","")

# Assert that full_name has no honorifics
assert airlines['full_name'].str.contains('Ms.|Mr.|Miss|Dr.').any() == False

AssertionError: 