# Modifying DataFrames
## Adding a Column I

Sometimes, we want to add a column to an existing DataFrame. We might want to add new information or perform a calculation based on the data that we already have.<br>
<br>
One way that we can add a new column is by giving a list of the same length as the existing DataFrame.<br>
<br>
Suppose we own a hardware store called The Handy Woman and have a DataFrame containing inventory information:
<table>
    <tr align="right">
        <th>Product ID</th>
        <th>Product Description</th>
        <th>Cost to Manufacture</th>
        <th>Price</th>
    </tr>
    <tr align="right">
        <td>1</td>
        <td>3 inch screw</td>
        <td>0.50</td>
        <td>0.75</td>
    </tr>
    <tr align="right">
        <td>2</td>
        <td>2 inch nail</td>
        <td>0.10</td>
        <td>0.25</td>
    </tr>
    <tr align="right">
        <td>3</td>
        <td>hammer</td>
        <td>3.00</td>
        <td>5.00</td>
    </tr>
    <tr align="right">
        <td>4</td>
        <td>screwdriver</td>
        <td>2.50</td>
        <td>3.00</td>
    </tr>
</table>

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)
df

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price
0,1,3 inch screw,0.5,0.75
1,2,2 inch nail,0.1,0.25
2,3,hammer,3.0,5.5
3,4,screwdriver,2.5,3.0


It looks like the actual quantity of each product in our warehouse is missing!<br>
<br>
Let us use the following code to add that information to our DataFrame.

In [3]:
df['Quantity'] = [100, 150, 50, 35]

Our new DataFrame looks like this:

In [4]:
df

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Quantity
0,1,3 inch screw,0.5,0.75,100
1,2,2 inch nail,0.1,0.25,150
2,3,hammer,3.0,5.5,50
3,4,screwdriver,2.5,3.0,35


***

### Exercise

The DataFrame `df` contains information on products sold at a hardware store. Add a column to `df` called 'Sold in Bulk?', which indicates if the product is sold in bulk or individually. The final table should look like this:
<table>
    <tr align="right">
        <th>Product ID</th>
        <th>Product Description</th>
        <th>Cost to Manufacture</th>
        <th>Price</th>
        <th>Sold in Bulk?<th>
    </tr>
    <tr align="right">
        <td>1</td>
        <td>3 inch screw</td>
        <td>0.50</td>
        <td>0.75</td>
        <td>Yes</td>
    </tr>
    <tr align="right">
        <td>2</td>
        <td>2 inch nail</td>
        <td>0.10</td>
        <td>0.25</td>
        <td>Yes</td>
    </tr>
    <tr align="right">
        <td>3</td>
        <td>hammer</td>
        <td>3.00</td>
        <td>5.00</td>
        <td>No</td>
    </tr>
    <tr align="right">
        <td>4</td>
        <td>screwdriver</td>
        <td>2.50</td>
        <td>3.00</td>
        <td>No</td>
    </tr>
</table>

In [5]:
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)

In [6]:
df['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']

In [7]:
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?
0           1  3 inch screw                  0.5   0.75           Yes
1           2   2 inch nail                  0.1   0.25           Yes
2           3        hammer                  3.0   5.50            No
3           4   screwdriver                  2.5   3.00            No


***

## Adding a Column II

We can also add a new column that is the same for all rows in the DataFrame. Let us return to our inventory example:
<table>
    <tr align="right">
        <th>Product ID</th>
        <th>Product Description</th>
        <th>Cost to Manufacture</th>
        <th>Price</th>
    </tr>
    <tr align="right">
        <td>1</td>
        <td>3 inch screw</td>
        <td>0.50</td>
        <td>0.75</td>
    </tr>
    <tr align="right">
        <td>2</td>
        <td>2 inch nail</td>
        <td>0.10</td>
        <td>0.25</td>
    </tr>
    <tr align="right">
        <td>3</td>
        <td>hammer</td>
        <td>3.00</td>
        <td>5.00</td>
    </tr>
    <tr align="right">
        <td>4</td>
        <td>screwdriver</td>
        <td>2.50</td>
        <td>3.00</td>
    </tr>
</table>

In [8]:
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)

Suppose we know that all of our products are currently in-stock. We can add a column that says this:

In [9]:
df['In Stock?'] = True

Now all of the rows have a column called `In Stock?` with value `True`.

In [10]:
df

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,In Stock?
0,1,3 inch screw,0.5,0.75,True
1,2,2 inch nail,0.1,0.25,True
2,3,hammer,3.0,5.5,True
3,4,screwdriver,2.5,3.0,True


***

### Exercise

Add a column to df called `Is taxed?`, which indicates whether or not to collect sales tax on the product. It should be `'Yes'` for all rows.

In [11]:
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)

In [12]:
df['Is taxed?'] = 'Yes'

In [13]:
df

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Is taxed?
0,1,3 inch screw,0.5,0.75,Yes
1,2,2 inch nail,0.1,0.25,Yes
2,3,hammer,3.0,5.5,Yes
3,4,screwdriver,2.5,3.0,Yes


***

## Adding a Column III

Finally, you can add a new column by performing a function on the existing columns.<br>
<br>
Maybe we want to add a column to our inventory table with the amount of sales tax that we need to charge for each item. The following code multiplies each `Price` by `0.075`, the sales tax for our state:

In [14]:
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)

In [15]:
df['Sales Tax'] = df.Price * 0.075

Now our table has a column called Sales Tax:

In [16]:
df

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Sales Tax
0,1,3 inch screw,0.5,0.75,0.05625
1,2,2 inch nail,0.1,0.25,0.01875
2,3,hammer,3.0,5.5,0.4125
3,4,screwdriver,2.5,3.0,0.225


***

### Exercise:

Add a column to `df` called `'Margin'`, which is equal to the difference between the `Price` and the `Cost to Manufacture`.

In [17]:
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)

In [18]:
df['Margin'] = df['Price'] - df['Cost to Manufacture']

In [19]:
df

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Margin
0,1,3 inch screw,0.5,0.75,0.25
1,2,2 inch nail,0.1,0.25,0.15
2,3,hammer,3.0,5.5,2.5
3,4,screwdriver,2.5,3.0,0.5


***

## Performing Column Operations

In the previous exercise, we learned how to add columns to a DataFrame.<br>
<br>
Often, the column that we want to add is related to existing columns, but requires a calculation more complex than multiplication or addition.<br>
<br>
For example, imagine that we have the following table of customers.<br>
<table>
    <tr align="left">
        <th>Name</th>
        <th>Email</th>
    </tr>
    <tr>
        <td>JOHN SMITH</td>
        <td>john.smith@gmail.com</td>
    </tr>
    <tr>
        <td>Jane Doe</td>
        <td>jdoe@yahoo.com</td>
    </tr>
    <tr>
        <td>joe schmo</td>
        <td>joeschmo@hotmail.com</td>
    </tr>
</table>

In [20]:
df = pd.DataFrame([
  ['JOHN SMITH', 'john.smith@gmail.com'],
  ['Jane Doe', 'jdoe@yahoo.com'],
  ['joe schmo', 'joeschmo@hotmail.com']
],
columns=['Name', 'Email'])

It is a little annoying that the capitalization is different for each row. Perhaps we wouldd like to make it more consistent by making all of the letters uppercase.<br>
<br>
We can use the `apply` function to apply a function to every value in a particular column. For example, this code overwrites the existing `'Name'` columns by applying the `str.upper` method to every row in `'Name'`.

In [21]:
df['Name'] = df.Name.apply(str.upper)

The result:

In [22]:
df

Unnamed: 0,Name,Email
0,JOHN SMITH,john.smith@gmail.com
1,JANE DOE,jdoe@yahoo.com
2,JOE SCHMO,joeschmo@hotmail.com


***

Apply the `str.lower` to all names in column `'Name'` in `df`. Assign these new names to a new column of `df` called `'Lowercase Name'`. The final DataFrame should look like this:<br>
<table>
    <tr align="left">
        <th>Name</th>
        <th>Email</th>
        <th>Lowercase Name<th>
    </tr>
    <tr>
        <td>JOHN SMITH</td>
        <td>john.smith@gmail.com</td>
        <td>john smith</td>
    </tr>
    <tr>
        <td>Jane Doe</td>
        <td>jdoe@yahoo.com</td>
        <td>jane doe</td>
    </tr>
    <tr>
        <td>joe schmo</td>
        <td>joeschmo@hotmail.com</td>
        <td>joe schmo</td>
    </tr>
</table>

In [23]:
df = pd.DataFrame([
  ['JOHN SMITH', 'john.smith@gmail.com'],
  ['Jane Doe', 'jdoe@yahoo.com'],
  ['joe schmo', 'joeschmo@hotmail.com']
],
columns=['Name', 'Email'])

In [24]:
df['Lowercase Name'] = df.Name.apply(str.lower)

In [25]:
df

Unnamed: 0,Name,Email,Lowercase Name
0,JOHN SMITH,john.smith@gmail.com,john smith
1,Jane Doe,jdoe@yahoo.com,jane doe
2,joe schmo,joeschmo@hotmail.com,joe schmo


***

## Reviewing Lambda Function

A <i>lambda function</i> is a way of defining a function in a single line of code. Usually, we would assign them to a variable.<br>
<br>
For example, the following lambda function multiplies a number by $2$ and then adds $3$:

In [26]:
mylambda = lambda x: (x * 2) + 3
print(mylambda(5))

13


Lambda functions work with all types of variables, not just integers! Here is an example that takes in a string, assigns it to the temporary variable `x`, and then converts it into lowercase:

In [27]:
stringlambda = lambda x: x.lower()
print(stringlambda("Oh Hi Mark!"))

oh hi mark!


***

### Exercise

Create a lambda function `mylambda` that returns the first and last letters of a string, assuming the string is at least 2 characters long. For example,<br>
<br>
`print(mylambda('This is a string'))`<br>
<br>
should produce:<br>
<br>
`'Tg'`

In [28]:
mylambda = lambda x: x[0] + x[-1]

In [29]:
print(mylambda('This is a string'))

Tg


***

## Reviewing Lambda Function: If Statements

We can make our lambdas more complex by using a modified form of an if statement.<br>
<br>
Suppose we want to pay workers time-and-a-half for overtime (any work above 40 hours per week). The following function will convert the number of hours into time-and-a-half hours using an if statement:

In [30]:
def myfunction(x):
    if x > 40:
        return 40 + (x - 40) * 1.50
    else:
        return x

Below is a lambda function that does the same thing:

In [31]:
myfunction = lambda x: 40 + (x - 40) * 1.50 if x > 40 else x

In general, the syntax for an if function in a lambda function is:<br>
`lambda x: [OUTCOME IF TRUE] if [CONDITIONAL] else [OUTCOME IF FALSE]`

***

### Exercise

You are managing the webpage of a somewhat violent video game and you want to check that each user's age is 13 or greater when they visit the site.<br>
<br>
Write a lambda function that takes an inputted age and either returns `Welcome to BattleCity!` if the user is 13 or older or `You must be over 13` if they are younger than 13. Your lambda function should be called `mylambda`.

In [32]:
mylambda = lambda x: 'Welcome to BattleCity' if x >= 13 else 'You must be over 13'

***

## Applying a Lambda to a Column

In Pandas, we often use lambda functions to perform complex operations on columns. For example, suppose that we want to create a column containing the email provider for each email address in the following table:

In [33]:
df = pd.DataFrame([
  ['JOHN SMITH', 'john.smith@gmail.com'],
  ['Jane Doe', 'jdoe@yahoo.com'],
  ['joe schmo', 'joeschmo@hotmail.com']
],
columns=['Name', 'Email'])

df

Unnamed: 0,Name,Email
0,JOHN SMITH,john.smith@gmail.com
1,Jane Doe,jdoe@yahoo.com
2,joe schmo,joeschmo@hotmail.com


We could use the following code with a lambda function and the string method `.split()`:

In [34]:
df['Email Provider'] = df.Email.apply(
    lambda x: x.split('@')[-1]
    )

The result is:

In [35]:
df

Unnamed: 0,Name,Email,Email Provider
0,JOHN SMITH,john.smith@gmail.com,gmail.com
1,Jane Doe,jdoe@yahoo.com,yahoo.com
2,joe schmo,joeschmo@hotmail.com,hotmail.com


***

### Exercise

Create a lambda function `get_last_name` which takes a string with someone's first and last name (i.e., `John Smith`), and returns just the last name (i.e., `Smith`).

In [36]:
get_last_name = lambda x: x.split()[-1]

In [37]:
print(get_last_name('John Smith'))

Smith


***

The DataFrame `df` represents the hours worked by different employees over the course of the week. It contains the following columns:

* `'name'`: The employee’s name
* `'hourly_wage'`: The employee’s hourly wage
* `'hours_worked'`: The number of hours worked this week

Use the lambda function `get_last_name` to create a new column `last_name` with only the employees' last name.

In [38]:
df = pd.read_csv('employees.csv')
df.head()

Unnamed: 0,id,name,hourly_wage,hours_worked
0,10310,Lauren Durham,19,43
1,18656,Grace Sellers,17,40
2,61254,Shirley Rasmussen,16,30
3,16886,Brian Rojas,18,47
4,89010,Samantha Mosley,11,38


In [39]:
df['last_name'] = df.name.apply(
    get_last_name
)

df.head()

Unnamed: 0,id,name,hourly_wage,hours_worked,last_name
0,10310,Lauren Durham,19,43,Durham
1,18656,Grace Sellers,17,40,Sellers
2,61254,Shirley Rasmussen,16,30,Rasmussen
3,16886,Brian Rojas,18,47,Rojas
4,89010,Samantha Mosley,11,38,Mosley


***

## Applying a Lambda to a Row

We can also operate on multiple columns at once. If we use `apply` without specifying a single column and add the argument `axis=1`, the input to our lambda function will be an entire row, not a column. To access particular values of the row, we use the syntax `row.column_name` or `row['column_name']`.<br>
<br>
Suppose we have a table representing a grocery list:

In [40]:
df = pd.DataFrame([
  ['Apple', 1.00, 'No'],
  ['Milk', 4.20, 'No'],
  ['Paper Towels', 5.00, 'Yes'],
  ['Light Bulbs', 3.75, 'Yes']
],
columns=['Item', 'Price', 'Is taxed?'])

df

Unnamed: 0,Item,Price,Is taxed?
0,Apple,1.0,No
1,Milk,4.2,No
2,Paper Towels,5.0,Yes
3,Light Bulbs,3.75,Yes


If we want to add in the price with tax for each line, we'll need to look at two columns: `Price` and `Is taxed?`.<br>
<br>
If `Is taxed?` is `Yes`, then we'll want to multiply `Price` by `1.075` (for $7.5\%$ sales tax).<br>
<br>
If `Is taxed?` is `No`, we'll just have `Price` without multiplying it.<br>
<br>
We can create this column using a lambda function and the keyword `axis=1`:

In [41]:
df['Price with Tax'] = df.apply(lambda row:
     row['Price'] * 1.075
     if row['Is taxed?'] == 'Yes'
     else row['Price'],
     axis=1
)

df

Unnamed: 0,Item,Price,Is taxed?,Price with Tax
0,Apple,1.0,No,1.0
1,Milk,4.2,No,4.2
2,Paper Towels,5.0,Yes,5.375
3,Light Bulbs,3.75,Yes,4.03125


***

### Exercise

If an employee worked for more than $40$ hours, she needs to be paid overtime ($1.5$ times the normal hourly wage).<br>
<br>
For instance, if an employee worked for $43$ hours and made \$$10$/hour, she would receive \$$400$ for the first $40$ hours that she worked, and an additional \$$45$ for the $3$ hours of overtime, for a total for \$$445$.<br>
<br>
Create a lambda function `total_earned` that accepts an input row with keys `hours_worked` and `hourly_wage` and uses an if statement to calculate the hourly wage.

In [42]:
total_earned = lambda row: (row['hours_worked'] * row['hourly_wage']) if row['hours_worked'] < 40 \
                           else \
                           (40 * row['hourly_wage'] + ((row['hours_worked'] - 40) * row['hourly_wage'] * 1.5))

Use the lambda function `total_earned` and `apply` to add a column `total_earned` to `df` with the total amount earned by each employee.

In [43]:
df = pd.read_csv('employees.csv')

df.head()

Unnamed: 0,id,name,hourly_wage,hours_worked
0,10310,Lauren Durham,19,43
1,18656,Grace Sellers,17,40
2,61254,Shirley Rasmussen,16,30
3,16886,Brian Rojas,18,47
4,89010,Samantha Mosley,11,38


In [44]:
df['total_earned'] = df.apply(total_earned, axis=1)

df.head()

Unnamed: 0,id,name,hourly_wage,hours_worked,total_earned
0,10310,Lauren Durham,19,43,845.5
1,18656,Grace Sellers,17,40,680.0
2,61254,Shirley Rasmussen,16,30,480.0
3,16886,Brian Rojas,18,47,909.0
4,89010,Samantha Mosley,11,38,418.0


***

## Renaming Columns

When we get our data from other sources, we often want to change the column names. For example, we might want all of the column names to follow variable name rules, so that we can use `df.column_name` (which tab-completes) rather than `df['column_name']` (which takes up extra space).<br>
<br>
You can change all of the column names at once by setting the `.columns` property to a different list. This is great when you need to change all of the column names at once, but be careful! You can easily mislabel columns if you get the ordering wrong. Here’s an example:

In [45]:
df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred'],
    'age': [23, 29, 21, 18]
})
df.columns = ['First Name', 'Age']
df

Unnamed: 0,First Name,Age
0,John,23
1,Jane,29
2,Sue,21
3,Fred,18


***

### Exercise

The DataFrame `df` contains data about movies from IMDb.<br>
<br>
We want to present this data to some film producers. Right now, our column names are in lower case, and are not very descriptive. Let us modify `df` using the `.columns` attribute to make the following changes to the columns:
<table width=20%>
    <tr align="left">
        <th>Old</th>
        <th>New</th>
    </tr>
    <tr>
        <td>id</td>
        <td>ID</td>
    </tr>
    <tr>
        <td>name</td>
        <td>Title</td>
    </tr>
    <tr>
        <td>genre</td>
        <td>Category</td>
    </tr>
    <tr>
        <td>year</td>
        <td>Year Released</td>
    </tr>
    <tr>
        <td>imdb_rating</td>
        <td>Rating</td>
    </tr>
</table>

In [46]:
df = pd.read_csv('imdb.csv')

df.columns = ['ID', 'Title', 'Category', 'Year Released', 'Rating']

df.head()

Unnamed: 0,ID,Title,Category,Year Released,Rating
0,1,Avatar,action,2009,7.9
1,2,Jurassic World,action,2015,7.3
2,3,The Avengers,action,2012,8.1
3,4,The Dark Knight,action,2008,9.0
4,5,Star Wars: Episode I - The Phantom Menace,action,1999,6.6


## Renaming Columns II

You also can rename individual columns by using the `.rename` method. Pass a dictionary like the one below to the `columns` keyword argument:

In [47]:
name_dict = {'old_column_name1': 'new_column_name1', 'old_column_name2': 'new_column_name2'}

Here's an example:

In [48]:
df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred'],
    'age': [23, 29, 21, 18]
})
df.rename(columns={
    'name': 'First Name',
    'age': 'Age'},
    inplace=True)

The code above will rename `name` to `First Name` and `age` to `Age`.<br>
<br>
Using `rename` with only the `columns` keyword will create a <b>new</b> DataFrame, leaving your original DataFrame unchanged. That's why we also passed in the keyword argument <b>`inplace=True`</b>. Using `inplace=True` lets us edit the original DataFrame.<br>
<br>
There are several reasons why `.rename` is preferable to `.columns`:
* You can rename just one column
* You can be specific about which column names are getting changed (with `.column` you can accidentally switch column names if you are not careful)

<i>*Note:</i> *If you misspell one of the original column names, this command will not fail. It just will not change anything.

***

### Exercise
If we did not know that `df` was a table of movie ratings, the column name might be confusing.<br>
<br>
To clarify, let us rename `name` to `movie_title`.<br>
<br>
Use the keyword `inplace=True` so that you modify `df` rather than creating a new DataFrame!

In [49]:
df = pd.read_csv('imdb.csv')

df.head()

Unnamed: 0,id,name,genre,year,imdb_rating
0,1,Avatar,action,2009,7.9
1,2,Jurassic World,action,2015,7.3
2,3,The Avengers,action,2012,8.1
3,4,The Dark Knight,action,2008,9.0
4,5,Star Wars: Episode I - The Phantom Menace,action,1999,6.6


In [50]:
df.rename(columns={
    'name': 'movie_title'},
    inplace=True)

df.head()

Unnamed: 0,id,movie_title,genre,year,imdb_rating
0,1,Avatar,action,2009,7.9
1,2,Jurassic World,action,2015,7.3
2,3,The Avengers,action,2012,8.1
3,4,The Dark Knight,action,2008,9.0
4,5,Star Wars: Episode I - The Phantom Menace,action,1999,6.6


***

## Review

Great job! In this lesson, you learned how to modify an existing DataFrame. Some of the skills you have learned include:
* Adding columns to a DataFrame
* Using lambda functions to calculate complex quantities
* Renaming columns

Let’s practice what you just learned!


***

Once more, you willl be the data analyst for ShoeFly.com, a fictional online shoe store.<br>
<br>
More messy order data has been loaded into the variable `orders`. Examine the first 5 rows of the data using `print` and `head`.

In [51]:
orders = pd.read_csv('shoefly.csv')
orders.head(5)

Unnamed: 0,id,first_name,last_name,gender,email,shoe_type,shoe_material,shoe_color
0,54791,Rebecca,Lindsay,female,RebeccaLindsay57@hotmail.com,clogs,faux-leather,black
1,53450,Emily,Joyce,female,EmilyJoyce25@gmail.com,ballet flats,faux-leather,navy
2,91987,Joyce,Waller,female,Joyce.Waller@gmail.com,sandles,fabric,black
3,14437,Justin,Erickson,male,Justin.Erickson@outlook.com,clogs,faux-leather,red
4,79357,Andrew,Banks,male,AB4318@gmail.com,boots,leather,brown


Many of our customers want to buy vegan shoes (shoes made from materials that <i>do not</i> come from animals). Add a new column called `shoe_source`, which is `vegan` if the materials is not `leather` and `animal` otherwise.

In [52]:
orders['shoe_source'] = orders.shoe_material.apply(lambda x:
                        'animal' if x == 'leather' else 'vegan'
)
orders.head(5)

Unnamed: 0,id,first_name,last_name,gender,email,shoe_type,shoe_material,shoe_color,shoe_source
0,54791,Rebecca,Lindsay,female,RebeccaLindsay57@hotmail.com,clogs,faux-leather,black,vegan
1,53450,Emily,Joyce,female,EmilyJoyce25@gmail.com,ballet flats,faux-leather,navy,vegan
2,91987,Joyce,Waller,female,Joyce.Waller@gmail.com,sandles,fabric,black,vegan
3,14437,Justin,Erickson,male,Justin.Erickson@outlook.com,clogs,faux-leather,red,vegan
4,79357,Andrew,Banks,male,AB4318@gmail.com,boots,leather,brown,animal


Our marketing department wants to send out an email to each customer. Using the columns `last_name` and `gender` create a column called `salutation` which contains `Dear Mr. <last_name>` for `men` and `Dear Ms. <last_name>` for `women`.

In [53]:
orders['salutation'] = orders.apply(lambda row:
    f'Dear Mr. {row.last_name}' if row.gender == 'male' else f'Dear Mrs. {row.last_name}',
    axis=1
)
orders.head(5)

Unnamed: 0,id,first_name,last_name,gender,email,shoe_type,shoe_material,shoe_color,shoe_source,salutation
0,54791,Rebecca,Lindsay,female,RebeccaLindsay57@hotmail.com,clogs,faux-leather,black,vegan,Dear Mrs. Lindsay
1,53450,Emily,Joyce,female,EmilyJoyce25@gmail.com,ballet flats,faux-leather,navy,vegan,Dear Mrs. Joyce
2,91987,Joyce,Waller,female,Joyce.Waller@gmail.com,sandles,fabric,black,vegan,Dear Mrs. Waller
3,14437,Justin,Erickson,male,Justin.Erickson@outlook.com,clogs,faux-leather,red,vegan,Dear Mr. Erickson
4,79357,Andrew,Banks,male,AB4318@gmail.com,boots,leather,brown,animal,Dear Mr. Banks
