# Modifying DataFrames

### Add a column to a DataFrame

1. assign a list of the SAME length to an existing `DataFrame`

```py
# setting the column name to `Quantity`
df['Quantity'] = [100, 150, 50, 35]
```

In [1]:
import pandas as pd

df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)
print(df)

   Product ID   Description  Cost to Manufacture  Price
0           1  3 inch screw                  0.5   0.75
1           2   2 inch nail                  0.1   0.25
2           3        hammer                  3.0   5.50
3           4   screwdriver                  2.5   3.00


In [2]:
df['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?
0           1  3 inch screw                  0.5   0.75           Yes
1           2   2 inch nail                  0.1   0.25           Yes
2           3        hammer                  3.0   5.50            No
3           4   screwdriver                  2.5   3.00            No


2. add a new column that is the same for all rows in the DataFrame.

```py
df['In Stock?'] = True
```

In [3]:
df['Is taxed?'] = 'Yes'
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?  \
0           1  3 inch screw                  0.5   0.75           Yes   
1           2   2 inch nail                  0.1   0.25           Yes   
2           3        hammer                  3.0   5.50            No   
3           4   screwdriver                  2.5   3.00            No   

  Is taxed?  
0       Yes  
1       Yes  
2       Yes  
3       Yes  


3. add a column by performing a function on the existing columns, e.g. add a column based on the sales tax to be charged for each item

```py
df['Sales Tax'] = df.Price * 0.075
```

In [4]:
df['Revenue'] = df['Price'] - df['Cost to Manufacture']
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?  \
0           1  3 inch screw                  0.5   0.75           Yes   
1           2   2 inch nail                  0.1   0.25           Yes   
2           3        hammer                  3.0   5.50            No   
3           4   screwdriver                  2.5   3.00            No   

  Is taxed?  Revenue  
0       Yes     0.25  
1       Yes     0.15  
2       Yes     2.50  
3       Yes     0.50  


### Performing column operations

We can use the `apply()` function to apply a function to every field in a particular column, e.g. make all the desciptions uppercase

In [5]:
df['Description'] = df.Description.apply(str.upper)
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?  \
0           1  3 INCH SCREW                  0.5   0.75           Yes   
1           2   2 INCH NAIL                  0.1   0.25           Yes   
2           3        HAMMER                  3.0   5.50            No   
3           4   SCREWDRIVER                  2.5   3.00            No   

  Is taxed?  Revenue  
0       Yes     0.25  
1       Yes     0.15  
2       Yes     2.50  
3       Yes     0.50  


We can pass a `Lambda` function to `apply` when performing column operations, e.g. retrieve the email provider from each users email address

```py
df['Email Provider'] = df.Email.apply(lambda x: x.split('@')[-1])
```

#### Example

```py
        id	name	hourly_wage	hours_worked	last_name
0	10310	Lauren Durham	19	43	Durham
1	18656	Grace Sellers	17	40	Sellers
2	61254	Shirley Rasmussen	16	30	Rasmussen
3	16886	Brian Rojas	18	47	Rojas
4	89010	Samantha Mosley	11	38	Mosley
5	87246	Louis Guzman	14	39	Guzman

get_last_name = lambda x: x.split(' ')[-1]
df['last_name'] = df.name.apply(get_last_name)
print(df)
```


### Performing operations on a row

To perform operations on multiple columns at once we operate on the entire row by passing the `axis=1` argument to `apply()`. The input to the lambda function will be the entire row instead of an individual column field. We can then access individual fields within our lambda.

To access particular column values in a row, use the syntax `row.column_name` or `row['column_name']`

```py
Item	Price	Is taxed?
Apple	1.00	No
Milk	4.20	No
Paper Towels	5.00	Yes
Light Bulbs	3.75	Yes
```

We want to add a new column that includes the tax where required.

If Is taxed? is Yes, then we’ll want to multiply Price by 1.075 (for 7.5% sales tax).

If Is taxed? is No, we’ll just have Price without multiplying it.

We can create this column using a lambda function and the keyword axis=1:

```py
df['Price with Tax'] = df.apply(lambda row:
     row['Price'] * 1.075
     if row['Is taxed?'] == 'Yes'
     else row['Price'],
     axis=1
)
```

#### Example

If an employee worked for more than 40 hours, she needs to be paid overtime (1.5 times the normal hourly wage).

For instance, if an employee worked for 43 hours and made $10/hour, she would receive $400 for the first 40 hours that she worked, and an additional $45 for the 3 hours of overtime, for a total for $445.

Create a lambda function total_earned that accepts an input row with keys hours_worked and hourly_wage and uses an if statement to calculate the hourly wage.

```py
# if it were a regular function
def total_earned(row):
   if row['hours_worked'] <= 40:
       return row['hours_worked'] * \
           row['hourly_wage']
    else:
        return (40 * row['hourly_wage'])\
            + (row['hours_worked'] - 40) * \
            (row['hourly_wage'] * 1.50)
```

Solution:

```py
total_earned = lambda row: row['hours_worked'] * row['hourly_wage'] if row['hours_worked'] <= 40 else (40 * row['hourly_wage']) + ((row['hours_worked'] - 40) * row['hourly_wage'] * 1.5)

df['total_earned'] = df.apply(total_earned, axis=1)
```