# Modifying DataFrames

In [1]:
import pandas as pd

df_tools = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)
df_tools

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price
0,1,3 inch screw,0.5,0.75
1,2,2 inch nail,0.1,0.25
2,3,hammer,3.0,5.5
3,4,screwdriver,2.5,3.0


### Add a column to a DataFrame

1. assign a list of the SAME length to an existing `DataFrame`

In [2]:
df_tools['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']
df_tools

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Sold in Bulk?
0,1,3 inch screw,0.5,0.75,Yes
1,2,2 inch nail,0.1,0.25,Yes
2,3,hammer,3.0,5.5,No
3,4,screwdriver,2.5,3.0,No


2. add a new column where the value of **ALL** fields is the same for all rows.

In [3]:
df_tools['Is taxed?'] = 'Yes'
df_tools

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Sold in Bulk?,Is taxed?
0,1,3 inch screw,0.5,0.75,Yes,Yes
1,2,2 inch nail,0.1,0.25,Yes,Yes
2,3,hammer,3.0,5.5,No,Yes
3,4,screwdriver,2.5,3.0,No,Yes


3. add a column by performing a function on the existing columns, e.g. add a column based on the sales tax to be charged for each item

```py
df['Sales Tax'] = df.Price * 0.075
```

In [4]:
df_tools['Revenue'] = df_tools['Price'] - df_tools['Cost to Manufacture']
df_tools

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Sold in Bulk?,Is taxed?,Revenue
0,1,3 inch screw,0.5,0.75,Yes,Yes,0.25
1,2,2 inch nail,0.1,0.25,Yes,Yes,0.15
2,3,hammer,3.0,5.5,No,Yes,2.5
3,4,screwdriver,2.5,3.0,No,Yes,0.5


We can also use 'dot' notaton where column names follow variable naming conventoins, e.g.

```py
orders['remaining_inventory'] = orders.initial_inventory - orders.number_sold
```

### Performing column operations

We can use the `apply()` function to apply a function to every field in a particular column, e.g. make all the desciptions uppercase. We can use this technique to replace the values in an existing column, or add a new column.

In [6]:
df_tools['Description'] = df_tools.Description.apply(str.upper)
df_tools

Unnamed: 0,Product ID,Description,Cost to Manufacture,Price,Sold in Bulk?,Is taxed?,Revenue
0,1,3 INCH SCREW,0.5,0.75,Yes,Yes,0.25
1,2,2 INCH NAIL,0.1,0.25,Yes,Yes,0.15
2,3,HAMMER,3.0,5.5,No,Yes,2.5
3,4,SCREWDRIVER,2.5,3.0,No,Yes,0.5


We can pass a `Lambda` function to `apply` when performing column operations, e.g. retrieve the email provider from each users email address. 

The action of the lambda function will be on each column field in turn, `axis=0` is the default. 

```py
df['Email Provider'] = df.Email.apply(lambda x: x.split('@')[-1])
```

In [7]:
headers = ['id', 'name', 'hourly_wage', 'hours_worked', 'last_name']

users = [
    [10310, 'Lauren Durham', 19, 43,'Durham'],
    [18656, 'Grace Sellers', 17, 40, 'Sellers'],
    [61254, 'Shirley Rasmussen', 16, 30, 'Rasmussen'],
    [16886, 'Brian Rojas', 18, 47, 'Rojas'],
    [89010, 'Samantha Mosley', 11, 38, 'Mosley'],
    [87246, 'Louis Guzman', 14, 39, 'Guzman']
]
df_workers = pd.DataFrame(users, columns=headers)
df_workers

Unnamed: 0,id,name,hourly_wage,hours_worked,last_name
0,10310,Lauren Durham,19,43,Durham
1,18656,Grace Sellers,17,40,Sellers
2,61254,Shirley Rasmussen,16,30,Rasmussen
3,16886,Brian Rojas,18,47,Rojas
4,89010,Samantha Mosley,11,38,Mosley
5,87246,Louis Guzman,14,39,Guzman


In [8]:
# get_last_name = lambda x: x.split(' ')[-1]
get_first_name = lambda x: x.split(' ')[0]
# create new col with results
df_workers['first_name'] = df_workers.name.apply(get_first_name)
df_workers

Unnamed: 0,id,name,hourly_wage,hours_worked,last_name,first_name
0,10310,Lauren Durham,19,43,Durham,Lauren
1,18656,Grace Sellers,17,40,Sellers,Grace
2,61254,Shirley Rasmussen,16,30,Rasmussen,Shirley
3,16886,Brian Rojas,18,47,Rojas,Brian
4,89010,Samantha Mosley,11,38,Mosley,Samantha
5,87246,Louis Guzman,14,39,Guzman,Louis


### Performing operations on a row

To perform operations on multiple columns at once we operate on the entire row by passing the `axis=1` argument to `apply()`. The input to the lambda function will be the entire row instead of an individual column field. We can then access individual fields within our lambda.

To access particular column values in a row, use the syntax `row.column_name` or `row['column_name']`

```py
Item	Price	Is taxed?
Apple	1.00	No
Milk	4.20	No
Paper Towels	5.00	Yes
Light Bulbs	3.75	Yes
```

We want to add a new column that includes the tax where required.

If Is taxed? is Yes, then we’ll want to multiply Price by 1.075 (for 7.5% sales tax).

If Is taxed? is No, we’ll just have Price without multiplying it.

We can create this column using a lambda function and the keyword axis=1:

```py
df['Price with Tax'] = df.apply(lambda row:
     row['Price'] * 1.075
     if row['Is taxed?'] == 'Yes'
     else row['Price'],
     axis=1
)
```

#### Continuing the Workers example

If an employee worked for more than 40 hours, she needs to be paid overtime (1.5 times the normal hourly wage).

For instance, if an employee worked for 43 hours and made $10 per hour, they would receive $400 for the first 40 hours, and an additional $45 for the 3 hours of overtime, for a total for $445.

Create a lambda function total_earned that accepts an input row with keys hours_worked and hourly_wage and uses an if statement to calculate the hourly wage.

Using a regular function:

In [11]:
def total_earned(row):
    if row['hours_worked'] <= 40:
        return row['hours_worked'] * \
            row['hourly_wage']
    else:
        return (40 * row['hourly_wage'])\
            + (row['hours_worked'] - 40) * \
            (row['hourly_wage'] * 1.50)
    
df_workers['total_earned_fn'] = df_workers.apply(total_earned, axis=1)
df_workers

Unnamed: 0,id,name,hourly_wage,hours_worked,last_name,first_name,total_earned_fn
0,10310,Lauren Durham,19,43,Durham,Lauren,845.5
1,18656,Grace Sellers,17,40,Sellers,Grace,680.0
2,61254,Shirley Rasmussen,16,30,Rasmussen,Shirley,480.0
3,16886,Brian Rojas,18,47,Rojas,Brian,909.0
4,89010,Samantha Mosley,11,38,Mosley,Samantha,418.0
5,87246,Louis Guzman,14,39,Guzman,Louis,546.0


Using a `Lambda` function:

In [13]:
total_earned = lambda row: row['hours_worked'] * row['hourly_wage'] if row['hours_worked'] <= 40 else (40 * row['hourly_wage']) + ((row['hours_worked'] - 40) * row['hourly_wage'] * 1.5)

df_workers['total_earned_lm'] = df_workers.apply(total_earned, axis=1)
df_workers

Unnamed: 0,id,name,hourly_wage,hours_worked,last_name,first_name,total_earned_fn,total_earned_lm
0,10310,Lauren Durham,19,43,Durham,Lauren,845.5,845.5
1,18656,Grace Sellers,17,40,Sellers,Grace,680.0,680.0
2,61254,Shirley Rasmussen,16,30,Rasmussen,Shirley,480.0,480.0
3,16886,Brian Rojas,18,47,Rojas,Brian,909.0,909.0
4,89010,Samantha Mosley,11,38,Mosley,Samantha,418.0,418.0
5,87246,Louis Guzman,14,39,Guzman,Louis,546.0,546.0


### Renamimg columns

Common practice to rename columns, especially when your creating pandas from imported data and you want column names that are more descriptive or want to follow variable naming conventions so you can use `df.column_name` as apposed to `df['column name']`.

1. You can rename all of columns at once by setting the `.columns` property to a different list. Make sure the list order matches the columns.

```py
df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred'],
    'age': [23, 29, 21, 18]
})
df.columns = ['First Name', 'Age']
```

2. Rename individual columns using `.rename()` method. The method takes a dictionary of key(old name)/value(new name) pairs. You can edit one or more columns.

```py
df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred'],
    'age': [23, 29, 21, 18]
})

df.rename(columns={
    'name': 'First Name',
    'age': 'Age'},
    inplace=True)
```

Note:

Using rename with only the columns keyword will create a new DataFrame, leaving your original DataFrame unchanged. Adding `inplace=True` edits the original DataFrame.

#### Example

Table has columns id, first_name, last_name, email, shoe_type, shoe_material, shoe_color

```py
# add a column for shoes made from materials that donot come from animals
not_from_animals = lambda x: 'vegan' if x != 'leather' else 'animal'
orders['shoe_source'] = orders.shoe_material.apply(not_from_animals)
print(orders)
```

```py
# create the proper greeting based on the users gender
salutation = lambda row: 'Dear Mr. ' + row['last_name'] if row['gender'] == 'male' else 'Dear Ms. ' + row['last_name']
orders['salutation'] = orders.apply(salutation, axis=1)
```

#### Example

```py

location	product_type	product_description	quantity	price
0	Staten Island	seeds	daisy	4	6.99
1	Staten Island	seeds	calla lily	46	19.99
2	Staten Island	seeds	tomato	85	13.99
3	Staten Island	garden tools	rake	4	13.99
4	Staten Island	garden tools	wheelbarrow	0	89.99

import pandas as pd
inventory = pd.read_csv('inventory.csv')
inventory.head(10)

staten_island = inventory.iloc[:10]
product_request = staten_island.product_description

# select all rows where location == 'Brooklyn' & product_type == 'seeds'
seed_request = inventory[(inventory.location == 'Brooklyn') & (inventory.product_type == 'seeds')]

# Add a column to inventory called in_stock which is True if quantity > 0 and False if quantity == 0
check_stock =  lambda x: True if x > 0 else False
inventory['in_stock'] = inventory.quantity.apply(check_stock)
print(inventory)

# add a column called total_value that is equal to price multiplied by quantity
total = lambda row: row.price * row.quantity if row.in_stock else 0
inventory['total_value'] = inventory.apply(total, axis=1)
print(inventory)

# generate a product description
combine_lambda = lambda row: '{} - {}'.format(row.product_type, row.product_description)
inventory['full_description'] = inventory.apply(combine_lambda, axis=1)
print(inventory)
```