# Data Processing in Pandas - II

*This is second part of a two part series : [Part 1](https://www.kaggle.com/ctxplorer/data-processing-in-pandas-i), Part 2*

#### Content:
1. Adding a column
2. Performing column operations
3. Renaming column

In [1]:
import pandas as pd

In [2]:
# Data
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)
print(df)

   Product ID   Description  Cost to Manufacture  Price
0           1  3 inch screw                  0.5   0.75
1           2   2 inch nail                  0.1   0.25
2           3        hammer                  3.0   5.50
3           4   screwdriver                  2.5   3.00


## 1. Adding a column

#### By assigning list of values to new column

In [3]:
df['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']
print(df)

   Product ID   Description      ...        Price  Sold in Bulk?
0           1  3 inch screw      ...         0.75            Yes
1           2   2 inch nail      ...         0.25            Yes
2           3        hammer      ...         5.50             No
3           4   screwdriver      ...         3.00             No

[4 rows x 5 columns]


#### By assigning same value for each rows

In [4]:
df['Is taxed?'] = 'Yes'
print(df)

   Product ID   Description    ...      Sold in Bulk?  Is taxed?
0           1  3 inch screw    ...                Yes        Yes
1           2   2 inch nail    ...                Yes        Yes
2           3        hammer    ...                 No        Yes
3           4   screwdriver    ...                 No        Yes

[4 rows x 6 columns]


#### By performing a function on existing columns

In [5]:
df['Revenue'] = df['Price'] - df['Cost to Manufacture']
print(df)

   Product ID   Description   ...     Is taxed?  Revenue
0           1  3 inch screw   ...           Yes     0.25
1           2   2 inch nail   ...           Yes     0.15
2           3        hammer   ...           Yes     2.50
3           4   screwdriver   ...           Yes     0.50

[4 rows x 7 columns]


## 2. Performing column operations

In [6]:
df = pd.DataFrame([
  ['JOHN SMITH', 'john.smith@gmail.com'],
  ['Jane Doe', 'jdoe@yahoo.com'],
  ['joe schmo', 'joeschmo@hotmail.com']
],
columns=['Name', 'Email'])
print(df)

         Name                 Email
0  JOHN SMITH  john.smith@gmail.com
1    Jane Doe        jdoe@yahoo.com
2   joe schmo  joeschmo@hotmail.com


#### By applying a lambda to a column

In [7]:
df['Email Provider'] = df.Email.apply(lambda x : x.split('@')[-1])
print(df)

         Name                 Email Email Provider
0  JOHN SMITH  john.smith@gmail.com      gmail.com
1    Jane Doe        jdoe@yahoo.com      yahoo.com
2   joe schmo  joeschmo@hotmail.com    hotmail.com


#### By applying a lambda to a row

In [8]:
df['Message'] = df.apply(
    lambda row: row.Name + ' uses gmail'
            if row['Email Provider'] == 'gmail.com'
            else row.Name + ' uses ' + row['Email Provider'],
    axis = 1
)
print(df)

         Name             ...                                 Message
0  JOHN SMITH             ...                   JOHN SMITH uses gmail
1    Jane Doe             ...                 Jane Doe uses yahoo.com
2   joe schmo             ...              joe schmo uses hotmail.com

[3 rows x 4 columns]


## 3. Renaming column

#### Using **.columns**

In [9]:
df.columns = ['Full Name', 'Email Address', 'Email Provider', 'Message']
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
Full Name         3 non-null object
Email Address     3 non-null object
Email Provider    3 non-null object
Message           3 non-null object
dtypes: object(4)
memory usage: 176.0+ bytes
None


#### Using **.rename**

In [10]:
df.rename(columns={
    'Full Name': 'Full_Name',
    'Email Address': 'Email_Address',
    'Email Provider': 'Email_Provider'
}, inplace=True)
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
Full_Name         3 non-null object
Email_Address     3 non-null object
Email_Provider    3 non-null object
Message           3 non-null object
dtypes: object(4)
memory usage: 176.0+ bytes
None


### That is all for now. Hope it helped you!