Remove Row and columns

Sample Data (Pandas DataFrame)


In [1]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

df = pd.DataFrame(data)
print(df)

   ID     Name  Age  Salary Department
0   1    Alice   25   50000         HR
1   2      Bob   30   60000         IT
2   3  Charlie   35   70000         IT
3   4    David   40   80000    Finance
4   5      Eve   45   90000         HR


1. Remove Rows Where Age is Greater Than 35
   
❇️ Understanding df[df['Age'] <= 35]
❇️ df['Age'] <= 35 creates a Boolean condition:
❇️ If the age is less than or equal to 35, it returns True.
❇️ If the age is greater than 35, it returns False.

In [2]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

# df = pd.DataFrame(data)
# print(df)
df_filtered = df[df['Age'] <= 35]
print(df_filtered)

   ID     Name  Age  Salary Department
0   1    Alice   25   50000         HR
1   2      Bob   30   60000         IT
2   3  Charlie   35   70000         IT


Why This Works:

❇️ The condition df['Age'] <= 35 acts as a filter.
❇️ The DataFrame keeps only the rows where the condition is True.
❇️ This is called Boolean Indexing in Pandas.

2. Remove a Column (e.g., Salary)

❇️ Understanding df.drop(columns=['Salary'])
❇️ df.drop(columns=['Salary']) tells Pandas to remove the column named Salary.
❇️ The columns= argument specifies that we're dropping a column (not a row).
❇️ The original DataFrame remains unchanged unless we assign the result to a new variable.

In [3]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

# df = pd.DataFrame(data)
# print(df)

df_dropped_col = df.drop(columns=['Salary'])
print(df_dropped_col)

   ID     Name  Age Department
0   1    Alice   25         HR
1   2      Bob   30         IT
2   3  Charlie   35         IT
3   4    David   40    Finance
4   5      Eve   45         HR


Why This Works:

❇️ drop(columns=['Salary']) tells Pandas which column to remove.
❇️ It does not modify the original DataFrame unless we assign it back (df = df.drop(columns=['Salary'])).
❇️ This is useful when we want to remove unnecessary data before analysis.

3. Add a New Column Based on a Condition (e.g., Senior or Junior)

Step-by-Step Breakdown:

❇️ Understanding df['Age'].apply(lambda x: 'Senior' if x > 30 else 'Junior')
❇️ The .apply() function applies a transformation to each value in the Age column.
❇️ The lambda x: function checks:
❇️ If x (Age) is greater than 30, assign 'Senior'.
❇️ Otherwise, assign 'Junior'.

In [4]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

# df = pd.DataFrame(data)
# print(df)

df['Seniority'] = df['Age'].apply(lambda x: 'Senior' if x > 30 else 'Junior')
print(df)

   ID     Name  Age  Salary Department Seniority
0   1    Alice   25   50000         HR    Junior
1   2      Bob   30   60000         IT    Junior
2   3  Charlie   35   70000         IT    Senior
3   4    David   40   80000    Finance    Senior
4   5      Eve   45   90000         HR    Senior


Why This Works

❇️ .apply() modifies each row of a column.
❇️ lambda x: provides a quick inline condition without needing a separate function.

4. Map Values in a Column (e.g., Rename Departments)

❇️ department_mapping = {...}
❇️ Defines a dictionary to map old values to new values.
❇️ df['Department'] selects the Department column.
❇️ .map(department_mapping)
❇️ Looks up each value in the dictionary.
❇️ If the value exists in department_mapping, it replaces it.
❇️ Else, it remains unchanged.

In [5]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

# df = pd.DataFrame(data)
# print(df)

department_mapping = {'HR': 'Human Resources', 'IT': 'Information Tech', 'Finance': 'Accounting'}
df['Department'] = df['Department'].map(department_mapping)
print(df)

   ID     Name  Age  Salary        Department Seniority
0   1    Alice   25   50000   Human Resources    Junior
1   2      Bob   30   60000  Information Tech    Junior
2   3  Charlie   35   70000  Information Tech    Senior
3   4    David   40   80000        Accounting    Senior
4   5      Eve   45   90000   Human Resources    Senior


Why This Works

❇️ .map() is ideal for replacing values in a column based on a dictionary.

5. Change Column Names

Step-by-Step Breakdown:

❇️ .rename(columns={...})
❇️ The columns={} argument specifies which columns to rename.
❇️ Dictionary keys are the old names ('ID', 'Name').
❇️ Dictionary values are the new names ('Employee_ID', 'Employee_Name').
❇️ The new DataFrame is stored in df_renamed.

In [6]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

# df = pd.DataFrame(data)
# print(df)

df_renamed = df.rename(columns={'ID': 'Employee_ID', 'Name': 'Employee_Name'})
print(df_renamed)

   Employee_ID Employee_Name  Age  Salary        Department Seniority
0            1         Alice   25   50000   Human Resources    Junior
1            2           Bob   30   60000  Information Tech    Junior
2            3       Charlie   35   70000  Information Tech    Senior
3            4         David   40   80000        Accounting    Senior
4            5           Eve   45   90000   Human Resources    Senior


Why This Works

❇️ .rename() is a built-in Pandas method designed for changing column names.
❇️ It allows renaming multiple columns at once.

6.Creating a Tax Column (Assuming 20% Tax Rate)

❇️ We'll calculate an estimated tax amount based on salary.
❇️ Step-by-Step Breakdown:
❇️ df['Salary'] selects the Salary column.
❇️ * 0.20 multiplies each salary by 0.20 (20%).
❇️ The result is stored in a new column "Tax".

In [7]:
import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}

# df = pd.DataFrame(data)
# print(df)

df['Tax'] = df['Salary'] * 0.20
print(df_renamed)

   Employee_ID Employee_Name  Age  Salary        Department Seniority
0            1         Alice   25   50000   Human Resources    Junior
1            2           Bob   30   60000  Information Tech    Junior
2            3       Charlie   35   70000  Information Tech    Senior
3            4         David   40   80000        Accounting    Senior
4            5           Eve   45   90000   Human Resources    Senior


Why This Works

❇️ Pandas supports vectorized operations, meaning df['Salary'] * 0.20 automatically applies the calculation to each row.
❇️ This is faster than using .apply() with a function.