# Data Manipulation in Pandas

This notebook covers data manipulation operations in Pandas, including filtering, sorting, transforming, and modifying data.

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Pandas version: 2.2.3
NumPy version: 2.2.4


## Sample Data

Let's create a sample DataFrame for demonstrating data manipulation operations.

In [3]:
# Create sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'Salary': [50000, 60000, 70000, 55000, 65000],
    'Department': ['HR', 'IT', 'Finance', 'IT', 'HR']
}

df = pd.DataFrame(data)
print("Sample DataFrame:")
print(df)

Sample DataFrame:
      Name  Age      City  Salary Department
0    Alice   25  New York   50000         HR
1      Bob   30    London   60000         IT
2  Charlie   35     Paris   70000    Finance
3    Diana   28     Tokyo   55000         IT
4      Eve   32    Sydney   65000         HR


## Sorting Data

Pandas provides `sort_values()` and `sort_index()` methods for sorting data.

In [4]:
# Sort by single column
print("Sort by Age (ascending):")
print(df.sort_values('Age'))

# Sort by single column (descending)
print("\nSort by Salary (descending):")
print(df.sort_values('Salary', ascending=False))

# Sort by multiple columns
print("\nSort by Department then by Age:")
print(df.sort_values(['Department', 'Age']))

# Sort by index
df_indexed = df.set_index('Name')
print("\nDataFrame with Name as index:")
print(df_indexed)

print("\nSort by index:")
print(df_indexed.sort_index())

Sort by Age (ascending):
      Name  Age      City  Salary Department
0    Alice   25  New York   50000         HR
3    Diana   28     Tokyo   55000         IT
1      Bob   30    London   60000         IT
4      Eve   32    Sydney   65000         HR
2  Charlie   35     Paris   70000    Finance

Sort by Salary (descending):
      Name  Age      City  Salary Department
2  Charlie   35     Paris   70000    Finance
4      Eve   32    Sydney   65000         HR
1      Bob   30    London   60000         IT
3    Diana   28     Tokyo   55000         IT
0    Alice   25  New York   50000         HR

Sort by Department then by Age:
      Name  Age      City  Salary Department
2  Charlie   35     Paris   70000    Finance
0    Alice   25  New York   50000         HR
4      Eve   32    Sydney   65000         HR
3    Diana   28     Tokyo   55000         IT
1      Bob   30    London   60000         IT

DataFrame with Name as index:
         Age      City  Salary Department
Name                         

## Filtering Data

Filtering allows you to select specific rows based on conditions.

In [5]:
# Filter by single condition
print("Employees older than 30:")
print(df[df['Age'] > 30])

# Filter by multiple conditions
print("\nIT employees with salary > 55000:")
print(df[(df['Department'] == 'IT') & (df['Salary'] > 55000)])

# Filter using isin()
print("\nEmployees in HR or Finance:")
print(df[df['Department'].isin(['HR', 'Finance'])])

# Filter using string methods
print("\nNames starting with 'A' or 'B':")
print(df[df['Name'].str.startswith(('A', 'B'))])

# Filter using query() method
print("\nUsing query method - Age > 28 and Department == 'IT':")
print(df.query('Age > 28 and Department == "IT"'))

Employees older than 30:
      Name  Age    City  Salary Department
2  Charlie   35   Paris   70000    Finance
4      Eve   32  Sydney   65000         HR

IT employees with salary > 55000:
  Name  Age    City  Salary Department
1  Bob   30  London   60000         IT

Employees in HR or Finance:
      Name  Age      City  Salary Department
0    Alice   25  New York   50000         HR
2  Charlie   35     Paris   70000    Finance
4      Eve   32    Sydney   65000         HR

Names starting with 'A' or 'B':
    Name  Age      City  Salary Department
0  Alice   25  New York   50000         HR
1    Bob   30    London   60000         IT

Using query method - Age > 28 and Department == 'IT':
  Name  Age    City  Salary Department
1  Bob   30  London   60000         IT


## Transforming Data

Data transformation involves modifying values using functions like `apply()`, `map()`, and `applymap()`.

In [7]:
# Apply function to a column
print("Convert names to uppercase:")
df_transformed = df.copy()
df_transformed['Name'] = df_transformed['Name'].apply(str.upper)
print(df_transformed)

# Apply function to multiple columns
print("\nAdd 1000 bonus to salaries:")
df_transformed['Salary'] = df_transformed['Salary'].apply(lambda x: x + 1000)
print(df_transformed[['Name', 'Salary']])

# Map values
print("\nMap departments to codes:")
dept_mapping = {'HR': 'H', 'IT': 'I', 'Finance': 'F'}
df_transformed['Dept_Code'] = df_transformed['Department'].map(dept_mapping)
print(df_transformed[['Name', 'Department', 'Dept_Code']])

# Apply function to entire DataFrame
print("\nApply function to entire DataFrame (string columns to lowercase):")
df_lower = df.map(lambda x: x.lower() if isinstance(x, str) else x)
print(df_lower)

Convert names to uppercase:
      Name  Age      City  Salary Department
0    ALICE   25  New York   50000         HR
1      BOB   30    London   60000         IT
2  CHARLIE   35     Paris   70000    Finance
3    DIANA   28     Tokyo   55000         IT
4      EVE   32    Sydney   65000         HR

Add 1000 bonus to salaries:
      Name  Salary
0    ALICE   51000
1      BOB   61000
2  CHARLIE   71000
3    DIANA   56000
4      EVE   66000

Map departments to codes:
      Name Department Dept_Code
0    ALICE         HR         H
1      BOB         IT         I
2  CHARLIE    Finance         F
3    DIANA         IT         I
4      EVE         HR         H

Apply function to entire DataFrame (string columns to lowercase):
      Name  Age      City  Salary Department
0    alice   25  new york   50000         hr
1      bob   30    london   60000         it
2  charlie   35     paris   70000    finance
3    diana   28     tokyo   55000         it
4      eve   32    sydney   65000         hr


## Summary

You have learned key data manipulation operations in Pandas:

- **Sorting**: Using `sort_values()` and `sort_index()` to order data
- **Filtering**: Selecting rows based on conditions using boolean indexing and `query()`
- **Transforming**: Modifying data using `apply()`, `map()`, and `applymap()` functions

These operations allow you to reshape, filter, and transform your data for analysis.