# Data Manipulation in Pandas

This notebook covers data manipulation operations in Pandas, including filtering, sorting, transforming, and modifying data.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

## Sample Data

Let's create a sample DataFrame for demonstrating data manipulation operations.

In [None]:
# Create sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'Salary': [50000, 60000, 70000, 55000, 65000],
    'Department': ['HR', 'IT', 'Finance', 'IT', 'HR']
}

df = pd.DataFrame(data)
print("Sample DataFrame:")
print(df)

## Sorting Data

Pandas provides `sort_values()` and `sort_index()` methods for sorting data.

In [None]:
# Sort by single column
print("Sort by Age (ascending):")
print(df.sort_values('Age'))

# Sort by single column (descending)
print("\nSort by Salary (descending):")
print(df.sort_values('Salary', ascending=False))

# Sort by multiple columns
print("\nSort by Department then by Age:")
print(df.sort_values(['Department', 'Age']))

# Sort by index
df_indexed = df.set_index('Name')
print("\nDataFrame with Name as index:")
print(df_indexed)

print("\nSort by index:")
print(df_indexed.sort_index())

## Filtering Data

Filtering allows you to select specific rows based on conditions.

In [None]:
# Filter by single condition
print("Employees older than 30:")
print(df[df['Age'] > 30])

# Filter by multiple conditions
print("\nIT employees with salary > 55000:")
print(df[(df['Department'] == 'IT') & (df['Salary'] > 55000)])

# Filter using isin()
print("\nEmployees in HR or Finance:")
print(df[df['Department'].isin(['HR', 'Finance'])])

# Filter using string methods
print("\nNames starting with 'A' or 'B':")
print(df[df['Name'].str.startswith(('A', 'B'))])

# Filter using query() method
print("\nUsing query method - Age > 28 and Department == 'IT':")
print(df.query('Age > 28 and Department == "IT"'))

## Transforming Data

Data transformation involves modifying values using functions like `apply()`, `map()`, and `applymap()`.

In [None]:
# Apply function to a column
print("Convert names to uppercase:")
df_transformed = df.copy()
df_transformed['Name'] = df_transformed['Name'].apply(str.upper)
print(df_transformed)

# Apply function to multiple columns
print("\nAdd 1000 bonus to salaries:")
df_transformed['Salary'] = df_transformed['Salary'].apply(lambda x: x + 1000)
print(df_transformed[['Name', 'Salary']])

# Map values
print("\nMap departments to codes:")
dept_mapping = {'HR': 'H', 'IT': 'I', 'Finance': 'F'}
df_transformed['Dept_Code'] = df_transformed['Department'].map(dept_mapping)
print(df_transformed[['Name', 'Department', 'Dept_Code']])

# Apply function to entire DataFrame
print("\nApply function to entire DataFrame (string columns to lowercase):")
df_lower = df.applymap(lambda x: x.lower() if isinstance(x, str) else x)
print(df_lower)

## Summary

You have learned key data manipulation operations in Pandas:

- **Sorting**: Using `sort_values()` and `sort_index()` to order data
- **Filtering**: Selecting rows based on conditions using boolean indexing and `query()`
- **Transforming**: Modifying data using `apply()`, `map()`, and `applymap()` functions

These operations allow you to reshape, filter, and transform your data for analysis.