# Data Transformation and Manipulation in Pandas

Data transformation and manipulation are essential steps in preparing data for analysis.  
In Pandas, you can easily reshape, modify, and compute new values from your datasets.

Common transformations include:
- **Renaming columns**: To make them more descriptive or consistent.
- **Sorting data**: By one or more columns, in ascending or descending order.
- **Creating new columns**: Based on existing data.
- **Applying functions**: With `apply()` or `map()` to transform values.
- **Replacing values**: Using `replace()` for corrections or standardization.
- **Changing data types**: With `astype()` to ensure correct formats.
- **Dropping columns or rows**: To remove unnecessary data.

These operations make your dataset more meaningful and easier to analyze.


In [1]:
import pandas as pd

In [2]:
# Sample dataset
data = {
    "Name": ["Hayley", "Taylor", "Claire", "Aurora", "Evangeline"],
    "Age": [25, 30, 35, 40, 45],
    "Salary": [50000, 60000, 70000, 80000, 90000],
    "Sector": ["Music", "Music", "IT", "Finance", "HR"]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

Original DataFrame:
          Name  Age  Salary   Sector
0      Hayley   25   50000    Music
1      Taylor   30   60000    Music
2      Claire   35   70000       IT
3      Aurora   40   80000  Finance
4  Evangeline   45   90000       HR


In [3]:
# 1. Rename columns
df_renamed = df.rename(columns={"Salary": "Annual Salary"})
print("\nRenamed Columns:\n", df_renamed)


Renamed Columns:
          Name  Age  Annual Salary   Sector
0      Hayley   25          50000    Music
1      Taylor   30          60000    Music
2      Claire   35          70000       IT
3      Aurora   40          80000  Finance
4  Evangeline   45          90000       HR


In [4]:
# 2. Sort by Age (descending)
df_sorted = df.sort_values(by="Age", ascending=False)
print("\nSorted by Age (Descending):\n", df_sorted)




Sorted by Age (Descending):
          Name  Age  Salary   Sector
4  Evangeline   45   90000       HR
3      Aurora   40   80000  Finance
2      Claire   35   70000       IT
1      Taylor   30   60000    Music
0      Hayley   25   50000    Music


In [5]:
# 3. Create a new column (Salary in thousands)
df["Salary (K)"] = df["Salary"] / 1000
print("\nNew Column 'Salary (K)':\n", df)




New Column 'Salary (K)':
          Name  Age  Salary   Sector  Salary (K)
0      Hayley   25   50000    Music        50.0
1      Taylor   30   60000    Music        60.0
2      Claire   35   70000       IT        70.0
3      Aurora   40   80000  Finance        80.0
4  Evangeline   45   90000       HR        90.0


In [6]:
# 4. Apply a function to transform a column
df["Age Category"] = df["Age"].apply(lambda x: "Young" if x < 35 else "Senior")
print("\nAdded 'Age Category' Column:\n", df)




Added 'Age Category' Column:
          Name  Age  Salary   Sector  Salary (K) Age Category
0      Hayley   25   50000    Music        50.0        Young
1      Taylor   30   60000    Music        60.0        Young
2      Claire   35   70000       IT        70.0       Senior
3      Aurora   40   80000  Finance        80.0       Senior
4  Evangeline   45   90000       HR        90.0       Senior


In [8]:
# 5. Replace department names
df["Sector"] = df["Sector"].replace({"HR": "Human Resources", "IT": "Information Technology"})
print("\nUpdated Sector Names:\n", df)




Updated Sector Names:
          Name  Age  Salary                  Sector  Salary (K) Age Category
0      Hayley   25   50000                   Music        50.0        Young
1      Taylor   30   60000                   Music        60.0        Young
2      Claire   35   70000  Information Technology        70.0       Senior
3      Aurora   40   80000                 Finance        80.0       Senior
4  Evangeline   45   90000         Human Resources        90.0       Senior


In [9]:
# 6. Change data type of Age to float
df["Age"] = df["Age"].astype(float)
print("\nChanged Age to float:\n", df)




Changed Age to float:
          Name   Age  Salary                  Sector  Salary (K) Age Category
0      Hayley  25.0   50000                   Music        50.0        Young
1      Taylor  30.0   60000                   Music        60.0        Young
2      Claire  35.0   70000  Information Technology        70.0       Senior
3      Aurora  40.0   80000                 Finance        80.0       Senior
4  Evangeline  45.0   90000         Human Resources        90.0       Senior


In [10]:
# 7. Drop a column
df_dropped = df.drop(columns=["Salary (K)"])
print("\nDropped 'Salary (K)' Column:\n", df_dropped)


Dropped 'Salary (K)' Column:
          Name   Age  Salary                  Sector Age Category
0      Hayley  25.0   50000                   Music        Young
1      Taylor  30.0   60000                   Music        Young
2      Claire  35.0   70000  Information Technology       Senior
3      Aurora  40.0   80000                 Finance       Senior
4  Evangeline  45.0   90000         Human Resources       Senior


# Real-World Analogy: Preparing a Recipe

Imagine your dataset is like a recipe:
- **Renaming**: Changing ingredient names for clarity.
- **Sorting**: Organizing ingredients by amount or category.
- **Creating new columns**: Adding a "calories per serving" column.
- **Applying functions**: Converting quantities from grams to ounces.
- **Replacing**: Updating old ingredient names to new ones.
- **Changing data types**: Making sure quantities are stored as numbers, not text.
- **Dropping**: Removing unneeded steps or ingredients.

Just like preparing a recipe before cooking, transforming data ensures it’s ready for analysis.
