# Row Operations in Pandas

### What Are Row Operations?

Row operations in Pandas involve modifying the rows of a DataFrame — including **adding new rows**, **removing rows**, **reordering or selecting specific rows**, and **sampling random rows**. Each row typically represents one observation, such as a single passenger on the Titanic. So manipulating rows directly affects the shape, content, and meaning of our dataset.

These operations are often used in:

- **Cleaning** data (removing unwanted records),
- **Exploring** subsets (filtering by conditions),
- **Training ML models** (splitting and shuffling), and
- **Augmenting** datasets (adding new records).

Row operations are performed using `iloc`, `loc`, `drop()`, `concat()`, `sample()`, filtering expressions, and slicing. They help us get our data into the right shape and form for analysis or modeling.

### Why Row Operations Are Important

Row-level operations are essential in nearly every data science project because:

- **Cleaning**: Drop irrelevant, null, or duplicate rows.
- **Exploration**: Filter rows by age, survival, or fare.
- **Testing**: Sample small random subsets.
- **Reordering**: Shuffle for cross-validation or visual sanity checks.
- **Expansion**: Add test cases or new rows programmatically.

In real-world projects, row manipulations help transform messy or unstructured data into clean, usable tables that work well with ML algorithms.

### Common Row Operations and Syntax

1. **Adding Rows (`pd.concat()` instead of `.append()`)**
    
    Since **Pandas 2.0**, `.append()` is **deprecated**, so use `pd.concat()` to add rows.

In [1]:
import pandas as pd

df = pd.read_csv("data/train.csv")

# Create a new row (same column structure)
new_row = pd.DataFrame([{
    'PassengerId': 999,
    'Survived': 0,
    'Pclass': 3,
    'Name': 'Test, Mr. Dummy',
    'Sex': 'male',
    'Age': 30,
    'SibSp': 0,
    'Parch': 0,
    'Ticket': '000000',
    'Fare': 7.25,
    'Cabin': None,
    'Embarked': 'S'
}])

# Append using pd.concat
df = pd.concat([df, new_row], ignore_index=True)

print(df.tail())

     PassengerId  Survived  Pclass                                      Name  \
887          888         1       1              Graham, Miss. Margaret Edith   
888          889         0       3  Johnston, Miss. Catherine Helen "Carrie"   
889          890         1       1                     Behr, Mr. Karl Howell   
890          891         0       3                       Dooley, Mr. Patrick   
891          999         0       3                           Test, Mr. Dummy   

        Sex   Age  SibSp  Parch      Ticket   Fare Cabin Embarked  
887  female  19.0      0      0      112053  30.00   B42        S  
888  female   NaN      1      2  W./C. 6607  23.45   NaN        S  
889    male  26.0      0      0      111369  30.00  C148        C  
890    male  32.0      0      0      370376   7.75   NaN        Q  
891    male  30.0      0      0      000000   7.25  None        S  


2. Removing Rows
    
    We can remove rows using `.drop()` by index:

In [2]:
df = df.drop(index=0)  # Remove first row
df = df.drop(index=[1, 2])  # Remove multiple rows
print(df.head())

   PassengerId  Survived  Pclass  \
3            4         1       1   
4            5         0       3   
5            6         0       3   
6            7         0       1   
7            8         0       3   

                                           Name     Sex   Age  SibSp  Parch  \
3  Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0   
4                      Allen, Mr. William Henry    male  35.0      0      0   
5                              Moran, Mr. James    male   NaN      0      0   
6                       McCarthy, Mr. Timothy J    male  54.0      0      0   
7                Palsson, Master. Gosta Leonard    male   2.0      3      1   

   Ticket     Fare Cabin Embarked  
3  113803  53.1000  C123        S  
4  373450   8.0500   NaN        S  
5  330877   8.4583   NaN        Q  
6   17463  51.8625   E46        S  
7  349909  21.0750   NaN        S  


3. Selecting/Reordering Rows

In [3]:
# First 5 rows
df_first_5 = df.iloc[:5]
print(df_first_5)

# Reversed DataFrame
df_reversed = df.iloc[::-1]
print(df_reversed)

# Filter: Only female passengers
df_females = df[df['Sex'] == 'female']
print(df_females)

   PassengerId  Survived  Pclass  \
3            4         1       1   
4            5         0       3   
5            6         0       3   
6            7         0       1   
7            8         0       3   

                                           Name     Sex   Age  SibSp  Parch  \
3  Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0   
4                      Allen, Mr. William Henry    male  35.0      0      0   
5                              Moran, Mr. James    male   NaN      0      0   
6                       McCarthy, Mr. Timothy J    male  54.0      0      0   
7                Palsson, Master. Gosta Leonard    male   2.0      3      1   

   Ticket     Fare Cabin Embarked  
3  113803  53.1000  C123        S  
4  373450   8.0500   NaN        S  
5  330877   8.4583   NaN        Q  
6   17463  51.8625   E46        S  
7  349909  21.0750   NaN        S  
     PassengerId  Survived  Pclass  \
891          999         0       3   
890          891

4. Sampling Rows (Random Subset)

In [4]:
# Sample 5 random rows
df_sample = df.sample(n=5, random_state=42)
print(df_sample[['PassengerId', 'Name', 'Sex']])

     PassengerId                                   Name     Sex
283          284             Dorking, Mr. Edward Arthur    male
437          438  Richards, Mrs. Sidney (Emily Hocking)  female
42            43                    Kraeff, Mr. Theodor    male
420          421                 Gheorgheff, Mr. Stanio    male
587          588       Frolicher-Stehli, Mr. Maxmillian    male


Useful for testing ML models or previewing random data entries.

### Real-World Use Cases

| Scenario | Row Operation |
| --- | --- |
| Remove null values | `df.dropna()` |
| Remove duplicates | `df.drop_duplicates()` |
| Filter passengers over 60 | `df[df['Age'] > 60]` |
| Random 80% for training | `df.sample(frac=0.8)` |
| Shuffle dataset | `df.sample(frac=1)` |
| Add test data | `pd.concat([df, test_df])` |

### Best Practices

| Task | Best Practice |
| --- | --- |
| Adding rows | Use `pd.concat()`, not `.append()` (deprecated) |
| Dropping rows | Use `.drop(index=X)` or conditional filtering |
| Random sampling | Always use `random_state` for reproducibility |
| After drops | Use `.reset_index(drop=True)` if needed |
| Avoid inplace confusion | Reassign or use `.copy()` for clarity |

### Exercises

Q1. Add a dummy row to the Titanic dataset.

In [5]:
dummy = pd.DataFrame([{
    'PassengerId': 1000,
    'Survived': 1,
    'Pclass': 2,
    'Name': 'AI, Mr. Synthetic',
    'Sex': 'male',
    'Age': 33,
    'SibSp': 0,
    'Parch': 0,
    'Ticket': 'TEST1000',
    'Fare': 12.5,
    'Cabin': None,
    'Embarked': 'C'
}])
df = pd.concat([df, dummy], ignore_index=True)
print(df.tail())

     PassengerId  Survived  Pclass                                      Name  \
885          889         0       3  Johnston, Miss. Catherine Helen "Carrie"   
886          890         1       1                     Behr, Mr. Karl Howell   
887          891         0       3                       Dooley, Mr. Patrick   
888          999         0       3                           Test, Mr. Dummy   
889         1000         1       2                         AI, Mr. Synthetic   

        Sex   Age  SibSp  Parch      Ticket   Fare Cabin Embarked  
885  female   NaN      1      2  W./C. 6607  23.45   NaN        S  
886    male  26.0      0      0      111369  30.00  C148        C  
887    male  32.0      0      0      370376   7.75   NaN        Q  
888    male  30.0      0      0      000000   7.25  None        S  
889    male  33.0      0      0    TEST1000  12.50  None        C  


Q2. Drop all rows where Age < 10

In [6]:
df = df[df['Age'] >= 10]
print(df.head())

   PassengerId  Survived  Pclass  \
0            4         1       1   
1            5         0       3   
3            7         0       1   
5            9         1       3   
6           10         1       2   

                                                Name     Sex   Age  SibSp  \
0       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
1                           Allen, Mr. William Henry    male  35.0      0   
3                            McCarthy, Mr. Timothy J    male  54.0      0   
5  Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  female  27.0      0   
6                Nasser, Mrs. Nicholas (Adele Achem)  female  14.0      1   

   Parch  Ticket     Fare Cabin Embarked  
0      0  113803  53.1000  C123        S  
1      0  373450   8.0500   NaN        S  
3      0   17463  51.8625   E46        S  
5      2  347742  11.1333   NaN        S  
6      0  237736  30.0708   NaN        C  


Q3. Shuffle all rows

In [7]:
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
print(df.head())

   PassengerId  Survived  Pclass  \
0          880         1       1   
1          371         1       1   
2          363         0       3   
3          688         0       3   
4          109         0       3   

                                            Name     Sex   Age  SibSp  Parch  \
0  Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)  female  56.0      0      1   
1                    Harder, Mr. George Achilles    male  25.0      1      0   
2                Barbara, Mrs. (Catherine David)  female  45.0      0      1   
3                              Dakic, Mr. Branko    male  19.0      0      0   
4                                Rekic, Mr. Tido    male  38.0      0      0   

   Ticket     Fare Cabin Embarked  
0   11767  83.1583   C50        C  
1   11765  55.4417   E50        C  
2    2691  14.4542   NaN        C  
3  349228  10.1708   NaN        S  
4  349249   7.8958   NaN        S  


Q4. Sample 5 random rows showing Name and Fare

In [8]:
print(df.sample(n=5)[['Name', 'Fare']])

                                                  Name     Fare
272  Louch, Mrs. Charles Alexander (Alice Adelaide ...  26.0000
638                          Madsen, Mr. Fridtjof Arne   7.1417
436                          Carr, Miss. Helen "Ellen"   7.7500
285                    Reuchlin, Jonkheer. John George   0.0000
361                  Beane, Mrs. Edward (Ethel Clarke)  26.0000


Q5. Remove last 2 rows

In [9]:
df = df[:-2]
print(df.tail())

     PassengerId  Survived  Pclass                               Name     Sex  \
644          848         0       3                 Markoff, Mr. Marin    male   
645           36         0       1     Holverson, Mr. Alexander Oskar    male   
646          107         1       3   Salkjelsvik, Miss. Anna Kristine  female   
647          150         0       2  Byles, Rev. Thomas Roussel Davids    male   
648          379         0       3                Betros, Mr. Tannous    male   

      Age  SibSp  Parch  Ticket     Fare Cabin Embarked  
644  35.0      0      0  349213   7.8958   NaN        C  
645  42.0      1      0  113789  52.0000   NaN        S  
646  21.0      0      0  343120   7.6500   NaN        S  
647  42.0      0      0  244310  13.0000   NaN        S  
648  20.0      0      0    2648   4.0125   NaN        C  


### Summary

Row operations form the backbone of real-world data preprocessing. Every row represents one observation (like a person, event, or transaction), and being able to **select, remove, add, or shuffle** these rows efficiently is key to clean and accurate analysis.

Adding rows helps in augmenting datasets or including synthetic test data. In modern Pandas (v2.0+), it’s recommended to use `pd.concat()` for this purpose — the `.append()` method is now deprecated and will be removed in future versions.

Removing rows is common during cleaning — to eliminate outliers, rows with nulls, or duplicate records. Conditional filters (`df[df[‘col’] > X]`) or `.drop(index=…)` allow us to do this surgically and precisely.

Another crucial operation is **random sampling** using `.sample()` — perfect for small previews, test sets, or splitting training/validation datasets. Always use `random_state` to ensure reproducibility.

Reordering and slicing (`iloc`, `loc`) is useful for inspection, feature testing, and batch processing. Don’t forget to `.reset_index(drop=True)` if we shuffle or drop rows and need a clean index.

These simple but powerful row techniques appear in nearly every data science or ML project — from loading data to exporting final predictions. Whether we're cleaning Titanic data or preparing million-row datasets, row-level control gives us the precision we need to build reliable and scalable data pipelines.