---
title: "Text Manipulation Methods in pandas"
author: "Mohammed Adil Siraju"
date: "2025-09-23"
categories: [pandas, dataframe, text-manipulation]
description: "Comprehensive guide to string methods in pandas for text data manipulation, including case conversion, searching, regex, and splitting."
---
# Text Manipulation Methods in pandas

This notebook explores pandas string methods (accessed via `.str`) for manipulating text data in DataFrames. Covers case changes, searching, regex, replacement, and splitting.

## Introduction

Pandas provides vectorized string operations through the `.str` accessor. These methods work on Series of strings and are efficient for text data processing.

In [1]:
import pandas as pd

## Sample Data

We'll use a simple DataFrame with text data to demonstrate string methods.

In [3]:
data = {
    'TextData': ['Hello','World','Python', 'Pandas', 'Data Science']
}

df = pd.DataFrame(data)
df

Unnamed: 0,TextData
0,Hello
1,World
2,Python
3,Pandas
4,Data Science


## Case Conversion

Convert text to lowercase or uppercase using `.str.lower()` and `.str.upper()`.

In [4]:
df['LowerCase'] = df['TextData'].str.lower()
df

Unnamed: 0,TextData,LowerCase
0,Hello,hello
1,World,world
2,Python,python
3,Pandas,pandas
4,Data Science,data science


In [5]:
df['UpperCase'] = df['TextData'].str.upper()
df

Unnamed: 0,TextData,LowerCase,UpperCase
0,Hello,hello,HELLO
1,World,world,WORLD
2,Python,python,PYTHON
3,Pandas,pandas,PANDAS
4,Data Science,data science,DATA SCIENCE


## Searching in Text

Check if strings contain substrings with `.str.contains()`. Use `case=False` for case-insensitive search.

In [8]:
df['Contains'] = df['TextData'].str.contains('O', case=False)
df

Unnamed: 0,TextData,LowerCase,UpperCase,Contains
0,Hello,hello,HELLO,True
1,World,world,WORLD,True
2,Python,python,PYTHON,True
3,Pandas,pandas,PANDAS,False
4,Data Science,data science,DATA SCIENCE,False


## Regular Expressions (Regex)

Use regex with methods like `.str.findall()` to find patterns. Here, finding all 'o' characters.

In [11]:
df['Matches'] = df['TextData'].str.findall('o')
df

Unnamed: 0,TextData,LowerCase,UpperCase,Contains,Matches
0,Hello,hello,HELLO,True,[o]
1,World,world,WORLD,True,[o]
2,Python,python,PYTHON,True,[o]
3,Pandas,pandas,PANDAS,False,[]
4,Data Science,data science,DATA SCIENCE,False,[]


## Replacement and Splitting

Replace substrings with `.str.replace()` and split strings with `.str.split()`.

In [16]:
df['Replaced'] = df['TextData'].str.replace('o', 'x')
df

Unnamed: 0,TextData,LowerCase,UpperCase,Contains,Matches,Replaced
0,Hello,hello,HELLO,True,[o],Hellx
1,World,world,WORLD,True,[o],Wxrld
2,Python,python,PYTHON,True,[o],Pythxn
3,Pandas,pandas,PANDAS,False,[],Pandas
4,Data Science,data science,DATA SCIENCE,False,[],Data Science


In [19]:
df['Split'] = df['TextData'].str.split(' ')
df

Unnamed: 0,TextData,LowerCase,UpperCase,Contains,Matches,Replaced,Split
0,Hello,hello,HELLO,True,[o],Hellx,[Hello]
1,World,world,WORLD,True,[o],Wxrld,[World]
2,Python,python,PYTHON,True,[o],Pythxn,[Python]
3,Pandas,pandas,PANDAS,False,[],Pandas,[Pandas]
4,Data Science,data science,DATA SCIENCE,False,[],Data Science,"[Data, Science]"


## Best Practices

- Handle missing values: Use `.str` methods which handle NaN gracefully.
- For complex regex, test patterns separately.
- Vectorized operations are faster than loops.

## Summary

This notebook covered essential pandas string methods for text manipulation. Experiment with real datasets to master these techniques!