Link to Medium blog post: https://blog.devgenius.io/string-operations-on-pandas-dataframe-88af220439d1

# String Operations on Pandas DataFrame

## 1. Converting to uppercase/lowercase/title case

Example 1: Converting strings in a column in pandas dataframe to uppercase

In [2]:
# create an example DataFrame s1 

import pandas as pd

s1 = pd.DataFrame({'Place': ['Charlotte,NC', 'Atlanta,georgia', 'Houston,texas', 'Richmand,Virginia'],
                     'Salary': ['$1,000','1500$','$2000$', '$$2500']})

s1.head()

Unnamed: 0,Place,Salary
0,"Charlotte,NC","$1,000"
1,"Atlanta,georgia",1500$
2,"Houston,texas",$2000$
3,"Richmand,Virginia",$$2500


In [3]:
s1['Place']=s1.Place.str.upper()
s1.head()

Unnamed: 0,Place,Salary
0,"CHARLOTTE,NC","$1,000"
1,"ATLANTA,GEORGIA",1500$
2,"HOUSTON,TEXAS",$2000$
3,"RICHMAND,VIRGINIA",$$2500


Example 2: Converting strings in a column in pandas dataframe to lowercase

In [4]:
s1['Place']=s1.Place.str.lower()
s1.head()

Unnamed: 0,Place,Salary
0,"charlotte,nc","$1,000"
1,"atlanta,georgia",1500$
2,"houston,texas",$2000$
3,"richmand,virginia",$$2500


Example 3: Converting strings in a column in pandas dataframe to titlecase

In [5]:
s1['Place']=s1.Place.str.title()
s1.head()

Unnamed: 0,Place,Salary
0,"Charlotte,Nc","$1,000"
1,"Atlanta,Georgia",1500$
2,"Houston,Texas",$2000$
3,"Richmand,Virginia",$$2500


## 2. strip,lstrip,rstrip

Example 1: Let’s strip ‘$’ from the “Salary” column. Both leading and trailing ‘$’ sign.

strip → used to remove leading and trailing whitespaces/character mentioned.

In [6]:
s1['Salary']=s1.Salary.str.strip('$')
s1.head()

Unnamed: 0,Place,Salary
0,"Charlotte,Nc",1000
1,"Atlanta,Georgia",1500
2,"Houston,Texas",2000
3,"Richmand,Virginia",2500


![image.png](attachment:image.png)

Example 2: Let’s strip leading ‘$’ from the “Salary” column.

lstrip → used to remove leading whitespaces/character mentioned.

In [7]:
s1['Salary']=s1.Salary.str.lstrip('$')
s1.head()

Unnamed: 0,Place,Salary
0,"Charlotte,Nc",1000
1,"Atlanta,Georgia",1500
2,"Houston,Texas",2000
3,"Richmand,Virginia",2500


![image.png](attachment:image.png)

Example 3: Let’s strip the trailing ‘$’ symbol from the “Salary” column.

rstrip → used to remove trailing whitespaces/character mentioned.

In [8]:
s1['Salary']=s1.Salary.str.rstrip('$')
s1.head()

Unnamed: 0,Place,Salary
0,"Charlotte,Nc",1000
1,"Atlanta,Georgia",1500
2,"Houston,Texas",2000
3,"Richmand,Virginia",2500


![image.png](attachment:image.png)

## 3. split

If we need to split the column in the dataframe into two columns based on some delimiter string, we can use the str.split() function

Example 1: Splitting “Place” column into “City” and “State” columns based on delimiter string “,”

In [9]:
s1[['City','State']]=s1.Place.str.split(',',expand=True)
s1.head()


Unnamed: 0,Place,Salary,City,State
0,"Charlotte,Nc",1000,Charlotte,Nc
1,"Atlanta,Georgia",1500,Atlanta,Georgia
2,"Houston,Texas",2000,Houston,Texas
3,"Richmand,Virginia",2500,Richmand,Virginia


## 4. replace

If we need to replace a substring in a column in the pandas dataframe, we can use the str. replace() function.

Example: Replacing “nc” by “North Carolina” in the “State” column

In [11]:
s1['State']=s1.State.str.replace('Nc','North Carolina')
s1.head()

Unnamed: 0,Place,Salary,City,State
0,"Charlotte,Nc",1000,Charlotte,North Carolina
1,"Atlanta,Georgia",1500,Atlanta,Georgia
2,"Houston,Texas",2000,Houston,Texas
3,"Richmand,Virginia",2500,Richmand,Virginia


## 5. startswith,endswith, contains

startswith

str. startswith(“prefix”) → Returns True if the string starts with the mentioned “prefix”.
We can apply this function to a column in pandas dataframe, to filter the rows that start with the mentioned “substring” in a particular column.

Example 1: Filtering rows that startswith “C” in the “Place” column



In [12]:
s2=s1.loc[s1.Place.str.startswith("C")]
s2

Unnamed: 0,Place,Salary,City,State
0,"Charlotte,Nc",1000,Charlotte,North Carolina


endswith

str. endswith(“suffix”) → Returns True if the string endswith the mentioned “suffix”.
We can apply this function to a column in pandas dataframe, to filter the rows that end with the mentioned “substring” in a particular column.

Example 2: Filtering rows that endswith “as” in the “Place” column

In [13]:
s2=s1.loc[s1.Place.str.endswith("as")]
s2

Unnamed: 0,Place,Salary,City,State
2,"Houston,Texas",2000,Houston,Texas


contains

Example 2: Filtering rows that contain the substring “lotte” in the “Place” column

In [14]:
s2=s1.loc[s1.Place.str.contains("lotte")]
s2

Unnamed: 0,Place,Salary,City,State
0,"Charlotte,Nc",1000,Charlotte,North Carolina
