# String manipulations in Pandas DataFrame

### String manipulation is the process of changing, parsing, splicing, pasting, or analyzing strings. As we know that sometimes, data in the string is not suitable for manipulating the analysis or get a description of the data. But Python is known for its ability to manipulate strings. So, by extending it here we will get to know how Pandas provides us the ways to manipulate to modify and process string data-frame using some builtin functions. Pandas library have some of the builtin functions which is often used to String Data-Frame Manipulations

In [5]:
# Importing the necessary libraries
import pandas as pd
import numpy as np
  
# df stands for dataframe
df = pd.Series(['Gulshan kumar', 'Shashank Sharma', 'Bablu Verma',
                'Abhishek Lal', 'Anand', np.nan, 'Pratap'])
  
print(df)

0      Gulshan kumar
1    Shashank Sharma
2        Bablu Verma
3       Abhishek Lal
4              Anand
5                NaN
6             Pratap
dtype: object


## Let’s have a look at various methods provided by this library for string manipulations.

### lower(): Converts all uppercase characters in strings in the DataFrame to lower case and returns the lowercase strings in the result.


In [6]:
print(df.str.lower())


0      gulshan kumar
1    shashank sharma
2        bablu verma
3       abhishek lal
4              anand
5                NaN
6             pratap
dtype: object


### upper(): Converts all lowercase characters in strings in the DataFrame to upper case and returns the uppercase strings in result.


In [7]:
print(df.str.upper())


0      GULSHAN KUMAR
1    SHASHANK SHARMA
2        BABLU VERMA
3       ABHISHEK LAL
4              ANAND
5                NaN
6             PRATAP
dtype: object


### strip(): If there are spaces at the beginning or end of a string, we should trim the strings to eliminate spaces using strip() or remove the extra spaces contained by a string in DataFrame.


In [8]:
print(df.str.strip())


0      Gulshan kumar
1    Shashank Sharma
2        Bablu Verma
3       Abhishek Lal
4              Anand
5                NaN
6             Pratap
dtype: object


### split(‘ ‘): Splits each string with the given pattern. Strings are split and the new elements after the performed split operation, are stored in a list.


In [9]:
print(df.str.split(' '))


0      [Gulshan, kumar]
1    [Shashank, Sharma]
2        [Bablu, Verma]
3       [Abhishek, Lal]
4               [Anand]
5                   NaN
6              [Pratap]
dtype: object


### len(): With the help of len() we can compute the length of each string in DataFrame & if there is empty data in DataFrame, it returns NaN.


In [10]:
print(df.str.len())


0    13.0
1    15.0
2    11.0
3    12.0
4     5.0
5     NaN
6     6.0
dtype: float64


### startswith(pattern): It returns true if the element or string in the DataFrame Index starts with the pattern.


In [11]:
print(df.str.startswith('G'))


0     True
1    False
2    False
3    False
4    False
5      NaN
6    False
dtype: object


### endswith(pattern): It returns true if the element or string in the DataFrame Index ends with the pattern.


In [13]:
print(df.str.endswith('r'))


0     True
1    False
2    False
3    False
4    False
5      NaN
6    False
dtype: object


### replace(a,b): It replaces the value a with the value b like below in example ‘Gulshan’ is being replaced by ‘Chnitu’.


In [15]:
print(df.str.replace('Gulshan', 'Chintu'))


0       Chintu kumar
1    Shashank Sharma
2        Bablu Verma
3       Abhishek Lal
4              Anand
5                NaN
6             Pratap
dtype: object


### find(pattern): It returns the first position of the first occurrence of the pattern. We can see in the example below, that it returns the index value of appearance of character ‘n’ in each string throughout the DataFrame.


In [16]:
print(df.str.find('n'))


0    6.0
1    6.0
2   -1.0
3   -1.0
4    1.0
5    NaN
6   -1.0
dtype: float64
