# String Functions (For object or str columns only)
- `df['column'].str.contains()`: Check if a specific value exists.
- `df['column'].str.startswith(), df['column'].str.endswith()`: Check if text starts or ends with a specific value.
- `df['column'].str.lower(), df['column'].str.upper()`: Convert text to lower or upper case.
- `df['column'].str.replace()`: Replace specified values.
- `df['column'].str.split()`: Split text by a specified delimiter.
- `df['column'].str.strip()`: Remove leading and trailing spaces.


In [1]:
import pandas as pd
import numpy as np

In [38]:
data = { 'Fruit': ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry'], 'Quantity': [10, 15, 7, 3, 12], 'Price': [1.2, 0.8, 2.5, 3.0, 1.5] }
df= pd.DataFrame(data)
df

Unnamed: 0,Fruit,Quantity,Price
0,Apple,10,1.2
1,Banana,15,0.8
2,Cherry,7,2.5
3,Date,3,3.0
4,Elderberry,12,1.5


# pandas.Series.str.contains

`Series.str.contains(pat, case=True, flags=0, na=None, regex=True)[source]`
Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

## Parameters

- **`pat`**: str
  Character sequence or regular expression.

- **`case`**: bool, default True
  If True, case sensitive.

- **`flags`**: int, default 0 (no flags)
  Flags to pass through to the re module, e.g. re.IGNORECASE.

- **`na`**: scalar, optional
  Fill value for missing values. The default depends on dtype of the array. For object-dtype, numpy.nan is used. For StringDtype, pandas.NA is used.

- **`regex`**: bool, default True
  If True, assumes the `pat` is a regular expression. If False, treats the `pat` as a literal string.

## Returns

- **Series or Index of boolean values**
  A Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.


In [4]:
df['Fruit'].str.contains('a')

0    False
1     True
2    False
3     True
4    False
Name: Fruit, dtype: bool

In [6]:
df['Fruit'].at[3]=np.nan

In [8]:
print(df['Fruit'].str.contains('a'))
df['Fruit'].str.contains('a',na=False)

0    False
1     True
2    False
3      NaN
4    False
Name: Fruit, dtype: object


0    False
1     True
2    False
3    False
4    False
Name: Fruit, dtype: bool

In [16]:
print(df['Fruit'].str.contains('A',na=False))
df['Fruit'].str.contains('A',case=False,na=False)

0     True
1    False
2    False
3    False
4    False
Name: Fruit, dtype: bool


0     True
1     True
2    False
3    False
4    False
Name: Fruit, dtype: bool

In [14]:
df

Unnamed: 0,Fruit,Quantity,Price
0,Apple,10,1.2
1,Banana,15,0.8
2,Cherry,7,2.5
3,,3,3.0
4,Elderberry,12,1.5


# pandas.Series.str.startswith

`Series.str.startswith(pat, na=None)[source]`
Test if the start of each string element matches a pattern.

 # pandas.Series.str.endswith
` Series.str.endswith(pat, na=None)`

Test if the end of each string element matches a pattern.



## Parameters

- **`pat`**: str or tuple[str, …]
  Character sequence or tuple of strings. Regular expressions are not accepted.

- **`na`**: object, default NaN
  Object shown if element tested is not a string. The default depends on dtype of the array. For object-dtype, numpy.nan is used. For StringDtype, pandas.NA is used.

## Returns

- **Series or Index of bool**
  A Series of booleans indicating whether the given pattern matches the start of each string element.


In [23]:
df['Fruit'].str.startswith('A')

0     True
1    False
2    False
3      NaN
4    False
Name: Fruit, dtype: object

In [24]:
df['Fruit'].str.endswith('a')

0    False
1     True
2    False
3      NaN
4    False
Name: Fruit, dtype: object

In [42]:
# without topic
df['Fruit'].apply(lambda x: x if any(char in x for char in 'aAbvB') else None)


0         Apple
1        Banana
2          None
3          Date
4    Elderberry
Name: Fruit, dtype: object

# pandas.Series.str.lower

`Series.str.lower()`
Convert strings in the Series/Index to lowercase.

# pandas.Series.str.upper
` Series.str.upper() `
Convert strings in the Series/Index to uppercase.



## Returns

- **Series or Index of object**


In [43]:
df['Fruit'].str.upper()

0         APPLE
1        BANANA
2        CHERRY
3          DATE
4    ELDERBERRY
Name: Fruit, dtype: object

In [44]:
df['Fruit'].str.lower()

0         apple
1        banana
2        cherry
3          date
4    elderberry
Name: Fruit, dtype: object

# pandas.Series.str.replace

`Series.str.replace(pat, repl, n=-1, case=None, flags=0, regex=False)[source]`
Replace each occurrence of pattern/regex in the Series/Index.

Equivalent to `str.replace()` or `re.sub()`, depending on the `regex` value.

## Parameters

- **`pat`**: str or compiled regex
  String can be a character sequence or regular expression.

- **`repl`**: str or callable
  Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See `re.sub()`.

- **`n`**: int, default -1 (all)
  Number of replacements to make from start.

- **`case`**: bool, default None
  Determines if replace is case sensitive:
  - If True, case sensitive (the default if `pat` is a string).
  - Set to False for case insensitive.
  - Cannot be set if `pat` is a compiled regex.

- **`flags`**: int, default 0 (no flags)
  Regex module flags, e.g. `re.IGNORECASE`. Cannot be set if `pat` is a compiled regex.

- **`regex`**: bool, default False
  Determines if the passed-in pattern is a regular expression:
  - If True, assumes the passed-in pattern is a regular expression.
  - If False, treats the pattern as a literal string.
  - Cannot be set to False if `pat` is a compiled regex or `repl` is a callable.

## Returns

- **Series or Index of object**
  A copy of the object with all matching occurrences of `pat` replaced by `repl`.

## Raises

- **ValueError**
  - if `regex` is False and `repl` is a callable or `pat` is a compiled regex
  - if `pat` is a compiled regex and `case` or `flags` is set



In [49]:
df['Fruit'].str.replace('a','A',n=2)

0         Apple
1        BAnAna
2        Cherry
3          DAte
4    Elderberry
Name: Fruit, dtype: object

# pandas.Series.str.split

`Series.str.split(pat=None, *, n=-1, expand=False, regex=None)[source]`
Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string.

## Parameters

- **`pat`**: str or compiled regex, optional
  String or regular expression to split on. If not specified, split on whitespace.

- **`n`**: int, default -1 (all)
  Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.

- **`expand`**: bool, default False
  Expand the split strings into separate columns.
  - If True, return DataFrame/MultiIndex expanding dimensionality.
  - If False, return Series/Index, containing lists of strings.

- **`regex`**: bool, default None
  Determines if the passed-in pattern is a regular expression:
  - If True, assumes the passed-in pattern is a regular expression.
  - If False, treats the pattern as a literal string.
  - If None and `pat` length is 1, treats `pat` as a literal string.
  - If None and `pat` length is not 1, treats `pat` as a regular expression.
  - Cannot be set to False if `pat` is a compiled regex.
  - Added in version 1.4.0.

## Returns

- **Series, Index, DataFrame or MultiIndex**
  Type matches caller unless `expand=True` (see Notes).


In [52]:
df['Fruit'].str.split('a',n=1,expand=True)

Unnamed: 0,0,1
0,Apple,
1,B,nana
2,Cherry,
3,D,te
4,Elderberry,


# pandas.Series.str.strip

`Series.str.strip(to_strip=None)[source]`
Remove leading and trailing characters.

Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides. Replaces any non-strings in Series with NaNs. Equivalent to `str.strip()`.

## Parameters

- **`to_strip`**: str or None, default None
  Specifying the set of characters to be removed. All combinations of this set of characters will be stripped. If None then whitespaces are removed.

## Returns

- **Series or Index of object**

## See also

- **Series.str.strip**
  Remove leading and trailing characters in Series/Index.

- **Series.str.lstrip**
  Remove leading characters in Series/Index.

- **Series.str.rstrip**
  Remove trailing characters in Series/Index.


In [56]:
df['Fruit'].str.strip('aA')

0          pple
1         Banan
2        Cherry
3          Date
4    Elderberry
Name: Fruit, dtype: object