### What Are Vectorized String Operations?

Vectorized string operations in Pandas allow you to perform string manipulations efficiently on entire columns (Series) of data. Unlike using loops, which can be cumbersome and error-prone, vectorized operations apply a function to every element of a Series at once.

### Why Use Pandas for String Operations?

In Python, you might use loops to manipulate strings, but this approach can be inefficient and tricky, especially when dealing with missing values. Pandas provides a more efficient and clean way to handle strings using the `str` accessor.

### Basic Example of Vectorized String Operations

In [2]:
import pandas as pd
import numpy as np

In [6]:
data = pd.Series(['Aardra','Elzu','Celin','Jewel','Kochu'])

#### Capitalizing Strings

In Python, we use a loop to capitalize each name:

In [31]:
data =['Aardra','Elzu','none','Celin','Jewel','Kochu']
[s.capitalize() for s in data]

['Aardra', 'Elzu', 'None', 'Celin', 'Jewel', 'Kochu']

But this fails if there's a `None` value. 

Using Pandas:

In [34]:
data =pd.Series(['Aardra','Elzu','none','Celin','Jewel','Kochu'])

In [36]:
data.str.capitalize()

0    Aardra
1      Elzu
2      None
3     Celin
4     Jewel
5     Kochu
dtype: object

### Methods similar to Python string methods
Nearly all Python's built-in string methods are mirrored by a Pandas vectorized string method. Here is a list of Pandas ``str`` methods that mirror Python string methods:

|             |                  |                  |                  |
|-------------|------------------|------------------|------------------|
|``len()``    | ``lower()``      | ``translate()``  | ``islower()``    | 
|``ljust()``  | ``upper()``      | ``startswith()`` | ``isupper()``    | 
|``rjust()``  | ``find()``       | ``endswith()``   | ``isnumeric()``  | 
|``center()`` | ``rfind()``      | ``isalnum()``    | ``isdecimal()``  | 
|``zfill()``  | ``index()``      | ``isalpha()``    | ``split()``      | 
|``strip()``  | ``rindex()``     | ``isdigit()``    | ``rsplit()``     | 
|``rstrip()`` | ``capitalize()`` | ``isspace()``    | ``partition()``  | 
|``lstrip()`` |  ``swapcase()``  |  ``istitle()``   | ``rpartition()`` |

Notice that these have various return values. Some, like ``lower()``, return a series of strings:

In [168]:
s = pd.Series(['Graham Chapman', 'John Cleese', 'Terry Gilliam',
                   'Eric Idle', 'Terry Jones', 'Michael Palin'])

In [170]:
s.str.upper()

0    GRAHAM CHAPMAN
1       JOHN CLEESE
2     TERRY GILLIAM
3         ERIC IDLE
4       TERRY JONES
5     MICHAEL PALIN
dtype: object

In [172]:
s.str.lower()

0    graham chapman
1       john cleese
2     terry gilliam
3         eric idle
4       terry jones
5     michael palin
dtype: object

In [174]:
s.str.len()

0    14
1    11
2    13
3     9
4    11
5    13
dtype: int64

In [176]:
s.str.ljust(15)

0    Graham Chapman 
1    John Cleese    
2    Terry Gilliam  
3    Eric Idle      
4    Terry Jones    
5    Michael Palin  
dtype: object

In [178]:
s.str.rjust(15)

0     Graham Chapman
1        John Cleese
2      Terry Gilliam
3          Eric Idle
4        Terry Jones
5      Michael Palin
dtype: object

In [180]:
s.str.center(10)

0    Graham Chapman
1       John Cleese
2     Terry Gilliam
3        Eric Idle 
4       Terry Jones
5     Michael Palin
dtype: object

In [182]:
s.str.startswith('T')

0    False
1    False
2     True
3    False
4     True
5    False
dtype: bool

### Miscellaneous methods
Finally, there are some miscellaneous methods that enable other convenient operations:

| Method | Description |
|--------|-------------|
| ``get()`` | Index each element |
| ``slice()`` | Slice each element|
| ``slice_replace()`` | Replace slice in each element with passed value|
| ``cat()``      | Concatenate strings|
| ``repeat()`` | Repeat values |
| ``normalize()`` | Return Unicode form of string |
| ``pad()`` | Add whitespace to left, right, or both sides of strings|
| ``wrap()`` | Split long strings into lines with length less than a given width|
| ``join()`` | Join strings in each element of the Series with passed separator|
| ``get_dummies()`` | extract dummy variables as a dataframe |

### 1. `get()`
- **Description**: Retrieves a specific element from a Series or DataFrame based on its index.
- **Example**:
 

In [184]:
import pandas as pd
s = pd.Series(['a', 'b', 'c'], index=[10, 20, 30])
print(s.get(20))
print(s.get(40, 'Not Found'))

b
Not Found



### 2. `slice()`
- **Description**: Slices each element in a Series or DataFrame.
- **Example**:
  
  
 

In [106]:
import pandas as pd

s = pd.Series(['apple', 'banana', 'cherry'])
print(s.str.slice(1, 4))

0    ppl
1    ana
2    her
dtype: object


### 3. `slice_replace()`
- **Description**: Replaces a slice of each string in a Series with a given value.
- **Example**:
  



In [112]:
import pandas as pd
s = pd.Series(['apple', 'banana', 'cherry'])
print(s.str.slice_replace(1, 4, 'X'))

0     aXe
1    bXna
2    cXry
dtype: object


### 4. `cat()`
- **Description**: Concatenates strings in a Series with an optional separator.
- **Example**:
  
  

In [118]:
import pandas as pd
s = pd.Series(['apple', 'banana', 'cherry'])
print(s.str.cat(sep=' ')) 

apple banana cherry


### 5. `repeat()`
- **Description**: Repeats the values in a Series a specified number of times.
- **Example**:
  
 

In [127]:
import pandas as pd
s = pd.Series(['apple', 'banana', 'cherry'])
print(s.str.repeat(5)) 

0         appleappleappleappleapple
1    bananabananabananabananabanana
2    cherrycherrycherrycherrycherry
dtype: object


### 6. `normalize()`
- **Description**: Returns the Unicode normalization form of each string.
- **Example**:
  

In [132]:
import pandas as pd
s = pd.Series(['café', 'naïve'])
print(s.str.normalize('NFC'))

0     café
1    naïve
dtype: object


### 7. `pad()`
- **Description**: Adds whitespace to the left, right, or both sides of strings.
- **Example**:
 
 


In [137]:
import pandas as pd

s = pd.Series(['apple', 'banana', 'cherry'])
print(s.str.pad(width=10, side='left'))  
print(s.str.pad(width=10, side='both')) 

0         apple
1        banana
2        cherry
dtype: object
0      apple   
1      banana  
2      cherry  
dtype: object


### 8. `wrap()`
- **Description**: Wraps long strings into lines with a maximum length.
- **Example**:
  

In [141]:
import pandas as pd
s = pd.Series(['This is a very long string that needs to be wrapped.'])
print(s.str.wrap(width=20))

0    This is a very long\nstring that needs to\nbe ...
dtype: object


### 9. `join()`
- **Description**: Joins the elements of a Series with a specified separator.
- **Example**:
 
 

In [143]:
import pandas as pd
s = pd.Series(['apple', 'banana', 'cherry'])
print(s.str.join('-')) 

0      a-p-p-l-e
1    b-a-n-a-n-a
2    c-h-e-r-r-y
dtype: object


### 10. `get_dummies()`
- **Description**: Converts categorical variable into dummy/indicator variables.
- **Example**:
 

In [152]:
import pandas as pd
df = pd.DataFrame({'color': ['red', 'blue', 'green']})
print(pd.get_dummies(df, columns=['color']))

   color_blue  color_green  color_red
0       False        False       True
1        True        False      False
2       False         True      False


In [156]:
full_monte = pd.DataFrame({'name': monte,
                           'info': ['B|C|D', 'B|D', 'A|C',
                                    'B|D', 'B|C', 'B|C|D']})
full_monte

Unnamed: 0,name,info
0,Graham Chapman,B|C|D
1,John Cleese,B|D
2,Terry Gilliam,A|C
3,Eric Idle,B|D
4,Terry Jones,B|C
5,Michael Palin,B|C|D


In [158]:
full_monte['info'].str.get_dummies('|')

Unnamed: 0,A,B,C,D
0,0,1,1,1
1,0,1,0,1
2,1,0,1,0
3,0,1,0,1
4,0,1,1,0
5,0,1,1,1


In [166]:
data = pd.Series(['Apple','Orange','Kiwi'])
data.str.get_dummies()

Unnamed: 0,Apple,Kiwi,Orange
0,1,0,0
1,0,0,1
2,0,1,0
