#### Pandas Tutorial - Part 53

This notebook covers various Series string methods including:
- Converting to title case with `str.title()`
- Translating characters with `str.translate()`
- Converting to uppercase with `str.upper()`
- String validation methods like `str.isdecimal()`, `str.isdigit()`, `str.isnumeric()`, `str.isspace()`, `str.islower()`, `str.isupper()`, and `str.istitle()`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### Converting to Title Case with `str.title()`

The `str.title()` method converts the first character of each word to uppercase and the remaining characters to lowercase.

In [2]:
# Create a Series with strings of different cases
s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
print("Original Series:")
print(s)

Original Series:
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object


In [3]:
# Convert to title case
result = s.str.title()
print("Result of title():")
print(result)

Result of title():
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object


In [4]:
# Create a Series with more complex strings
s_complex = pd.Series([
    'the quick brown fox',
    'HELLO WORLD',
    'python is fun',
    'data-science and machine-learning'
])
print("Series with complex strings:")
print(s_complex)

Series with complex strings:
0                  the quick brown fox
1                          HELLO WORLD
2                        python is fun
3    data-science and machine-learning
dtype: object


In [5]:
# Convert to title case
result_complex = s_complex.str.title()
print("Result of title():")
print(result_complex)

Result of title():
0                  The Quick Brown Fox
1                          Hello World
2                        Python Is Fun
3    Data-Science And Machine-Learning
dtype: object


##### Translating Characters with `str.translate()`

The `str.translate()` method maps characters in the string through a given mapping table.

In [6]:
# Create a Series with strings
s = pd.Series(['hello', 'world', 'python'])
print("Original Series:")
print(s)

Original Series:
0     hello
1     world
2    python
dtype: object


In [7]:
# Create a translation table to replace 'o' with '0' and 'l' with '1'
translation_table = str.maketrans({'o': '0', 'l': '1'})
result = s.str.translate(translation_table)
print("Result of translate():")
print(result)

Result of translate():
0     he110
1     w0r1d
2    pyth0n
dtype: object


In [8]:
# Create a translation table to delete 'o' and 'l'
delete_table = str.maketrans('', '', 'ol')
result_delete = s.str.translate(delete_table)
print("Result of translate() with deletion:")
print(result_delete)

Result of translate() with deletion:
0       he
1      wrd
2    pythn
dtype: object


In [9]:
# Create a translation table to replace multiple characters
complex_table = str.maketrans({
    'h': 'H',
    'e': 'E',
    'l': 'L',
    'o': 'O',
    'w': 'W',
    'r': 'R',
    'd': 'D',
    'p': 'P',
    'y': 'Y',
    't': 'T',
    'n': 'N'
})
result_complex = s.str.translate(complex_table)
print("Result of translate() with complex mapping:")
print(result_complex)

Result of translate() with complex mapping:
0     HELLO
1     WORLD
2    PYTHON
dtype: object


##### Converting to Uppercase with `str.upper()`

The `str.upper()` method converts all characters in the string to uppercase.

In [10]:
# Create a Series with strings of different cases
s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
print("Original Series:")
print(s)

Original Series:
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object


In [11]:
# Convert to uppercase
result = s.str.upper()
print("Result of upper():")
print(result)

Result of upper():
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object


In [12]:
# Create a Series with mixed strings and non-strings
s_mixed = pd.Series(['hello', 123, 'world', np.nan])
print("Series with mixed types:")
print(s_mixed)

Series with mixed types:
0    hello
1      123
2    world
3      NaN
dtype: object


In [13]:
# Convert to uppercase
result_mixed = s_mixed.str.upper()
print("Result of upper() with mixed types:")
print(result_mixed)

Result of upper() with mixed types:
0    HELLO
1      NaN
2    WORLD
3      NaN
dtype: object


##### String Validation Methods

Pandas provides various methods to validate and check properties of strings.

### Checking for Numeric Characters

Let's explore the differences between `isdecimal()`, `isdigit()`, and `isnumeric()`.

In [14]:
# Create a Series with various numeric strings
s = pd.Series(['23', '3 ', '²', '½', '三', ''])
print("Original Series:")
print(s)

Original Series:
0    23
1    3 
2     ²
3     ½
4     三
5      
dtype: object


In [15]:
# Check if strings are decimal (base 10 digits)
result_decimal = s.str.isdecimal()
print("Result of isdecimal():")
print(result_decimal)

Result of isdecimal():
0     True
1    False
2    False
3    False
4    False
5    False
dtype: bool


In [16]:
# Check if strings are digits (includes superscripts and subscripts)
result_digit = s.str.isdigit()
print("Result of isdigit():")
print(result_digit)

Result of isdigit():
0     True
1    False
2     True
3    False
4    False
5    False
dtype: bool


In [17]:
# Check if strings are numeric (includes digits, fractions, and other numeric symbols)
result_numeric = s.str.isnumeric()
print("Result of isnumeric():")
print(result_numeric)

Result of isnumeric():
0     True
1    False
2     True
3     True
4     True
5    False
dtype: bool


### Checking for Whitespace

In [18]:
# Create a Series with various whitespace strings
s = pd.Series([' ', '\t\r\n ', '', 'a ', ' b'])
print("Original Series:")
print(s)

Original Series:
0           
1    \t\r\n 
2           
3         a 
4          b
dtype: object


In [19]:
# Check if strings are whitespace
result = s.str.isspace()
print("Result of isspace():")
print(result)

Result of isspace():
0     True
1     True
2    False
3    False
4    False
dtype: bool


### Checking for Character Case

In [20]:
# Create a Series with strings of different cases
s = pd.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
print("Original Series:")
print(s)

Original Series:
0         leopard
1    Golden Eagle
2           SNAKE
3                
dtype: object


In [21]:
# Check if strings are lowercase
result_lower = s.str.islower()
print("Result of islower():")
print(result_lower)

Result of islower():
0     True
1    False
2    False
3    False
dtype: bool


In [22]:
# Check if strings are uppercase
result_upper = s.str.isupper()
print("Result of isupper():")
print(result_upper)

Result of isupper():
0    False
1    False
2     True
3    False
dtype: bool


In [23]:
# Check if strings are titlecase
result_title = s.str.istitle()
print("Result of istitle():")
print(result_title)

Result of istitle():
0    False
1     True
2    False
3    False
dtype: bool


### Checking for Alphanumeric Characters

In [24]:
# Create a Series with various strings
s = pd.Series(['abc123', 'abc', '123', 'abc_123', 'abc 123'])
print("Original Series:")
print(s)

Original Series:
0     abc123
1        abc
2        123
3    abc_123
4    abc 123
dtype: object


In [25]:
# Check if strings are alphabetic
result_alpha = s.str.isalpha()
print("Result of isalpha():")
print(result_alpha)

Result of isalpha():
0    False
1     True
2    False
3    False
4    False
dtype: bool


In [26]:
# Check if strings are alphanumeric
result_alnum = s.str.isalnum()
print("Result of isalnum():")
print(result_alnum)

Result of isalnum():
0     True
1     True
2     True
3    False
4    False
dtype: bool


##### Practical Applications

In [27]:
# Create a DataFrame with customer data
data = {
    'name': ['John Smith', 'JANE DOE', 'robert johnson', 'Emily Wilson'],
    'phone': ['123-456-7890', '(987) 654-3210', '555.123.4567', '123 456 7890'],
    'email': ['john@example.com', 'JANE@EXAMPLE.COM', 'robert.johnson@example.com', 'emily_wilson@example.com'],
    'age': ['25', '30', '35', 'forty']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Original DataFrame:
             name           phone                       email    age
0      John Smith    123-456-7890            john@example.com     25
1        JANE DOE  (987) 654-3210            JANE@EXAMPLE.COM     30
2  robert johnson    555.123.4567  robert.johnson@example.com     35
3    Emily Wilson    123 456 7890    emily_wilson@example.com  forty


In [28]:
# Standardize names to title case
df['name'] = df['name'].str.title()
print("DataFrame with standardized names:")
print(df)

DataFrame with standardized names:
             name           phone                       email    age
0      John Smith    123-456-7890            john@example.com     25
1        Jane Doe  (987) 654-3210            JANE@EXAMPLE.COM     30
2  Robert Johnson    555.123.4567  robert.johnson@example.com     35
3    Emily Wilson    123 456 7890    emily_wilson@example.com  forty


In [29]:
# Standardize emails to lowercase
df['email'] = df['email'].str.lower()
print("DataFrame with standardized emails:")
print(df)

DataFrame with standardized emails:
             name           phone                       email    age
0      John Smith    123-456-7890            john@example.com     25
1        Jane Doe  (987) 654-3210            jane@example.com     30
2  Robert Johnson    555.123.4567  robert.johnson@example.com     35
3    Emily Wilson    123 456 7890    emily_wilson@example.com  forty


In [30]:
# Clean phone numbers by removing non-numeric characters
df['clean_phone'] = df['phone'].str.translate(str.maketrans('', '', '()-. '))
print("DataFrame with cleaned phone numbers:")
print(df)

DataFrame with cleaned phone numbers:
             name           phone                       email    age  \
0      John Smith    123-456-7890            john@example.com     25   
1        Jane Doe  (987) 654-3210            jane@example.com     30   
2  Robert Johnson    555.123.4567  robert.johnson@example.com     35   
3    Emily Wilson    123 456 7890    emily_wilson@example.com  forty   

  clean_phone  
0  1234567890  
1  9876543210  
2  5551234567  
3  1234567890  


In [31]:
# Validate age as numeric
df['is_valid_age'] = df['age'].str.isdecimal()
print("DataFrame with age validation:")
print(df)

DataFrame with age validation:
             name           phone                       email    age  \
0      John Smith    123-456-7890            john@example.com     25   
1        Jane Doe  (987) 654-3210            jane@example.com     30   
2  Robert Johnson    555.123.4567  robert.johnson@example.com     35   
3    Emily Wilson    123 456 7890    emily_wilson@example.com  forty   

  clean_phone  is_valid_age  
0  1234567890          True  
1  9876543210          True  
2  5551234567          True  
3  1234567890         False  


In [32]:
# Filter DataFrame to only include rows with valid ages
valid_df = df[df['is_valid_age']]
print("DataFrame with only valid ages:")
print(valid_df)

DataFrame with only valid ages:
             name           phone                       email age clean_phone  \
0      John Smith    123-456-7890            john@example.com  25  1234567890   
1        Jane Doe  (987) 654-3210            jane@example.com  30  9876543210   
2  Robert Johnson    555.123.4567  robert.johnson@example.com  35  5551234567   

   is_valid_age  
0          True  
1          True  
2          True  


##### Conclusion

In this notebook, we've explored various Series string methods in pandas:

1. `str.title()`: Converts the first character of each word to uppercase and the remaining characters to lowercase.
2. `str.translate()`: Maps characters in the string through a given mapping table, allowing for character replacement and deletion.
3. `str.upper()`: Converts all characters in the string to uppercase.
4. String validation methods:
   - `str.isdecimal()`, `str.isdigit()`, and `str.isnumeric()`: Check for different types of numeric characters.
   - `str.isspace()`: Checks if all characters are whitespace.
   - `str.islower()`, `str.isupper()`, and `str.istitle()`: Check for character case.
   - `str.isalpha()` and `str.isalnum()`: Check for alphabetic and alphanumeric characters.

These methods are essential tools for string manipulation, text processing, and data cleaning in pandas, allowing for flexible and powerful operations on your data.