# Pandas Tutorial - Part 53

This notebook covers various Series string methods including:
- Converting to title case with `str.title()`
- Translating characters with `str.translate()`
- Converting to uppercase with `str.upper()`
- String validation methods like `str.isdecimal()`, `str.isdigit()`, `str.isnumeric()`, `str.isspace()`, `str.islower()`, `str.isupper()`, and `str.istitle()`

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## Converting to Title Case with `str.title()`

The `str.title()` method converts the first character of each word to uppercase and the remaining characters to lowercase.

In [None]:
# Create a Series with strings of different cases
s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
print("Original Series:")
print(s)

In [None]:
# Convert to title case
result = s.str.title()
print("Result of title():")
print(result)

In [None]:
# Create a Series with more complex strings
s_complex = pd.Series([
    'the quick brown fox',
    'HELLO WORLD',
    'python is fun',
    'data-science and machine-learning'
])
print("Series with complex strings:")
print(s_complex)

In [None]:
# Convert to title case
result_complex = s_complex.str.title()
print("Result of title():")
print(result_complex)

## Translating Characters with `str.translate()`

The `str.translate()` method maps characters in the string through a given mapping table.

In [None]:
# Create a Series with strings
s = pd.Series(['hello', 'world', 'python'])
print("Original Series:")
print(s)

In [None]:
# Create a translation table to replace 'o' with '0' and 'l' with '1'
translation_table = str.maketrans({'o': '0', 'l': '1'})
result = s.str.translate(translation_table)
print("Result of translate():")
print(result)

In [None]:
# Create a translation table to delete 'o' and 'l'
delete_table = str.maketrans('', '', 'ol')
result_delete = s.str.translate(delete_table)
print("Result of translate() with deletion:")
print(result_delete)

In [None]:
# Create a translation table to replace multiple characters
complex_table = str.maketrans({
    'h': 'H',
    'e': 'E',
    'l': 'L',
    'o': 'O',
    'w': 'W',
    'r': 'R',
    'd': 'D',
    'p': 'P',
    'y': 'Y',
    't': 'T',
    'n': 'N'
})
result_complex = s.str.translate(complex_table)
print("Result of translate() with complex mapping:")
print(result_complex)

## Converting to Uppercase with `str.upper()`

The `str.upper()` method converts all characters in the string to uppercase.

In [None]:
# Create a Series with strings of different cases
s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
print("Original Series:")
print(s)

In [None]:
# Convert to uppercase
result = s.str.upper()
print("Result of upper():")
print(result)

In [None]:
# Create a Series with mixed strings and non-strings
s_mixed = pd.Series(['hello', 123, 'world', np.nan])
print("Series with mixed types:")
print(s_mixed)

In [None]:
# Convert to uppercase
result_mixed = s_mixed.str.upper()
print("Result of upper() with mixed types:")
print(result_mixed)

## String Validation Methods

Pandas provides various methods to validate and check properties of strings.

### Checking for Numeric Characters

Let's explore the differences between `isdecimal()`, `isdigit()`, and `isnumeric()`.

In [None]:
# Create a Series with various numeric strings
s = pd.Series(['23', '3 ', '²', '½', '三', ''])
print("Original Series:")
print(s)

In [None]:
# Check if strings are decimal (base 10 digits)
result_decimal = s.str.isdecimal()
print("Result of isdecimal():")
print(result_decimal)

In [None]:
# Check if strings are digits (includes superscripts and subscripts)
result_digit = s.str.isdigit()
print("Result of isdigit():")
print(result_digit)

In [None]:
# Check if strings are numeric (includes digits, fractions, and other numeric symbols)
result_numeric = s.str.isnumeric()
print("Result of isnumeric():")
print(result_numeric)

### Checking for Whitespace

In [None]:
# Create a Series with various whitespace strings
s = pd.Series([' ', '\t\r\n ', '', 'a ', ' b'])
print("Original Series:")
print(s)

In [None]:
# Check if strings are whitespace
result = s.str.isspace()
print("Result of isspace():")
print(result)

### Checking for Character Case

In [None]:
# Create a Series with strings of different cases
s = pd.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
print("Original Series:")
print(s)

In [None]:
# Check if strings are lowercase
result_lower = s.str.islower()
print("Result of islower():")
print(result_lower)

In [None]:
# Check if strings are uppercase
result_upper = s.str.isupper()
print("Result of isupper():")
print(result_upper)

In [None]:
# Check if strings are titlecase
result_title = s.str.istitle()
print("Result of istitle():")
print(result_title)

### Checking for Alphanumeric Characters

In [None]:
# Create a Series with various strings
s = pd.Series(['abc123', 'abc', '123', 'abc_123', 'abc 123'])
print("Original Series:")
print(s)

In [None]:
# Check if strings are alphabetic
result_alpha = s.str.isalpha()
print("Result of isalpha():")
print(result_alpha)

In [None]:
# Check if strings are alphanumeric
result_alnum = s.str.isalnum()
print("Result of isalnum():")
print(result_alnum)

## Practical Applications

In [None]:
# Create a DataFrame with customer data
data = {
    'name': ['John Smith', 'JANE DOE', 'robert johnson', 'Emily Wilson'],
    'phone': ['123-456-7890', '(987) 654-3210', '555.123.4567', '123 456 7890'],
    'email': ['john@example.com', 'JANE@EXAMPLE.COM', 'robert.johnson@example.com', 'emily_wilson@example.com'],
    'age': ['25', '30', '35', 'forty']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

In [None]:
# Standardize names to title case
df['name'] = df['name'].str.title()
print("DataFrame with standardized names:")
print(df)

In [None]:
# Standardize emails to lowercase
df['email'] = df['email'].str.lower()
print("DataFrame with standardized emails:")
print(df)

In [None]:
# Clean phone numbers by removing non-numeric characters
df['clean_phone'] = df['phone'].str.translate(str.maketrans('', '', '()-. '))
print("DataFrame with cleaned phone numbers:")
print(df)

In [None]:
# Validate age as numeric
df['is_valid_age'] = df['age'].str.isdecimal()
print("DataFrame with age validation:")
print(df)

In [None]:
# Filter DataFrame to only include rows with valid ages
valid_df = df[df['is_valid_age']]
print("DataFrame with only valid ages:")
print(valid_df)

## Conclusion

In this notebook, we've explored various Series string methods in pandas:

1. `str.title()`: Converts the first character of each word to uppercase and the remaining characters to lowercase.
2. `str.translate()`: Maps characters in the string through a given mapping table, allowing for character replacement and deletion.
3. `str.upper()`: Converts all characters in the string to uppercase.
4. String validation methods:
   - `str.isdecimal()`, `str.isdigit()`, and `str.isnumeric()`: Check for different types of numeric characters.
   - `str.isspace()`: Checks if all characters are whitespace.
   - `str.islower()`, `str.isupper()`, and `str.istitle()`: Check for character case.
   - `str.isalpha()` and `str.isalnum()`: Check for alphabetic and alphanumeric characters.

These methods are essential tools for string manipulation, text processing, and data cleaning in pandas, allowing for flexible and powerful operations on your data.