# Exploring More Python String Methods

This notebook delves into additional string methods in Python, expanding on the foundational ones. These methods offer more specialized functionalities for string manipulation, formatting, and inspection.

## 1. `center(width[, fillchar])`

Returns a string centered in a string of length `width`. Padding is done using the specified `fillchar` (default is an ASCII space). The original string is returned if `width` is less than or equal to `len(s)`.

In [None]:
text = "Python"
centered_text_space = text.center(20)
centered_text_char = text.center(20, '*')

print(f"Original: '{text}'")
print(f"Centered (space): 'END{centered_text_space}END'")
print(f"Centered ('*'):  'END{centered_text_char}END'")

short_center = text.center(3)
print(f"Centered (width < len): '{short_center}'")

## 2. `ljust(width[, fillchar])`

Returns the string left justified in a string of length `width`. Padding is done using the specified `fillchar` (default is an ASCII space). The original string is returned if `width` is less than or equal to `len(s)`.

In [None]:
text = "Python"
ljust_text_space = text.ljust(20)
ljust_text_char = text.ljust(20, '-')

print(f"Original: '{text}'")
print(f"Left Justified (space): 'END{ljust_text_space}END'")
print(f"Left Justified ('-'):  'END{ljust_text_char}END'")

## 3. `rjust(width[, fillchar])`

Returns the string right justified in a string of length `width`. Padding is done using the specified `fillchar` (default is an ASCII space). The original string is returned if `width` is less than or equal to `len(s)`.

In [None]:
text = "Python"
rjust_text_space = text.rjust(20)
rjust_text_char = text.rjust(20, '=')

print(f"Original: '{text}'")
print(f"Right Justified (space): 'END{rjust_text_space}END'")
print(f"Right Justified ('='):  'END{rjust_text_char}END'")

## 4. `zfill(width)`

Pads a numeric string on the left with zeros to fill `width`. A sign prefix (`+` or `-`) is handled correctly. The original string is returned if `width` is less than or equal to `len(s)`.

In [None]:
num_str1 = "42"
num_str2 = "-123"
num_str3 = "+5000"
text_str = "abc"

print(f"'{num_str1}'.zfill(5): '{num_str1.zfill(5)}'")
print(f"'{num_str2}'.zfill(5): '{num_str2.zfill(5)}'")
print(f"'{num_str3}'.zfill(5): '{num_str3.zfill(5)}'") # Sign moves to the left
print(f"'{num_str1}'.zfill(1): '{num_str1.zfill(1)}'") # Width too small
print(f"'{text_str}'.zfill(5): '{text_str.zfill(5)}'") # Works on non-numeric, but typically used for numbers

## 5. `swapcase()`

Returns a copy of the string with uppercase characters converted to lowercase and vice versa. Non-alphabetic characters are unchanged.

In [None]:
text1 = "Hello World"
text2 = "PYTHON is FUN 123!"

print(f"'{text1}'.swapcase(): '{text1.swapcase()}'")
print(f"'{text2}'.swapcase(): '{text2.swapcase()}'")

## 6. `expandtabs(tabsize=8)`

Returns a copy of the string where all tab characters (`\t`) are replaced by one or more spaces, depending on the current column and the given `tabsize`. Tab stops are every `tabsize` characters (default is 8).

In [None]:
tabbed_string = "col1\tcol2\tcol3"
data_string = "a\tb\tcde\tf"

print(f"Original: '{tabbed_string}'")
print(f"Expanded (default tabsize 8): '{tabbed_string.expandtabs()}'")
print(f"Expanded (tabsize 4): '{tabbed_string.expandtabs(4)}'")
print(f"Expanded (tabsize 10): '{tabbed_string.expandtabs(10)}'")

print(f"\nOriginal data: '{data_string}'")
print(f"Data expanded (tabsize 4): '{data_string.expandtabs(4)}'")
print(f"Compare with fixed spaces: 'a   b   cde f'") # For visual check

## 7. `rsplit(sep=None, maxsplit=-1)`

Similar to `split()`, but splits from the right. 
*   If `sep` is not specified or `None`, splits by whitespace from the right.
*   `maxsplit` is the maximum number of splits to do (from the right).

In [None]:
text = "apple,banana,cherry,date"
print(f"Original: '{text}'")
print(f"rsplit(','): {text.rsplit(',')}")
print(f"rsplit(',', 1): {text.rsplit(',', 1)}") # Useful for getting filename and extension
print(f"rsplit(',', 2): {text.rsplit(',', 2)}")

path = "/usr/local/bin/python"
print(f"\nPath: '{path}'")
print(f"path.rsplit('/', 1): {path.rsplit('/', 1)}") # [directory, filename]

## 8. `splitlines(keepends=False)`

Returns a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless `keepends` is true.

In [None]:
multi_line_text = "Hello\nWorld\nThis is Python\r\nAnd NLTK!\rLast line."
print(f"Original Text:\n{multi_line_text}")

lines_no_ends = multi_line_text.splitlines()
print(f"\nsplitlines(): {lines_no_ends}")

lines_with_ends = multi_line_text.splitlines(keepends=True)
print(f"\nsplitlines(keepends=True): {lines_with_ends}")

empty_string = ""
print(f"\n'{empty_string}'.splitlines(): {empty_string.splitlines()}")
string_with_final_newline = "Line1\nLine2\n"
print(f"'{string_with_final_newline}'.splitlines(): {string_with_final_newline.splitlines()}")

## 9. `rfind(sub[, start[, end]])`

Returns the highest index in the string where substring `sub` is found, such that `sub` is contained within `s[start:end]`. Returns -1 if `sub` is not found. Similar to `find()`, but searches from the right.

In [None]:
text = "banana bandana"
print(f"String: '{text}'")
print(f"rfind('na'): {text.rfind('na')}") # Output: 10
print(f"find('na'): {text.find('na')}")   # Output: 2
print(f"rfind('na', 0, 10): {text.rfind('na', 0, 10)}") # Searches in 'banana ban', finds 'na' at index 4
print(f"rfind('xyz'): {text.rfind('xyz')}")

## 10. `rindex(sub[, start[, end]])`

Like `rfind()` but raises `ValueError` if the substring `sub` is not found.

In [None]:
text = "banana bandana"
print(f"String: '{text}'")
print(f"rindex('na'): {text.rindex('na')}")
try:
    print(f"rindex('xyz'): {text.rindex('xyz')}")
except ValueError as e:
    print(f"Error finding 'xyz': {e}")

## 11. `rpartition(sep)`

Splits the string at the last occurrence of `sep`, and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, returns a 3-tuple containing two empty strings, followed by the original string.

In [None]:
text = "user@example.com@backup"
print(f"String: '{text}'")
print(f"rpartition('@'): {text.rpartition('@')}")
print(f"partition('@'): {text.partition('@')}") # Compare with partition

no_sep_text = "nodomain"
print(f"\nString: '{no_sep_text}'")
print(f"rpartition('@'): {no_sep_text.rpartition('@')}")

## 12. `maketrans(x[, y[, z]])` and `translate(table)`

*   `str.maketrans(x[, y[, z]])`: This static method returns a translation table usable for `str.translate()`.
    *   If one argument: it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings, or `None`.
    *   If two arguments: they must be strings of equal length, and in the resulting dictionary, each character in `x` will be mapped to the character at the same position in `y`.
    *   If three arguments: each character in `z` will be mapped to `None` (characters in `z` are deleted).
*   `translate(table)`: Returns a copy of the string where each character has been mapped through the given translation `table`.

In [None]:
# Example 1: Two arguments (character replacement)
table1 = str.maketrans("aeiou", "12345")
text1 = "hello beautiful world"
translated_text1 = text1.translate(table1)
print(f"Original: '{text1}'")
print(f"Translated (vowels to numbers): '{translated_text1}'")

# Example 2: Three arguments (replacement and deletion)
table2 = str.maketrans("aeiou", "AEIOU", "lrd") # Replace vowels, delete 'l', 'r', 'd'
text2 = "hello beautiful world"
translated_text2 = text2.translate(table2)
print(f"\nOriginal: '{text2}'")
print(f"Translated (vowels to uppercase, l,r,d deleted): '{translated_text2}'")

# Example 3: One argument (dictionary mapping)
mapping_dict = {ord('a'): 'APPLE', ord(' '): '_'}
table3 = str.maketrans(mapping_dict)
text3 = "a cat and a bat"
translated_text3 = text3.translate(table3)
print(f"\nOriginal: '{text3}'")
print(f"Translated (dictionary mapping): '{translated_text3}'")

## 13. `encode(encoding='utf-8', errors='strict')`

Returns an encoded version of the string as a `bytes` object. 
*   `encoding`: The encoding to use (e.g., 'utf-8', 'ascii', 'latin-1'). Default is 'utf-8'.
*   `errors`: Specifies how encoding errors are handled. 
    *   `'strict'` (default): Raises `UnicodeEncodeError`.
    *   `'ignore'`: Ignores unencodable characters.
    *   `'replace'`: Replaces with a replacement marker (often '?').
    *   `'xmlcharrefreplace'`: Replaces with XML character references.
    *   `'backslashreplace'`: Replaces with Python's backslashed escape sequences.

The counterpart `bytes.decode()` converts bytes back to a string.

In [None]:
text = "Python is fun! 😊"

utf8_encoded = text.encode('utf-8')
print(f"Original text: '{text}'")
print(f"UTF-8 encoded: {utf8_encoded}")
print(f"Decoded back: {utf8_encoded.decode('utf-8')}")

try:
    ascii_encoded_strict = text.encode('ascii') # Will fail due to emoji
    print(f"ASCII encoded (strict): {ascii_encoded_strict}")
except UnicodeEncodeError as e:
    print(f"\nError with ASCII strict encoding: {e}")

ascii_encoded_replace = text.encode('ascii', errors='replace')
print(f"ASCII encoded (replace): {ascii_encoded_replace}")
print(f"Decoded from ASCII (replace): {ascii_encoded_replace.decode('ascii')}")

ascii_encoded_ignore = text.encode('ascii', errors='ignore')
print(f"ASCII encoded (ignore): {ascii_encoded_ignore}")
print(f"Decoded from ASCII (ignore): {ascii_encoded_ignore.decode('ascii')}")

ascii_encoded_xml = text.encode('ascii', errors='xmlcharrefreplace')
print(f"ASCII encoded (xmlcharrefreplace): {ascii_encoded_xml}")
print(f"Decoded from ASCII (xmlcharrefreplace): {ascii_encoded_xml.decode('ascii')}") # Note: this decodes XML entities to their char

## 14. `isidentifier()`

Returns `True` if the string is a valid identifier according to the language definition, section [Identifiers and keywords](https://docs.python.org/3/reference/lexical_analysis.html#identifiers). For example, it must start with a letter or underscore, followed by letters, digits, or underscores. Keywords are also valid identifiers.

In [None]:
print(f"'variable_name'.isidentifier(): {'variable_name'.isidentifier()}")
print(f"'_my_var'.isidentifier(): {'_my_var'.isidentifier()}")
print(f"'var123'.isidentifier(): {'var123'.isidentifier()}")
print(f"'class'.isidentifier(): {'class'.isidentifier()}") # 'class' is a keyword, but also a valid identifier syntactically
import keyword
print(f"'class' is a keyword: {keyword.iskeyword('class')}")

print(f"'123var'.isidentifier(): {'123var'.isidentifier()}")
print(f"'var-name'.isidentifier(): {'var-name'.isidentifier()}")
print(f"'var name'.isidentifier(): {'var name'.isidentifier()}")
print(f"''.isidentifier(): {''.isidentifier()}")

## 15. `isprintable()`

Returns `True` if all characters in the string are printable or the string is empty. Non-printable characters are those characters defined in the Unicode character database as "Other" or "Separator", excepting the ASCII space (0x20) which is considered printable.

In [None]:
printable_str = "Hello World 123!@#"
empty_str = ""
non_printable_str1 = "Hello\nWorld" # newline character \n
non_printable_str2 = "Hello\tWorld" # tab character \t
non_printable_str3 = "Hello\x0bWorld" # vertical tab \x0b

print(f"'{printable_str}'.isprintable(): {printable_str.isprintable()}")
print(f"'{empty_str}'.isprintable(): {empty_str.isprintable()}")
print(f"'Hello World'.isprintable(): {'Hello World'.isprintable()}") # Space is printable
print(f"'{non_printable_str1}'.isprintable(): {non_printable_str1.isprintable()}")
print(f"'{non_printable_str2}'.isprintable(): {non_printable_str2.isprintable()}")
print(f"'{non_printable_str3}'.isprintable(): {non_printable_str3.isprintable()}")

## 16. Differentiating `isdecimal()`, `isdigit()`, `isnumeric()`

These methods check if a string consists of certain types of numeric characters. Their distinctions are subtle and relate to Unicode character properties.

*   `isdecimal()`: Returns `True` if all characters in the string are decimal characters and there is at least one character. Decimal characters are those that can be used to form numbers in base 10 (e.g., U+0660 ARABIC-INDIC DIGIT ZERO). Strictly digits used for base-10 numbers.
*   `isdigit()`: Returns `True` if all characters in the string are digits and there is at least one character. This is a broader category than `isdecimal()`. It includes decimal characters and also digits that need special handling, like superscript/subscript digits.
*   `isnumeric()`: Returns `True` if all characters in the string are numeric characters and there is at least one character. This is the broadest category. It includes digit characters, and all characters that have the Unicode numeric value property (e.g., U+2155, VULGAR FRACTION ONE FIFTH, Roman numerals, currency numerators).

In [None]:
s1 = "12345"       # Basic digits
s2 = "\u00B2"      # Superscript two (²)
s3 = "\u2155"      # Vulgar fraction one fifth (⅕)
s4 = "\u0661\u0662" # Arabic-Indic digits one and two (١٢)
s5 = "-1.23"       # Contains non-numeric characters (period, minus)
s6 = "Ⅳ"          # Roman numeral four

def check_numeric_types(s, description):
    print(f"\nString: '{s}' ({description})")
    print(f"  isdecimal(): {s.isdecimal()}")
    print(f"  isdigit():   {s.isdigit()}")
    print(f"  isnumeric(): {s.isnumeric()}")

check_numeric_types(s1, "Basic digits")
check_numeric_types(s2, "Superscript two")
check_numeric_types(s3, "Vulgar fraction")
check_numeric_types(s4, "Arabic-Indic digits")
check_numeric_types(s5, "Floating point number string")
check_numeric_types(s6, "Roman numeral")

print("\nSummary of True conditions for these examples:")
print("s.isdecimal(): Only s1 ('12345') and s4 ('١٢')")
print("s.isdigit():   s1 ('12345'), s2 ('²'), and s4 ('١٢')")
print("s.isnumeric(): s1 ('12345'), s2 ('²'), s3 ('⅕'), s4 ('١٢'), and s6 ('Ⅳ')")

This notebook has covered a range of additional string methods that provide more fine-grained control over string manipulation, formatting, and validation in Python.