# Strings

A string is a sequence of characters that come from some alphabet. In Python, the
built-in str class represents strings based upon the **Unicode international character
set, a 16-bit character encoding** that covers most written languages. Unicode is
an extension of the 7-bit ASCII character set that includes the basic Latin alphabet, numerals, and common symbols

We can declare a Python string using a single quote, a double quote, a triple quote, or the str() function. The following code snippet shows how to declare a string in Python:

In [2]:
# A single quote string
single_quote = 'a'  # This is an example of a character in other programming languages. It is a string in Python

# Another single quote string
another_single_quote = 'Programming teaches you patience.'

# A double quote string
double_quote = "aa"

# Another double-quote string
another_double_quote = "It is impossible until it is done!"

# A triple quote string
triple_quote = '''aaa'''

# Also a triple quote string
another_triple_quote = """Welcome to the Python programming language. Ready, 1, 2, 3, Go!"""

# Using the str() function
string_function = str(123.45)  # str() converts float data type to string data type

## String properties in Python

### Inmutability
This means that we cannot update the characters in a string. For example, we cannot delete an element from a string or try to assign a new element at any of its index positions. If we try to update the string, it throws a ```TypeError```:


```python
immutable_string = "Accountability"

# Assign a new element at index 0
immutable_string[0] = 'B'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_42292/2351953155.py in <module>
      2 
      3 # Assign a new element at index 0
----> 4 immutable_string[0] = 'B'

TypeError: 'str' object does not support item assignment
```

We can, however, reassign a string to the immutable_string variable, but we should note that they aren’t the same string because they don’t point to the same object in memory. Python doesn’t update the old string object; it creates a new one, as we can see by the ids:

```python
immutable_string = "Accountability"
print(id(immutable_string)) # prints 1545476899056

immutable_string = "Bccountability"
print(id(immutable_string)) # prints 1545460751408

test_immutable = immutable_string
print(id(test_immutable)) # prints 1545460751408
```


### Consecuences of inmutability

Assume that we have a large string named document, and our
goal is to produce a new string, letters, that contains only the alphabetic characters
of the original string (e.g., with spaces, numbers, and punctuation removed). It may
be tempting to compose a result through repeated concatenation, as follows.

In [6]:
document = "01:01:40. This is a wonderful time to learn some more Python code!"
# WARNING: do not do this
letters = ''# start with empty string
for c in document:
    if c.isalpha( ):
        letters += c # concatenate alphabetic character

While the preceding code fragment accomplishes the goal, it may be terribly
inefficient. Because strings are immutable, the command, letters += c, would
presumably compute the concatenation, letters + c, as a new string instance and
then reassign the identifier, letters, to that result. Constructing that new string
would require time proportional to its length. If the final result has n characters, the
series of concatenations would take time proportional to the familiar sum 1+ 2+
3+···+n, and therefore its temporal complexity would be $O(n^2)$. Some later implementations of the
Python interpreter have developed an optimization to allow such code to complete
in linear time, but this is not guaranteed for all Python implementations.


A more standard Python idiom to guarantee linear time composition of a string
is to use a temporary list to store individual pieces, and then to rely on the join
method of the str class to compose the final result. Using this technique with our
previous example would appear as follows:


In [8]:
document = "01:01:40. This is a wonderful time to learn some more Python code!"

temp = list() # Start with an empty list

for c in document:
    if c.isalpha():
        temp.append(c)      # Append an alphabetic character to the temporal list
letters = ''.join(temp)     # Compose overall result

Another alternative, using a generator comprehension

In [7]:
letters = ''.join(l for l in document if l.isalpha())

## String operators

- **Concatenation**: Creates a **new string** from joining the two provided strings. Time complexity ~ $O(n+m)$.
```python
    a = "this"
    b = "is a concatenated string"
    c = a + b

```

- **Indexing and slicing**: Same as a Python list.
```python
    a = "The same problem over and over"

    print(a[4:8]) # prints "same"

```

- ***in* operator**: Returns ```True``` if the first operand is contained within the second, and ```False``` otherwise. Implementation based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm. Time complexity ~ $O(n)$
```python
    >>> s = 'foo'

    >>> s in 'That\'s food for thought.'
    True
    >>> s in 'That\'s good for now.'
    False
```


## String Methods

The following methods are available

### Searching for Substrings
- **s.count(pattern, start=0, end=len(s))**: Return the number of non-overlapping occurrences of pattern
- **s.find(pattern, start=0, end=len(s))**: Return the index starting the leftmost occurrence of pattern; else -1
- **s.index(pattern, start=0, end=len(s))**: Similar to find, but raise ValueError if not found
- **s.rfind(pattern, start=0, end=len(s))**: Return the index starting the rightmost occurrence of pattern; else -1
- **s.rindex(pattern, start=0, end=len(s))**: Similar to rfind, but raise ValueError if not found


### Constructing Related Strings

- **s.replace(old, new)**: Return a copy of s with all occurrences of old replaced by new
- **s.capitalize( )**: Return a copy of s with its first character having uppercase
- **s.upper( )**: Return a copy of s with all alphabetic characters in uppercase
- **s.lower( )**: Return a copy of s with all alphabetic characters in lowercase
- **s.center(width)**: Return a copy of s, padded to width, centered among spaces
- **s.ljust(width)**: Return a copy of s, padded to width with trailing spaces
- **s.rjust(width)**: Return a copy of s, padded to width with leading spaces
- **s.zfill(width)**: Return a copy of s, padded to width with leading zeros
- **s.strip( )**: Return a copy of s, with leading and trailing whitespace removed
- **s.lstrip( )**: Return a copy of s, with leading whitespace removed
- **s.rstrip( )**: Return a copy of s, with trailing whitespace removed

### Testing Boolean Conditions
- **s.startswith(pattern)**: Return True if pattern is a prefix of string s
- **s.endswith(pattern)**: Return True if pattern is a suffix of string s
- **s.isspace( )**: Return True if all characters of nonempty string are whitespace
- **s.isalpha( )**: Return True if all characters of nonempty string are alphabetic
- **s.islower( )**: Return True if there are one or more alphabetic characters, all of which are lowercased
- **s.isupper( )**: Return True if there are one or more alphabetic characters, all of which are uppercased
- **s.isdigit( )**: Return True if all characters of nonempty string are in 0–9
- **s.isdecimal( )**: Return True if all characters of nonempty string represent digits 0–9, including Unicode equivalents
- **s.isnumeric( )**: Return True if all characters of nonempty string are numeric Unicode characters (e.g., 0–9, equivalents, fraction characters)
- **s.isalnum( )**: Return True if all characters of nonempty string are either alphabetic or numeric (as per above definitions)

### Splitting and Joining Strings

- **sep.join(strings)** Return the composition of the given sequence of strings, inserting sep as delimiter between each pair
```python
    print(' and '.join(['red', 'blue', 'yellow'])) # prints 'red and blue and yellow'
    print(''.join(['red','blue','yellow'])) # prints 'redblueyellow'
```
- **s.splitlines( )** Return a list of substrings of s, as delimited by newlines
- **s.split(sep, count)** Return a list of substrings of s, as delimited by the first count occurrences of sep. If count is not specified, split on all occurrences. If sep is not specified, use whitespace as delimiter.
- **s.rsplit(sep, count)** Similar to split, but using the rightmost occurrences of sep
- **s.partition(sep)** Return (head, sep, tail) such that s = head + sep + tail, using leftmost occurrence of sep, if any; else return (s, , )
- **s.rpartition(sep)** Return (head, sep, tail) such that s = head + sep + tail,using rightmost occurrence of sep, if any; else return ( , , s)

# String Formatting

## Format templates

The format method of the str class composes a string that includes one or more formatted arguments. The method is invoked with a syntax ```s.format(arg0, arg1, ...)```,
where s serves as a formatting string that expresses the desired result with one
or more placeholders in which the arguments will be substituted.

In [6]:
template = "Hi {name}!, nice to see you again. Today is {day}, how is your {day} going, {name}?"

print(template.format(name='Manuel', day='Tuesday'))

Hi Manuel!, nice to see you again. Today is Tuesday, how is your Tuesday going, Manuel?


## Formatted string literals (f-strings)

In [9]:
x = 12
y = 4

print(f"x+y = {x+y}, x^y = {x**y}")

x+y = 16, x^y = 20736


## Format Specification Mini-Language

“Format specifications” are used within replacement fields contained within a format string to define how individual values are presented. They can also be passed directly to the built-in format() function. Each formattable type may define how the format specification is to be interpreted.

Most built-in types implement the following options for format specifications, although some of the formatting options are only supported by the numeric types.

A general convention is that an empty format specification produces the same result as if you had called str() on the value. A non-empty format specification typically modifies the result.


```
format_spec     ::=  [[fill]align][sign][z][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
```

### Alignment

'<' Forces the field to be left-aligned within the available space (this is the default for most objects).
```python
    s = "Python"
    print(f"This is {s:~>12}") # prints "This is Python~~~~~~"
```
'>' Forces the field to be right-aligned within the available space (this is the default for numbers).
```python
    s = "Python"
    print(f"This is {s:~>12}") # prints "This is ~~~~~~Python"
```
'=' Forces the padding to be placed after the sign (if any) but before the digits. This is used for printing fields in the form ‘+000000120’. This alignment option is only valid for numeric types. It becomes the default for numbers when ‘0’ immediately precedes the field width.

```python
    print(f"{-58.123:0=10}") # prints "-00058.123"
```

'^' Forces the field to be centered within the available space.
```python
    s = "Python"
    print(f"This is {s: ^12}") # prints "This is    Python   "
    print(f"This is {s:0^12}") # prints "This is 000Python000"
```

### Sign

The sign option is only valid for number types, and can be one of the following:

- '+' indicates that a sign should be used for both positive as well as negative numbers.
```python
    print(f"{3.1351:+}") # prints "+3.1351"
    print(f"{-3.1351:+}") # prints "-3.1351"
```
- '-' indicates that a sign should be used only for negative numbers (this is the default behavior).
```python
    print(f"{3.1351:+}") # prints "3.1351"
    print(f"{-3.1351:+}") # prints "-3.1351"
```
- space indicates that a leading space should be used on positive numbers, and a minus sign on negative numbers.
```python
    print(f"{3.1351: }") # prints " 3.1351"
    print(f"{-3.1351: }") # prints "-3.1351"
```

### The rest

width is a decimal integer defining the minimum total field width, including any prefixes, separators, and other formatting characters. If not specified, then the field width will be determined by the content.

When no explicit alignment is given, preceding the width field by a zero ('0') character enables sign-aware zero-padding for numeric types. This is equivalent to a fill character of '0' with an alignment type of '='.

Changed in version 3.10: Preceding the width field by '0' no longer affects the default alignment for strings.

The precision is a decimal integer indicating how many digits should be displayed after the decimal point for presentation types 'f' and 'F', or before and after the decimal point for presentation types 'g' or 'G'. For string presentation types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The precision is not allowed for integer presentation types.

Finally, the type determines how the data should be presented.


- 'b': Binary format. Outputs the number in base 2.
- 'c': Character. Converts the integer to the corresponding unicode character before printing.
- 'd': Decimal Integer. Outputs the number in base 10.
- 'o': Octal format. Outputs the number in base 8.
- 'x': Hex format. Outputs the number in base 16, using lower-case letters for the digits above 9.
- 'X': Hex format. Outputs the number in base 16, using upper-case letters for the digits above 9. In case '#' is specified, the prefix '0x' will be upper-cased to '0X' as well.
- 'n': Number. This is the same as 'd', except that it uses the current locale setting to insert the appropriate number separator characters.
- None: The same as 'd'.


```python
    import math

    # Default behaviour
    print(math.pi) # prints "3.141592653589793"

    # Some examples
    print(f"{math.pi:.5}")      # prints "3.1416"
    print(f"{math.pi:0<10.5}")  # prints "3.14160000"
    print(f"{math.pi: >+10.5}") # prints "   +3.1416"


    # More examples examples
    print(f"{13}")   # prints "13"
    print(f"{13:b}") # prints "1101"
    print(f"{13:X}") # prints "D"
```