String Concatenation

In [2]:
# Concatenation using +
s1 = "Hello"
s2 = "World"
combined = s1 + " " + s2
print(combined)  # Output: Hello World

# Concatenation using join
words = ["Hello", "World"]
combined = " ".join(words)
print(combined)  # Output: Hello World


Hello World
Hello World


String Internal Representation

In [3]:
# Internally, Python strings are represented as arrays of Unicode code points (characters). 
# This allows Python to handle multi-language text and special characters.

s = "Hello, 世界"
for char in s:
    print(char)  # Outputs each character including the Chinese characters


H
e
l
l
o
,
 
世
界


String Methods

In [4]:
# Original string
text = "Hello, this is a Python string example."

# Convert the string to uppercase
upper_text = text.upper()
print("Uppercase:", upper_text)
# Output: HELLO, THIS IS A PYTHON STRING EXAMPLE.

# Convert the string to lowercase
lower_text = text.lower()
print("Lowercase:", lower_text)
# Output: hello, this is a python string example.

# Replace part of the string
replaced_text = text.replace("Python", "sample")
print("Replaced:", replaced_text)
# Output: Hello, this is a sample string example.

# Split the string into a list of words
split_text = text.split()
print("Split:", split_text)
# Output: ['Hello,', 'this', 'is', 'a', 'Python', 'string', 'example.']

# Find the position of a substring
position = text.find("Python")
print("Position of 'Python':", position)
# Output: 18 (the starting index of the substring "Python")

# Check if the string starts with "Hello"
starts_with_hello = text.startswith("Hello")
print("Starts with 'Hello':", starts_with_hello)
# Output: True

# Check if the string ends with "example."
ends_with_example = text.endswith("example.")
print("Ends with 'example.':", ends_with_example)
# Output: True


Uppercase: HELLO, THIS IS A PYTHON STRING EXAMPLE.
Lowercase: hello, this is a python string example.
Replaced: Hello, this is a sample string example.
Split: ['Hello,', 'this', 'is', 'a', 'Python', 'string', 'example.']
Position of 'Python': 17
Starts with 'Hello': True
Ends with 'example.': True


### f-strings - formatted strings
1. They are expressions
2. Thus evaluated at runtime
3. Older ways had limitations and were cumbersome.
4. This is the most recent and preferred way.
5. More here: https://peps.python.org/pep-0498/


#### %-formatting
1. This was a very old way

In [9]:
name = 'Guido Rossum'
print('Good Morning %s' %name )

# Tuple
name = ('Guido', 'Rossum')
# "Good Morning %s" % name # will fail
print('Good Morning %s' %(name,)) # We will need to do this. 

# with f-string
print(f'Good Morning {name}') # much cleaner.

Good Morning Guido Rossum
Good Morning ('Guido', 'Rossum')
Good Morning ('Guido', 'Rossum')


#### str.format()
1. This construct was added to remove the limitations and issues of %-formatting.
2. Allowed the use of multiple parameters
3. Extensible through the `__format__()` method on objects.
4. f-strings use much of the functionality of this method while being less verbose.


In [26]:
class A:
    def __format__(self, format_spec): # passes the format specifier
        return format('Hello I am A', format_spec) # call the global format method.
a = A()
'ok {value:^20}'.format(value  = a) # This is verbose

'ok     Hello I am A    '

In [1]:
name = "Guido"
age = 24
greeting = f"My name is {name} and I am {age} years old."
print(greeting)  # Output: My name is Guido and I am 24 years old.


My name is Guido and I am 24 years old.


# How f-strings are translated to code

In [61]:
expr1 = 2.342345+57.989789 #Any expression
spec1 = '.2f'

print(f'str1 {expr1:{spec1}}') # spec needs to be an expression and the trailing braces need to be close to each other.

format(expr1, '.2f') # The above f-string is equivalent to this.

str1 60.33


'60.33'

### f-string format
`f '` `<text>` `{` `<expression>` `<optional !s, !r, or !a>` `<optional : format specifier>` `}`  `<text>` ...`'`

1. The expression part is then formatter using the `__format__` pattern. 
2. Expressions cannot contain `:` or `;`. The exception is `!=` which is allowed as a special case.
3. Backslash cannot be used inside the expression portion. It can be used in the string portion of the f-string

In [40]:
print(f'{"\" This is allowed \" "}') # as the backslash is part of the string portion of the expression.
# f'{\"quoted string\"}' #This wont work as the backspash is ourside the string portion of the expression. 

" This is allowed " 


4. The right way to have have a literal brace in the output is:

In [4]:
print(f'{{ --{3-1}--}}') #The first pair  {{ produces a literal brace like the ending 2 }} produce the closing brace
print(f'hey {{"name"}}, how are you?')

{ --2--}
hey {"name"}, how are you?


5. Raw strings do not process escape sequence

In [45]:
print(r'This is a \n raw string')
print('This not a \n raw string')

print(fr'This is a raw {3+2} \n expression')
print(f"This is not a raw {3+2} \n expression")

This is a \n raw string
This not a 
 raw string
This is a raw 5 \n expression
This is not a raw 5 
 expression


6. Best is use `'''` when wanting to use quotes in the expression.

In [46]:
f'''This is a {3*1} 'quote' string'''

"This is a 3 'quote' string"

7. Evaluation order of expressions is left to right

In [63]:
val = [10]
def f(lst):
    lst[0] += 1
    return lst


f'{f(val)}  {f(val)}'

'[11]  [12]'

### Using Lambda in expression

In [66]:
f'{(lambda x: x**2)(4)}' #The lambda squares any value passed to it. And its invoked with 4

'16'

Replacement Field {} Specification

In [6]:
value = 1234.56789
formatted_value = f"Formatted number: {value:.2f}"  # Limit to 2 decimal places
print(formatted_value)  # Output: Formatted number: 1234.57


Formatted number: 1234.57


### Optional Conversion Operators:
1. We can use `!` for making a quick conversion
2. It causes type coercion before formatting (as specified with the “:” operator)
3. !s - calls str(object) - user-friendly representation of the object
4. Above is quitvalent to: `str(object)` -> `type(object).__str__(object)`.
5. Ff type does not have `__str__()` then fallback is repr(object)
6. !r - representation calls repr(). A string that can be eval(string) to get the object
7. !a = calls ascii(). Same as repr() with all non-ascii characters escaped

- They are entirely optional as we can write `str(object)` `repr(object)` and `ascii(object)` in the expression


In [7]:
# Defining some variables
text = "Hello\nWorld"       # Contains a newline character
unicode_text = "Héllo"      # Contains a non-ASCII character

# Using !r to show the raw (debug) representation of the string
print(f"repr of text: {text!r}")  # Output: 'Hello\nWorld' (shows the newline escape sequence)

# Using !s to show the regular string representation (default behavior)
print(f"str of text: {text!s}")   # Output: Hello
                                  #         World  (newline is printed)

# Using !a to escape non-ASCII characters
print(f"ascii of unicode_text: {unicode_text!a}")  # Output: 'H\u00e9llo' (non-ASCII character é is escaped)


repr of text: 'Hello\nWorld'
str of text: Hello
World
ascii of unicode_text: 'H\xe9llo'


### Format Specification
format_spec ::= [[fill]align][sign]["z"]["#"]["0"][width][grouping_option]
                ["." precision][type]

fill        ::= lpad or rpad with any character except curly brace { or }\
align       ::= Left, right, center\
sign        ::= controls the sign of numbers\
'z'         ::= makes -0 floating point to +0 with format specification\
'#'         ::= controls alternate form  for conversion.\
'0'         ::= similar to fill char 0 and alignment =\
grouping    ::= '_' or ',' as thousand's separator\
.precision  ::= number of digits after decimal\
presentation type        ::= presentation type\

#### Fill, align, Sign, Width

##### Fill
1. Fill specifies a single character used to pad the expression left or right. 
2. The end string will be a minimum width of characters. 
3. Curly braces are not supported as fill characters directly. 
4. Nested fields can be used to overcome this limitation. 
5. If a fill character is provided then alignment is must


##### Align
1. `<` for left; `>` for right; `^` for center; `=` fill char between sign and expression

#### Width
1. The target width
2. Unless this is specified and more that the width of the data to fill, align will have no meaning.

#### Sign
1. `+` indicates that a sign should be placed for both positive and negative numbers
2. `-` indicates that the sign is placed for only negative numbers
3. `space` leading `space` should be placed for positive and a minus sign for negative numbers.

In [21]:
# Width 10; fill with #; align center. Filling starts from the right.
print(f'{-67.9:*^10}') # the simplest example
print(f"{-67.9:^10}")  # Default fill char is 'space'.
print(f"{67.9:#^ 10}")  # the simplest example with sign. Note the leading space for the positive float.

print(f'Simple string {"Guido":->30}')  # fill -; align right; width 30
print(f"Number with sign {-89.78:_=-30}")  # puts - only for negative number
print(f"Bools are supported {False:~^+30}")  # bool is integer 0-false or 1-true

**-67.9***
  -67.9   
## 67.9###
Simple string -------------------------Guido
Number with sign -________________________89.78
Bools are supported ~~~~~~~~~~~~~~+0~~~~~~~~~~~~~~


### Nested Fields
1. Nested fields allow for providing dynamic format specifications. 
2. We can dynamically compute the format specification at runtime. 
3. Let's say that the fill char needs to come from a config file; therefore, we are unaware of it at compile time. 

In [22]:
fillChar = '{' # as read from config file
print(f'This number is left padded with dynamic fill char: {100:{fillChar}>30}')

fillSpec = '->30' # fill, alignment and width all from config file
print(f'All from config {100:{fillSpec}}')

This number is left padded with dynamic fill char: {{{{{{{{{{{{{{{{{{{{{{{{{{{100
All from config ---------------------------100


### Grouping Options
1. This is used to specify thousands separator
2. In the example below, the 'fill-align-width' sequence is not detected by Python
3. The sign char is also not detected. 
4. '#' 'z' '0' are also not there.
5. Since ',' and '_' are the only allowed separators and Python detects it. So thats what will be used.

In [26]:
value = 1000000
print(f"{value:_}")  # Comma cannot 


1_000_000


### The 0 (Zero) Character
1. When no explicit alignment is specified, preceding with width with ‘0’ zero character enables sign-aware zero padding. 
2. Same as fill ‘0’ and alignment ‘=’
3. This is a very handy shorthand as can be seen below.

In [7]:
print(f"With zero: {42:05}")  # make the width 5, use 0 for padding, apply padding between sign and number.
# 42:05 specifies:
#  - 0 is used as a padding character.
#  - 5 is the total width of the formatted number, which means the output will be at least 5 characters wide.


print(f"Without zero: {42:0=-5}")
# {42:0=-5} is a less common and more explicit format.
# 0 as the first character indicates zero-padding.
# = specifies that the padding should come between the sign (if any) and the digits.
# -5 sets the width to 5, with padding applied between the negative sign (if applicable) and the number.


print(f'With zero: {-45.01:010}')
# -45.01:010 specifies:
# 0 for zero-padding.
# 10 is the total width of the formatted number.


print(f"With zero: {-45.01:0=-10}") 
# 0 for zero-padding.
# = to indicate where the padding should be applied, which is between the sign and the digits.
# -10 to specify the total width.

With zero: 00042
Without zero: 00042
With zero: -000045.01
With zero: -000045.01


### Presentation Types & the Character #
1. Python used “alternate form” and converts the provided number to it.
2. The alternate form is defined differently for different types. 
3. This option is only valid for integer, float and complex types. 
4. Integers: when binary [b], octal [o], hexadecimal [x], decimal [d] output is used, this option adds the respective prefix '0b', '0o', '0x', or '0X' to the output value. 

For float and complex the alternate form causes the result of the conversion to always contain a decimal-point character, even if no digits follow it. Normally, a decimal-point character appears in the result of these conversions only if a digit follows it. In addition, for 'g' and 'G' conversions, trailing zeros are not removed from the result.

In [40]:
# Decimal and hexadecimal
print(f"{255:#o}")   # Input is a decimal 255. Output: is in octal
print(f"{0xAB:#d}")  # Input is hex AB. Output is in decimal
print(f'{0b1101:#X}') # Input binary 1101. Output is in uppercase hex


0o377
171
0XD


### Presentation with floats
1. Formatting in floats

In [86]:
print(f'{121:#e}') # represent 100 with e
print(f'{121.23:#f}') # uses default precision of 6
print(f"{121.23:#.1f}")  # uses 1 char post decimal
print(f"{121.23:#.0f}")  # uses retains the decimal
print(f"{121.23:.0f}")  # Without # gets rid of the decimal.


#The g format specifier is for "general" format, which chooses either fixed-point or scientific notation based on the value.
# It uses condition -4 < exponent < p
print(f"{1232.2:#.3g}")  # Since the value 1232.2 has an exponent of 3, 
# the condition -4 < 3 < 3 is False. Therefore, it is formatted in scientific notation.
print(f"{1232.2:#.2e}")



print(f"{1232.2:#.4g}")  # g with p = 3; exp=3
# Since -4 < 3 < 4 is True, the number is formatted in fixed-point notation.
print(f"{1232.2:#.3e}") 

print(f'{.34:#.2%}') # Converts to percent with the number of decimal places.

1.210000e+02
121.230000
121.2
121.
121
1.23e+03
1.22e+01
1232.
1.232e+03
34.00%


### Template Strings
1. Provides a mechanism for string sunstitution.
2. Primary use case for template string for internationalization.

In [93]:
from string import Template

# Simple example.
s = Template('$name is wishing $guest a great day!')
print(s.substitute(name = 'Guido', guest='Robert'))

# Values can be porvided by dict also
d = dict(name='Guido', guest='Dict')
print(s.substitute(d))



Guido is wishing Robert a great day!
Guido is wishing Dict a great day!
