## builtins module: the str class

The str class is an abbreviation for a string of Unicode characters. The string class is an immutable ordered Collection of Unicode characters. Immutable means once it has been instantiated it cannot be later modified.

## Initialisation Signature

Inputting the class name ```str``` with open parenthesis in a new code cell will display the initialisation signature as a popup balloon. Note that Jupyter Notebook and JupyterLab may require the additional keypress shift ```⇧``` and tab ```↹``` in order to invoke the popup balloon:

Alternatively using ```?``` to query the ```str``` class will display the docstring in the ipython cell output:

In [1]:
str?

[1;31mInit signature:[0m [0mstr[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

The purpose of the initialisation signature is to provide the data required to initialise a new instance. This was covered in the previous notebook which discussed the ```object``` class and introduced the concept of object orientated programming (OOP).

To recap during instantiation the datamodel static method ```__new__``` which is the constructor is used to create the instance ```self``` and this constructor invokes the datamodel instance method ```__init__``` the initialisation signature to initialise this instance with instance data.

For the string class, the initialisation signature shows alternative ways of supplying instance data for a string.

If the first way is examined:

* In Python parenthesis ```( )``` are used to call a function and supply any necessary input arguments.
* The comma ```,``` is used as a delimiter to seperate out any input arguments.
* In Python ```self``` is used to denote *this instance*. In other words a string can be constructed from an existing string instance, this is a special case as a string is a fundamental datatype.
* Any input argument before a ```/``` must be provided positionally
* ```*args``` indicates a variable number of additional positional input arguments. These are typically not used for the string class.
* ```**kwargs``` indicates a variable number of additional named input arguments. These are typically not used for the string class.

The parameter ```self```, an existing string instance cannot be provided using a named input argument:

A string instance can be instantiated by supplying an existing string instance ```self``` to the string class:

In [2]:
str('hello')

'hello'

However because the string is a fundamental datatype it can also be instantiated shorthand using the following:

In [3]:
'hello'

'hello'

The characters in a string must be enclosed in quotations. These are used to distinguish a string of characters from an instance name.

Notice the difference in the syntax colour highlighting between the string instance above and the object below. If the content in the raw cell is run in the cell below a ```NameError``` will display.

<img src='./images/img_001.png' alt='img_001' width='100'/>

The Variables should look like the following:

<img src='./images/img_002.png' alt='img_002' width='500'/>

If the following code is input:

In [4]:
'hello'

'hello'

Notice the value ```'hello'``` is displayed in the cell output. When a value is displayed in the cell output, it means it is not stored elsewhere. This value is a Python string instance that has no reference (a reference is also known as a variable name or instance name) and therefore cannot be reselected. Conceptualise a reference as a label which is used to select the string instance.

A string instance can be assigned to a variable name during instantiation.

In [5]:
greeting = 'hello'

Notice now that the cell has no output. Instead it is stored under the instance name ```greeting```. Notice that the Variables will be updated:

<img src='./images/img_003.png' alt='img_003' width='500'/>

The value of the string instance can be referenced via the variable name or instance name:

In [6]:
greeting

'hello'

In the above case, the Python interpretter recognised the instance name or variable name. This is used as a reference to retrieve the value which is shown in the cell output above. If the instance is assigned to another instance name for example:

In [7]:
greeting2 = greeting

Then in the Variable Explorer, two instance names are shown with the same value:

<img src='./images/img_004.png' alt='img_004' width='500'/>

This instance has two instance names which act as alias to one another. If an instance name is conceptualised as a label, then this instance has two labels. If either instance name are used, the same value is retrieved:

In [8]:
greeting

'hello'

In [9]:
greeting2

'hello'

A check is made to see if the value retrieved from each instance name are equal to one another. Because they are the same instance, the boolean ```True``` is returned:

In [10]:
greeting == greeting2

True

Each object in Python has a unique identification. If the identification of the object corresponding to each instance name is checked:

In [11]:
id(greeting)

2176289384496

In [12]:
id(greeting2)

2176289384496

Then the id is the same, because both these instance names are references or labels to the same physical instance in memory. Therefore the following is ```True```:

In [13]:
greeting is greeting2

True

Which recall is shorthand for:

In [14]:
id(greeting) == id(greeting2)

True

The delete statement ```del``` can be used to delete an instance name. Conceptualise this as deleting the label, leaving the instance unchanged:

In [15]:
del greeting

Notice that the instance name ```greeting``` is deleted i.e. this label is removed. However the label ```greeting2``` is still present and the instance ```'hello'``` is unaltered.

<img src='./images/img_005.png' alt='img_005' width='500'/>

If ```del``` is used to also delete the instance name ```greeting2```:

In [16]:
del greeting2

<img src='./images/img_006.png' alt='img_006' width='500'/>

Then there are no instance names for the string instance ```'hello'```. When an instance has no instance name which recall are used to reference the instance, it cannot be referenced and is considered orphaned. Orphaned instances are automatically cleaned up by Pythons garbage collection. 

If a new instance is created:

In [17]:
greeting = 'Hello World'

Then the instance name displays on variables:

<img src='./images/img_007.png' alt='img_007' width='500'/>

If a reassignment is carried out:

In [18]:
greeting = 'hi'

The instance name displays on Variables but the instance it is associated with has changed. Recall that the instance name can be conceptualised as a label; what has happened is the label greeting has been peeled off from the old instance ```'Hello World'``` and placed on the new instance ```'hi'```. The old instance now has no label and therefore has no reference and is orphaned. Because it is orphaned it is cleaned up by Pythons garbage collection.

<img src='./images/img_008.png' alt='img_008' width='500'/>

Reassignment **moves the instance name** from the old instance to the new instance and does not change the value of an instance. Strings are **immutable** which means that once they are instantiated they can never be modified.

The initialisation signature of the string class shows instantiation using a named keyword input argument ```object``` which has a default value of an empty string:

This named keyword input argument can be assigned to a custom value:

In [19]:
str(object='hello')

'hello'

If not assigned, it takes on its default value which returns an empty string:

In [20]:
str()

''

The string can instantiated by supplying the named keyword input argument in the initialisation signature and can be assigned to a variable name. Notice the subtlety in spacing in the line below which follows Pythons PEP8 styling convention. 

* The assignment operator to the instance name ```greeting``` is subtly emphasised using the spacing. 
* Within the parenthesis of the function call, the assignment of a value to the named parameter ```object``` is not spaced. Spacing within a function call is instead typically used with the ```,``` separator to visually separate out input arguments from one another. 
  
Python code will work with or without the spacing but the PEP8 complaint styling is preferred:

In [21]:
greeting = str(object='hello')

The following are not PEP8 compliant:

greeting=str(object='hello')

greeting = str(object = 'hello')

If using the VSCode IDE, ```Ctrl```, ```⇧``` and ```p``` can be used to open up the command palette. The command format document can be used to format the code in the notebook to use spacing compliant with PEP8. This requires the autopep8 extension by Microsoft to be installed.

## String Quotations

In Python single and double quotations can be used to enclose a string, notice that however single quotations are preferred by default. This preference is seen in the cell output:

In [166]:
"Hello World!"

'Hello World!'

In [167]:
'Hello World!'

'Hello World!'

Having both quotation styles makes a string that encloses a string literal more readible:

In [168]:
"The string is 'Hello World!'"

"The string is 'Hello World!'"

The string enclosing the string are not printed:

In [169]:
print("The string is 'Hello World!'")

The string is 'Hello World!'


The Python community is somewhat...

## Identifiers

Two string instances can be instantiated:

In [22]:
greeting = 'hello'
farewell = 'bye'

<img src='./images/img_009.png' alt='img_009' width='500'/>

If one of the instance names is typed followed by a dot ```.``` then a list of identifiers that are available from the instance should display as a popup balloon. Some IDEs such as Jupyter Notebook and JupyterLab may require the keypress tab ```↹``` to invoke the popup balloon. Input the contents of the raw cell into the code cell below:

Notice that the same list of identifiers displays for the other instance:

These are the same identifiers and originate from the string class ```str```:

## Data Model Identifiers and the object Base Class

In the previous tutorial the ```object``` class was stated to be the base class of every other class. If the method resolution order of the ```str``` class is examined, it returns the following list:

In [23]:
str.mro()

[str, object]

And if ```help``` is used on each class details about the methods are shown:

In [24]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

In [25]:
help(object)

Help on class object in module builtins:

class object
 |  The base class of the class hierarchy.
 |  
 |  When called, it accepts no arguments and returns a new featureless
 |  instance that has no instance attributes and cannot be given any.
 |  
 |  Built-in subclasses:
 |      anext_awaitable
 |      async_generator
 |      async_generator_asend
 |      async_generator_athrow
 |      ... and 91 other subclasses
 |  
 |  Methods defined here:
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(self, /)
 |      Default dir() implementation.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Default object formatter.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getstate__(self, /)
 |      Helper for pickle.
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(sel

From the method resoultion order, the ```str``` class can access ```object``` based identifiers such as ```__dir__``` (*dunder dir*) that are defined only in the ```object``` class. Therefore the ```builtins``` function ```dir``` can be used on a ```str``` instance:

In [26]:
greeting = 'hello'

In [27]:
dir(greeting)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

Sometimes there is an implementation of a method in both ```str``` and ```object``` such as the formal string representation ```__repr__``` (*dunder repr*):

In [28]:
generic = object()

In [29]:
repr(generic)

'<object object at 0x000001FAB6161590>'

In [30]:
greeting = 'hello'

In [31]:
repr(greeting)

"'hello'"

In other words instead of using the default formal representation from the ```object``` class which specifies an ```object``` location in memory, the definition of ```__repr__``` in the child class ```str``` is used:

In [32]:
str.mro()

[str, object]

And printing the formal string representation shows how to preferentially instantiate the string:

In [33]:
print(repr(greeting))

'hello'


The string class has datamodel identifiers that do not appear in the ```object``` class. They are present in the ```operator``` module:

In [34]:
import operator

The ```help``` function can be used on ```operator``` and shows the operator used for each associated datamodel method:

In [35]:
help(operator)

Help on module operator:

NAME
    operator - Operator interface.

MODULE REFERENCE
    https://docs.python.org/3.11/library/operator.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module exports a set of functions implemented in C corresponding
    to the intrinsic operators of Python.  For example, operator.add(x, y)
    is equivalent to the expression x+y.  The function names are those
    used for special methods; variants without leading and trailing
    '__' are also provided for convenience.

CLASSES
    builtins.object
        attrgetter
        itemgetter
        methodcaller
    
    class attrgetter(builtins.object)
     |  attrgetter(attr, ...) --> attrgetter object
     |  

The ```str``` is a collection and uses a subselection of these operators. The following helper funciton can be imported from the custom module:

In [36]:
from helper_module import print_identifier_group

This can be used to seperate out the identifiers into standard datamodel attributes:

In [37]:
print_identifier_group(str, kind='datamodel_attribute')

['__doc__']


The standard ```object``` based datamodels:

In [38]:
print_identifier_group(str, kind='datamodel_method', second=object, show_only_intersection_identifiers=True)

['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']


A subset of ```operators``` that are used for a collection:

In [39]:
print_identifier_group(str, kind='datamodel_method', second=operator, show_only_intersection_identifiers=True)

['__add__', '__contains__', '__eq__', '__ge__', '__getitem__', '__gt__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__']


No additional attributes:

In [40]:
print_identifier_group(str, kind='attribute')

[]


The following ```str``` methods:

In [41]:
print_identifier_group(str, kind='method')

['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


When ```help``` is used the identifiers are grouped into:

* (Instance) Methods
* Class Methods\*
* Static Methods
* Data Descriptors\*

Methods are functions and are called. Data Descriptors are other Python instances and are not callable. To call a funciton and provide input arguments parenthesis are used.

Most methods are instance methods and require an instance which has instance data to work on. Other input arguments may be required.

Class methods are instead bound to a class and usually used for the purpose of alternative constructors.

Static methods are regular functions which are neither bound to the instance or class but are found within the namespace of the class for convenience.

\* Not present in the string class.

### Instance Methods

The docstring of the ```capitalize``` can be examined from an instance:

In [42]:
greeting.capitalize?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      builtin_function_or_method

And from the class itself:

In [43]:
str.capitalize?

[1;31mSignature:[0m [0mstr[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      method_descriptor

N.B. capitali**z**e uses American English and not British English capitali**s**e. 

Notice the difference between:

When the method ```capitalize``` is called from an instance, it has access to the instance data. This method requires no additional data to operate which is why the parenthesis are empty. 

In contrast when the method is called from the class itself, it has no instance data to work from an instance must be provided. In Python ```self``` means *this instance*. Because this is before ```/``` it must be provided positionally.

A string is immutable meaning that it cannot be modified after it is instantiated. Therefore string methods typically return a new instance usually a new instance of the string class. The docstring contains the word ```return``` giving details about the return value. 

If the method is called from an instance:

In [44]:
greeting.capitalize()

'Hello'

The new capitalised string instance displays in the cell output. This is a new instance and does not modify the original which can be seen to be unchanged in Variables.

<img src='./images/img_009.png' alt='img_009' width='500'/>

Since this new instance is not assigned an instance name it has no references and is automatically removed by Pythons Garbage collection. It can be assigned an instance name using:

In [45]:
capgreeting = greeting.capitalize()

Notice no cell output as the instance is assigned to the instance name. This can be seen in Variables:

<img src='./images/img_010.png' alt='img_010' width='500'/>

If the instance method is invoked from a class, the instance ```self``` must be provided positionally as the first input argument:

In [46]:
str.capitalize(farewell)

'Bye'

Failure to supply an instance will result in a ```TypeError```.

## builtins Module

Recall that ```__builtins__``` (*dunder builtins*), is automatically imported allowing access to ```builtins``` identifiers:

In [47]:
__builtins__

<module 'builtins' (built-in)>

Identifiers can be viewed from the module by inputting the module name followed by a dot ```.```

And the str class can be accessed using:

In [48]:
__builtins__.str

str

Which is the same as:

In [49]:
str

str

And:

In [50]:
__builtins__.str is str

True

Recall that ```builtins``` has the following classes and functions which are used in preference to the datamodel identifiers:

In [51]:
print_identifier_group(__builtins__, kind='class')

['bool', 'bytearray', 'bytes', 'classmethod', 'complex', 'dict', 'enumerate', 'filter', 'float', 'frozenset', 'int', 'list', 'map', 'memoryview', 'object', 'property', 'range', 'reversed', 'set', 'slice', 'staticmethod', 'str', 'super', 'tuple', 'type', 'zip']


In [52]:
print_identifier_group(__builtins__, kind='method')

['abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'breakpoint', 'callable', 'chr', 'compile', 'copyright', 'credits', 'delattr', 'dir', 'display', 'divmod', 'eval', 'exec', 'execfile', 'format', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'locals', 'max', 'min', 'next', 'oct', 'open', 'ord', 'pow', 'print', 'repr', 'round', 'runfile', 'setattr', 'sorted', 'sum', 'vars']


And the datamodel methods map to collection based operators:

In [53]:
print_identifier_group(str, kind='datamodel_method', second=operator, show_only_intersection_identifiers=True)

['__add__', '__contains__', '__eq__', '__ge__', '__getitem__', '__gt__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__']


## Case Methods

The ```str``` case method ```capitalize``` has already been examined:

In [54]:
greeting.capitalize?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      builtin_function_or_method

In [55]:
greeting.capitalize()

'Hello'

There are associated identifiers such as:

* ```lower```
* ```casefold```
* ```upper```
* ```title```
* ```swapcase```

The docstrings of these can all be examined:

In [56]:
greeting.lower?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mlower[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a copy of the string converted to lowercase.
[1;31mType:[0m      builtin_function_or_method

In [57]:
greeting.casefold?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mcasefold[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a version of the string suitable for caseless comparisons.
[1;31mType:[0m      builtin_function_or_method

In [58]:
greeting.upper?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mupper[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a copy of the string converted to uppercase.
[1;31mType:[0m      builtin_function_or_method

In [59]:
greeting.title?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mtitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining
cased characters have lower case.
[1;31mType:[0m      builtin_function_or_method

In [60]:
greeting.swapcase?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mswapcase[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Convert uppercase characters to lowercase and lowercase characters to uppercase.
[1;31mType:[0m      builtin_function_or_method

In [61]:
greeting.title?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mtitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining
cased characters have lower case.
[1;31mType:[0m      builtin_function_or_method

All of these case identifiers only require instance data and return a new string:

In [62]:
'hEllo wOrld'.lower()

'hello world'

In [63]:
'hEllo wOrld'.casefold()

'hello world'

In [64]:
'hEllo wOrld'.upper()

'HELLO WORLD'

In [65]:
'hEllo wOrld'.swapcase()

'HeLLO WoRLD'

In [66]:
'hEllo wOrld'.title()

'Hello World'

casefold is similar to lower but has more support for non-English characters, as seen with the additional German characters and the Greek characters where some of the lower case characters have variants:

In [67]:
'ÄäÜüÖöẞß'.lower()

'ääüüöößß'

In [68]:
'ÄäÜüÖöẞß'.casefold()

'ääüüöössss'

In [69]:
'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'.lower()

'ααββγγδδεεζζηηθθιικκλλμμννξξοοππρρσσςττυυφφχχψψωω'

In [70]:
'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'.casefold()

'ααββγγδδεεζζηηθθιικκλλμμννξξοοππρρσσσττυυφφχχψψωω'

## Boolean Identifiers

A number of identifiers are used to examine a specific property of a string and return a boolean of ```True``` if it has that property and ```False``` otherwise: 

In [71]:
greeting.isupper?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misupper[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and
there is at least one cased character in the string.
[1;31mType:[0m      builtin_function_or_method

In [72]:
greeting.islower?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mislower[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and
there is at least one cased character in the string.
[1;31mType:[0m      builtin_function_or_method

In [73]:
greeting.istitle?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mistitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only
follow uncased characters and lowercase characters only cased ones.
[1;31mType:[0m      builtin_function_or_method

For example:

In [74]:
'HELLO'.isupper()


True

In [75]:
'Hello'.isupper()


False

In [76]:
'hello'.islower()

True

In [77]:
'Hello'.islower()

False

In [78]:
'Hello'.istitle()

True

## Valid Identifier Names

The method isidentifier will check to see if a possible string is valid for an identifier name. This can be useful to check before assignment of an instance to an object name:

In [79]:
greeting.isidentifier?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misidentifier[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
such as "def" or "class".
[1;31mType:[0m      builtin_function_or_method

This means that the following identifier name with lower case letters is acceptable:

In [80]:
'hello'.isidentifier()

True

A space is not acceptable and attempted use of an identifier will give a ```SyntaxError```:

In [81]:
'hello world'.isidentifier()

False

An underscore is acceptable and identifier names generally use ```snake_case```:

In [82]:
'hello_world'.isidentifier()

True

Numbers are acceptable:

In [83]:
'hello_world2'.isidentifier()

True

However an identifier cannot begin with a number and the attempted use of an identifier will give a ```SyntaxError```:

In [84]:
'2hello_world'.isidentifier()

False

Other special characters are not acceptable and attempted use of an identifier will give a syntax error:

In [85]:
'hello-world2'.isidentifier()

False

Upper case is acceptable but generally ```PascalCase``` is reserved for a class name:

In [86]:
'PascalCase'.isidentifier()

True

All capitals is also acceptable but this is generally reserved for a constant:

In [87]:
'ALL_CAPS'.isidentifier()

True

An instance name shouldn't match any of the identifiers in ```__builtins__``` otherwise it will override the builtin (until the kernel is restarted) which will lead to confusion when the builtin is attempted to be used. There are some identifiers which are reserved, these can be seen by importing the keyword module:

In [88]:
import keyword

The identifiers from the module can be viewed by inputting:

In [89]:
for kw in keyword.kwlist:
    print(kw, end=' ')

False None True and as assert async await break class continue def del elif else except finally for from global if import in is lambda nonlocal not or pass raise return try while with yield 

In [90]:
for skw in keyword.softkwlist:
    print(skw, end=' ')

_ case match 

If a keyword is reassigned a ```SyntaxError``` will display:

Care in particular should also be made not to instantiate a ```str``` instance to the ```str``` class as it will override the builtin. This is a mistake often made by begineers. For example the following is very bad practice:

In [91]:
str = 'hello'

In [92]:
str

'hello'

To rectify this, the import can be made from the ```builtins``` module:

In [93]:
from builtins import str

In [94]:
str()

''

## The American Standard Code for Information Interchange (ASCII)

Early computers were based on a typewritter that essentially prints English characters onto a sheet of paper. In order to achieve such a task a number of non-printable commands such as the carriage return (moving the carriage back to the left) and the form feed (moving the peace of paper up by the width of a line) are required as well as the printable characters such as the English letters, numbers, and whitespace.

<img src='./images/img_011.png' alt='img_011' width='800'/>

Each command has to be mapped physically into the computers memory. Fundamentally the computer can only store data in the form of a bit which is essentially a digital switch.

A single switch has the possible values ```0```, ```1``` which is ```2 ** 1``` combinations which is a total of ```2```. Note the combination ```0``` is included so ```0:2``` is inclusive of the lower bount ```0``` and exclusive of the upper bound ```2```.

<img src='./images/img_012.png' alt='img_012' width='400'/>

More typically ```8``` of these switches are combined into a single logical unit called a byte. A byte has ```2 ** 8``` combinations which is a total of ```256```. Note the combination ```0``` is included so ```0:256``` is inclusive of the lower bount ```0``` and exclusive of the upper bound ```256```.

<img src='./images/img_013.png' alt='img_013' width='400'/>

One of the most popular set of commands was developed in America and is known as the American Standard for Information Interchange (ASCII). The first ```33``` combinations correspond to non-printable characters such as the carriage return and form feed as previously discussed in addition to a number of additional hardware related commands. 

Each bit can be ```0``` or ```1``` and the byte sequence corresponds to the physical position of the ```8``` switches. As binary is not human readible the hexadecimal system is also used which has ```16``` characters ```0```, ```1```, ```2```, ```3```, ```4```, ```5```, ```6```, ```7```, ```8```, ```9```, ```a```, ```b```, ```c```, ```d```, ```e```, ```f```. ```2 ** 4``` is ```16``` combinations and therefore each half of the byte is represented by its own hexadecimal character. These numbering systems are shown alongside the number in decimal.


|byte|hex|num|command|
|---|---|---|---|
|00000000|00|000|null|
|00000001|01|001|start of heading|
|00000010|02|002|start of text|
|00000011|03|003|end of text|
|00000100|04|004|end of transmission|
|00000101|05|005|enquiry|
|00000110|06|006|acknowledge|
|00000111|07|007|bell|
|00001000|08|008|**backspace**|
|00001001|09|009|**horizontal tab**|
|00001010|0a|010|**new line**|
|00001011|0b|011|**vertical tab**|
|00001100|0c|012|**form feed**|
|00001101|0d|013|**carriage return**|
|00001110|0e|014|shift out|
|00001111|0f|015|shift in|
|00010000|10|016|data link escape|
|00010001|11|017|device control 1|
|00010010|12|018|device control 2|
|00010011|13|019|device control 3|
|00010100|14|020|device control 4|
|00010101|15|021|negative acknowledge|
|00010110|16|022|synchronous idle|
|00010111|17|023|end of transmission block|
|00011000|18|024|cancel|
|00011001|19|025|end of medium|
|00011010|1a|026|substitute|
|00011011|1b|027|**escape**|
|00011100|1c|028|file separator|
|00011101|1d|029|group separator|
|00011110|1e|030|record separator|
|00011111|1f|031|unit seperator|
|00100000|20|032|**space**|

The remaining commands spanning up to half a byte contained the characters most commonly used in the English language.

|byte|hex|num|character|
|---|---|---|---|
|00100001|21|033|!|
|00100010|22|034|"|
|00100011|23|035|#|
|00100100|24|036|$|
|00100101|25|037|%|
|00100110|26|038|&|
|00100111|27|039|'|
|00101000|28|040|(|
|00101001|29|041|)|
|00101010|2a|042|*|
|00101011|2b|043|+|
|00101100|2c|044|,|
|00101101|2d|045|-|
|00101110|2e|046|.|
|00101111|2f|047|/|
|00110000|30|048|0|
|00110001|31|049|1|
|00110010|32|050|2|
|00110011|33|051|3|
|00110100|34|052|4|
|00110101|35|053|5|
|00110110|36|054|6|
|00110111|37|055|7|
|00111000|38|056|8|
|00111001|39|057|9|
|00111010|3a|058|:|
|00111011|3b|059|;|
|00111100|3c|060|<|
|00111101|3d|061|=|
|00111110|3e|062|>|
|00111111|3f|063|?|
|01000000|40|064|@|
|01000001|41|065|A|
|01000010|42|066|B|
|01000011|43|067|C|
|01000100|44|068|D|
|01000101|45|069|E|
|01000110|46|070|F|
|01000111|47|071|G|
|01001000|48|072|H|
|01001001|49|073|I|
|01001010|4a|074|J|
|01001011|4b|075|K|
|01001100|4c|076|L|
|01001101|4d|077|M|
|01001110|4e|078|N|
|01001111|4f|079|O|
|01010000|50|080|P|
|01010001|51|081|Q|
|01010010|52|082|R|
|01010011|53|083|S|
|01010100|54|084|T|
|01010101|55|085|U|
|01010110|56|086|V|
|01010111|57|087|W|
|01011000|58|088|X|
|01011001|59|089|Y|
|01011010|5a|090|Z|
|01011011|5b|091|[|
|01011100|5c|092|\|
|01011101|5d|093|]|
|01011110|5e|094|^|
|01011111|5f|095|_|
|01100000|60|096|`|
|01100001|61|097|a|
|01100010|62|098|b|
|01100011|63|099|c|
|01100100|64|100|d|
|01100101|65|101|e|
|01100110|66|102|f|
|01100111|67|103|g|
|01101000|68|104|h|
|01101001|69|105|i|
|01101010|6a|106|j|
|01101011|6b|107|k|
|01101100|6c|108|l|
|01101101|6d|109|m|
|01101110|6e|110|n|
|01101111|6f|111|o|
|01110000|70|112|p|
|01110001|71|113|q|
|01110010|72|114|r|
|01110011|73|115|s|
|01110100|74|116|t|
|01110101|75|117|u|
|01110110|76|118|v|
|01110111|77|119|w|
|01111000|78|120|x|
|01111001|79|121|y|
|01111010|7a|122|z|
|01111011|7b|123|{|
|01111100|7c|124|\||
|01111101|7d|125|}|
|01111110|7e|126|~|
|01111111|7f|127|DEL|


As each character maps to a numeric bytes sequence it is ordinal. The builtins ordinal function ```ord``` will return the ordinal numeric value of the number in decimal:

In [95]:
ord?

[1;31mSignature:[0m [0mord[0m[1;33m([0m[0mc[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the Unicode code point for a one-character string.
[1;31mType:[0m      builtin_function_or_method

For example the ordinal value of the string ```'3'``` can be checked:

In [96]:
ord('3')

51

Notice the difference in syntax highlighting between the string of the number ```'3'``` and the number ```51```. This number can be converted into a binary string or hex string using the builtins ```bin``` and ```hex``` functions respectively:

In [97]:
bin?

[1;31mSignature:[0m [0mbin[0m[1;33m([0m[0mnumber[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the binary representation of an integer.

>>> bin(2796202)
'0b1010101010101010101010'
[1;31mType:[0m      builtin_function_or_method

In [98]:
hex?

[1;31mSignature:[0m [0mhex[0m[1;33m([0m[0mnumber[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the hexadecimal representation of an integer.

>>> hex(12648430)
'0xc0ffee'
[1;31mType:[0m      builtin_function_or_method

For example:

In [99]:
bin(ord('3'))

'0b110011'

This can be conceptualised as the following with the trailing zeros:

In [100]:
'0b' + bin(ord('3')).lstrip('0b').zfill(8)

'0b00110011'

Note the prefix 0b indicates a binary number and does not display the two leading zeros:

In [101]:
hex(ord('3'))

'0x33'

Note the prefix 0x indicates a hexadecimal number:

In [102]:
bin(16)

'0b10000'

The string module contains a number of useful strings which group characters. It can be imported using:

In [103]:
import string

The identifiers can be viewed by inputting:

They identifiers from this module can also be viewed in a cell output using the custom function ```print_identifier_group```. This module mainly has attributes. The standard datamodel attributes for a module are present:

In [104]:
print_identifier_group(string, kind='datamodel_attribute')

['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']


In [105]:
print_identifier_group(string, kind='datamodel_method')

[]


And the main components of the module are attributes, in this case mainly ```str``` instances:

In [106]:
print_identifier_group(string, kind='attribute')

['_re', '_sentinel_dict', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']


In [107]:
print_identifier_group(string, kind='method')

['_ChainMap', 'capwords']


```ascii_letters``` is a ```str``` instance containing all English letters:

In [108]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

This can be split into lowercase and uppercase using the ```str``` instances ```ascii_lowercase``` and ```ascii_uppercase``` respectively: 

In [109]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [110]:
string.ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

```digits``` is a ```str``` instance that contains the ```10``` digits used in the decimal system:

In [111]:
string.digits

'0123456789'

```hexdigits``` is a ```str``` instance that contains the ```16``` characters that can be used for hexadecimal. Note ```a``` and ```A``` are an alias of one another:

In [112]:
string.hexdigits

'0123456789abcdefABCDEF'

```printable``` is a ```str``` instance that contains the printable characters:

In [113]:
string.printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

```punctuation``` is a ```str``` instance that contains all the punctuation characters:

In [114]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

```whitespace``` is a ```str``` instance containing the whitespace characters:

In [115]:
string.whitespace

' \t\n\r\x0b\x0c'

With the exception to the space, these are shown using escape sequences which will be further explored in a moment. 

Now that the ASCII grouping and string groupings seen within the ```string``` module have been seen, the additional boolean identifiers can be examined. These boolean identifiers all act upon instance data and return a ```bool```. Their docstrings are:

In [116]:
greeting.isprintable?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misprintable[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in
repr() or if it is empty.
[1;31mType:[0m      builtin_function_or_method

In [117]:
greeting.isascii?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misascii[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.
Empty string is ASCII too.
[1;31mType:[0m      builtin_function_or_method

In [118]:
greeting.isalnum?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misalnum[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and
there is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [119]:
greeting.isalpha?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misalpha[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [120]:
greeting.isspace?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misspace[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [121]:
greeting.isdecimal?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misdecimal[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and
there is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [122]:
greeting.isdigit?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misdigit[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [123]:
greeting.isnumeric?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0misnumeric[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at
least one character in the string.
[1;31mType:[0m      builtin_function_or_method

For example:

In [124]:
'hello Γειά σου 123'.isprintable()

True

In [125]:
'hello Γειά σου 123'.isascii()

False

In [126]:
'hello 123 !'.isascii()

True

In [127]:
'hello 123 !'.isalnum()

False

In [128]:
'hello123'.isalnum()

True

In [129]:
'hello123'.isalpha()

False

In [130]:
'hello'.isalpha()

True

In [131]:
'hello'.isspace()

False

The boolean numeric string identifiers have subtle differences. These can be seen by examining the response of the identifier for the following number groupings:

In [132]:
numeric_groups = {'ascii': '0123456789', 
                  'font1': '𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿', 
                  'font2': '𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵', 
                  'font3': '𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡', 
                  'subscript': '₀₁₂₃₄₅₆₇₈₉',
                  'superscript': '⁰¹²³⁴⁵⁶⁷⁸⁹',
                  'circled1': '➀➁➂➃➄➅➆➇➈',
                  'circled2': '➉',
                  'fractions': '½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉', 
                  'asciihex': '0123456789abcdef', }

In [133]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isdecimal())
    

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ False
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ False
circled1 ➀➁➂➃➄➅➆➇➈ False
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False
asciihex 0123456789abcdef False


In [134]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isdigit())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False
asciihex 0123456789abcdef False


In [135]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isnumeric())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ True
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ True
asciihex 0123456789abcdef False


In [136]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isalnum())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ True
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ True
asciihex 0123456789abcdef True


The boolean identifiers are often used for checks. An example of such a check was demonstrated when grouping identifiers from builtins when using a for loop.

## Escape Characters

The ```\``` is a special symbol used to insert an escape character. The most commonly used escape characters have the form:

In [138]:
print('|  |') # no escape character

|  |


In [139]:
print('| \t |') # the tab

| 	 |


In [140]:
print('| \n |') # the new line

| 
 |


In [141]:
print('| \\ |') # the leftslash itself

| \ |


In [142]:
print('| \' |') # the single quotation

| ' |


In [143]:
print('| \" |') # the double quotation

| " |


An ASCII character or character spanning over the range of a single byte can be inserted using an escape character 2 hexadecimal digits:

In [144]:
hex(ord('!')) 

'0x21'

In [146]:
'\x21' # a byte (2 hexadecimal digits)

'!'

In [149]:
print('| \x09 |') # the tab as a byte (2 hexadecimal digits)

| 	 |


Note the two hexadecimal digits have to be provided. The following would give a ```SyntaxError```:

The most commonly used Unicode characters span over 2 bytes and can therefore be inserted using 4 hexadecimal digits. For example

In [150]:
hex(ord('α'))

'0x3b1'

In [None]:
'\u03b1' # a Unicode character (4 hexadecimal digits, 2 hexadecimal digits × 2 bytes)

Note the four hexadecimal digits have to be provided. The following would give a ```SyntaxError```:

Note that the hex form is normally used for a byte that is not printable. If the 6 whitespace characters are examined in more detail this can be seen:

In [151]:
string.whitespace

' \t\n\r\x0b\x0c'

|name||byte|
|---|---|---|
|space|' '|'\\x20'|
|tab|'\\t'|'\\x09'|
|new line|'\\n'|'\\x0a'|
|carriage return|'\\r'|'\\x0d'|
|vertical tab||'\\x0b'|
|form feed||'\\x0c'|

In [152]:
' ' == '\x20'

True

In [153]:
'\t' == '\x09'

True

In [154]:
'\n' == '\x0a'

True

In [155]:
'\r' == '\x0d'

True

It is not common to do so, however each character in a string can also be inserted as an escape character:

In [156]:
'\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21'

'hello world!'

The ```unicodedata``` module can be imported:

In [157]:
import unicodedata

Its identifiers can be viewed using:

And its version can be checked using:

In [158]:
unicodedata.unidata_version

'14.0.0'

In [159]:
'\U0000303a'

'〺'

A translation table can be created for use with the instance method ```translate```:

In [160]:
greeting.translate?

[1;31mSignature:[0m [0mgreeting[0m[1;33m.[0m[0mtranslate[0m[1;33m([0m[0mtable[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Replace each character in the string using the given translation table.

  table
    Translation table, which must be a mapping of Unicode ordinals to
    Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a
dictionary or list.  If this operation raises LookupError, the character is
left untouched.  Characters mapped to None are deleted.
[1;31mType:[0m      builtin_function_or_method

```maketrans``` is a static method which is essentially a function thats neither bound to the instance or the class. This function merely exists in the namespace of the class (conceptualise the class as a Python module) as this is the most logical place to find it:

In [161]:
str.maketrans?

[1;31mDocstring:[0m
Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode
ordinals (integers) or characters to Unicode ordinals, strings or None.
Character keys will be then converted to ordinals.
If there are two arguments, they must be strings of equal length, and
in the resulting dictionary, each character in x will be mapped to the
character at the same position in y. If there is a third argument, it
must be a string, whose characters will be mapped to None in the result.
[1;31mType:[0m      builtin_function_or_method

In [162]:
greektolatin = str.maketrans('αβγδε', 'abcde')
greektolatin

{945: 97, 946: 98, 947: 99, 948: 100, 949: 101}

In [163]:
hex(945)

'0x3b1'

In [164]:
hex(97)

'0x61'

This translation table can be used on the example string to replace the Greek letters (keys) with the latin letters (values):

In [165]:
'αββγγγδδδδεεεεε'.translate(greektolatin)

'abbcccddddeeeee'

## Formal and Informal Strings

The \\ is a special character used to insert an escape character. To insert an \\ itself, two have to be used the former being the \\ specifying to insert an escape character and the second \\ specifying that the escape character is \\. The \\ is used for a file path within Windows:

In [None]:
file_path = 'C:\\Users\\Philip'

Compare the difference to the cell output and the output in a cell from a print statement:

In [None]:
file_path

In [None]:
print(file_path)

Notice that the former uses the formal representation and the output displays the file path that matches what needs to be input to instantiate the string. The latter displays the informal representation and displays the file path when the escape characters are inserted. Under the hood these are controlled by the data model identifiers \_\_repr\_\_ and \_\_str\_\_ respectively:

In [None]:
? file_path.__repr__

In [None]:
? file_path.__str__

These data model identifiers are based on the object design pattern and from the docstring it is more common to use the builtins function repr and class str respectively:

In [None]:
? repr

In [None]:
? str

The str class will cast an existing object into a str. When the object is already a str it will be unchanged:

In [None]:
file_path

In [None]:
str(file_path)

The repr function on the other hand will return a str instance that contains the quotations that enclose the string and shows insertion of the escape characters:

In [None]:
repr(file_path)

Notice that when this is printed:

In [None]:
print(repr(file_path))

The cell output matches the formal form:

In [None]:
file_path

And the escape characters are processed when the string is printed:

In [None]:
print(str(file_path))

All the other builtins classes in Python have a string representation. When the formal representations do not include additional quotations or escape characters the formal and informal representations often return identical strings for example:

In [None]:
4

In [None]:
repr(4)

In [None]:
str(4)

## Raw Strings

In Windows the file path is of the form:

In [None]:
# file_path = 'C:\Users\Philip'

<span style="color:red">SyntaxError:</span> (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

For the file path to be recognised as a Python string each \\ has to be converted into a \\\\ as \\ is used to insert an escape character and is an escape character itself:

In [None]:
file_path = 'C:\\Users\\Philip'

For convenience there is also a raw string which uses the prefix r. In a raw string the \\ is not used to insert an escape character and instead the \ is recognised as being part of the string:

In [None]:
raw_file_path = r'C:\Users\Philip'

In [None]:
raw_file_path

## The print function

The print function has previously been used however now that escape character are understood its docstirng can be examined:

In [None]:
? print

The *args indicates that a variable number of positional input arguments are used. sep and end are named input arguments which have a default value of a space and a new line respectively:

file and flush are for advanced purposes when the print stream is to be directed for example to a file instead of a cell output.

In [None]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

In [None]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', sep='')

In [None]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

In [None]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', end='')
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

## Formatted Strings

Supposing the string body has the form:

In [None]:
body = 'The string to 0 is 1 2!'

And there are three variables:

In [None]:
var0 = 'print'
var1 = 'hello'
var2 = 'world'

The objective of a formatted string is to insert these variables into the string (in this case without additional format specifications) so the following string is returned:

In [None]:
'The string to print is hello world!'

If the docstring of the string method format is examined:

In [None]:
? body.format

Then it can be seen that substitutions are identifier by braces so body should be:

In [None]:
body = 'The string to {0} is {1} {2}!'

Notice the changes in the syntax highlighting for the placeholders. *args represents a variable number of positional input arguments. To insert variables into strings the number of positional input arguments should match the number of placeholders in the string body:

In [None]:
body.format(var0, var1, var2)

The string can alternatively be setup to contain named variables:

In [None]:
body = 'The string to {var0_} is {var1_} {var2_}!'

**kwargs represents a variable number of named keyword input arguments which should match the named keyword input arguments in the string body:

In [None]:
body.format(var0_=var0, var1_=var1, var2_=var2)

If the two lines are combined into one:

In [None]:
'The string to {var0_} is {var1_} {var2_}!'.format(var0_=var0, var1_=var1, var2_=var2)

More commonly the placeholders are given the same name as the variables to be inserted:

In [None]:
'The string to {var0} is {var1} {var2}!'.format(var0=var0, var1=var1, var2=var2)

Notice how each instance name is used 3 times which is pretty cumbersome. A shorthand way of writing the expression above is to use the prefix f:

In [None]:
f'The string to {var0} is {var1} {var2}!'

Classes from builtins have the \_\_format\_\_ data model identifier:

In [None]:
? object.__format__

When this data model method is configured in a class it gives the ability to use format specification options in a formatted string. This can be demonstrated with other str instances and numeric (int and float) instances.

The format specifier for a string (s) can be used:

In [None]:
f'The string to {var0:s} is {var1} {var2}!'

In [None]:
f'The string to {var0:10s} is {var1} {var2}!'

In [None]:
f'The string to {var0:010s} is {var1:s} {var2:s}!'

If prefixed with 0 then trailing spaces will be displayed using 0.

Numbers are commonly inserted:

In [None]:
num1 = 1
num2 = 0.0000123456789
num3 = 12.3456789

In [None]:
f'The numbers are {num1}, {num2} and {num3}.' 

The format specifier for an integer decimal (d) can be used:

In [None]:
f'The numbers are {num1:d}, {num2} and {num3}.' 

In [None]:
f'The numbers are {num1:5d}, {num2} and {num3}.' 

In [None]:
f'The numbers are {num1:05d}, {num2} and {num3}.' 

In [None]:
f'The numbers are {num1: 05d}, {num2} and {num3}.' 

Again the number of characters in the string the number should occupy can be specified. Unlike the string these are leading opposed to trailing. if prefixed with a 0, then these will be shown as 0. Note if a space is added, one of these will be the space. The space changes the behaviour of the format specifier.

In [None]:
f'The numbers are {num1}, {num2:g} and {num3:g}.' 

In [None]:
f'The numbers are {num1}, {num2:e} and {num3:e}.' 

In [None]:
f'The numbers are {num1}, {num2:0.3e} and {num3:0.3e}.' 

In [None]:
f'The numbers are {num1}, {num2:f} and {num3:f}.' 

In [None]:
f'The numbers are {num1}, {num2:0.3f} and {num3:0.3f}.' 

floating point numbers can use the general (g), exponental (e) and fixed (f) format specifiers. The prefix 0.3 specifies rounding to 3 digits past the decimal point.


The keys of the mapping such as a dictionary can be included in the placeholder, alongside an optional format specifier:

In [None]:
numbers = {'num1': 1, 'num2': 0.0000123456789, 'num3': 12.3456789}

In [None]:
body = 'The numbers are {num1:d}, {num2:.3e} and {num3:.3e}.'

The colon is used in both the dictionary and format specifier. Spacing to the right of the colon is often present in a dictionary and does not change the value. Spacing does change the format specifier and should only be added if a space is to be included in the format specification.

The format map can be used to insert values into a body provided the placeholders match the keys in the dictionary:

In [None]:
? body.format_map

In [None]:
body.format_map(numbers)

The older style of formatted strings uses the data model identifier \_\_mod\_\_ which controls the behaviour of the operator % and in the case of older style string formatting also uses the % as a placeholder opposed to the braces {}:

In [None]:
body = 'The numbers are %d, %0.3f and %0.3g.' 
nums = (1, 0.0000123456789, 12.3456789)

In [None]:
? body.__mod__

In [None]:
body % nums

## Single VS Double Quotations

In Python a string can be enclosed in single or double quotations:

In [None]:
"hello world!"

Notice that the string returned in the cell output is automatically enclosed in single quotations. This is because the Python interpreter has a **preference for single quotations**. The can also be seen when the builtins character function chr is used:

In [None]:
chr(97)

The main purpose of double quotations is to enclose string literals within a string. For example:

In [None]:
"var = 'hello world!'"

In [None]:
'var = \'hello world!\''

Note some programming languages use single quotations for a character and double quotations for a string of characters; in Python a character is a single element string however some programmers who work with multiple languages adopt these habits when using Python.

## Multiline Strings

A string can be displayed over multiple lines using triple quotations:

In [None]:
multiline = '''the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog'''

In [None]:
multiline

In [None]:
print(multiline)

Note that any spacing added will be incorporated into the multiline string:

In [None]:
multiline = '''
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            '''

In [None]:
multiline

In [None]:
print(multiline)

Multiline strings are most commonly used for docstrings. Because docstrings are likely to include string literals, triple double quotations are normally preferenced:

In [None]:
? print

In [None]:
doc = """Prints the values

sep
  string inserted between values, default a space ' '.
end
  string appended after the last value, default a newline '\\n'."""

In [None]:
print(doc)

## Data Model Attributes

There are a number of data model identifiers that are attributes. The attribute \_\_class\_\_ returns the class:

In [None]:
greeting.__class__

This is normally determined using the builtins class type:

In [None]:
? type

In [None]:
type(greeting)

The \_\_doc\_\_ returns the docstring as a string instance:

In [None]:
greeting.__doc__

This is normally looked up along some other information using the ? operator:

In [None]:
? greeting

The \_\_sizeof\_\_ data model identifier is used to return the size of a string in bytes:

In [None]:
? greeting.__sizeof__

In [None]:
greeting.__sizeof__()

To get the size is more common to use the getsizeof function from the system module sys which under the hood uses this data model method:

In [None]:
import sys

In [None]:
? sys.getsizeof

In [None]:
sys.getsizeof(greeting)

## Immutable Ordered Collection ABC Design Pattern

Earlier it was seen that the str was a subclass of the object and therefore followed the design pattern of an object. The design pattern of the str class actually has a number of abstract base classes. 

An abstract base class is a conceptual class, that isn't instantiated directly but used as a design pattern for numerous Python classes so their behaviour is consistent.

The str has a Container abstract base class, which means it has the data model method \_\_contains\_\_:

In [None]:
? greeting.__contains__

Instead of using the data model identifier, the keyword in is typically used:

In [None]:
'hello'.__contains__('h')

In [None]:
'h' in 'hello'

In [None]:
'll' in 'hello'

In [None]:
'H' in 'hello'

In [None]:
'H'.casefold() in 'hello'.casefold()

The string is also Hashable which means it has the data model method \_\_hash\_\_ which maps to builtins hash and returns the integer has value:

In [None]:
? greeting.__hash__

In [None]:
? hash

The hash value corresponds to an integer. This integer will be the same for strings that are equal in value:

In [None]:
greeting = 'hello'

In [None]:
hash(greeting)

In [None]:
hash('hello')

And differ otherwise:

In [None]:
hash('Hello')

A Python object has to be immutable to be hashable and hashable values can be used in keys in mappings such as dictionaries:

In [None]:
colors = {'red': '#ff0000', 
          'green': '#00b050',
          'blue': '#0070c0'}

In [None]:
hash('red')

In [None]:
colors['red']

## Iteration and Indexing

The str is Sized and therefore has the data model identifier \_\_len\_\_ which controls the behaviour of the builtins len function and returns the number of characters in a string:

In [None]:
? greeting.__len__

In [None]:
? len

The string 'hello' has 5 characters as expected:

In [None]:
len('hello')

The str is an Iterable which has the data model method \_\_iter\_\_, that controls the behaviour of the builtins iter:

In [None]:
? greeting.__iter__

In [None]:
? iter

This is used to construct an iterator from the str. An iterator is a data type that can only contain a single value at a time:

In [None]:
greeting

In [None]:
forward = iter(greeting)

In [None]:
forward

Note an iterator displays only a single value and has no len:

In [None]:
# len(forward)

<span style="color:red">TypeError:</span> object of type 'str_ascii_iterator' has no len()

This is because the iterator instance forward has no data model \_\_len\_\_ and therefore the len function has no instruction to follow:

In [None]:
print(dir(forward), sep='')

The iterator has the data model \_\_next\_\_ which maps to the builtins next function:

In [None]:
? forward.__next__

In [None]:
? next

Next can be run on the iterator to display each individual character in the original string, the iterator was constructed from:

In [None]:
next(forward)

In [None]:
next(forward)

In [None]:
next(forward)

In [None]:
next(forward)

In [None]:
next(forward)

Using next on an iterator that has been exhausted gives a StopIteration error:

In [None]:
# next(forward)

<span style="color:red">StopIteration</span>: 

The iterator above was an ASCII iterator as each character was ASCII. A slightly different str iterator is created when the iterator contains non-ASCII characters:

In [None]:
greek_greeting = 'Γειά σου Κόσμε!'

In [None]:
forward = iter(greek_greeting)

In [None]:
forward

A Unicode character is returned when next is used:

In [None]:
next(forward)

In [None]:
next(forward)

In [None]:
next(forward)

And so on... 

When a for loop is used with a str, under the hood an iterator is created and consumed by the for loop. A simple for loop can be made which prints each letter twice:

In [None]:
word = 'hello'

In [None]:
for letter in word:
    print(letter, sep='', end='')
    print(letter, sep='', end='')

The string can be enumerated to give a tuple of the numeric index and the corresponding letter from a for loop:

In [None]:
for (index, letter) in enumerate(word):
    print(index, letter)

A string has an abstract base class Collection. The Collection abstract base class has all the properties from the simpler abstract base classes Sized, Iterable and Container seen. 

A string also has an abstract base class Sequence which means it contains the data model identifiers \_\_getitem\_\_, \_\_len\_\_, \_\_contains\_\_, \_\_iter\_\_ and \_\_reversed\_\_ alongside the identifiers index and count.

The data model identifier \_\_getitem\_\_ defines the behaviour when indexing into an Collection using square brackets [ ]:

In [None]:
? greeting.__getitem__

Note that Python uses zero-order indexing so the value at index 1 is the second value:

In [None]:
'hello'[1]

The first value is at index 0:

In [None]:
'hello'[0]

The last value has an index of the length of the string minus 1:

In [None]:
'hello'[len('hello')-1]

It is also possible to get a negative index:

In [None]:
for index, letter in enumerate(word):
    print(index-len(greeting), letter)

The last value for example is the value 1 before 0 which is -1:

In [None]:
'hello'[-1]

This becomes more clear when the negative and positive indexes are examined together:

In [None]:
for index, letter in enumerate(word):
    print(index-len(greeting), letter)
for index, letter in enumerate(word):
    print(index, letter)

Conceptualise the word being printed on a label that is rolled up.

The builtins slice function can be used to create a substring by slicing:

In [None]:
? slice

If a start, stop and step value are created with the start and the step having the default values:

In [None]:
start = 0 # default value
stop = 3
step = 1 # default value

The form of the slice can be:

In [None]:
'hello'[slice(start, stop, step)]

In [None]:
'hello'[slice(start, stop)]

In [None]:
'hello'[slice(stop)]

This is typically shortened by using a colon instead:

In [None]:
'hello'[start:stop:step]

In [None]:
'hello'[start:stop]

In [None]:
'hello'[:stop]

In [None]:
'hello world!'

In [None]:
len('hello world!')

In [None]:
'hello world!'[:5]

In [None]:
'hello world!'[6:]

In [None]:
'hello world!'[6:11]

In [None]:
'hello world!'[::2]

In [None]:
'hello world!'[1::2]

In [None]:
'hello world!'[::-1]

In [None]:
'hello world!'[-8:-len('hello world!')-1:-1]

The string class doesn't have the data model identifier \_\_reversed\_\_ defined however because it is an ordered Sequence the builtins function reversed can be used on a string instance to create a reversed iterator:

In [None]:
greeting

In [None]:
backward = reversed(greeting)
backward

In [None]:
next(backward)

In [None]:
next(backward)

In [None]:
next(backward)

And so on...

In [None]:
word = 'hello world!'

for index, letter in enumerate(word):
    print(index, letter)

The index and find identifiers perform the reverse operation and retrieves the positive index corresponding to the first occurrence of a character or substring:

In [None]:
? word.find

In [None]:
? word.index

These two functions behave identically upon success:

In [None]:
word.find('l')

In [None]:
word.index('l')

However give -1 and ValueError respectively upon failure:

In [None]:
word.find('L')

In [None]:
# word.index('L')

<span style="color:red">ValueError:</span> substring not found

These methods take in an optional start and stop positional input arguments whic can be assigned to start and stop index values to constrict the search. For example to find the index of all the values of 'l':

In [None]:
word.find('l')

In [None]:
word.find('l', 2+1)

In [None]:
word.find('l', 3+1)

In [None]:
word.find('l', 9+1)

A Unicode substring can also be searched for opposed to a Unicode character:

In [None]:
word.find('world')

In [None]:
word.find('w')

The index and find methods search the string for a substring from the left to the right. There is also the reverse find and reverse index, rfind and rindex respectively which search from right to left:

In [None]:
word.rfind('l')

In [None]:
word.rfind('l', 0, 9)

In [None]:
word.rfind('l', 0, 3)

In [None]:
word.rfind('l', 0, 2)

In [None]:
word.rfind('l')

The count method returns the number of times a substring is found in a string:

In [None]:
word.count('l')

The boolean string identifiers startswith and endswith return a bool if a string starts or ends with a substring. These also take a start and stop index as input arguments:

In [None]:
? word.startswith

In [None]:
? word.endswith

In [None]:
'hello world!'.startswith('hello')

In [None]:
'hello world!'.startswith('hello', 1)

In [None]:
'hello world!'.endswith('!')

In [None]:
'hello world!'.endswith('!', 0, 11)

The replace method can be used to replace an old substring old with a new substring new. It has an optional argument count which has a default value of -1 and this means it allows for all replacements by default. The / trailing the input arguments once again indicates that the input arguments are to be supplied positionally:

In [None]:
? word.replace

In [None]:
'hello world!'.replace('hello', 'bye')

In [None]:
'hello hello world!'.replace('hello', 'bye')

In [None]:
'hello hello world!'.replace('hello', 'bye', 1)

## Center and Justify

A string can be centered and justified using the fill, centre, ljust and rjust string methods:

In [None]:
? greeting.center

In [None]:
? greeting.ljust

In [None]:
? greeting.rjust

In [None]:
len(greeting)

In [None]:
'hello'.center(8)

In [None]:
'hello'.center(8, 'X')

In [None]:
'hello'.ljust(8, 'X')

In [None]:
'hello'.rjust(8, 'X')

The opposite operation can be carried out using the string identifiers left strip and right strip, lstrip and rstrip respectively which left strip and right strip whitespace by default or a specified fill character or character sequence:

In [None]:
padded = 'hello'.center(20)
padded

In [None]:
? padded.lstrip

In [None]:
? padded.rstrip

In [None]:
padded.lstrip()

In [None]:
padded.rstrip()

In [None]:
padded.lstrip().rstrip()

In [None]:
padded = 'hello'.center(20, 'X')
padded

In [None]:
padded.lstrip('X').rstrip('X')

There are the associated identifiers removeprefix and removesuffix that are more precise and will only remove a specified prefix or suffix:

In [None]:
? padded.removeprefix

In [None]:
? padded.removesuffix

In [None]:
padded

In [None]:
padded.removeprefix('X')

Earlier the ordinal value of the string '3' was examined. The prefix '0b' can be removed using remove prefix:

In [None]:
string3 = bin(ord('3'))
string3

In [None]:
string3 = bin(ord('3')).removeprefix('0b')
string3

There is also the string method zfill which can be used to zero fill a string and is mainly intended for strings of numeric values:

In [None]:
? string3.zfill

Since this binary number is of a byte that has 8 values, the width can be set to 8:

In [None]:
string3 = string3.zfill(8)
string3

## Binary Operators

The string is an ordered immutable Sequence as previously discussed. An immutable Sequence often has the data model identifiers addition \_\_add\_\_ and multiplication \_\_mul\_\_ which control the behaviour of the operators + and * respectively. For an immutable sequence these perform the task of concatenation and replication with an integer respectively. The reverse multiplication \_\_rmul\_\_ is also typically defined which operates if the integer is multiplied by the string instead of the string multiplied by the integer giving the same result. 

Note that these data model identifiers alongside the previously examined \_\_mod\_\_ data model identifier which controls the behaviour of the % operator have different behaviour in numeric data types where they instead behave differently and perform a numeric operation. The typical functionality of these data model identifiers is different for Sequences and numbers.

The \_\_add\_\_ data model identifier is called a binary operator as it requires two instances the string instance self and the string instance value. The \_\_mul\_\_ data model identifier is configured to use two instances of different types the string instance and the int instance. The \_\_mod\_\_ data model identifier is configured to use two instances of different types the string instance and the tuple instance of values to be inserted.

In [None]:
? greeting.__add__

In [None]:
'hello' + 'world'

In [None]:
'hello' + ' ' + 'world'

In [None]:
? greeting.__mul__

In [None]:
greeting * 3

In [None]:
? greeting.__rmul__

In [None]:
3 * greeting

In [None]:
? greeting.__mod__

In [None]:
'%s %s%s' % ('hello', 'world', '!')

It is common to use a binary operator and reassign the output to the instance name. Recall the operation on the right is carried out using the original string and then the instance name, which can be conceptualised as a label then points to the new object:

In [None]:
greeting

In [None]:
greeting = greeting + ' world!'

In [None]:
greeting

This can be done shorthand by combining the binary operator for example addition + with the assignment operator = resulting in +=

In [None]:
greeting += ' world!'

In [None]:
greeting

The assignment binary operator more clearly indicates that an addition involving the old value is carried out before reassignment to the value.

## Binary Comparison Operators

The binary data model operator \_\_eq\_\_ checks whether two instances are equal and returns a bool. This is configured in the object class alongside its opposite \_\_ne\_\_ and these control the behaviour of the is equal to == and not equal to != operators:

In [None]:
object1 = object()
object2 = object()
object1a = object1

In [None]:
? object1.__eq__

In [None]:
? object1.__ne__

In [None]:
object1 == object2

In [None]:
object1 == object1a

In [None]:
object1 != object2

For ordinal values there are the additional binary data model comparison operators less than \_\_lt\_\_ and greater than \_\_gt\_\_ which control the behaviour of the < and > operators. Recall that a string is made up of letters which are ordinal:

In [None]:
ord('A')

In [None]:
ord('a')

Therefore the string has the methods:

In [None]:
? greeting.__lt__

In [None]:
? greeting.__gt__

And due to the ordinal values:

In [None]:
'a' > 'A'

In [None]:
'hello' > 'Hello World!'

There is also the additional  binary data model comparison operators less than or equal to \_\_le\_\_ and greater than or equal to \_\_ge\_\_ which control the behaviour of the <= and >= operators.

In [None]:
? greeting.__le__

In [None]:
? greeting.__ge__

In [None]:
'a' >= 'b'

In [None]:
'a' < 'a'

In [None]:
'a' == 'a'

In [None]:
'a' <= 'a'

In [None]:
'a' >= 'a'

The keyword is, also performs a comparison which checks to see whether two objects are the same object in memory i.e. whether they have the same id:

In [None]:
? id

In [None]:
id('a')

In [None]:
id('b')

Because a string is immutable is and is equal to == return the same value. For mutable data types this may not be the case:

In [None]:
'a' is 'b'

In [None]:
id('a') == id('b')

## Splitting and Joining Strings

The string has a number of identifiers which are used for splitting and joining a string. These generally involve casting to a Python collection such as a tuple of strings or a list of strings.

For example the identifier partition and right partition rpartition will partition a string into a three element tuple containing the substring before the partition, the partition substring and the substring after the partition respectively. To make it more obvious the following string will be instantiated:

In [None]:
greeting = 'hello|world|!'

In [None]:
? greeting.partition

In [None]:
greeting.partition('|')

In [None]:
? greeting.rpartition

In [None]:
greeting.rpartition('|')

More generally the split and join identifiers can be used to split a string into a list of strings or join a list of strings up into a single string. For example if the following sentence is created:

In [None]:
sentence = 'the fat black cat sat on the mat!'

The identifier split can be examined:

In [None]:
? sentence.split

Since the values to be split from are whitespace, the input arguments can be left unspecified defaulting to their default values. This gives a list of strings:

In [None]:
words = sentence.split()
words

There is also a right split, the difference is subtle and the methods behave different only when maxsplit is assigned a new value:

In [None]:
? sentence.rsplit

In [None]:
words = sentence.rsplit()
words

In [None]:
'a b c d e'.split(' ', maxsplit=2)

In [None]:
'a b c d e'.rsplit(' ', maxsplit=2)

To join the words, the string join method can be called from a delimiter string instance:

In [None]:
delimiter = ' '

In [None]:
? delimiter.join

In [None]:
delimiter.join(words)

In [None]:
' '.join(words)

In [None]:
'|'.join(words)

If a multiline string is created:

In [None]:
paragraph = '''The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog'''

In [None]:
paragraph

There is an associated string method splitlines, which splits the string into a list using the newline. It has an input argument keepends which defaults to False and therefor excludes the newline character:

In [None]:
? paragraph.splitlines

In [None]:
paragraph.splitlines()

['The quick brown fox jumps over the lazy dog',
 'The quick brown fox jumps over the lazy dog',
 'The quick brown fox jumps over the lazy dog',
 'The quick brown fox jumps over the lazy dog']

If the multiline string is created with tabs:

In [None]:
paragraph = '''\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog'''

The tabs can be replaced by a specified number of spaces using the string method expandtabs:

In [None]:
? paragraph.expandtabs

[1;31mSignature:[0m  [0mparagraph[0m[1;33m.[0m[0mexpandtabs[0m[1;33m([0m[0mtabsize[0m[1;33m=[0m[1;36m8[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.
[1;31mType:[0m      builtin_function_or_method

In [None]:
paragraph.expandtabs(4)

'    The quick brown fox jumps over the lazy dog\n    The quick brown fox jumps over the lazy dog\n    The quick brown fox jumps over the lazy dog\n    The quick brown fox jumps over the lazy dog'

In [None]:
print(paragraph)

	The quick brown fox jumps over the lazy dog
	The quick brown fox jumps over the lazy dog
	The quick brown fox jumps over the lazy dog
	The quick brown fox jumps over the lazy dog


In [None]:
print(paragraph.expandtabs(4))

    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog


## Get Attribute

A list of identifiers can be viewed from the string instance greeting:

In [None]:
# greeting.

If one of the methods is referenced and not called, details about the function object display:

In [None]:
greeting.center

<function str.center(width, fillchar=' ', /)>

In [None]:
greeting.center(20)

'   hello|world|!    '

The data model identifier \_\_getattribute\_\_ controls the behaviour of the getattr function:

In [None]:
? greeting.__getattribute__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__getattribute__[0m[1;33m([0m[0mname[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__getattribute__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__getattribute__' of str object at 0x00000216F3BDCAF0>
[1;31mDocstring:[0m      Return getattr(self, name).

In [None]:
? getattr

[1;31mDocstring:[0m
getattr(object, name[, default]) -> value

Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.
When a default argument is given, it is returned when the attribute doesn't
exist; without it, an exception is raised in that case.
[1;31mType:[0m      builtin_function_or_method

This is used to get an attribute using a string:

In [None]:
getattr(greeting, 'center')

<function str.center(width, fillchar=' ', /)>

In [None]:
getattr(greeting, 'center')(20)

'   hello|world|!    '

This can be useful when using something like a for loop:

In [None]:
dir(greeting)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

In [None]:
for attribute in dir(greeting):
    if attribute[0:2] == 'is':
        print(attribute.ljust(len('isidentifier')), end='\t\t')
        print(getattr(greeting, attribute))

isalnum     		<built-in method isalnum of str object at 0x00000216F3BDCAF0>
isalpha     		<built-in method isalpha of str object at 0x00000216F3BDCAF0>
isascii     		<built-in method isascii of str object at 0x00000216F3BDCAF0>
isdecimal   		<built-in method isdecimal of str object at 0x00000216F3BDCAF0>
isdigit     		<built-in method isdigit of str object at 0x00000216F3BDCAF0>
isidentifier		<built-in method isidentifier of str object at 0x00000216F3BDCAF0>
islower     		<built-in method islower of str object at 0x00000216F3BDCAF0>
isnumeric   		<built-in method isnumeric of str object at 0x00000216F3BDCAF0>
isprintable 		<built-in method isprintable of str object at 0x00000216F3BDCAF0>
isspace     		<built-in method isspace of str object at 0x00000216F3BDCAF0>
istitle     		<built-in method istitle of str object at 0x00000216F3BDCAF0>
isupper     		<built-in method isupper of str object at 0x00000216F3BDCAF0>


In [None]:
for attribute in dir(greeting):
    if attribute[0:2] == 'is':
        print(attribute.ljust(len('isidentifier')), end='\t\t')
        print(getattr(greeting, attribute)())

isalnum     		False
isalpha     		False
isascii     		True
isdecimal   		False
isdigit     		False
isidentifier		False
islower     		True
isnumeric   		False
isprintable 		True
isspace     		False
istitle     		False
isupper     		False


In [None]:
? str.maketrans

[1;31mDocstring:[0m
Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode
ordinals (integers) or characters to Unicode ordinals, strings or None.
Character keys will be then converted to ordinals.
If there are two arguments, they must be strings of equal length, and
in the resulting dictionary, each character in x will be mapped to the
character at the same position in y. If there is a third argument, it
must be a string, whose characters will be mapped to None in the result.
[1;31mType:[0m      builtin_function_or_method

In [None]:
str.maketrans('αβγ', 'abc')

{945: 97, 946: 98, 947: 99}

In [None]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

## Bytes Related Identifiers

In [None]:
? greeting.encode

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mencode[0m[1;33m([0m[0mencoding[0m[1;33m=[0m[1;34m'utf-8'[0m[1;33m,[0m [0merrors[0m[1;33m=[0m[1;34m'strict'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Encode the string using the codec registered for encoding.

encoding
  The encoding in which to encode the string.
errors
  The error handling scheme to use for encoding errors.
  The default is 'strict' meaning that encoding errors raise a
  UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
  'xmlcharrefreplace' as well as any other name registered with
  codecs.register_error that can handle UnicodeEncodeErrors.
[1;31mType:[0m      builtin_function_or_method

In [None]:
greeting.encode()

b'hello|world|!'

In [None]:
greek_greeting = 'Γειά σου Κόσμε!'

In [None]:
greek_greeting.encode()

b'\xce\x93\xce\xb5\xce\xb9\xce\xac \xcf\x83\xce\xbf\xcf\x85 \xce\x9a\xcf\x8c\xcf\x83\xce\xbc\xce\xb5!'