## The str class

The str class is an abbreviation for a string of Unicode characters. The string class is an immutable ordered Collection of Unicode characters. Immutable means once it has been instantiated it cannot be later modified.

## Initialisation Signature

Inputting str( will display the initialisation signature docstring of the string class as a popup balloon:

In [1]:
str()

''

When a <span style="color:green">#</span> is inserted before Python code it becomes a comment. The above line is commented out to prevent any errors when all ipython cells are run. Remove the <span style="color:green">#</span> to uncomment out the cell, delete the ( and reinsert it and a docstring should display as a popup balloon.

Some IDEs such as JupyterLab may require the keypress shift ⇧ and tab ↹ to invoke the popup balloon.

Alternatively inputting <span style="color:#C00000">?</span> <span style="color:#0070C0">str</span> in an ipython cell will display the docstring:

In [2]:
? str

[1;31mInit signature:[0m  [0mstr[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

The purpose of the initialisation signature is to provide the data required to initialise a new instance. 

During construction of a new instance under the hood, two data model identifiers are use:

* \_\_new\_\_ - the constructor
* \_\_init\_\_ - the initialiser

\_\_new\_\_ creates a new instance and then invokes the \_\_init\_\_ data model initialiser to initialise this instance with instance data. 

Python uses Object Orientated Programming (OOP) and every class is based on the object class. \_\_new\_\_ is a data model that is inherited directly from the object class. The \_\_init\_\_ data model method is typically redefined in each child class, in this case str, so a new instance has the required instance data.

For the string class, the initialisation signature shows alternative ways of supplying instance data for a string.

If the first way is examined:

* In Python parenthesis ( ) are used to call a function and supply any necessary input arguments.
* The comma , is used as a delimiter to seperate out any input arguments.
* In Python *self* is used to denote *this instance*. (In other words a string can be constructed from an existing string instance).
* The right slash / is an instruction for proceeding input arguments to be provided positionally (top). If the input argument is instead named (bottom) an error message will display.

In [3]:
str('hello')

'hello'

In [4]:
# str(self='hello')

<span style="color:red">---------------------------------------------------------------------------</span>

<span style="color:red">TypeError:</span> 'self' is an invalid keyword argument for str()

* \*args indicates a variable number of additional positional input arguments. These are typically not used for the str class.
* \*\*kwargs indicates a variable number of additional named input arguments. These are typically not used for the str class.

A string instance can be instantiated using the initialisation signature:

In [5]:
str('hello')

'hello'

Because the string is a fundamental datatype it can also be instantiated using the following:

In [6]:
'hello'

'hello'

Notice the difference in syntax highlighting between the string instance above and the object name below. Once again remove the <span style="color:green">#</span> as the following line of code is commented out.

In [7]:
# hello

Running the cell above will give an error similar to the following:

<span style="color:red">---------------------------------------------------------------------------</span>

<span style="color:red">NameError:</span> name 'hello' is not defined

In other words Python is looking for the object hello which does not exist.

In VSCode open the Variables tab by pressing the Variables button to the top:

<img src='./images/img_001.png' alt='img_001' width='100'/>

The Variables should look like the following:

<img src='./images/img_002.png' alt='img_002' width='500'/>

If the following code is input:

In [8]:
'hello'

'hello'

Notice the value 'hello' is displayed in the cell output. This value is a Python string instance that has no reference (otherwise known as a variable name or instance name) and cannot be reselected. Conceptualise a reference as a label which is used to select the string instance.

A string instance can be assigned to a variable name during instantiation.

In [9]:
greeting = 'hello'

Notice the cell has no output. Instead it is referenced via the variable name or instance name.

Notice that the Variables will be updated:

<img src='./images/img_003.png' alt='img_003' width='500'/>

And the string instance can be accessed from the variable or instance name. Conceptualise the instance name as a label which points to the instance.

In [10]:
greeting

'hello'

The Python interpretter recognises the instance name or variable name. This is used as a reference to retrieve the value which is shown in the cell output above. If the instance is assigned to another instance name for example:

In [11]:
greeting2 = greeting

Then in the Variable Explorer, two instance names are shown with the same value:

<img src='./images/img_004.png' alt='img_004' width='500'/>

Conceptualise the instance name as a label which points to the instance. This instance has two instance names which act as alias to one another. If either instance name are used, the same value is retrieved:

In [12]:
greeting

'hello'

In [13]:
greeting2

'hello'

And if a check is made to see if the two values are equal, the boolean True is returned:

In [14]:
greeting == greeting2

True

Each object in Python has a unique id. If the id of the object corresponding to each instance name is checked:

In [15]:
id(greeting)

2038944621104

In [16]:
id(greeting2)

2038944621104

Then the id is the same, because both these instance names are references or labels to the same physical instance in memory. Therefore the following is True:

In [17]:
greeting is greeting2

True

The del statement can be used to delete an instance name. Conceptualise this as deleting a label:

In [18]:
del greeting

Notice that the instance name greeting is deleted i.e. this label is removed. However the label greeting2 is still present and the instance value 'hello' is unaltered.

<img src='./images/img_005.png' alt='img_005' width='500'/>

If del is used to also delete the instance name or label greeting2:

In [19]:
del greeting2

<img src='./images/img_006.png' alt='img_006' width='500'/>

Then there is no instance names (references) for the string instance which has the value 'hello'; when an instance has no instance name or reference it is considered orphaned and is cleaned up by Pythons garbage collection. 

If a new instance is created:

In [20]:
greeting = 'Hello World'

Then the instance name displays on variables:

<img src='./images/img_007.png' alt='img_007' width='500'/>

If a reassignment is carried out:

In [21]:
greeting = 'hi'

Then the instance name displays on Variables but the value is changed. Recall that the instance name can be conceptualised as a label; what has happened is the label greeting has been removed from the old instance with the Value 'Hello World' and placed on the new instance with the Value 'hi'. The old instance now has no label i.e. no reference and is cleaned up by Pythons garbage collection.

<img src='./images/img_008.png' alt='img_008' width='500'/>

Reassignment **moves the instance name** from the old instance to the new instance and does not change the value of an instance. Strings are **immutable** which means that once they are instantiated they can never be modified.

The initialisation signature of the string class shows instantiation using a named keyword input argument object which has a default value of an empty string:

This named keyword input argument can be assigned to a custom value:

In [22]:
str(object='hello')

'hello'

If not assigned, it takes on its default value which returns an empty string:

In [23]:
str()

''

The string can instantiated by supplying the named keyword input argument in the initialisation signature and can be assigned to a variable name. Notice the subtlety in spacing in the line below which follows Pythons PEP8 styling convention. The assignment operator to the instance name is subtly emphasised using the spacing. The keyword argument within the function call has no spacing as spacing within a function call is instead typically used with the , separator to visually separate out input arguments from one another. Python code will work with or without the spacing but the form below is preferred:

In [24]:
greeting = str(object='hello')

## Identifiers

Two string instances can be instantiated to instance names:

In [25]:
greeting = 'hello'
farewell = 'bye'

<img src='./images/img_009.png' alt='img_009' width='500'/>

If one of the instance names is typed followed by a dot . then a list of identifiers displays. 

Some IDEs such as JupyterLab may require the keypress tab ↹ to invoke the list of identifiers:

In [26]:
# greeting.

Notice that the same list of identifiers displays for the other instance:

In [27]:
# farewell.

These are the same identifiers and originate from the string class str:

In [28]:
# str.

A Python object such as the str instance behaves similarly to a directory (or folder) within Windows Explorer. In Windows Explorer each directory can contain other directories or files.

In Python each instance can contain identifiers. Identifiers is the term for any objects belonging to another object. These objects can be functions; functions which are defined in a class are known as methods or instances of another Python object.

The list of identifiers can be viewed as a list using the directory dir function.

In [29]:
? dir

[1;31mDocstring:[0m
dir([object]) -> list of strings

If called without an argument, return the names in the current scope.
Else, return an alphabetized list of names comprising (some of) the attributes
of the given object, and of attributes reachable from it.
If the object supplies a method named __dir__, it will be used; otherwise
the default dir() logic is used and returns:
  for a module object: the module's attributes.
  for a class object:  its attributes, and recursively the attributes
    of its bases.
  for any other object: its attributes, its class's attributes, and
    recursively the attributes of its class's base classes.
[1;31mType:[0m      builtin_function_or_method

In [30]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

The help function can be used to view details about each identifier:

In [31]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

Notice that the identifiers are grouped into:

* (Instance) Methods
* Class Methods\*
* Static Methods
* Data Descriptors\*

Methods are functions and are called. Data Descriptors are other Python instances and are not callable. To call a funciton and provide input arguments parenthesis are used.

Most methods are instance methods and require an instance which has instance data to work on. Other input arguments may be required.

Class methods are instead bound to a class and usually used for the purpose of alternative constructors.

Static methods are regular functions which are neither bound to the instance or class but are found within the namespace of the class for convenience.

\* Not present in the string class.

### Instance Methods

The docstring of the capitalize method can be examined from an instance:

In [32]:
? greeting.capitalize

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      builtin_function_or_method

And from the class itself:

In [33]:
? str.capitalize

[1;31mSignature:[0m  [0mstr[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      method_descriptor

Notice the difference between:

When the method capitalize is called from an instance, it has access to the instance data. This method requires no additional data top operate which is why the parenthesis are empty. 

In contrast when the method is called from the class itself, it has no instance data to work from an instance must be provided. In Python *self* means *this instance*. Recall that the / indicates that all prior input arguments should be provided positionally and not as named input arguments.

A string is immutable meaning that it cannot be modified after it is instantiated. Therefore string methods typically return a new instance usually a new instance of the string class. The docstring contains the word *return* giving details about the return value. 

In [34]:
# Docstring:
# Return a capitalized version of the string.

If the method is called from an instance:

In [35]:
greeting.capitalize()

'Hello'

The new capitalised string instance displays in the cell output. This is a new instance and does not modify the original which can be seen to be unchanged in Variables.

<img src='./images/img_009.png' alt='img_009' width='500'/>

Since this new instance is not assigned an instance name it has no references and is removed by Pythons Garbage collection. It can be assigned an instance name using:

In [36]:
capgreeting = greeting.capitalize()

Notice no cell output as the instance is assigned to the instance name. This can be seen in Variables:

<img src='./images/img_010.png' alt='img_010' width='500'/>

If the instance method is invoked from a class, the instance self must be provided as the first input argument:

In [37]:
str.capitalize(farewell)

'Bye'

Failure to supply an instance will result in an error:

In [38]:
# str.capitalize()

<span style="color:red">TypeError</span>: unbound method str.capitalize() needs an argument

## Data Model Identifiers and the object Base Class

Many of the identifiers in the str class begin and end with a **d**ouble **under**score. These are data model identifiers which are sometimes colloquially called dunder identifiers and generally follow the design pattern of the object base class. 

The method resolution order function will return the method resolution order of the string class:

In [39]:
? str.mro

[1;31mSignature:[0m  [0mstr[0m[1;33m.[0m[0mmro[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a type's method resolution order.
[1;31mType:[0m      builtin_function_or_method

In [40]:
str.mro()

[str, object]

Notice that it returns a Python list of str and object. This means that the str (child class) is based upon an object (parent class) and object is the base class for everything in Pythons object orientated programming. The str can use methods from the object class and str class but will preference methods in the str class if they are redefined. If help is used on the object class, a list of identifiers display:

In [41]:
help(object)

Help on class object in module builtins:

class object
 |  The base class of the class hierarchy.
 |  
 |  When called, it accepts no arguments and returns a new featureless
 |  instance that has no instance attributes and cannot be given any.
 |  
 |  Built-in subclasses:
 |      anext_awaitable
 |      async_generator
 |      async_generator_asend
 |      async_generator_athrow
 |      ... and 93 other subclasses
 |  
 |  Methods defined here:
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(self, /)
 |      Default dir() implementation.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Default object formatter.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getstate__(self, /)
 |      Helper for pickle.
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(sel

In [42]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

Notice that the identifiers are grouped into:

* (Instance) Methods
* Class Methods
* Static Methods
* Data Descriptors

Notice that most of these begin and end with a double underscore.

Recall during construction of a new instance under the hood, two data model identifiers are used:

* \_\_new\_\_ - the constructor
* \_\_init\_\_ - the initialiser

\_\_new\_\_ is a data model that is inherited directly from the object class. The \_\_init\_\_ data model method is instead redefined in the str, so is preferenced over the \_\_init\_\_ in the object class.

In [43]:
? str

[1;31mInit signature:[0m  [0mstr[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

In [44]:
# instance_name = str(self='hello')

<span style="color:red">TypeError</span>: 'self' is an invalid keyword argument for str()

Notice that the object class has the identifier \_\_dir\_\_. The data model identifier is not typically used directly but instead maps to a corresponding builtins function:

In [45]:
? object.__dir__

[1;31mSignature:[0m  [0mobject[0m[1;33m.[0m[0m__dir__[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Default dir() implementation.
[1;31mType:[0m      method_descriptor

In [46]:
? str.__dir__

[1;31mSignature:[0m  [0mstr[0m[1;33m.[0m[0m__dir__[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Default dir() implementation.
[1;31mType:[0m      method_descriptor

The docstring of the inbuilt function dir can be examined, note that it has an object as an input argument which is usually an object or an instance of another class such as a str instance which recall is based upon an object.

In [47]:
? dir

[1;31mDocstring:[0m
dir([object]) -> list of strings

If called without an argument, return the names in the current scope.
Else, return an alphabetized list of names comprising (some of) the attributes
of the given object, and of attributes reachable from it.
If the object supplies a method named __dir__, it will be used; otherwise
the default dir() logic is used and returns:
  for a module object: the module's attributes.
  for a class object:  its attributes, and recursively the attributes
    of its bases.
  for any other object: its attributes, its class's attributes, and
    recursively the attributes of its class's base classes.
[1;31mType:[0m      builtin_function_or_method

The function dir can be used on a string instance:

In [48]:
print(dir(greeting), end=' ')

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] 

If used without an input argument it instead examines the local scope:

In [49]:
dir()

['In',
 'Out',
 '_',
 '_1',
 '_10',
 '_12',
 '_13',
 '_14',
 '_15',
 '_16',
 '_17',
 '_22',
 '_23',
 '_3',
 '_30',
 '_35',
 '_37',
 '_40',
 '_5',
 '_6',
 '_8',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__vsc_ipynb_file__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i27',
 '_i28',
 '_i29',
 '_i3',
 '_i30',
 '_i31',
 '_i32',
 '_i33',
 '_i34',
 '_i35',
 '_i36',
 '_i37',
 '_i38',
 '_i39',
 '_i4',
 '_i40',
 '_i41',
 '_i42',
 '_i43',
 '_i44',
 '_i45',
 '_i46',
 '_i47',
 '_i48',
 '_i49',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'capgreeting',
 'exit',
 'farewell',
 'get_ipython',
 'greeting',
 'open',
 'quit']

Note the inclusion of \_\_builtins\_\_, the module which contains all the builtin classes. In Python a module can be imported using:

In [50]:
import builtins

Identifiers can be viewed from the module by inputting the module name followed by a dot .

In [51]:
# builtins.

And the str class can be accessed using:

In [52]:
builtins.str

str

Which is the same as:

In [53]:
str

str

And:

In [54]:
builtins.str is str

True

An individual object such as the string class can be imported from a module using:

In [55]:
from builtins import str

And multiple objects can be imported by using a comma as a delimiter:

In [56]:
from builtins import object, str

The \* can be used to import all objects from a module, although this is not normally recommended:

In [57]:
# from builtins import *

It is worthwhile examining the objects from builtins in groupings. 

Pascal Case classes are generally reserved for error messages, the NameError for example has already been encountered:

In [58]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and isclass and isupper and not isdatamodel):
        print(identifier, end=' ')



Pascal Case identifiers that are not callable are constants. Constants are instances of inbuilt classes that have a constant value:

In [59]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (not isfunction and not isclass and isupper and not isdatamodel):
        print(identifier, end=' ')

Ellipsis False None NotImplemented True 

Lower case classes include the object and the str class:

In [60]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and isclass and not isupper and not isdatamodel):
        print(identifier, end=' ')

bool bytearray bytes classmethod complex dict enumerate filter float frozenset int list map memoryview object property range reversed set slice staticmethod str super tuple type zip 

Lower case functions are typically used in place of the data model identifiers, for example dir is used in place of the data model identifier \_\_dir\_\_:

In [61]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and not isclass and not isupper and not isdatamodel):
        print(identifier, end=' ')

abs aiter all anext any ascii bin breakpoint callable chr compile copyright credits delattr dir display divmod eval exec execfile format get_ipython getattr globals hasattr hash help hex id input isinstance issubclass iter len license locals max min next oct open ord pow print repr round runfile setattr sorted sum vars 

In [62]:
? abs

[1;31mSignature:[0m  [0mabs[0m[1;33m([0m[0mx[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the absolute value of the argument.
[1;31mType:[0m      builtin_function_or_method

In [63]:
? bool

[1;31mInit signature:[0m  [0mbool[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
bool(x) -> bool

Returns True when the argument x is true, False otherwise.
The builtins True and False are the only two instances of the class bool.
The class bool is a subclass of the class int, and cannot be subclassed.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

There are data model identifiers that are callable, these are generally present for every Python module. For example \_\_import\_\_ is used when a module is imported using the Python keyword import:

In [64]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and not isclass and not isupper and isdatamodel):
        print(identifier, end=' ')

__build_class__ __import__ 

And there are also data model identifiers that are instances of other classes, typically strings for example, \_\_name\_\_ which is the modules name, \_\_package\_\_ which is the package the module is contained in and \_\_doc\_\_ which is the modules docstring:

In [65]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (not isfunction and not isclass and not isupper and isdatamodel):
        print(identifier, end=' ')

__IPYTHON__ __debug__ __doc__ __name__ __package__ __spec__ 

The name is builtins:

In [66]:
builtins.__name__

'builtins'

The package is an empty string as it is a module builtin to Python:

In [67]:
builtins.__package__

''

The docstring is as follows:

In [68]:
builtins.__doc__

"Built-in functions, types, exceptions, and other objects.\n\nThis module provides direct access to all 'built-in'\nidentifiers of Python; for example, builtins.len is\nthe full name for the built-in function len().\n\nThis module is not normally accessed explicitly by most\napplications, but can be useful in modules that provide\nobjects with the same name as a built-in value, but in\nwhich the built-in of that name is also needed."

## Case Methods

The str case method capitalize has already been examined:

In [69]:
? greeting.capitalize

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      builtin_function_or_method

In [70]:
greeting.capitalize()

'Hello'

There are associated identifiers such as:

* lower
* casefold
* upper
* title
* swapcase

The docstrings of these can all be examined:

In [71]:
? greeting.lower

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mlower[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a copy of the string converted to lowercase.
[1;31mType:[0m      builtin_function_or_method

In [72]:
? greeting.casefold

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcasefold[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a version of the string suitable for caseless comparisons.
[1;31mType:[0m      builtin_function_or_method

In [73]:
? greeting.upper

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mupper[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a copy of the string converted to uppercase.
[1;31mType:[0m      builtin_function_or_method

In [74]:
? greeting.title

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mtitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining
cased characters have lower case.
[1;31mType:[0m      builtin_function_or_method

In [75]:
? greeting.swapcase

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mswapcase[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Convert uppercase characters to lowercase and lowercase characters to uppercase.
[1;31mType:[0m      builtin_function_or_method

In [76]:
? greeting.title

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mtitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining
cased characters have lower case.
[1;31mType:[0m      builtin_function_or_method

All of these case identifiers only require instance data and return a new string:

In [77]:
'hEllo wOrld'.lower()

'hello world'

In [78]:
'hEllo wOrld'.casefold()

'hello world'

In [79]:
'hEllo wOrld'.upper()

'HELLO WORLD'

In [80]:
'hEllo wOrld'.swapcase()

'HeLLO WoRLD'

In [81]:
'hEllo wOrld'.title()

'Hello World'

casefold is similar to lower but has more support for non-English characters, as seen with the additional German characters and the Greek characters where some of the lower case characters have variants:

In [82]:
'ÄäÜüÖöẞß'.lower()

'ääüüöößß'

In [83]:
'ÄäÜüÖöẞß'.casefold()

'ääüüöössss'

In [84]:
'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'.lower()

'ααββγγδδεεζζηηθθιικκλλμμννξξοοππρρσσςττυυφφχχψψωω'

In [85]:
'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'.casefold()

'ααββγγδδεεζζηηθθιικκλλμμννξξοοππρρσσσττυυφφχχψψωω'

## Boolean Identifiers

A number of identifiers are used to examine a specific property of a string and return a boolean of True if it has that property and False otherwise: 

In [86]:
? greeting.isupper

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misupper[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and
there is at least one cased character in the string.
[1;31mType:[0m      builtin_function_or_method

In [87]:
? greeting.islower

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mislower[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and
there is at least one cased character in the string.
[1;31mType:[0m      builtin_function_or_method

In [88]:
? greeting.istitle

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mistitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only
follow uncased characters and lowercase characters only cased ones.
[1;31mType:[0m      builtin_function_or_method

For example:

In [89]:
'HELLO'.isupper()


True

In [90]:
'Hello'.isupper()


False

In [91]:
'hello'.islower()

True

In [92]:
'Hello'.islower()

False

In [93]:
'Hello'.istitle()

True

## Valid Identifier Names

The method isidentifier will check to see if a possible string is valid for an identifier name. This can be useful to check before assignment of an instance to an object name:

In [94]:
? greeting.isidentifier

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misidentifier[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
such as "def" or "class".
[1;31mType:[0m      builtin_function_or_method

Lower case letters are acceptable:

In [95]:
'hello'.isidentifier()

True

In [96]:
# hello = 'hello'

A space is not acceptable:

In [97]:
'hello world'.isidentifier()

False

In [98]:
# hello world = 'hello'

An underscore is acceptable and instance names are generally named using snake_case:

In [99]:
'hello_world'.isidentifier()

True

In [100]:
# hello_world = 'hello'

Numbers are acceptable:

In [101]:
'hello_world2'.isidentifier()

True

However an identifier cannot begin with a number:

In [102]:
'2hello_world'.isidentifier()

False

Other special characters are not acceptable:

In [103]:
'hello-world2'.isidentifier()

False

In [104]:
# hello-world

Upper case is acceptable but generally PascalCase is reserved for a class name:

In [105]:
'PascalCase'.isidentifier()

True

All capitals is also acceptable but this is generally reserved for a constant:

In [106]:
'ALL_CAPS'.isidentifier()

True

An instance name shouldn't match any of the identifiers in builtins otherwise it will override the builtin (until the kernel is restarted) which will lead to confusion when the builtin is attempted to be used. There are some identifiers which are reserved, these can be seen by importing the keyword module:

In [107]:
import keyword

In [108]:
# keyword.

In [109]:
for kw in keyword.kwlist:
    print(kw, end=' ')

False None True and as assert async await break class continue def del elif else except finally for from global if import in is lambda nonlocal not or pass raise return try while with yield 

In [110]:
for skw in keyword.softkwlist:
    print(skw, end=' ')

_ case match 

If a keyword is reassigned an error message will display:

In [111]:
# with = 'hello'

    with = 'hello'

         ^

<span style="color:red">SyntaxError</span>: invalid syntax

In [112]:
str = 'hello'

In [113]:
str

'hello'

In [114]:
from builtins import str

In [115]:
str()

''

In [116]:
# import module

In [117]:
# module.str

In [118]:
# builtins.str

## The American Standard Code for Information Interchange (ASCII)

Early computers were based on a typewritter that essentially prints English characters onto a sheet of paper. In order to achieve such a task a number of non-printable commands such as the carriage return (moving the carriage back to the left) and the form feed (moving the peace of paper up by the width of a line) are required as well as the printable characters such as the English letters, numbers, and whitespace.

<img src='./images/img_011.png' alt='img_011' width='800'/>

Each command has to be mapped physically into the computers memory. Fundamentally the computer can only store data in the form of a bit which is essentially a digital switch.

A single switch has the possible values 0, 1 which is 2 ** 1 = 2 combinations.

More typically 8 of these switches are combined into a single unit which makes up a byte and gives 2 ** 8 = 256 combinations.

Note the combination 0 is included so 0:256 is inclusive of the lower bount 0 and exclusive of the upper bound 256.

<img src='./images/img_012.png' alt='img_012' width='400'/>

One of the most popular set of commands was developed in the United States and is known as ASCII. The first 33 combinations correspond to non-printable characters such as the carriage return and form feed as previously discussed in addition to a number of additional hardware related commands. 

Each bit can be 0 or 1 and the byte sequence corresponds to the physical position of the 8 switches. As binary is not human readible the hexadecimal system is used which has 16 characters 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f. 2 ** 4 is 16 combinations and therefore each half of the byte is represented by its own hexadecimal character. These numbering systems are shown alongside the number in decimal.


|byte|hex|num|command|
|---|---|---|---|
|00000000|00|000|null|
|00000001|01|001|start of heading|
|00000010|02|002|start of text|
|00000011|03|003|end of text|
|00000100|04|004|end of transmission|
|00000101|05|005|enquiry|
|00000110|06|006|acknowledge|
|00000111|07|007|bell|
|00001000|08|008|**backspace**|
|00001001|09|009|**horizontal tab**|
|00001010|0a|010|**new line**|
|00001011|0b|011|**vertical tab**|
|00001100|0c|012|**form feed**|
|00001101|0d|013|**carriage return**|
|00001110|0e|014|shift out|
|00001111|0f|015|shift in|
|00010000|10|016|data link escape|
|00010001|11|017|device control 1|
|00010010|12|018|device control 2|
|00010011|13|019|device control 3|
|00010100|14|020|device control 4|
|00010101|15|021|negative acknowledge|
|00010110|16|022|synchronous idle|
|00010111|17|023|end of transmission block|
|00011000|18|024|cancel|
|00011001|19|025|end of medium|
|00011010|1a|026|substitute|
|00011011|1b|027|**escape**|
|00011100|1c|028|file separator|
|00011101|1d|029|group separator|
|00011110|1e|030|record separator|
|00011111|1f|031|unit seperator|
|00100000|20|032|**space**|

The remaining commands spanning up to half a byte contained the characters most commonly used in the English language.

|byte|hex|num|character|
|---|---|---|---|
|00100001|21|033|!|
|00100010|22|034|"|
|00100011|23|035|#|
|00100100|24|036|$|
|00100101|25|037|%|
|00100110|26|038|&|
|00100111|27|039|'|
|00101000|28|040|(|
|00101001|29|041|)|
|00101010|2a|042|*|
|00101011|2b|043|+|
|00101100|2c|044|,|
|00101101|2d|045|-|
|00101110|2e|046|.|
|00101111|2f|047|/|
|00110000|30|048|0|
|00110001|31|049|1|
|00110010|32|050|2|
|00110011|33|051|3|
|00110100|34|052|4|
|00110101|35|053|5|
|00110110|36|054|6|
|00110111|37|055|7|
|00111000|38|056|8|
|00111001|39|057|9|
|00111010|3a|058|:|
|00111011|3b|059|;|
|00111100|3c|060|<|
|00111101|3d|061|=|
|00111110|3e|062|>|
|00111111|3f|063|?|
|01000000|40|064|@|
|01000001|41|065|A|
|01000010|42|066|B|
|01000011|43|067|C|
|01000100|44|068|D|
|01000101|45|069|E|
|01000110|46|070|F|
|01000111|47|071|G|
|01001000|48|072|H|
|01001001|49|073|I|
|01001010|4a|074|J|
|01001011|4b|075|K|
|01001100|4c|076|L|
|01001101|4d|077|M|
|01001110|4e|078|N|
|01001111|4f|079|O|
|01010000|50|080|P|
|01010001|51|081|Q|
|01010010|52|082|R|
|01010011|53|083|S|
|01010100|54|084|T|
|01010101|55|085|U|
|01010110|56|086|V|
|01010111|57|087|W|
|01011000|58|088|X|
|01011001|59|089|Y|
|01011010|5a|090|Z|
|01011011|5b|091|[|
|01011100|5c|092|\|
|01011101|5d|093|]|
|01011110|5e|094|^|
|01011111|5f|095|_|
|01100000|60|096|`|
|01100001|61|097|a|
|01100010|62|098|b|
|01100011|63|099|c|
|01100100|64|100|d|
|01100101|65|101|e|
|01100110|66|102|f|
|01100111|67|103|g|
|01101000|68|104|h|
|01101001|69|105|i|
|01101010|6a|106|j|
|01101011|6b|107|k|
|01101100|6c|108|l|
|01101101|6d|109|m|
|01101110|6e|110|n|
|01101111|6f|111|o|
|01110000|70|112|p|
|01110001|71|113|q|
|01110010|72|114|r|
|01110011|73|115|s|
|01110100|74|116|t|
|01110101|75|117|u|
|01110110|76|118|v|
|01110111|77|119|w|
|01111000|78|120|x|
|01111001|79|121|y|
|01111010|7a|122|z|
|01111011|7b|123|{|
|01111100|7c|124|\||
|01111101|7d|125|}|
|01111110|7e|126|~|
|01111111|7f|127|DEL|


As each character maps to a numeric bytes sequence it is ordinal. The builtins ordinal function ord will return the numeric value of the number in decimal:

In [119]:
? ord

[1;31mSignature:[0m  [0mord[0m[1;33m([0m[0mc[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the Unicode code point for a one-character string.
[1;31mType:[0m      builtin_function_or_method

For example the ordinal value of the string '3' can be checked:

In [120]:
ord('3')

51

Notice the difference between the string of the number '3' and the number 51. This number can be converted into a binary string or hex string using the builtins bin and hex functions respectively:

In [121]:
? bin

[1;31mSignature:[0m  [0mbin[0m[1;33m([0m[0mnumber[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the binary representation of an integer.

>>> bin(2796202)
'0b1010101010101010101010'
[1;31mType:[0m      builtin_function_or_method

In [122]:
? hex

[1;31mSignature:[0m  [0mhex[0m[1;33m([0m[0mnumber[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the hexadecimal representation of an integer.

>>> hex(12648430)
'0xc0ffee'
[1;31mType:[0m      builtin_function_or_method

For example:

In [123]:
bin(ord('3'))

'0b110011'

This can be conceptualised as the following with the trailing zeros:

In [124]:
'0b' + bin(ord('3')).lstrip('0b').zfill(8)

'0b00110011'

Note the prefix 0b indicates a binary number and does not display the two leading zeros:

In [125]:
hex(ord('3'))

'0x33'

Note the prefix 0x indicates a hexadecimal number:

In [126]:
bin(16)

'0b10000'

The string module contains a number of useful strings which group characters. It can be imported using:

In [127]:
import string

The identifiers can be viewed by inputting:

In [128]:
# string.

These are mainly string instances.

ascii_letters gives all English letters:

In [129]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

This can be split into lowercase and uppercase using ascii_lowercase and ascii_uppercase respectively: 

In [130]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [131]:
string.ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

digits gives the 10 digits used in the decimal system:

In [132]:
string.digits

'0123456789'

hexdigits gives the 16 characters that can be used for hexadecimal. Note a and A are an alias of one another:

In [133]:
string.hexdigits

'0123456789abcdefABCDEF'

printable gives the printable characters:

In [134]:
string.printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

punctuation gives the punctuation:

In [135]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The whitespace gives the whitespace characters:

In [136]:
string.whitespace

' \t\n\r\x0b\x0c'

With the exception to the space, these are shown using escape sequences which will be explored in a moment. Now that ASCII and other groupings seen within the string module are understood the additional boolean identifiers can be examined. 

These boolean identifiers all act upon instance data and return a bool, their docstring is as shows:

In [137]:
? greeting.isprintable

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misprintable[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in
repr() or if it is empty.
[1;31mType:[0m      builtin_function_or_method

In [138]:
? greeting.isascii

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misascii[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.
Empty string is ASCII too.
[1;31mType:[0m      builtin_function_or_method

In [139]:
? greeting.isalnum

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misalnum[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and
there is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [140]:
? greeting.isalpha

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misalpha[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [141]:
? greeting.isspace

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misspace[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [142]:
? greeting.isdecimal

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misdecimal[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and
there is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [143]:
? greeting.isdigit

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misdigit[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [144]:
? greeting.isnumeric

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misnumeric[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at
least one character in the string.
[1;31mType:[0m      builtin_function_or_method

For example:

In [145]:
'hello Γειά σου 123'.isprintable()

True

In [146]:
'hello Γειά σου 123'.isascii()

False

In [147]:
'hello 123 !'.isascii()

True

In [148]:
'hello 123 !'.isalnum()

False

In [149]:
'hello123'.isalnum()

True

In [150]:
'hello123'.isalpha()

False

In [151]:
'hello'.isalpha()

True

In [152]:
'hello'.isspace()

False

The boolean numeric string identifiers have subtle differences. These can be seen by examining the response of the identifier for the following number groupings:

In [153]:
numeric_groups = {'ascii': '0123456789', 
                  'font1': '𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿', 
                  'font2': '𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵', 
                  'font3': '𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡', 
                  'subscript': '₀₁₂₃₄₅₆₇₈₉',
                  'superscript': '⁰¹²³⁴⁵⁶⁷⁸⁹',
                  'circled1': '➀➁➂➃➄➅➆➇➈',
                  'circled2': '➉',
                  'fractions': '½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉', 
                  'asciihex': '0123456789abcdef', }

In [154]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isdecimal())
    

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ False
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ False
circled1 ➀➁➂➃➄➅➆➇➈ False
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False
asciihex 0123456789abcdef False


In [155]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isdigit())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False
asciihex 0123456789abcdef False


In [156]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isnumeric())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ True
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ True
asciihex 0123456789abcdef False


In [157]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isalnum())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ True
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ True
asciihex 0123456789abcdef True


The boolean identifiers are often used for checks. An example of such a check was demonstrated when grouping identifiers from builtins when using a for loop.

## Escape Characters

The \\ is a special symbol used to insert an escape character. The most commonly used escape characters have the form:

In [158]:
print('|  |') # no escape character

|  |


In [159]:
print('| \t |') # tab

| 	 |


In [160]:
print('| \n |') # new line

| 
 |


In [161]:
print('| \\ |') # leftslash itself

| \ |


In [162]:
print('| \' |') # single quotation

| ' |


In [163]:
print('| \" |') # double quotation

| " |


An ASCII character or character spanning over the range of a single byte can be inserted using an escape character 2 hexadecimal digits:

In [164]:
hex(ord('!'))

'0x21'

In [165]:
'\x21' # a byte (2 hexadecimal digits)

'!'

In [166]:
print('| \x09 |') # tab

| 	 |


In [167]:
# '\x09'

In [168]:
# '\x9'

<span style="color:red">SyntaxError:</span>: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-2: truncated \xXX escape</span>

The most commonly used Unicode characters span over 2 bytes and can therefore be inserted using 4 hexadecimal digits:

In [169]:
hex(ord('α'))

'0x3b1'

In [170]:
'\u03b1' # 2 bytes (4 hexadecimal digits)

'α'

In [171]:
# '\u3b1

<span style="color:red">SyntaxError<\span>: unterminated string literal (detected at line 1)

Note that the hex form is normally used for a byte that is not printable. If the 6 whitespace characters are examined in more detail:

In [172]:
string.whitespace

' \t\n\r\x0b\x0c'

|name||byte|
|---|---|---|
|space|' '|'\\x20'|
|tab|'\\t'|'\\x09'|
|new line|'\\n'|'\\x0a'|
|carriage return|'\\r'|'\\x0d'|
|vertical tab||'\\x0b'|
|form feed||'\\x0c'|

In [173]:
' ' == '\x20'

True

In [174]:
'\t' == '\x09'

True

In [175]:
'\n' == '\x0a'

True

In [176]:
'\r' == '\x0d'

True

It is not common to do so, however each character in a string can also be inserted as an escape character:

In [177]:
'\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21'

'hello world!'

In [178]:
import unicodedata

In [179]:
# unicodedata.

In [180]:
unicodedata.unidata_version

'14.0.0'

In [181]:
'\U0000303a'

'〺'

A translation table can be created for use with the instance method maketrans:

In [182]:
? greeting.translate

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mtranslate[0m[1;33m([0m[0mtable[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Replace each character in the string using the given translation table.

  table
    Translation table, which must be a mapping of Unicode ordinals to
    Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a
dictionary or list.  If this operation raises LookupError, the character is
left untouched.  Characters mapped to None are deleted.
[1;31mType:[0m      builtin_function_or_method

In [183]:
? str.maketrans

[1;31mDocstring:[0m
Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode
ordinals (integers) or characters to Unicode ordinals, strings or None.
Character keys will be then converted to ordinals.
If there are two arguments, they must be strings of equal length, and
in the resulting dictionary, each character in x will be mapped to the
character at the same position in y. If there is a third argument, it
must be a string, whose characters will be mapped to None in the result.
[1;31mType:[0m      builtin_function_or_method

maketrans is a static method which is essentially a fucntion thats neither bound to the instance or the class. This function merely exists in the namespace of the class (conceptualise the class as a Python module) as this is the most logical place to find it:

In [184]:
greektolatin = str.maketrans('αβγδε', 'abcde')
greektolatin

{945: 97, 946: 98, 947: 99, 948: 100, 949: 101}

In [185]:
hex(945)

'0x3b1'

In [186]:
hex(97)

'0x61'

This translation table can be used on the example string to replace the Greek letters (keys) with the latin letters (values):

In [187]:
'αββγγγδδδδεεεεε'.translate(greektolatin)

'abbcccddddeeeee'

## Formal and Informal Strings

The \\ is a special character used to insert an escape character. To insert an \\ itself, two have to be used the former being the \\ specifying to insert an escape character and the second \\ specifying that the escape character is \\. The \\ is used for a file path within Windows:

In [188]:
file_path = 'C:\\Users\\Philip'

Compare the difference to the cell output and the output in a cell from a print statement:

In [189]:
file_path

'C:\\Users\\Philip'

In [190]:
print(file_path)

C:\Users\Philip


Notice that the former uses the formal representation and the output displays the file path that matches what needs to be input to instantiate the string. The latter displays the informal representation and displays the file path when the escape characters are inserted. Under the hood these are controlled by the data model identifiers \_\_repr\_\_ and \_\_str\_\_ respectively:

In [191]:
? file_path.__repr__

[1;31mSignature:[0m       [0mfile_path[0m[1;33m.[0m[0m__repr__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mfile_path[0m[1;33m.[0m[0m__repr__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__repr__' of str object at 0x000001DABBF67320>
[1;31mDocstring:[0m      Return repr(self).

In [192]:
? file_path.__str__

[1;31mSignature:[0m       [0mfile_path[0m[1;33m.[0m[0m__str__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mfile_path[0m[1;33m.[0m[0m__str__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__str__' of str object at 0x000001DABBF67320>
[1;31mDocstring:[0m      Return str(self).

These data model identifiers are based on the object design pattern and from the docstring it is more common to use the builtins function repr and class str respectively:

In [193]:
? repr

[1;31mSignature:[0m  [0mrepr[0m[1;33m([0m[0mobj[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the canonical string representation of the object.

For many object types, including most builtins, eval(repr(obj)) == obj.
[1;31mType:[0m      builtin_function_or_method

In [194]:
? str

[1;31mInit signature:[0m  [0mstr[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

The str class will cast an existing object into a str. When the object is already a str it will be unchanged:

In [195]:
file_path

'C:\\Users\\Philip'

In [196]:
str(file_path)

'C:\\Users\\Philip'

The repr function on the other hand will return a str instance that contains the quotations that enclose the string and shows insertion of the escape characters:

In [197]:
repr(file_path)

"'C:\\\\Users\\\\Philip'"

Notice that when this is printed:

In [198]:
print(repr(file_path))

'C:\\Users\\Philip'


The cell output matches the formal form:

In [199]:
file_path

'C:\\Users\\Philip'

And the escape characters are processed when the string is printed:

In [200]:
print(str(file_path))

C:\Users\Philip


All the other builtins classes in Python have a string representation. When the formal representations do not include additional quotations or escape characters the formal and informal representations often return identical strings for example:

In [201]:
4

4

In [202]:
repr(4)

'4'

In [203]:
str(4)

'4'

## Raw Strings

In Windows the file path is of the form:

In [204]:
# file_path = 'C:\Users\Philip'

<span style="color:red">SyntaxError:</span> (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

For the file path to be recognised as a Python string each \\ has to be converted into a \\\\ as \\ is used to insert an escape character and is an escape character itself:

In [205]:
file_path = 'C:\\Users\\Philip'

For convenience there is also a raw string which uses the prefix r. In a raw string the \\ is not used to insert an escape character and instead the \ is recognised as being part of the string:

In [206]:
raw_file_path = r'C:\Users\Philip'

In [207]:
raw_file_path

'C:\\Users\\Philip'

## The print function

The print function has previously been used however now that escape character are understood its docstirng can be examined:

In [208]:
? print

[1;31mSignature:[0m  [0mprint[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [0msep[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [0mend[0m[1;33m=[0m[1;34m'\n'[0m[1;33m,[0m [0mfile[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mflush[0m[1;33m=[0m[1;32mFalse[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
[1;31mType:[0m      builtin_function_or_method

The *args indicates that a variable number of positional input arguments are used. sep and end are named input arguments which have a default value of a space and a new line respectively:

file and flush are for advanced purposes when the print stream is to be directed for example to a file instead of a cell output.

In [209]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

the brown fox jumps over the lazy dog


In [210]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', sep='')

thebrownfoxjumpsoverthelazydog


In [211]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

the brown fox jumps over the lazy dog
the brown fox jumps over the lazy dog


In [212]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', end='')
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

the brown fox jumps over the lazy dogthe brown fox jumps over the lazy dog


## Formatted Strings

Supposing the string body has the form:

In [213]:
body = 'The string to 0 is 1 2!'

And there are three variables:

In [214]:
var0 = 'print'
var1 = 'hello'
var2 = 'world'

The objective of a formatted string is to insert these variables into the string (in this case without additional format specifications) so the following string is returned:

In [215]:
'The string to print is hello world!'

'The string to print is hello world!'

If the docstring of the string method format is examined:

In [216]:
? body.format

[1;31mDocstring:[0m
S.format(*args, **kwargs) -> str

Return a formatted version of S, using substitutions from args and kwargs.
The substitutions are identified by braces ('{' and '}').
[1;31mType:[0m      builtin_function_or_method

Then it can be seen that substitutions are identifier by braces so body should be:

In [217]:
body = 'The string to {0} is {1} {2}!'

Notice the changes in the syntax highlighting for the placeholders. *args represents a variable number of positional input arguments. To insert variables into strings the number of positional input arguments should match the number of placeholders in the string body:

In [218]:
body.format(var0, var1, var2)

'The string to print is hello world!'

The string can alternatively be setup to contain named variables:

In [219]:
body = 'The string to {var0_} is {var1_} {var2_}!'

**kwargs represents a variable number of named keyword input arguments which should match the named keyword input arguments in the string body:

In [220]:
body.format(var0_=var0, var1_=var1, var2_=var2)

'The string to print is hello world!'

If the two lines are combined into one:

In [221]:
'The string to {var0_} is {var1_} {var2_}!'.format(var0_=var0, var1_=var1, var2_=var2)

'The string to print is hello world!'

More commonly the placeholders are given the same name as the variables to be inserted:

In [222]:
'The string to {var0} is {var1} {var2}!'.format(var0=var0, var1=var1, var2=var2)

'The string to print is hello world!'

Notice how each instance name is used 3 times which is pretty cumbersome. A shorthand way of writing the expression above is to use the prefix f:

In [223]:
f'The string to {var0} is {var1} {var2}!'

'The string to print is hello world!'

Classes from builtins have the \_\_format\_\_ data model identifier:

In [224]:
? object.__format__

[1;31mSignature:[0m  [0mobject[0m[1;33m.[0m[0m__format__[0m[1;33m([0m[0mself[0m[1;33m,[0m [0mformat_spec[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Default object formatter.
[1;31mType:[0m      method_descriptor

When this data model method is configured in a class it gives the ability to use format specification options in a formatted string. This can be demonstrated with other str instances and numeric (int and float) instances.

The format specifier for a string (s) can be used:

In [225]:
f'The string to {var0:s} is {var1} {var2}!'

'The string to print is hello world!'

In [226]:
f'The string to {var0:10s} is {var1} {var2}!'

'The string to print      is hello world!'

In [227]:
f'The string to {var0:010s} is {var1:s} {var2:s}!'

'The string to print00000 is hello world!'

If prefixed with 0 then trailing spaces will be displayed using 0.

Numbers are commonly inserted:

In [228]:
num1 = 1
num2 = 0.0000123456789
num3 = 12.3456789

In [229]:
f'The numbers are {num1}, {num2} and {num3}.' 

'The numbers are 1, 1.23456789e-05 and 12.3456789.'

The format specifier for an integer decimal (d) can be used:

In [230]:
f'The numbers are {num1:d}, {num2} and {num3}.' 

'The numbers are 1, 1.23456789e-05 and 12.3456789.'

In [231]:
f'The numbers are {num1:5d}, {num2} and {num3}.' 

'The numbers are     1, 1.23456789e-05 and 12.3456789.'

In [232]:
f'The numbers are {num1:05d}, {num2} and {num3}.' 

'The numbers are 00001, 1.23456789e-05 and 12.3456789.'

In [233]:
f'The numbers are {num1: 05d}, {num2} and {num3}.' 

'The numbers are  0001, 1.23456789e-05 and 12.3456789.'

Again the number of characters in the string the number should occupy can be specified. Unlike the string these are leading opposed to trailing. if prefixed with a 0, then these will be shown as 0. Note if a space is added, one of these will be the space. The space changes the behaviour of the format specifier.

In [234]:
f'The numbers are {num1}, {num2:g} and {num3:g}.' 

'The numbers are 1, 1.23457e-05 and 12.3457.'

In [235]:
f'The numbers are {num1}, {num2:e} and {num3:e}.' 

'The numbers are 1, 1.234568e-05 and 1.234568e+01.'

In [236]:
f'The numbers are {num1}, {num2:0.3e} and {num3:0.3e}.' 

'The numbers are 1, 1.235e-05 and 1.235e+01.'

In [237]:
f'The numbers are {num1}, {num2:f} and {num3:f}.' 

'The numbers are 1, 0.000012 and 12.345679.'

In [238]:
f'The numbers are {num1}, {num2:0.3f} and {num3:0.3f}.' 

'The numbers are 1, 0.000 and 12.346.'

floating point numbers can use the general (g), exponental (e) and fixed (f) format specifiers. The prefix 0.3 specifies rounding to 3 digits past the decimal point.


The keys of the mapping such as a dictionary can be included in the placeholder, alongside an optional format specifier:

In [239]:
numbers = {'num1': 1, 'num2': 0.0000123456789, 'num3': 12.3456789}

In [240]:
body = 'The numbers are {num1:d}, {num2:.3e} and {num3:.3e}.'

The colon is used in both the dictionary and format specifier. Spacing to the right of the colon is often present in a dictionary and does not change the value. Spacing does change the format specifier and should only be added if a space is to be included in the format specification.

The format map can be used to insert values into a body provided the placeholders match the keys in the dictionary:

In [241]:
? body.format_map

[1;31mDocstring:[0m
S.format_map(mapping) -> str

Return a formatted version of S, using substitutions from mapping.
The substitutions are identified by braces ('{' and '}').
[1;31mType:[0m      builtin_function_or_method

In [242]:
body.format_map(numbers)

'The numbers are 1, 1.235e-05 and 1.235e+01.'

The older style of formatted strings uses the data model identifier \_\_mod\_\_ which controls the behaviour of the operator % and in the case of older style string formatting also uses the % as a placeholder opposed to the braces {}:

In [243]:
body = 'The numbers are %d, %0.3f and %0.3g.' 
nums = (1, 0.0000123456789, 12.3456789)

In [244]:
? body.__mod__

[1;31mSignature:[0m       [0mbody[0m[1;33m.[0m[0m__mod__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mbody[0m[1;33m.[0m[0m__mod__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__mod__' of str object at 0x000001DABBEEF870>
[1;31mDocstring:[0m      Return self%value.

In [245]:
body % nums

'The numbers are 1, 0.000 and 12.3.'

## Single VS Double Quotations

In Python a string can be enclosed in single or double quotations:

In [246]:
"hello world!"

'hello world!'

Notice that the string returned in the cell output is automatically enclosed in single quotations. This is because the Python interpreter has a **preference for single quotations**. The can also be seen when the builtins character function chr is used:

In [247]:
chr(97)

'a'

The main purpose of double quotations is to enclose string literals within a string. For example:

In [248]:
"var = 'hello world!'"

"var = 'hello world!'"

In [249]:
'var = \'hello world!\''

"var = 'hello world!'"

Note some programming languages use single quotations for a character and double quotations for a string of characters; in Python a character is a single element string however some programmers who work with multiple languages adopt these habits when using Python.

## Multiline Strings

A string can be displayed over multiple lines using triple quotations:

In [250]:
multiline = '''the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog'''

In [251]:
multiline

'the quick brown fox jumps over the lazy dog\nthe quick brown fox jumps over the lazy dog\nthe quick brown fox jumps over the lazy dog\nthe quick brown fox jumps over the lazy dog'

In [252]:
print(multiline)

the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog
the quick brown fox jumps over the lazy dog


Note that any spacing added will be incorporated into the multiline string:

In [253]:
multiline = '''
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            '''

In [254]:
multiline

'\n            the quick brown fox jumps over the lazy dog\n            the quick brown fox jumps over the lazy dog\n            the quick brown fox jumps over the lazy dog\n            the quick brown fox jumps over the lazy dog\n            '

In [255]:
print(multiline)


            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            the quick brown fox jumps over the lazy dog
            


Multiline strings are most commonly used for docstrings. Because docstrings are likely to include string literals, triple double quotations are normally preferenced:

In [256]:
? print

[1;31mSignature:[0m  [0mprint[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [0msep[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [0mend[0m[1;33m=[0m[1;34m'\n'[0m[1;33m,[0m [0mfile[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mflush[0m[1;33m=[0m[1;32mFalse[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
[1;31mType:[0m      builtin_function_or_method

In [257]:
doc = """Prints the values

sep
  string inserted between values, default a space ' '.
end
  string appended after the last value, default a newline '\\n'."""

In [258]:
print(doc)

Prints the values

sep
  string inserted between values, default a space ' '.
end
  string appended after the last value, default a newline '\n'.


## Data Model Attributes

There are a number of data model identifiers that are attributes. The attribute \_\_class\_\_ returns the class:

In [259]:
greeting.__class__

str

This is normally determined using the builtins class type:

In [260]:
? type

[1;31mInit signature:[0m  [0mtype[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
type(object) -> the object's type
type(name, bases, dict, **kwds) -> a new type
[1;31mType:[0m           type
[1;31mSubclasses:[0m     ABCMeta, EnumType, _AnyMeta, NamedTupleMeta, _TypedDictMeta, _DeprecatedType, _ABC, MetaHasDescriptors, PyCStructType, UnionType, ...

In [261]:
type(greeting)

str

The \_\_doc\_\_ returns the docstring as a string instance:

In [262]:
greeting.__doc__

"str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."

This is normally looked up along some other information using the ? operator:

In [263]:
? greeting

[1;31mType:[0m        str
[1;31mString form:[0m hello
[1;31mLength:[0m      5
[1;31mDocstring:[0m  
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.

The \_\_sizeof\_\_ data model identifier is used to return the size of a string in bytes:

In [264]:
? greeting.__sizeof__

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0m__sizeof__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the size of the string in memory, in bytes.
[1;31mType:[0m      builtin_function_or_method

In [265]:
greeting.__sizeof__()

54

To get the size is more common to use the getsizeof function from the system module sys which under the hood uses this data model method:

In [266]:
import sys

In [267]:
? sys.getsizeof

[1;31mDocstring:[0m
getsizeof(object [, default]) -> int

Return the size of object in bytes.
[1;31mType:[0m      builtin_function_or_method

In [268]:
sys.getsizeof(greeting)

54

## Immutable Ordered Collection ABC Design Pattern

Earlier it was seen that the str was a subclass of the object and therefore followed the design pattern of an object. The design pattern of the str class actually has a number of abstract base classes. 

An abstract base class is a conceptual class, that isn't instantiated directly but used as a design pattern for numerous Python classes so their behaviour is consistent.

The str has a Container abstract base class, which means it has the data model method \_\_contains\_\_:

In [269]:
? greeting.__contains__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__contains__[0m[1;33m([0m[0mkey[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__contains__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__contains__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return key in self.

Instead of using the data model identifier, the keyword in is typically used:

In [270]:
'hello'.__contains__('h')

True

In [271]:
'h' in 'hello'

True

In [272]:
'll' in 'hello'

True

In [273]:
'H' in 'hello'

False

In [274]:
'H'.casefold() in 'hello'.casefold()

True

The string is also Hashable which means it has the data model method \_\_hash\_\_ which maps to builtins hash and returns the integer has value:

In [275]:
? greeting.__hash__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__hash__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__hash__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__hash__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return hash(self).

In [276]:
? hash

[1;31mSignature:[0m  [0mhash[0m[1;33m([0m[0mobj[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the hash value for the given object.

Two objects that compare equal must also have the same hash value, but the
reverse is not necessarily true.
[1;31mType:[0m      builtin_function_or_method

The hash value corresponds to an integer. This integer will be the same for strings that are equal in value:

In [277]:
greeting = 'hello'

In [278]:
hash(greeting)

8283226730398163945

In [279]:
hash('hello')

8283226730398163945

And differ otherwise:

In [280]:
hash('Hello')

-7813639786837616734

A Python object has to be immutable to be hashable and hashable values can be used in keys in mappings such as dictionaries:

In [281]:
colors = {'red': '#ff0000', 
          'green': '#00b050',
          'blue': '#0070c0'}

In [282]:
hash('red')

7942303018215054294

In [283]:
colors['red']

'#ff0000'

## Iteration and Indexing

The str is Sized and therefore has the data model identifier \_\_len\_\_ which controls the behaviour of the builtins len function and returns the number of characters in a string:

In [284]:
? greeting.__len__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__len__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__len__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__len__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return len(self).

In [285]:
? len

[1;31mSignature:[0m  [0mlen[0m[1;33m([0m[0mobj[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the number of items in a container.
[1;31mType:[0m      builtin_function_or_method

The string 'hello' has 5 characters as expected:

In [286]:
len('hello')

5

The str is an Iterable which has the data model method \_\_iter\_\_, that controls the behaviour of the builtins iter:

In [287]:
? greeting.__iter__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__iter__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__iter__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__iter__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Implement iter(self).

In [288]:
? iter

[1;31mDocstring:[0m
iter(iterable) -> iterator
iter(callable, sentinel) -> iterator

Get an iterator from an object.  In the first form, the argument must
supply its own iterator, or be a sequence.
In the second form, the callable is called until it returns the sentinel.
[1;31mType:[0m      builtin_function_or_method

This is used to construct an iterator from the str. An iterator is a data type that can only contain a single value at a time:

In [289]:
greeting

'hello'

In [290]:
forward = iter(greeting)

In [291]:
forward

<str_ascii_iterator at 0x1dabbea2e00>

Note an iterator displays only a single value and has no len:

In [292]:
# len(forward)

<span style="color:red">TypeError:</span> object of type 'str_ascii_iterator' has no len()

This is because the iterator instance forward has no data model \_\_len\_\_ and therefore the len function has no instruction to follow:

In [293]:
print(dir(forward), sep='')

['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']


The iterator has the data model \_\_next\_\_ which maps to the builtins next function:

In [294]:
? forward.__next__

[1;31mSignature:[0m       [0mforward[0m[1;33m.[0m[0m__next__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mforward[0m[1;33m.[0m[0m__next__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__next__' of str_ascii_iterator object at 0x000001DABBEA2E00>
[1;31mDocstring:[0m      Implement next(self).

In [295]:
? next

[1;31mDocstring:[0m
next(iterator[, default])

Return the next item from the iterator. If default is given and the iterator
is exhausted, it is returned instead of raising StopIteration.
[1;31mType:[0m      builtin_function_or_method

Next can be run on the iterator to display each individual character in the original string, the iterator was constructed from:

In [296]:
next(forward)

'h'

In [297]:
next(forward)

'e'

In [298]:
next(forward)

'l'

In [299]:
next(forward)

'l'

In [300]:
next(forward)

'o'

Using next on an iterator that has been exhausted gives a StopIteration error:

In [301]:
# next(forward)

<span style="color:red">StopIteration</span>: 

The iterator above was an ASCII iterator as each character was ASCII. A slightly different str iterator is created when the iterator contains non-ASCII characters:

In [302]:
greek_greeting = 'Γειά σου Κόσμε!'

In [303]:
forward = iter(greek_greeting)

In [304]:
forward

<str_iterator at 0x1dabbea3910>

A Unicode character is returned when next is used:

In [305]:
next(forward)

'Γ'

In [306]:
next(forward)

'ε'

In [307]:
next(forward)

'ι'

And so on... 

When a for loop is used with a str, under the hood an iterator is created and consumed by the for loop. A simple for loop can be made which prints each letter twice:

In [308]:
word = 'hello'

In [309]:
for letter in word:
    print(letter, sep='', end='')
    print(letter, sep='', end='')

hheelllloo

The string can be enumerated to give a tuple of the numeric index and the corresponding letter from a for loop:

In [310]:
for (index, letter) in enumerate(word):
    print(index, letter)

0 h
1 e
2 l
3 l
4 o


A string has an abstract base class Collection. The Collection abstract base class has all the properties from the simpler abstract base classes Sized, Iterable and Container seen. 

A string also has an abstract base class Sequence which means it contains the data model identifiers \_\_getitem\_\_, \_\_len\_\_, \_\_contains\_\_, \_\_iter\_\_ and \_\_reversed\_\_ alongside the identifiers index and count.

The data model identifier \_\_getitem\_\_ defines the behaviour when indexing into an Collection using square brackets [ ]:

In [311]:
? greeting.__getitem__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__getitem__[0m[1;33m([0m[0mkey[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__getitem__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__getitem__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return self[key].

Note that Python uses zero-order indexing so the value at index 1 is the second value:

In [312]:
'hello'[1]

'e'

The first value is at index 0:

In [313]:
'hello'[0]

'h'

The last value has an index of the length of the string minus 1:

In [314]:
'hello'[len('hello')-1]

'o'

It is also possible to get a negative index:

In [315]:
for index, letter in enumerate(word):
    print(index-len(greeting), letter)

-5 h
-4 e
-3 l
-2 l
-1 o


The last value for example is the value 1 before 0 which is -1:

In [316]:
'hello'[-1]

'o'

This becomes more clear when the negative and positive indexes are examined together:

In [317]:
for index, letter in enumerate(word):
    print(index-len(greeting), letter)
for index, letter in enumerate(word):
    print(index, letter)

-5 h
-4 e
-3 l
-2 l
-1 o
0 h
1 e
2 l
3 l
4 o


Conceptualise the word being printed on a label that is rolled up.

The builtins slice function can be used to create a substring by slicing:

In [318]:
? slice

[1;31mInit signature:[0m  [0mslice[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
slice(stop)
slice(start, stop[, step])

Create a slice object.  This is used for extended slicing (e.g. a[0:10:2]).
[1;31mType:[0m           type
[1;31mSubclasses:[0m     

If a start, stop and step value are created with the start and the step having the default values:

In [319]:
start = 0 # default value
stop = 3
step = 1 # default value

The form of the slice can be:

In [320]:
'hello'[slice(start, stop, step)]

'hel'

In [321]:
'hello'[slice(start, stop)]

'hel'

In [322]:
'hello'[slice(stop)]

'hel'

This is typically shortened by using a colon instead:

In [323]:
'hello'[start:stop:step]

'hel'

In [324]:
'hello'[start:stop]

'hel'

In [325]:
'hello'[:stop]

'hel'

In [326]:
'hello world!'

'hello world!'

In [327]:
len('hello world!')

12

In [328]:
'hello world!'[:5]

'hello'

In [329]:
'hello world!'[6:]

'world!'

In [330]:
'hello world!'[6:11]

'world'

In [331]:
'hello world!'[::2]

'hlowrd'

In [332]:
'hello world!'[1::2]

'el ol!'

In [333]:
'hello world!'[::-1]

'!dlrow olleh'

In [334]:
'hello world!'[-8:-len('hello world!')-1:-1]

'olleh'

The string class doesn't have the data model identifier \_\_reversed\_\_ defined however because it is an ordered Sequence the builtins function reversed can be used on a string instance to create a reversed iterator:

In [335]:
greeting

'hello'

In [336]:
backward = reversed(greeting)
backward

<reversed at 0x1dabbea0e20>

In [337]:
next(backward)

'o'

In [338]:
next(backward)

'l'

In [339]:
next(backward)

'l'

And so on...

In [340]:
word = 'hello world!'

for index, letter in enumerate(word):
    print(index, letter)

0 h
1 e
2 l
3 l
4 o
5  
6 w
7 o
8 r
9 l
10 d
11 !


The index and find identifiers perform the reverse operation and retrieves the positive index corresponding to the first occurrence of a character or substring:

In [341]:
? word.find

[1;31mDocstring:[0m
S.find(sub[, start[, end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end].  Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.
[1;31mType:[0m      builtin_function_or_method

In [342]:
? word.index

[1;31mDocstring:[0m
S.index(sub[, start[, end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end].  Optional
arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.
[1;31mType:[0m      builtin_function_or_method

These two functions behave identically upon success:

In [343]:
word.find('l')

2

In [344]:
word.index('l')

2

However give -1 and ValueError respectively upon failure:

In [345]:
word.find('L')

-1

In [347]:
# word.index('L')

<span style="color:red">ValueError:</span> substring not found

These methods take in an optional start and stop positional input arguments whic can be assigned to start and stop index values to constrict the search. For example to find the index of all the values of 'l':

In [348]:
word.find('l')

2

In [349]:
word.find('l', 2+1)

3

In [350]:
word.find('l', 3+1)

9

In [351]:
word.find('l', 9+1)

-1

A Unicode substring can also be searched for opposed to a Unicode character:

In [352]:
word.find('world')

6

In [353]:
word.find('w')

6

The index and find methods search the string for a substring from the left to the right. There is also the reverse find and reverse index, rfind and rindex respectively which search from right to left:

In [354]:
word.rfind('l')

9

In [355]:
word.rfind('l', 0, 9)

3

In [356]:
word.rfind('l', 0, 3)

2

In [357]:
word.rfind('l', 0, 2)

-1

In [358]:
word.rfind('l')

9

The count method returns the number of times a substring is found in a string:

In [359]:
word.count('l')

3

The boolean string identifiers startswith and endswith return a bool if a string starts or ends with a substring. These also take a start and stop index as input arguments:

In [360]:
? word.startswith

[1;31mDocstring:[0m
S.startswith(prefix[, start[, end]]) -> bool

Return True if S starts with the specified prefix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
prefix can also be a tuple of strings to try.
[1;31mType:[0m      builtin_function_or_method

In [361]:
? word.endswith

[1;31mDocstring:[0m
S.endswith(suffix[, start[, end]]) -> bool

Return True if S ends with the specified suffix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
suffix can also be a tuple of strings to try.
[1;31mType:[0m      builtin_function_or_method

In [362]:
'hello world!'.startswith('hello')

True

In [363]:
'hello world!'.startswith('hello', 1)

False

In [364]:
'hello world!'.endswith('!')

True

In [365]:
'hello world!'.endswith('!', 0, 11)

False

The replace method can be used to replace an old substring old with a new substring new. It has an optional argument count which has a default value of -1 and this means it allows for all replacements by default. The / trailing the input arguments once again indicates that the input arguments are to be supplied positionally:

In [366]:
? word.replace

[1;31mSignature:[0m  [0mword[0m[1;33m.[0m[0mreplace[0m[1;33m([0m[0mold[0m[1;33m,[0m [0mnew[0m[1;33m,[0m [0mcount[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a copy with all occurrences of substring old replaced by new.

  count
    Maximum number of occurrences to replace.
    -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are
replaced.
[1;31mType:[0m      builtin_function_or_method

In [367]:
'hello world!'.replace('hello', 'bye')

'bye world!'

In [368]:
'hello hello world!'.replace('hello', 'bye')

'bye bye world!'

In [369]:
'hello hello world!'.replace('hello', 'bye', 1)

'bye hello world!'

## Center and Justify

A string can be centered and justified using the fill, centre, ljust and rjust string methods:

In [370]:
? greeting.center

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcenter[0m[1;33m([0m[0mwidth[0m[1;33m,[0m [0mfillchar[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a centered string of length width.

Padding is done using the specified fill character (default is a space).
[1;31mType:[0m      builtin_function_or_method

In [371]:
? greeting.ljust

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mljust[0m[1;33m([0m[0mwidth[0m[1;33m,[0m [0mfillchar[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).
[1;31mType:[0m      builtin_function_or_method

In [372]:
? greeting.rjust

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mrjust[0m[1;33m([0m[0mwidth[0m[1;33m,[0m [0mfillchar[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).
[1;31mType:[0m      builtin_function_or_method

In [373]:
len(greeting)

5

In [374]:
'hello'.center(8)

' hello  '

In [375]:
'hello'.center(8, 'X')

'XhelloXX'

In [376]:
'hello'.ljust(8, 'X')

'helloXXX'

In [377]:
'hello'.rjust(8, 'X')

'XXXhello'

The opposite operation can be carried out using the string identifiers left strip and right strip, lstrip and rstrip respectively which left strip and right strip whitespace by default or a specified fill character or character sequence:

In [378]:
padded = 'hello'.center(20)
padded

'       hello        '

In [379]:
? padded.lstrip

[1;31mSignature:[0m  [0mpadded[0m[1;33m.[0m[0mlstrip[0m[1;33m([0m[0mchars[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.
[1;31mType:[0m      builtin_function_or_method

In [380]:
? padded.rstrip

[1;31mSignature:[0m  [0mpadded[0m[1;33m.[0m[0mrstrip[0m[1;33m([0m[0mchars[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.
[1;31mType:[0m      builtin_function_or_method

In [381]:
padded.lstrip()

'hello        '

In [382]:
padded.rstrip()

'       hello'

In [383]:
padded.lstrip().rstrip()

'hello'

In [384]:
padded = 'hello'.center(20, 'X')
padded

'XXXXXXXhelloXXXXXXXX'

In [385]:
padded.lstrip('X').rstrip('X')

'hello'

There are the associated identifiers removeprefix and removesuffix that are more precise and will only remove a specified prefix or suffix:

In [386]:
? padded.removeprefix

[1;31mSignature:[0m  [0mpadded[0m[1;33m.[0m[0mremoveprefix[0m[1;33m([0m[0mprefix[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):].
Otherwise, return a copy of the original string.
[1;31mType:[0m      builtin_function_or_method

In [387]:
? padded.removesuffix

[1;31mSignature:[0m  [0mpadded[0m[1;33m.[0m[0mremovesuffix[0m[1;33m([0m[0msuffix[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty,
return string[:-len(suffix)]. Otherwise, return a copy of the original
string.
[1;31mType:[0m      builtin_function_or_method

In [388]:
padded

'XXXXXXXhelloXXXXXXXX'

In [389]:
padded.removeprefix('X')

'XXXXXXhelloXXXXXXXX'

Earlier the ordinal value of the string '3' was examined. The prefix '0b' can be removed using remove prefix:

In [390]:
string3 = bin(ord('3'))
string3

'0b110011'

In [391]:
string3 = bin(ord('3')).removeprefix('0b')
string3

'110011'

There is also the string method zfill which can be used to zero fill a string and is mainly intended for strings of numeric values:

In [392]:
? string3.zfill

[1;31mSignature:[0m  [0mstring3[0m[1;33m.[0m[0mzfill[0m[1;33m([0m[0mwidth[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.
[1;31mType:[0m      builtin_function_or_method

Since this binary number is of a byte that has 8 values, the width can be set to 8:

In [393]:
string3 = string3.zfill(8)
string3

'00110011'

## Binary Operators

The string is an ordered immutable Sequence as previously discussed. An immutable Sequence often has the data model identifiers addition \_\_add\_\_ and multiplication \_\_mul\_\_ which control the behaviour of the operators + and * respectively. For an immutable sequence these perform the task of concatenation and replication with an integer respectively. The reverse multiplication \_\_rmul\_\_ is also typically defined which operates if the integer is multiplied by the string instead of the string multiplied by the integer giving the same result. 

Note that these data model identifiers alongside the previously examined \_\_mod\_\_ data model identifier which controls the behaviour of the % operator have different behaviour in numeric data types where they instead behave differently and perform a numeric operation. The typical functionality of these data model identifiers is different for Sequences and numbers.

The \_\_add\_\_ data model identifier is called a binary operator as it requires two instances the string instance self and the string instance value. The \_\_mul\_\_ data model identifier is configured to use two instances of different types the string instance and the int instance. The \_\_mod\_\_ data model identifier is configured to use two instances of different types the string instance and the tuple instance of values to be inserted.

In [394]:
? greeting.__add__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__add__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__add__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__add__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return self+value.

In [395]:
'hello' + 'world'

'helloworld'

In [396]:
'hello' + ' ' + 'world'

'hello world'

In [397]:
? greeting.__mul__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__mul__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__mul__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__mul__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return self*value.

In [398]:
greeting * 3

'hellohellohello'

In [399]:
? greeting.__rmul__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__rmul__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__rmul__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__rmul__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return value*self.

In [400]:
3 * greeting

'hellohellohello'

In [401]:
? greeting.__mod__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__mod__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__mod__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__mod__' of str object at 0x000001DABA91E230>
[1;31mDocstring:[0m      Return self%value.

In [402]:
'%s %s%s' % ('hello', 'world', '!')

'hello world!'

It is common to use a binary operator and reassign the output to the instance name. Recall the operation on the right is carried out using the original string and then the instance name, which can be conceptualised as a label then points to the new object:

In [403]:
greeting

'hello'

In [404]:
greeting = greeting + ' world!'

In [405]:
greeting

'hello world!'

This can be done shorthand by combining the binary operator for example addition + with the assignment operator = resulting in +=

In [406]:
greeting += ' world!'

In [407]:
greeting

'hello world! world!'

The assignment binary operator more clearly indicates that an addition involving the old value is carried out before reassignment to the value.

## Binary Comparison Operators

The binary data model operator \_\_eq\_\_ checks whether two instances are equal and returns a bool. This is configured in the object class alongside its opposite \_\_ne\_\_ and these control the behaviour of the is equal to == and not equal to != operators:

In [408]:
object1 = object()
object2 = object()
object1a = object1

In [409]:
? object1.__eq__

[1;31mSignature:[0m       [0mobject1[0m[1;33m.[0m[0m__eq__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mobject1[0m[1;33m.[0m[0m__eq__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__eq__' of object object at 0x000001DABBB3D610>
[1;31mDocstring:[0m      Return self==value.

In [410]:
? object1.__ne__

[1;31mSignature:[0m       [0mobject1[0m[1;33m.[0m[0m__ne__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mobject1[0m[1;33m.[0m[0m__ne__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__ne__' of object object at 0x000001DABBB3D610>
[1;31mDocstring:[0m      Return self!=value.

In [411]:
object1 == object2

False

In [412]:
object1 == object1a

True

In [413]:
object1 != object2

True

For ordinal values there are the additional binary data model comparison operators less than \_\_lt\_\_ and greater than \_\_gt\_\_ which control the behaviour of the < and > operators. Recall that a string is made up of letters which are ordinal:

In [414]:
ord('A')

65

In [415]:
ord('a')

97

Therefore the string has the methods:

In [416]:
? greeting.__lt__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__lt__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__lt__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__lt__' of str object at 0x000001DABBF0DA20>
[1;31mDocstring:[0m      Return self<value.

In [417]:
? greeting.__gt__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__gt__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__gt__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__gt__' of str object at 0x000001DABBF0DA20>
[1;31mDocstring:[0m      Return self>value.

And due to the ordinal values:

In [418]:
'a' > 'A'

True

In [419]:
'hello' > 'Hello World!'

True

There is also the additional  binary data model comparison operators less than or equal to \_\_le\_\_ and greater than or equal to \_\_ge\_\_ which control the behaviour of the <= and >= operators.

In [420]:
? greeting.__le__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__le__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__le__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__le__' of str object at 0x000001DABBF0DA20>
[1;31mDocstring:[0m      Return self<=value.

In [421]:
? greeting.__ge__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__ge__[0m[1;33m([0m[0mvalue[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__ge__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__ge__' of str object at 0x000001DABBF0DA20>
[1;31mDocstring:[0m      Return self>=value.

In [422]:
'a' >= 'b'

False

In [423]:
'a' < 'a'

False

In [424]:
'a' == 'a'

True

In [425]:
'a' <= 'a'

True

In [426]:
'a' >= 'a'

True

The keyword is, also performs a comparison which checks to see whether two objects are the same object in memory i.e. whether they have the same id:

In [427]:
? id

[1;31mSignature:[0m  [0mid[0m[1;33m([0m[0mobj[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the identity of an object.

This is guaranteed to be unique among simultaneously existing objects.
(CPython uses the object's memory address.)
[1;31mType:[0m      builtin_function_or_method

In [428]:
id('a')

140728134243464

In [429]:
id('b')

140728133671496

Because a string is immutable is and is equal to == return the same value. For mutable data types this may not be the case:

In [430]:
'a' is 'b'

  'a' is 'b'


False

In [431]:
id('a') == id('b')

False

## Splitting and Joining Strings

The string has a number of identifiers which are used for splitting and joining a string. These generally involve casting to a Python collection such as a tuple of strings or a list of strings.

For example the identifier partition and right partition rpartition will partition a string into a three element tuple containing the substring before the partition, the partition substring and the substring after the partition respectively. To make it more obvious the following string will be instantiated:

In [432]:
greeting = 'hello|world|!'

In [433]:
? greeting.partition

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mpartition[0m[1;33m([0m[0msep[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Partition the string into three parts using the given separator.

This will search for the separator in the string.  If the separator is found,
returns a 3-tuple containing the part before the separator, the separator
itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string
and two empty strings.
[1;31mType:[0m      builtin_function_or_method

In [434]:
greeting.partition('|')

('hello', '|', 'world|!')

In [435]:
? greeting.rpartition

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mrpartition[0m[1;33m([0m[0msep[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If
the separator is found, returns a 3-tuple containing the part before the
separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings
and the original string.
[1;31mType:[0m      builtin_function_or_method

In [436]:
greeting.rpartition('|')

('hello|world', '|', '!')

More generally the split and join identifiers can be used to split a string into a list of strings or join a list of strings up into a single string. For example if the following sentence is created:

In [437]:
sentence = 'the fat black cat sat on the mat!'

The identifier split can be examined:

In [438]:
? sentence.split

[1;31mSignature:[0m  [0msentence[0m[1;33m.[0m[0msplit[0m[1;33m([0m[0msep[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mmaxsplit[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a list of the substrings in the string, using sep as the separator string.

  sep
    The separator used to split the string.

    When set to None (the default value), will split on any whitespace
    character (including \\n \\r \\t \\f and spaces) and will discard
    empty strings from the result.
  maxsplit
    Maximum number of splits (starting from the left).
    -1 (the default value) means no limit.

Note, str.split() is mainly useful for data that has been intentionally
delimited.  With natural text that includes punctuation, consider using
the regular expression module.
[1;31mType:[0m      builtin_function_or_method

Since the values to be split from are whitespace, the input arguments can be left unspecified defaulting to their default values. This gives a list of strings:

In [439]:
words = sentence.split()
words

['the', 'fat', 'black', 'cat', 'sat', 'on', 'the', 'mat!']

There is also a right split, the difference is subtle and the methods behave different only when maxsplit is assigned a new value:

In [440]:
? sentence.rsplit

[1;31mSignature:[0m  [0msentence[0m[1;33m.[0m[0mrsplit[0m[1;33m([0m[0msep[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mmaxsplit[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a list of the substrings in the string, using sep as the separator string.

  sep
    The separator used to split the string.

    When set to None (the default value), will split on any whitespace
    character (including \\n \\r \\t \\f and spaces) and will discard
    empty strings from the result.
  maxsplit
    Maximum number of splits (starting from the left).
    -1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.
[1;31mType:[0m      builtin_function_or_method

In [441]:
words = sentence.rsplit()
words

['the', 'fat', 'black', 'cat', 'sat', 'on', 'the', 'mat!']

In [442]:
'a b c d e'.split(' ', maxsplit=2)

['a', 'b', 'c d e']

In [443]:
'a b c d e'.rsplit(' ', maxsplit=2)

['a b c', 'd', 'e']

To join the words, the string join method can be called from a delimiter string instance:

In [444]:
delimiter = ' '

In [445]:
? delimiter.join

[1;31mSignature:[0m  [0mdelimiter[0m[1;33m.[0m[0mjoin[0m[1;33m([0m[0miterable[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Concatenate any number of strings.

The string whose method is called is inserted in between each given string.
The result is returned as a new string.

Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'
[1;31mType:[0m      builtin_function_or_method

In [446]:
delimiter.join(words)

'the fat black cat sat on the mat!'

In [447]:
' '.join(words)

'the fat black cat sat on the mat!'

In [448]:
'|'.join(words)

'the|fat|black|cat|sat|on|the|mat!'

If a multiline string is created:

In [449]:
paragraph = '''The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog'''

In [450]:
paragraph

'The quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog'

There is an associated string method splitlines, which splits the string into a list using the newline. It has an input argument keepends which defaults to False and therefor excludes the newline character:

In [451]:
? paragraph.splitlines

[1;31mSignature:[0m  [0mparagraph[0m[1;33m.[0m[0msplitlines[0m[1;33m([0m[0mkeepends[0m[1;33m=[0m[1;32mFalse[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and
true.
[1;31mType:[0m      builtin_function_or_method

In [452]:
paragraph.splitlines()

['The quick brown fox jumps over the lazy dog',
 'The quick brown fox jumps over the lazy dog',
 'The quick brown fox jumps over the lazy dog',
 'The quick brown fox jumps over the lazy dog']

If the multiline string is created with tabs:

In [453]:
paragraph = '''\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog'''

The tabs can be replaced by a specified number of spaces using the string method expandtabs:

In [454]:
? paragraph.expandtabs

[1;31mSignature:[0m  [0mparagraph[0m[1;33m.[0m[0mexpandtabs[0m[1;33m([0m[0mtabsize[0m[1;33m=[0m[1;36m8[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.
[1;31mType:[0m      builtin_function_or_method

In [455]:
paragraph.expandtabs(4)

'    The quick brown fox jumps over the lazy dog\n    The quick brown fox jumps over the lazy dog\n    The quick brown fox jumps over the lazy dog\n    The quick brown fox jumps over the lazy dog'

In [456]:
print(paragraph)

	The quick brown fox jumps over the lazy dog
	The quick brown fox jumps over the lazy dog
	The quick brown fox jumps over the lazy dog
	The quick brown fox jumps over the lazy dog


In [457]:
print(paragraph.expandtabs(4))

    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog


## Get Attribute

A list of identifiers can be viewed from the string instance greeting:

In [458]:
greeting.

SyntaxError: invalid syntax (252053849.py, line 1)

If one of the methods is referenced and not called, details about the function object display:

In [459]:
greeting.center

<function str.center(width, fillchar=' ', /)>

In [460]:
greeting.center(20)

'   hello|world|!    '

The data model identifier \_\_getattribute\_\_ controls the behaviour of the getattr function:

In [461]:
? greeting.__getattribute__

[1;31mSignature:[0m       [0mgreeting[0m[1;33m.[0m[0m__getattribute__[0m[1;33m([0m[0mname[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mgreeting[0m[1;33m.[0m[0m__getattribute__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__getattribute__' of str object at 0x000001DABBDAA3F0>
[1;31mDocstring:[0m      Return getattr(self, name).

In [462]:
? getattr

[1;31mDocstring:[0m
getattr(object, name[, default]) -> value

Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.
When a default argument is given, it is returned when the attribute doesn't
exist; without it, an exception is raised in that case.
[1;31mType:[0m      builtin_function_or_method

This is used to get an attribute using a string:

In [463]:
getattr(greeting, 'center')

<function str.center(width, fillchar=' ', /)>

In [464]:
getattr(greeting, 'center')(20)

'   hello|world|!    '

This can be useful when using something like a for loop:

In [465]:
dir(greeting)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

In [466]:
for attribute in dir(greeting):
    if attribute[0:2] == 'is':
        print(attribute.ljust(len('isidentifier')), end='\t\t')
        print(getattr(greeting, attribute))

isalnum     		<built-in method isalnum of str object at 0x000001DABBDAA3F0>
isalpha     		<built-in method isalpha of str object at 0x000001DABBDAA3F0>
isascii     		<built-in method isascii of str object at 0x000001DABBDAA3F0>
isdecimal   		<built-in method isdecimal of str object at 0x000001DABBDAA3F0>
isdigit     		<built-in method isdigit of str object at 0x000001DABBDAA3F0>
isidentifier		<built-in method isidentifier of str object at 0x000001DABBDAA3F0>
islower     		<built-in method islower of str object at 0x000001DABBDAA3F0>
isnumeric   		<built-in method isnumeric of str object at 0x000001DABBDAA3F0>
isprintable 		<built-in method isprintable of str object at 0x000001DABBDAA3F0>
isspace     		<built-in method isspace of str object at 0x000001DABBDAA3F0>
istitle     		<built-in method istitle of str object at 0x000001DABBDAA3F0>
isupper     		<built-in method isupper of str object at 0x000001DABBDAA3F0>


In [467]:
for attribute in dir(greeting):
    if attribute[0:2] == 'is':
        print(attribute.ljust(len('isidentifier')), end='\t\t')
        print(getattr(greeting, attribute)())

isalnum     		False
isalpha     		False
isascii     		True
isdecimal   		False
isdigit     		False
isidentifier		False
islower     		True
isnumeric   		False
isprintable 		True
isspace     		False
istitle     		False
isupper     		False


In [468]:
? str.maketrans

[1;31mDocstring:[0m
Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode
ordinals (integers) or characters to Unicode ordinals, strings or None.
Character keys will be then converted to ordinals.
If there are two arguments, they must be strings of equal length, and
in the resulting dictionary, each character in x will be mapped to the
character at the same position in y. If there is a third argument, it
must be a string, whose characters will be mapped to None in the result.
[1;31mType:[0m      builtin_function_or_method

In [469]:
str.maketrans('αβγ', 'abc')

{945: 97, 946: 98, 947: 99}

In [470]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

## Bytes Related Identifiers

In [471]:
? greeting.encode

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mencode[0m[1;33m([0m[0mencoding[0m[1;33m=[0m[1;34m'utf-8'[0m[1;33m,[0m [0merrors[0m[1;33m=[0m[1;34m'strict'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Encode the string using the codec registered for encoding.

encoding
  The encoding in which to encode the string.
errors
  The error handling scheme to use for encoding errors.
  The default is 'strict' meaning that encoding errors raise a
  UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
  'xmlcharrefreplace' as well as any other name registered with
  codecs.register_error that can handle UnicodeEncodeErrors.
[1;31mType:[0m      builtin_function_or_method

In [472]:
greeting.encode()

b'hello|world|!'

In [473]:
greek_greeting = 'Γειά σου Κόσμε!'

In [474]:
greek_greeting.encode()

b'\xce\x93\xce\xb5\xce\xb9\xce\xac \xcf\x83\xce\xbf\xcf\x85 \xce\x9a\xcf\x8c\xcf\x83\xce\xbc\xce\xb5!'