## The str class

The str class is an abbreviation for a string of Unicode characters. The string class is an immutable ordered Collection of Unicode characters. Immutable means once it has been instantiated it cannot be later modified.

## Initialisation Signature

Inputting str( will display the initialisation signature docstring of the string class as a popup balloon:

In [1]:
#str(

When a <span style="color:green">#</span> is inserted before Python code it becomes a comment. The above line is commented out to prevent any errors when all ipython cells are run. Remove the <span style="color:green">#</span> to uncomment out the cell, delete the ( and reinsert it and a docstring should display as a popup balloon.

Some IDEs such as JupyterLab may require the keypress shift ⇧ and tab ↹ to invoke the popup balloon.

Alternatively inputting <span style="color:#C00000">?</span> <span style="color:#0070C0">str</span> in an ipython cell will display the docstring:

In [2]:
? str

[1;31mInit signature:[0m  [0mstr[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

The purpose of the initialisation signature is to provide the data required to initialise a new instance. 

During construction of a new instance under the hood, two data model identifiers are use:

* \_\_new\_\_ - the constructor
* \_\_init\_\_ - the initialiser

\_\_new\_\_ creates a new instance and then invokes the \_\_init\_\_ data model initialiser to initialise this instance with instance data. 

Python uses Object Orientated Programming (OOP) and every class is based on the object class. \_\_new\_\_ is a data model that is inherited directly from the object class. The \_\_init\_\_ data model method is typically redefined in each child class, in this case str, so a new instance has the required instance data.

For the string class, the initialisation signature shows alternative ways of supplying instance data for a string.

In [3]:
# str(self, /, *args, **kwargs)
# str(object='') -> str
# str(bytes_or_buffer[, encoding[, errors]]) -> str

If the first way is examined:

In [4]:
# str(self, /, *args, **kwargs)

* In Python parenthesis ( ) are used to call a function and supply any necessary input arguments.
* The comma , is used as a delimiter to seperate out any input arguments.
* In Python *self* is used to denote *this instance*. (In other words a string can be constructed from an existing string instance).
* The right slash / is an instruction for proceeding input arguments to be provided positionally (top). If the input argument is instead named (bottom) an error message will display.

In [5]:
# str('hello')

In [6]:
# str(self='hello')

* \*args indicates a variable number of additional positional input arguments. These are typically not used for the str class.
* \*\*kwargs indicates a variable number of additional named input arguments. These are typically not used for the str class.

A string instance can be instantiated using the initialisation signature:

In [7]:
str('hello')

'hello'

Because the string is a fundamental datatype it can also be instantiated using the following:

In [8]:
'hello'

'hello'

Notice the difference in syntax highlighting between the string instance above and the object name below. Once again remove the <span style="color:green">#</span> as the following line of code is commented out.

In [9]:
# hello

Running the cell above will give an error similar to the following:

<span style="color:red">---------------------------------------------------------------------------</span>
NameError

<span style="color:red">NameError:</span> name 'hello' is not defined

In other words Python is looking for the object hello which does not exist.

In VSCode open the Variables tab by pressing the Variables button to the top:

<img src='./images/img_001.png' alt='img_001' width='100'/>

The Variables should look like the following:

<img src='./images/img_002.png' alt='img_002' width='500'/>

If the following code is input:

In [10]:
'hello'

'hello'

Notice the value 'hello' is displayed in the cell output. This value is a Python string instance that has no reference (otherwise known as a variable name or instance name) and cannot be reselected. Conceptualise a reference as a label which is used to select the string instance.

A string instance can be assigned to a variable name during instantiation.

In [11]:
greeting = 'hello'

Notice the cell has no output. Instead it is referenced via the variable name or instance name.

Notice that the Variables will be updated:

<img src='./images/img_003.png' alt='img_003' width='500'/>

And the string instance can be accessed from the variable or instance name. Conceptualise the instance name as a label which points to the instance.

In [12]:
greeting

'hello'

The Python interpretter recognises the instance name or variable name. This is used as a reference to retrieve the value which is shown in the cell output above. If the instance is assigned to another instance name for example:

In [13]:
greeting2 = greeting

Then in the Variable Explorer, two instance names are shown with the same value:

<img src='./images/img_004.png' alt='img_004' width='500'/>

Conceptualise the instance name as a label which points to the instance. This instance has two instance names which act as alias to one another. If either instance name are used, the same value is retrieved:

In [14]:
greeting

'hello'

In [15]:
greeting2

'hello'

And if a check is made to see if the two values are equal, the boolean True is returned:

In [16]:
greeting == greeting2

True

Each object in Python has a unique id. If the id of the object corresponding to each instance name is checked:

In [17]:
id(greeting)

1652598155952

In [18]:
id(greeting2)

1652598155952

Then the id is the same, because both these instance names are references or labels to the same physical instance in memory. Therefore the following is True:

In [19]:
greeting is greeting2

True

The del statement can be used to delete an instance name. Conceptualise this as deleting a label:

In [20]:
del greeting

Notice that the instance name greeting is deleted i.e. this label is removed. However the label greeting2 is still present and the instance value 'hello' is unaltered.

<img src='./images/img_005.png' alt='img_005' width='500'/>

If del is used to also delete the instance name or label greeting2:

In [21]:
del greeting2

<img src='./images/img_006.png' alt='img_006' width='500'/>

Then there is no instance names (references) for the string instance which has the value 'hello'; when an instance has no instance name or reference it is considered orphaned and is cleaned up by Pythons garbage collection. 

If a new instance is created:

In [22]:
greeting = 'Hello World'

Then the instance name displays on variables:

<img src='./images/img_007.png' alt='img_007' width='500'/>

If a reassignment is carried out:

In [23]:
greeting = 'hi'

Then the instance name displays on Variables but the value is changed. Recall that the instance name can be conceptualised as a label; what has happened is the label greeting has been removed from the old instance with the Value 'Hello World' and placed on the new instance with the Value 'hi'. The old instance now has no label i.e. no reference and is cleaned up by Pythons garbage collection.

<img src='./images/img_008.png' alt='img_008' width='500'/>

Reassignment **moves the instance name** from the old instance to the new instance and does not change the value of an instance. Strings are **immutable** which means that once they are instantiated they can never be modified.

The initialisation signature of the string class shows instantiation using a named keyword input argument object which has a default value of an empty string:

In [24]:
# str(object='') -> str

This named keyword input argument can be assigned to a custom value:

In [25]:
str(object='hello')

'hello'

If not assigned, it takes on its default value which returns an empty string:

In [26]:
str()

''

The string can instantiated by supplying the named keyword input argument in the initialisation signature and can be assigned to a variable name. Notice the subtlety in spacing in the line below which follows Pythons PEP8 styling convention. The assignment operator to the instance name is subtly emphasised using the spacing. The keyword argument within the function call has no spacing as spacing within a function call is instead typically used with the , separator to visually separate out input arguments from one another. Python code will work with or without the spacing but the form below is preferred:

In [27]:
greeting = str(object='hello')

## Identifiers

Two string instances can be instantiated to instance names:

In [28]:
greeting = 'hello'
farewell = 'bye'

<img src='./images/img_009.png' alt='img_009' width='500'/>

If one of the instance names is typed followed by a dot . then a list of identifiers displays. 

Some IDEs such as JupyterLab may require the keypress tab ↹ to invoke the list of identifiers:

In [29]:
# greeting.

Notice that the same list of identifiers displays for the other instance:

In [30]:
# farewell.

These are the same identifiers and originate from the string class str:

In [31]:
# str.

A Python object such as the str instance behaves similarly to a directory (or folder) within Windows Explorer. In Windows Explorer each directory can contain other directories or files.

In Python each instance can contain identifiers. Identifiers is the term for any objects belonging to another object. These objects can be functions; functions which are defined in a class are known as methods or instances of another Python object.

The list of identifiers can be viewed as a list using the directory dir function.

In [32]:
? dir

[1;31mDocstring:[0m
dir([object]) -> list of strings

If called without an argument, return the names in the current scope.
Else, return an alphabetized list of names comprising (some of) the attributes
of the given object, and of attributes reachable from it.
If the object supplies a method named __dir__, it will be used; otherwise
the default dir() logic is used and returns:
  for a module object: the module's attributes.
  for a class object:  its attributes, and recursively the attributes
    of its bases.
  for any other object: its attributes, its class's attributes, and
    recursively the attributes of its class's base classes.
[1;31mType:[0m      builtin_function_or_method

In [33]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

The help function can be used to view details about each identifier:

In [34]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

Notice that the identifiers are grouped into:

* (Instance) Methods
* Class Methods\*
* Static Methods
* Data Descriptors\*

Methods are functions and are called. Data Descriptors are other Python instances and are not callable. To call a funciton and provide input arguments parenthesis are used.

Most methods are instance methods and require an instance which has instance data to work on. Other input arguments may be required.

Class methods are instead bound to a class and usually used for the purpose of alternative constructors.

Static methods are regular functions which are neither bound to the instance or class but are found within the namespace of the class for convenience.

\* Not present in the string class.

### Instance Methods

The docstring of the capitalize method can be examined from an instance:

In [35]:
? greeting.capitalize

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      builtin_function_or_method

And from the class itself:

In [36]:
? str.capitalize

[1;31mSignature:[0m  [0mstr[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      method_descriptor

Notice the difference between:

In [37]:
# greeting.capitalize()

In [38]:
# str.capitalize(self, /)

When the method capitalize is called from an instance, it has access to the instance data. This method requires no additional data top operate which is why the parenthesis are empty. 

In contrast when the method is called from the class itself, it has no instance data to work from an instance must be provided. In Python *self* means *this instance*. Recall that the / indicates that all prior input arguments should be provided positionally and not as named input arguments.

A string is immutable meaning that it cannot be modified after it is instantiated. Therefore string methods typically return a new instance usually a new instance of the string class. The docstring contains the word *return* giving details about the return value. 

In [39]:
# Docstring:
# Return a capitalized version of the string.

If the method is called from an instance:

In [40]:
greeting.capitalize()

'Hello'

The new capitalised string instance displays in the cell output. This is a new instance and does not modify the original which can be seen to be unchanged in Variables.

<img src='./images/img_009.png' alt='img_009' width='500'/>

Since this new instance is not assigned an instance name it has no references and is removed by Pythons Garbage collection. It can be assigned an instance name using:

In [41]:
capgreeting = greeting.capitalize()

Notice no cell output as the instance is assigned to the instance name. This can be seen in Variables:

<img src='./images/img_010.png' alt='img_010' width='500'/>

If the instance method is invoked from a class, the instance self must be provided as the first input argument:

In [42]:
str.capitalize(farewell)

'Bye'

Failure to supply an instance will result in an error:

In [43]:
# str.capitalize()

<span style="color:red">---------------------------------------------------------------------------</span>

<span style="color:red">TypeError</span> Traceback (most recent call last)
Cell In[53], line 1
----> 1 str.capitalize()

TypeError: unbound method str.capitalize() needs an argument

## Data Model Identifiers and the object Base Class

Many of the identifiers in the str class begin and end with a **d**ouble **under**score. These are data model methods which are sometimes colloquially called dunder methods and generally follow the design pattern of the object base class. 

The method resolution order function will return the method resolution order of the string class:

In [44]:
? str.mro

[1;31mSignature:[0m  [0mstr[0m[1;33m.[0m[0mmro[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a type's method resolution order.
[1;31mType:[0m      builtin_function_or_method

In [45]:
str.mro()

[str, object]

Notice that it returns a Python list of str and object. This means that the str (child class) is based upon an object (parent class) and object is the base class for everything in Pythons object orientated programming. The str can use methods from the object class and str class but will preference methods in the str class if they are redefined. If help is used on the object class, a list of identifiers display:

In [46]:
help(object)

Help on class object in module builtins:

class object
 |  The base class of the class hierarchy.
 |  
 |  When called, it accepts no arguments and returns a new featureless
 |  instance that has no instance attributes and cannot be given any.
 |  
 |  Built-in subclasses:
 |      anext_awaitable
 |      async_generator
 |      async_generator_asend
 |      async_generator_athrow
 |      ... and 108 other subclasses
 |  
 |  Methods defined here:
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(self, /)
 |      Default dir() implementation.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Default object formatter.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getstate__(self, /)
 |      Helper for pickle.
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(se

Notice that the identifiers are grouped into:

* (Instance) Methods
* Class Methods
* Static Methods
* Data Descriptors

Notice that most of these begin and end with a double underscore.

Recall during construction of a new instance under the hood, two data model identifiers are use:

* \_\_new\_\_ - the constructor
* \_\_init\_\_ - the initialiser

\_\_new\_\_ is a data model that is inherited directly from the object class. The \_\_init\_\_ data model method is instead redefined in the str, so is preferenced over the \_\_init\_\_ in the object class.

Notice that the object class has the identifier \_\_dir\_\_. The data model identifier is not typically used directly but instead maps to a corresponding builtins function:

In [47]:
? object.__dir__

[1;31mSignature:[0m  [0mobject[0m[1;33m.[0m[0m__dir__[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Default dir() implementation.
[1;31mType:[0m      method_descriptor

In [48]:
? str.__dir__

[1;31mSignature:[0m  [0mstr[0m[1;33m.[0m[0m__dir__[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Default dir() implementation.
[1;31mType:[0m      method_descriptor

The docstring of the inbuilt function dir can be examined, note that it has an object as an input argument which is usually an object or an instance of another class such as a str instance which recall is based upon an object.

In [49]:
? dir

[1;31mDocstring:[0m
dir([object]) -> list of strings

If called without an argument, return the names in the current scope.
Else, return an alphabetized list of names comprising (some of) the attributes
of the given object, and of attributes reachable from it.
If the object supplies a method named __dir__, it will be used; otherwise
the default dir() logic is used and returns:
  for a module object: the module's attributes.
  for a class object:  its attributes, and recursively the attributes
    of its bases.
  for any other object: its attributes, its class's attributes, and
    recursively the attributes of its class's base classes.
[1;31mType:[0m      builtin_function_or_method

The function dir can be used on a string instance:

In [50]:
dir(greeting)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

If used without an input argument it instead examines the local scope:

In [51]:
dir()

['In',
 'Out',
 '_',
 '_10',
 '_12',
 '_14',
 '_15',
 '_16',
 '_17',
 '_18',
 '_19',
 '_25',
 '_26',
 '_33',
 '_40',
 '_42',
 '_45',
 '_50',
 '_7',
 '_8',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__vsc_ipynb_file__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i27',
 '_i28',
 '_i29',
 '_i3',
 '_i30',
 '_i31',
 '_i32',
 '_i33',
 '_i34',
 '_i35',
 '_i36',
 '_i37',
 '_i38',
 '_i39',
 '_i4',
 '_i40',
 '_i41',
 '_i42',
 '_i43',
 '_i44',
 '_i45',
 '_i46',
 '_i47',
 '_i48',
 '_i49',
 '_i5',
 '_i50',
 '_i51',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'capgreeting',
 'exit',
 'farewell',
 'get_ipython',
 'greeting',
 'open',
 'quit']

Note the inclusion of \_\_builtins\_\_, the module which contains all the builtin classes. In Python a module can be imported using:

In [52]:
import builtins

Identifiers can be viewed from the module by inputting the module name followed by a dot .

In [53]:
# builtins.

And the str class can be accessed using:

In [54]:
builtins.str

str

Which is the same as:

In [55]:
str

str

And:

In [56]:
builtins.str is str

True

An individual object such as the string class can be imported from a module using:

In [57]:
from builtins import str

And multiple objects can be imported by using a comma as a delimiter:

In [58]:
from builtins import object, str

The \* can be used to import all objects from a module, although this is not normally recommended:

In [59]:
# from builtins import *

It is worthwhile examining the objects from builtins in groupings. 

Pascal Case classes are generally reserved for error messages, the NameError for example has already been encountered:

In [60]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and isclass and isupper and not isdatamodel):
        print(identifier, end=' ')



Pascal Case identifiers that are not callable are constants. Constants are instances of inbuilt classes that have a constant value:

In [61]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (not isfunction and not isclass and isupper and not isdatamodel):
        print(identifier, end=' ')

Ellipsis False None NotImplemented True 

Lower case classes include the object and the str class:

In [62]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and isclass and not isupper and not isdatamodel):
        print(identifier, end=' ')

bool bytearray bytes classmethod complex dict enumerate filter float frozenset int list map memoryview object property range reversed set slice staticmethod str super tuple type zip 

Lower case functions are typically used in place of the data model identifiers, for example dir is used in place of the data model identifier \_\_dir\_\_:

In [63]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and not isclass and not isupper and not isdatamodel):
        print(identifier, end=' ')

abs aiter all anext any ascii bin breakpoint callable chr compile copyright credits delattr dir display divmod eval exec execfile format get_ipython getattr globals hasattr hash help hex id input isinstance issubclass iter len license locals max min next oct open ord pow print repr round runfile setattr sorted sum vars 

There are data model identifiers that are callable, these are generally present for every Python module. For example \_\_import\_\_ is used when a module is imported using the Python keyword import:

In [64]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (isfunction and not isclass and not isupper and isdatamodel):
        print(identifier, end=' ')

__build_class__ __import__ 

And there are also data model identifiers that are instances of other classes, typically strings for example, \_\_name\_\_ which is the modules name, \_\_package\_\_ which is the package the module is contained in and \_\_doc\_\_ which is the modules docstring:

In [65]:
for identifier in dir(builtins):
    isfunction = callable(getattr(builtins, identifier))
    isclass = type(getattr(builtins, identifier))==type
    isupper = identifier[0].isupper()
    isdatamodel = identifier[0] == '_'
    if (not isfunction and not isclass and not isupper and isdatamodel):
        print(identifier, end=' ')

__IPYTHON__ __debug__ __doc__ __name__ __package__ __spec__ 

The name is builtins:

In [66]:
builtins.__name__

'builtins'

The package is an empty string as it is a module builtin to Python:

In [67]:
builtins.__package__

''

The docstring is as follows:

In [68]:
builtins.__doc__

"Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices."

## Case Method

The str case method capitalize has already been examined:

In [69]:
? greeting.capitalize

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcapitalize[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower
case.
[1;31mType:[0m      builtin_function_or_method

In [70]:
greeting.capitalize()

'Hello'

There are associated identifiers such as:

* lower
* casefold
* upper
* title
* swapcase

The docstrings of these can all be examined:

In [71]:
? greeting.lower

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mlower[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a copy of the string converted to lowercase.
[1;31mType:[0m      builtin_function_or_method

In [72]:
? greeting.casefold

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mcasefold[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a version of the string suitable for caseless comparisons.
[1;31mType:[0m      builtin_function_or_method

In [73]:
? greeting.upper

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mupper[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return a copy of the string converted to uppercase.
[1;31mType:[0m      builtin_function_or_method

In [74]:
? greeting.title

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mtitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining
cased characters have lower case.
[1;31mType:[0m      builtin_function_or_method

In [75]:
? greeting.swapcase

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mswapcase[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Convert uppercase characters to lowercase and lowercase characters to uppercase.
[1;31mType:[0m      builtin_function_or_method

In [76]:
? greeting.title

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mtitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining
cased characters have lower case.
[1;31mType:[0m      builtin_function_or_method

All of these case identifiers only require instance data and return a new string:

In [77]:
'hEllo wOrld'.lower()

'hello world'

In [78]:
'hEllo wOrld'.casefold()

'hello world'

In [79]:
'hEllo wOrld'.upper()

'HELLO WORLD'

In [80]:
'hEllo wOrld'.swapcase()

'HeLLO WoRLD'

In [81]:
'hEllo wOrld'.title()

'Hello World'

casefold is similar to lower but has more support for non-English characters, as seen with the additional German characters and the Greek characters where some of the lower case characters have variants:

In [82]:
'ÄäÜüÖöẞß'.lower()

'ääüüöößß'

In [83]:
'ÄäÜüÖöẞß'.casefold()

'ääüüöössss'

In [84]:
'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'.lower()

'ααββγγδδεεζζηηθθιικκλλμμννξξοοππρρσσςττυυφφχχψψωω'

In [85]:
'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'.casefold()

'ααββγγδδεεζζηηθθιικκλλμμννξξοοππρρσσσττυυφφχχψψωω'

## Boolean Identifiers

A number of identifiers are used to examine a specific property of a string and return a boolean of True if it has that property and False otherwise: 

In [86]:
? greeting.isupper

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misupper[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and
there is at least one cased character in the string.
[1;31mType:[0m      builtin_function_or_method

In [87]:
? greeting.islower

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mislower[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and
there is at least one cased character in the string.
[1;31mType:[0m      builtin_function_or_method

In [88]:
? greeting.istitle

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0mistitle[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only
follow uncased characters and lowercase characters only cased ones.
[1;31mType:[0m      builtin_function_or_method

For example:

In [89]:
'HELLO'.isupper()


True

In [90]:
'Hello'.isupper()


False

In [91]:
'hello'.islower()

True

In [92]:
'Hello'.islower()

False

In [93]:
'Hello'.istitle()

True

## Valid Identifier Names

The method isidentifier will check to see if a possible string is valid for an identifier name. This can be useful to check before assignment of an instance to an object name:

In [94]:
? greeting.isidentifier

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misidentifier[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
such as "def" or "class".
[1;31mType:[0m      builtin_function_or_method

Lower case letters are acceptable:

In [95]:
'hello'.isidentifier()

True

A space is not acceptable:

In [96]:
'hello world'.isidentifier()

False

An underscore is acceptable and instance names are generally named using snake_case:

In [97]:
'hello_world'.isidentifier()

True

Numbers are acceptable:

In [98]:
'hello_world2'.isidentifier()

True

However an identifier cannot begin with a number:

In [99]:
'2hello_world'.isidentifier()

False

Other special characters are not acceptable:

In [100]:
'hello_world2!'.isidentifier()

False

Upper case is acceptable but generally PascalCase is reserved for a class name:

In [101]:
'PascalCase'.isidentifier()

True

All capitals is also acceptable but this is generally reserved for a constant:

In [102]:
'ALL_CAPS'.isidentifier()

True

An instance name shouldn't match any of the identifiers in builtins otherwise it will override the builtin (until the kernel is restarted) which will lead to confusion when the builtin is attempted to be used. There are some identifiers which are reserved, these can be seen by importing the keyword module:

In [103]:
import keyword

In [104]:
for kw in keyword.kwlist:
    print(kw, end=' ')

False None True and as assert async await break class continue def del elif else except finally for from global if import in is lambda nonlocal not or pass raise return try while with yield 

In [105]:
for skw in keyword.softkwlist:
    print(skw, end=' ')

_ case match 

If a keyword is reassigned an error message will display:

In [106]:
# with = 'hello'

    with = 'hello'

         ^

<span style="color:red">SyntaxError</span>: invalid syntax

## The American Standard Code for Information Interchange (ASCII)

Early computers were based on a typewritter that essentially prints English characters onto a sheet of paper. In order to achieve such a task a number of non-printable commands such as the carriage return (moving the carriage back to the left) and the form feed (moving the peace of paper up by the width of a line) are required as well as the printable characters such as the English letters, numbers, and whitespace.

<img src='./images/img_011.png' alt='img_011' width='800'/>

Each command has to be mapped physically into the computers memory. Fundamentally the computer can only store data in the form of a bit which is essentially a digital switch.

A single switch has the possible values 0, 1 which is 2 ** 0 = 2 combinations.

More typically 8 of these switches are combined into a single unit which makes up a byte and gives 2 ** 8 = 256 combinations.

Note the combination 0 is included so 0:256 is inclusive of the lower bount 0 and exclusive of the upper bound 256.

<img src='./images/img_012.png' alt='img_012' width='400'/>

One of the most popular set of commands was developed in the United States and is known as ASCII. The first 33 combinations correspond to non-printable characters such as the carriage return and form feed as previously discussed in addition to a number of additional hardware related commands. 

Each bit can be 0 or 1 and the byte sequence corresponds to the physical position of the 8 switches. As binary is not human readible the hexadecimal system is used which has 16 characters 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f. 2 ** 4 is 16 combinations and therefore each half of the byte is represented by its own hexadecimal character. These numbering systems are shown alongside the number in decimal.


|byte|hex|num|command|
|---|---|---|---|
|00000000|00|000|null|
|00000001|01|001|start of heading|
|00000010|02|002|start of text|
|00000011|03|003|end of text|
|00000100|04|004|end of transmission|
|00000101|05|005|enquiry|
|00000110|06|006|acknowledge|
|00000111|07|007|bell|
|00001000|08|008|**backspace**|
|00001001|09|009|**horizontal tab**|
|00001010|0a|010|**new line**|
|00001011|0b|011|**vertical tab**|
|00001100|0c|012|**form feed**|
|00001101|0d|013|**carriage return**|
|00001110|0e|014|shift out|
|00001111|0f|015|shift in|
|00010000|10|016|data link escape|
|00010001|11|017|device control 1|
|00010010|12|018|device control 2|
|00010011|13|019|device control 3|
|00010100|14|020|device control 4|
|00010101|15|021|negative acknowledge|
|00010110|16|022|synchronous idle|
|00010111|17|023|end of transmission block|
|00011000|18|024|cancel|
|00011001|19|025|end of medium|
|00011010|1a|026|substitute|
|00011011|1b|027|**escape**|
|00011100|1c|028|file separator|
|00011101|1d|029|group separator|
|00011110|1e|030|record separator|
|00011111|1f|031|unit seperator|
|00100000|20|032|**space**|

The remaining commands spanning up to half a byte contained the characters most commonly used in the English language.

|byte|hex|num|character|
|---|---|---|---|
|00100001|21|033|!|
|00100010|22|034|"|
|00100011|23|035|#|
|00100100|24|036|$|
|00100101|25|037|%|
|00100110|26|038|&|
|00100111|27|039|'|
|00101000|28|040|(|
|00101001|29|041|)|
|00101010|2a|042|*|
|00101011|2b|043|+|
|00101100|2c|044|,|
|00101101|2d|045|-|
|00101110|2e|046|.|
|00101111|2f|047|/|
|00110000|30|048|0|
|00110001|31|049|1|
|00110010|32|050|2|
|00110011|33|051|3|
|00110100|34|052|4|
|00110101|35|053|5|
|00110110|36|054|6|
|00110111|37|055|7|
|00111000|38|056|8|
|00111001|39|057|9|
|00111010|3a|058|:|
|00111011|3b|059|;|
|00111100|3c|060|<|
|00111101|3d|061|=|
|00111110|3e|062|>|
|00111111|3f|063|?|
|01000000|40|064|@|
|01000001|41|065|A|
|01000010|42|066|B|
|01000011|43|067|C|
|01000100|44|068|D|
|01000101|45|069|E|
|01000110|46|070|F|
|01000111|47|071|G|
|01001000|48|072|H|
|01001001|49|073|I|
|01001010|4a|074|J|
|01001011|4b|075|K|
|01001100|4c|076|L|
|01001101|4d|077|M|
|01001110|4e|078|N|
|01001111|4f|079|O|
|01010000|50|080|P|
|01010001|51|081|Q|
|01010010|52|082|R|
|01010011|53|083|S|
|01010100|54|084|T|
|01010101|55|085|U|
|01010110|56|086|V|
|01010111|57|087|W|
|01011000|58|088|X|
|01011001|59|089|Y|
|01011010|5a|090|Z|
|01011011|5b|091|[|
|01011100|5c|092|\|
|01011101|5d|093|]|
|01011110|5e|094|^|
|01011111|5f|095|_|
|01100000|60|096|`|
|01100001|61|097|a|
|01100010|62|098|b|
|01100011|63|099|c|
|01100100|64|100|d|
|01100101|65|101|e|
|01100110|66|102|f|
|01100111|67|103|g|
|01101000|68|104|h|
|01101001|69|105|i|
|01101010|6a|106|j|
|01101011|6b|107|k|
|01101100|6c|108|l|
|01101101|6d|109|m|
|01101110|6e|110|n|
|01101111|6f|111|o|
|01110000|70|112|p|
|01110001|71|113|q|
|01110010|72|114|r|
|01110011|73|115|s|
|01110100|74|116|t|
|01110101|75|117|u|
|01110110|76|118|v|
|01110111|77|119|w|
|01111000|78|120|x|
|01111001|79|121|y|
|01111010|7a|122|z|
|01111011|7b|123|{|
|01111100|7c|124|||
|01111101|7d|125|}|
|01111110|7e|126|~|
|01111111|7f|127||


As each character maps to a numeric bytes sequence it is ordinal. The builtins ordinal function ord will return the numeric value of the number in decimal:

In [107]:
? ord

[1;31mSignature:[0m  [0mord[0m[1;33m([0m[0mc[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the Unicode code point for a one-character string.
[1;31mType:[0m      builtin_function_or_method

For example the ordinal value of the string '3' can be checked:

In [108]:
ord('3')

51

Notice the difference between the string of the number '3' and the number 51. This number can be converted into a binary string or hex string using the builtins bin and hex functions respectively:

In [109]:
? bin

[1;31mSignature:[0m  [0mbin[0m[1;33m([0m[0mnumber[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the binary representation of an integer.

>>> bin(2796202)
'0b1010101010101010101010'
[1;31mType:[0m      builtin_function_or_method

In [110]:
? hex

[1;31mSignature:[0m  [0mhex[0m[1;33m([0m[0mnumber[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the hexadecimal representation of an integer.

>>> hex(12648430)
'0xc0ffee'
[1;31mType:[0m      builtin_function_or_method

For example:

In [111]:
bin(ord('3'))

'0b110011'

This can be conceptualised as the following with the trailing zeros:

In [112]:
'0b' + bin(ord('3')).lstrip('0b').zfill(8)

'0b00110011'

Note the prefix 0b indicates a binary number and does not display the two leading zeros:

In [113]:
hex(ord('3'))

'0x33'

Note the prefix 0x indicates a hexadecimal number:

In [114]:
bin(16)

'0b10000'

The string module contains a number of useful strings which group characters. It can be imported using:

In [115]:
import string

The identifiers can be viewed by inputting:

In [116]:
# string.

These are mainly string instances.

ascii_letters gives all English letters:

In [117]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

This can be split into lowercase and uppercase using ascii_lowercase and ascii_uppercase respectively: 

In [118]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [119]:
string.ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

digits gives the 10 digits used in the decimal system:

In [120]:
string.digits

'0123456789'

hexdigits gives the 16 characters that can be used for hexadecimal. Note a and A are an alias of one another:

In [121]:
string.hexdigits

'0123456789abcdefABCDEF'

printable gives the printable characters:

In [122]:
string.printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

punctuation gives the punctuation:

In [123]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The whitespace gives the whitespace characters:

In [124]:
string.whitespace

' \t\n\r\x0b\x0c'

With the exception to the space, these are shown using escape sequences which will be explored in a moment. Now that ASCII and other groupings seen within the string module are understood the additional boolean identifiers can be examined. 

These boolean identifiers all act upon instance data and return a bool, their docstring is as shows:

In [133]:
? greeting.isprintable

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misprintable[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in
repr() or if it is empty.
[1;31mType:[0m      builtin_function_or_method

In [128]:
? greeting.isascii

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misascii[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.
Empty string is ASCII too.
[1;31mType:[0m      builtin_function_or_method

In [126]:
? greeting.isalnum

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misalnum[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and
there is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [127]:
? greeting.isalpha

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misalpha[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [134]:
? greeting.isspace

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misspace[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [130]:
? greeting.isdecimal

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misdecimal[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and
there is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [131]:
? greeting.isdigit

False

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misdigit[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there
is at least one character in the string.
[1;31mType:[0m      builtin_function_or_method

In [132]:
? greeting.isnumeric

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0misnumeric[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at
least one character in the string.
[1;31mType:[0m      builtin_function_or_method

For example:

In [147]:
'hello Γειά σου 123'.isprintable()

True

In [148]:
'hello Γειά σου 123'.isascii()

False

In [141]:
'hello 123 !'.isascii()

True

In [142]:
'hello 123 !'.isalnum()

False

In [143]:
'hello123'.isalnum()

True

In [144]:
'hello123'.isalpha()

False

In [145]:
'hello'.isalpha()

True

In [149]:
'hello'.isspace()

False

The boolean numeric string identifiers have subtle differences. These can be seen by examining the response of the identifier for the following number groupings:

In [179]:
numeric_groups = {'ascii': '0123456789', 
                  'font1': '𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿', 
                  'font2': '𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵', 
                  'font3': '𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡', 
                  'subscript': '₀₁₂₃₄₅₆₇₈₉',
                  'superscript': '⁰¹²³⁴⁵⁶⁷⁸⁹',
                  'circled1': '➀➁➂➃➄➅➆➇➈',
                  'circled2': '➉',
                  'fractions': '½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉', 
                  'asciihex': '0123456789abcdef', }


In [186]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isdecimal())
    

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ False
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ False
circled1 ➀➁➂➃➄➅➆➇➈ False
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False
asciihex 0123456789abcdef False


In [188]:
for group in numeric_group:
    print(group, numeric_groups[group], numeric_groups[group].isdecimal())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ False
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ False
circled1 ➀➁➂➃➄➅➆➇➈ False
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False


In [189]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isdigit())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ False
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ False
asciihex 0123456789abcdef False


In [190]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isnumeric())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ True
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ True
asciihex 0123456789abcdef False


In [191]:
for group in numeric_groups:
    print(group, numeric_groups[group], numeric_groups[group].isalnum())

ascii 0123456789 True
font1 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 True
font2 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 True
font3 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 True
subscript ₀₁₂₃₄₅₆₇₈₉ True
superscript ⁰¹²³⁴⁵⁶⁷⁸⁹ True
circled1 ➀➁➂➃➄➅➆➇➈ True
circled2 ➉ True
fractions ½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ True
asciihex 0123456789abcdef True


The boolean identifiers are often used for checks. An example of such a check was demonstrated when grouping identifiers from builtins when using a for loop.

## Escape Characters

The \\ is a special symbol used to insert an escape character. The most commonly used escape characters have the form:

In [200]:
print('|  |') # no escape character

|  |


In [202]:
print('| \t |') # tab

| 	 |


In [201]:
print('| \n |') # new line

| 
 |


In [206]:
print('| \\ |') # leftslash itself

| \ |


In [205]:
print('| \' |') # single quotation

| ' |


In [207]:
print('| \" |') # double quotation

| " |


An ASCII character or character spanning over the range of a single byte can be inserted using an escape character 2 hexadecimal digits:

In [209]:
hex(ord('!'))

'0x21'

In [213]:
'\x21' # a byte (2 hexadecimal digits)

'!'

The most commonly used Unicode characters span over 2 bytes and can therefore be inserted using 4 hexadecimal digits:

In [215]:
hex(ord('α'))

'0x3b1'

In [216]:
'\u03b1' # 2 bytes (4 hexadecimal digits)

'α'

Note that the hex form is normally used for a byte that is not printable. If the 6 whitespace characters are examined in more detail:

In [237]:
string.whitespace

' \t\n\r\x0b\x0c'

|name||byte|
|---|---|---|
|space|' '|'\\x20'|
|tab|'\\t'|'\\x09'|
|new line|'\\n'|'\\x0a'|
|carriage return|'\\r'|'\\x0d'|
|vertical tab||'\\x0b'|
|form feed||'\\x0c'|

In [238]:
' ' == '\x20'

True

In [239]:
'\t' == '\x09'

True

In [240]:
'\n' == '\x0a'

True

In [241]:
'\r' == '\x0d'

True

It is nto common to do so, however each character in a string can also be inserted as an escape character:

In [242]:
'\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21'

'hello world!'

## \_\_repr\_\_ and \_\_str\_\_

The \\ is a special character used to insert an escape character. To insert an \\ itself, two have to be used the former being the \\ specifying to insert an escape character and the second \\ specifying that the escape character is \\. The \\ is used for a file path within Windows:

In [217]:
file_path = 'C:\\Users\\Philip'

Compare the difference to the cell output and the output in a cell from a print statement:

In [227]:
file_path

'C:\\Users\\Philip'

In [228]:
print(file_path)

C:\Users\Philip


Notice that the former uses the formal representation and the output displays the file path that matches what needs to be input to instantiate the string. The latter displays the informal representation and displays the file path when the escape characters are inserted. Under the hood these are controlled by the data model identifiers \_\_repr\_\_ and \_\_str\_\_ respectively:

In [218]:
? file_path.__repr__

[1;31mSignature:[0m       [0mfile_path[0m[1;33m.[0m[0m__repr__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mfile_path[0m[1;33m.[0m[0m__repr__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__repr__' of str object at 0x00000180C834F410>
[1;31mDocstring:[0m      Return repr(self).

In [219]:
? file_path.__str__

[1;31mSignature:[0m       [0mfile_path[0m[1;33m.[0m[0m__str__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mCall signature:[0m  [0mfile_path[0m[1;33m.[0m[0m__str__[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mType:[0m           method-wrapper
[1;31mString form:[0m    <method-wrapper '__str__' of str object at 0x00000180C834F410>
[1;31mDocstring:[0m      Return str(self).

These data model identifiers are based on the object design pattern and from the docstring it is more common to use the builtins function repr and class str respectively:

In [220]:
? repr

[1;31mSignature:[0m  [0mrepr[0m[1;33m([0m[0mobj[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return the canonical string representation of the object.

For many object types, including most builtins, eval(repr(obj)) == obj.
[1;31mType:[0m      builtin_function_or_method

In [221]:
? str

[1;31mInit signature:[0m  [0mstr[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...

The str class will cast an existing object into a str. When the object is already a str it will be unchanged:

In [229]:
file_path

'C:\\Users\\Philip'

In [223]:
str(file_path)

'C:\\Users\\Philip'

The repr function on the other hand will return a str instance that contains the quotations that enclose the string and shows insertion of the escape characters:

In [222]:
repr(file_path)

"'C:\\\\Users\\\\Philip'"

Notice that when this is printed:

In [225]:
print(repr(file_path))

'C:\\Users\\Philip'


The cell output matches the str form:

In [230]:
str(file_path)

'C:\\Users\\Philip'

And the escape characters are processed when the string is printed:

In [226]:
print(str(file_path))

C:\Users\Philip


All the other builtins classes in Python have a string representation. When the formal representations do not include additional quotations or escape characters the formal and informal representations often return identical strings fro example:

In [232]:
repr(4)

'4'

In [231]:
str(4)

'4'

## Raw Strings

In Windows the file path is of the form:

In [None]:
# file_path = 'C:\Users\Philip'

For the file path to be recognised as a Python string each \\ has to be converted into a \\\\ as \\ is used to insert an escape character and is an escape character itself:

In [None]:
file_path = 'C:\\Users\\Philip'

For convenience there is also a raw string which uses the prefix r. In a raw string the \\ is not used to insert an escape character and instead the \ is recognised as being part of the string:

In [235]:
raw_file_path = r'C:\Users\Philip'

In [236]:
raw_file_path

'C:\\Users\\Philip'

## The print function

The print function has previously been used however now that escape character are understood its docstirng can be examined:

In [243]:
? print

[1;31mSignature:[0m  [0mprint[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [0msep[0m[1;33m=[0m[1;34m' '[0m[1;33m,[0m [0mend[0m[1;33m=[0m[1;34m'\n'[0m[1;33m,[0m [0mfile[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mflush[0m[1;33m=[0m[1;32mFalse[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
[1;31mType:[0m      builtin_function_or_method

The *args indicates that a variable number of positional input arguments are used. sep and end are named input arguments which have a default value of a space and a new line respectively:

In [None]:
# print(*args, sep=' ', end='\n', file=None, flush=False)

file and flush are for advanced purposes when the print stream is to be directed for example to a file instead of a cell output.

In [244]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

the brown fox jumps over the lazy dog


In [245]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', sep='')

thebrownfoxjumpsoverthelazydog


In [246]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

the brown fox jumps over the lazy dog
the brown fox jumps over the lazy dog


In [247]:
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', end='')
print('the', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')

the brown fox jumps over the lazy dogthe brown fox jumps over the lazy dog


## Formatted Strings

Supposing the string body has the form:

In [248]:
body = 'The string to 0 is 1 2!'

And there are three variables:

In [249]:
var0 = 'print'
var1 = 'hello'
var2 = 'world'

The objective of a formatted string is to insert these variables into the string (in this case without additional format specifications) so the following string is returned:

In [250]:
'The string to print is hello world!'

'The string to print is hello world!'

If the docstring of the string method format is examined:

In [251]:
? body.format

[1;31mDocstring:[0m
S.format(*args, **kwargs) -> str

Return a formatted version of S, using substitutions from args and kwargs.
The substitutions are identified by braces ('{' and '}').
[1;31mType:[0m      builtin_function_or_method

Then it can be seen that substitutions are identifier by braces so body should be:

In [252]:
body = 'The string to {0} is {1} {2}!'

Notice the changes in the syntax highlighting for the placeholders. *args represents a variable number of positional input arguments. To insert variables into strings the number of positional input arguments should match the number of placeholders in the string body:

In [253]:
body.format(var0, var1, var2)

'The string to print is hello world!'

The string can alternatively be setup to contain named variables:

In [254]:
body = 'The string to {var0_} is {var1_} {var2_}!'

**kwargs represents a variable number of named keyword input arguments which should match the named keyword input arguments in the string body:

In [255]:
body.format(var0_=var0, var1_=var1, var2_=var2)

'The string to print is hello world!'

If the two lines are combined into one:

In [256]:
'The string to {var0_} is {var1_} {var2_}!'.format(var0_=var0, var1_=var1, var2_=var2)

'The string to print is hello world!'

More commonly the placeholders are given the same name as the variables to be inserted:

In [257]:
'The string to {var0} is {var1} {var2}!'.format(var0=var0, var1=var1, var2=var2)

'The string to print is hello world!'

Notice how each instance name is used 3 times which is pretty cumbersome. A shorthand way of writing the expression above is to use the prefix f:

In [258]:
f'The string to {var0} is {var1} {var2}!'

'The string to print is hello world!'

Classes from builtins have the \_\_format\_\_ data model identifier:

In [259]:
? object.__format__

[1;31mSignature:[0m  [0mobject[0m[1;33m.[0m[0m__format__[0m[1;33m([0m[0mself[0m[1;33m,[0m [0mformat_spec[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Default object formatter.
[1;31mType:[0m      method_descriptor

When this data model method is configured in a class it gives the ability to use format specification options in a formatted string. This can be demonstrated with other str instances and numeric (int and float) instances.

The format specifier for a string (s) can be used:

In [268]:
f'The string to {var0:s} is {var1} {var2}!'

'The string to print is hello world!'

In [269]:
f'The string to {var0:10s} is {var1} {var2}!'

'The string to print      is hello world!'

In [265]:
f'The string to {var0:010s} is {var1:s} {var2:s}!'

'The string to print00000 is hello world!'

If prefixed with 0 then trailing spaces will be displayed using 0.

Numbers are commonly inserted:

In [266]:
num1 = 1
num2 = 0.0000123456789
num3 = 12.3456789

In [270]:
f'The numbers are {num1}, {num2} and {num3}.' 

'The numbers are 1, 1.23456789e-05 and 12.3456789.'

The format specifier for an integer decimal (d) can be used:

In [271]:
f'The numbers are {num1:d}, {num2} and {num3}.' 

'The numbers are 1, 1.23456789e-05 and 12.3456789.'

In [272]:
f'The numbers are {num1:5d}, {num2} and {num3}.' 

'The numbers are     1, 1.23456789e-05 and 12.3456789.'

In [273]:
f'The numbers are {num1:05d}, {num2} and {num3}.' 

'The numbers are 00001, 1.23456789e-05 and 12.3456789.'

In [274]:
f'The numbers are {num1: 05d}, {num2} and {num3}.' 

'The numbers are  0001, 1.23456789e-05 and 12.3456789.'

Again the number of characters in the string the number should occupy can be specified. Unlike the string these are leading opposed to trailing. if prefixed with a 0, then these will be shown as 0. Note if a space is added, one of these will be the space. The space changes the behaviour of the format specifier.

In [275]:
f'The numbers are {num1}, {num2:g} and {num3:g}.' 

'The numbers are 1, 1.23457e-05 and 12.3457.'

In [276]:
f'The numbers are {num1}, {num2:e} and {num3:e}.' 

'The numbers are 1, 1.234568e-05 and 1.234568e+01.'

In [278]:
f'The numbers are {num1}, {num2:0.3e} and {num3:0.3e}.' 

'The numbers are 1, 1.235e-05 and 1.235e+01.'

In [279]:
f'The numbers are {num1}, {num2:f} and {num3:f}.' 

'The numbers are 1, 0.000012 and 12.345679.'

In [280]:
f'The numbers are {num1}, {num2:0.3f} and {num3:0.3f}.' 

'The numbers are 1, 0.000 and 12.346.'

floating point numbers can use the general (g), exponental (e) and fixed (f) format specifiers. The prefix 0.3 specifies rounding to 3 digits past the decimal point.


The keys of the mapping such as a dictionary can be included in the placeholder, alongside an optional format specifier:

In [281]:
numbers = {'num1': 1, 'num2': 0.0000123456789, 'num3': 12.3456789}

In [282]:
body = 'The numbers are {num1:d}, {num2:.3e} and {num3:.3e}.'

The colon is used in both the dictionary and format specifier. Spacing to the right of the colon is often present in a dictionary and does not change the value. Spacing does change the format specifier and should only be added if a space is to be included in the format specification.

The format map can be used to insert values into a body provided the placeholders match the keys in the dictionary:

In [283]:
? body.format_map

[1;31mDocstring:[0m
S.format_map(mapping) -> str

Return a formatted version of S, using substitutions from mapping.
The substitutions are identified by braces ('{' and '}').
[1;31mType:[0m      builtin_function_or_method

In [285]:
body.format_map(numbers)

'The numbers are 1, 1.235e-05 and 1.235e+01.'

## Data Model Attributes

In [287]:
greeting.__class__

str

In [291]:
type(greeting)

str

In [286]:
greeting.__doc__

"str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."

In [292]:
? greeting

[1;31mType:[0m        str
[1;31mString form:[0m hello
[1;31mLength:[0m      5
[1;31mDocstring:[0m  
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.

In [290]:
? greeting.__sizeof__

[1;31mSignature:[0m  [0mgreeting[0m[1;33m.[0m[0m__sizeof__[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Return the size of the string in memory, in bytes.
[1;31mType:[0m      builtin_function_or_method

In [None]:
?