<h1>Expressions and output</h1>

<h2>Expressions, operators and data types</h2>

numeric operands tower:

*complex

*float

*fractions.Fraction

*ecimal.Decimal

*int

The idea behind the tower is that many arithmetic operators coerce operands up the
tower from integer to float to complex

We can group operators into a number of categories.

• Arithmetic: +, -, *, **, /, //, %

• Bit-oriented: <<, >>, &, |, ^, ~

• Comparison: <, >, <=, >=, ==, !=

<h2>Using operators on non-numeric data</h2>

In [1]:
# We can apply some of the arithmetic operators to strings, bytes, and tuples.

"hello " + "world"

'hello world'

In [2]:
"<+++>"*9

'<+++><+++><+++><+++><+++><+++><+++><+++><+++>'

<h2>The print() function</h2>

In [3]:
'''The strings are combined with a default separator of ' ' and printed with a default 
line ending of '\n'. We can change the separator and line ending characters.

Note that the sep and end parameters must be provided by name; these are called
keyword arguments'''

print("value", 355/113)

value 3.1415929203539825


In [4]:
print("value", 355/113, sep='=')

value=3.1415929203539825


In [7]:
print("value", 355/113, sep='=', end='!\n')

value=3.1415929203539825!


In [13]:
'''We've imported the sys module. This contains definitions of sys.stderr and
sys.stdout for the standard output files. By using the file= keyword parameter,
we can direct a specific line of output to the stderr file instead of the default
of stdout.'''

import sys
print("This is an Error Message", file=sys.stderr)

This is an Error Message


<h2>Examining syntax rules</h2>

There are nine fundamental syntax rules in section 2.1 of the Python Language
Reference:
1. There are two species of statements: simple and compound.
2. A physical line ends with \n.
3. A comment starts with # and continues to the end of the physical line.
4. A special comment can be used to annotate the file encoding.
5. Physical lines can be joined explicitly into a logical line using the \ as an
escape character in front of the physical end-of-line character.
6. Physical lines can be joined implicitly into a logical line using (), [], or {};
these must pair properly for the logical line to be complete.
7. Blank lines contain only spaces, tabs and newlines.
8. Leading whitespace is required to properly group statements inside the
clauses of compound statements. Consistency is essential. A four space indent is widely used and
strongly encouraged.
9. Except at the beginning of the line,—where it determines nesting of compound
statements—whitespace can be used freely between tokens.


<h2>Splitting, partitioning and joining strings</h2>
The split() method is used to locate repeated list-like structures within a string. 

The partition() method is used to separate the head and tail of a string.

In [14]:
text="mynumber=38889,mynumber2=999887"
text.split(",")

['mynumber=38889', 'mynumber2=999887']

In [15]:
# references output of last cell
items=_

In [16]:
items[0].partition("=")

('mynumber', '=', '38889')

In [17]:
items[1].partition("=")

('mynumber2', '=', '999887')

In [18]:
'''We used the partition("=") method on each item in the items variable to break
the assignment down into name, =, and value.

The join() method is the inverse of the split() method.'''

options = ("x", "y", "z")
"|".join(options)

'''We've created a sequence of three strings and assigned it to a variable named
options. We then used the string "|" to join the items in the options sequence.
The result is a longer string with the items separated by the given string.'''

'x|y|z'

<h2>Using the format() method to make more readable output</h2>

In [20]:
'''Sophisticated string creation can be done with the format() method. We create
a template string and values which can be plugged into the template.'''

c = 42
"{0:d}ºC is {1: .1f}ºF".format(c, 32+9*c/5)

'42ºC is  107.6ºF'

<h2>Summary of the standard string libraries</h2>
Python's standard library offers a number of modules with additional string
processing features:
1. string: The string module contains constants that decompose the ASCII
characters into letters, numbers, whitespace, and so on. It contains the full
definition of the formatter that is used by the str.format() method.
2. re: The regular expression library allows us to define a pattern that can be
used to parse input strings.
3. difflib: The difflib module is used to compare sequences of strings,
typically from text files.
4. textwrap: We can use the textwrap module to format large blocks of text.
5. unicodedata: The unicodedata module provides functions for determining
what kind of Unicode character is present.
6. stringprep: This is an implementation of RFC 3454, which prepares
Unicode text strings in order to support sensible string comparisons

<h2>Using the re module to parse strings</h2>

Regular expressions give us a simple way to specify a set of related strings by
describing the pattern they have in common. A regular expression is an element
of set theory that could (in theory) define the set of all possible related strings.

When we use the re module, we generally do three things. Firstly, we specify
the pattern string. Secondly, we compile the pattern into an object that efficiently
determines if and where a given string matches the pattern. Finally, we repeatedly
use the pattern object to efficiently match, search, or parse the given input strings.

We may perform any of the following three common kinds of processing:

• A matching regular expression might be Birth Date:\s+\d+/\d+/\d+. The
\s+ subexpression means one or more spaces. The \d+ subexpression of this
means one or more digits. A match pattern is usually designed to match the
whole string.

• A searching regular expression might be \d+/\d+/\d+. This search pattern
includes one or more digits, \d+, and literal punctuation, /. This expression
describes a substring that can be found somewhere within the given string.

• A parsing pattern separates the various digit groups from the surrounding
context. This is a slight modification to one of the previous examples to
include (), that specifies what to capture. We might use (\d+)/(\d+)/(\d+)
to show that the digit groups should be extracted for further processing.

<h2>Using regular expressions</h2>

The general recipe for using regular expressions in a Python program has three
essential steps:
    
1. Define the pattern string.

2. Evaluate the re.compile() function to create a pattern object.

3. Use the compiled pattern object to match or search the candidate strings.

In [39]:
import re
# ad 1
r"\d+/\d+/\d+"
# ad 2 "." Matches any single character except newline.
date_pattern = re.compile(r"Birth Date:\s+(.*)")
# ad 3
match = date_pattern.match("Birth Date: ??????")

In [40]:
# here, the given string did match the regular expression pattern, and a Match object was created.
match

<_sre.SRE_Match object; span=(0, 18), match='Birth Date: ??????'>

<h2>Creating a regular expression string</h2>

Here's a pattern we might use to parse some input: r"(\w+)\s*[=:]\s*(.*)"

This is a regular expression which is a sequence of 5 regular expressions.

• The characters (\w+) make a regular expression, \w, with a + suffix enclosed
in (). This matches any sequence of one or more word characters.

• \s* is a regular expression. It's a simple expression \s with a suffix of *.
It matches zero or more whitespace characters. This means that spaces
are optional after the initial word. If spaces are present, any number may
be used.

• [=:] is a regular expression built from two single-character expressions, =
and :.   
It matches either one of the two characters.
    
• \s* is used a second time to permit any number of whitespace characters between the = or : and the value.
    
• The final regular expression is (.*) which matches any sequence of characters.

<h2>Working with Unicode, ASCII, and bytes</h2>

The re module works with bytes as well as Unicode strings. We must provide
proper pattern literals depending on which kind of string we're working with.
With Unicode, we use pattern literals with the r prefix: r"\w+". With bytes, we use
the rb prefix, rb"\w+"; the rb means raw bytes instead of raw Unicode characters.