## 1.4:  Strings of Pearls

### 1.4.1  strings

Strings are a sequence of characters:  letters, numbers, and other symbols.

Strings can be defined with either a pair of double quotation marks, "hi", or single quotation marks, 'hi'; however, always be consistent.

In [None]:
print("Hello Python!")
print('Hello, World!')

If you try to mix quotation marks, you'll get an error message.  For example,

&nbsp; &nbsp; &nbsp; print("this will not work')

will result in the following error message:

&nbsp; &nbsp; &nbsp; EOL while scanning string literal<br />

This means that the Python interpreter got to the End Of Line (EOL - effectively the end of the statement) before it got to the end of the string entity (a string <i>literal</i>) that it thought it was reading ("scanning").  This is because when it encoutered an initiating double quotation mark it started to interpret a string, but did not find a terminal double quotation mark to tell it when to stop.

A <i>literal</i> is a fixed representation of an entity.  For example, x = 4 means the value of the literal 4 is assigned to variable x.  4 is a fixed representation of an integer with the mathematical value of four.  In the statement y = 2+2, the right side is not a literal; it is an expression that contains the literal 2 twice.  The expression evaluates to an integer, but no literal representation of that integer is in the statement y = 2+2.

A string literal is any sequence of characters enclosed in a consistent pair of quotation marks.  "ABC" is a string literal, as is the right side of the assignment statement x = "word".

In [None]:
print("this will not work')

print('don't try this at home')

Two or more strings can be combined, or <i>concatenated</i>, to make a new string.

In [None]:
x = "Tues" "day"
print(x)
x = "Can" "you" "read" "this?"
print(x)

However, for usefulness and readability use the <i>concatenation operator</i>, +, since it is more explicit and works with variables.

In [None]:
word = "day"
x = "Tues" + word
print(x)
y = "Wednes" + word
print(y)

print("Thurs" + word)

#print("Thurs" word)   # This will make Python unhappy

Care must be exercised when trying to combine strings and numbers

In [None]:
print("Tuesday the " + 15 + "th")

You need to convert the number (of type int) to a string first.  Use the str() function for this conversion.

You may recall from previous tutorials that sometimes strings and ints can be operands in the same expression, such as for "hi"&ast;3.  This depends on how the operator is <i>overloaded</i>.

In computer programming, an operator can be made to behave differently depending on the types of its arguments.  This is why 1+1 results in 2, but 1+1.0 results in 2.0:  an int plus an int results in an int, while an int plus a float results in a float.  A string times an int (and an int times a string) results in the string repeated that integer number of times.  However, a string times a float will result in an error, similar to how a string plus an integer resulted in an error:  those operators are not overloaded in those ways.

Operator overloading may seem like unnecessary complication, but even in formal mathematics operators are widely overloaded but are not universal.  One can add scalars, and one can add vectors -- two mathematically different operations that are called "addition".  But we can't add a scalar to a vector.  And yet, just as we can multiply a scalar with a scalar, we can multiply a scalar with a vector.  But there are two common ways to multiply a vector with a vector:  dot products and cross products.  And how to "multiply" matrices is definitely not what someone would expect without a course in matrix algebra.

We saw another example of operator overloading when we discussed the unary "-" operator for negation and the binary "-" operator for subtraction.  Operator overloading is a form of "polymorphism", a terms we will discuss later in the course when we introduce "object oriented programming".

In [None]:
print("Tuesday the " + str(15) + "th")

print("hi!"*3)     # this works

### 1.4.2   escape characters

Sometimes you need to “escape” to access special characters.

An <i>escape character</i> is a character which signals that the following character is to be treated differently.  In Python the backslash is used to access certain characters that would otherwise be ambiguous or unavailable.

<table>
    <tr>
        <th>character</th>
        <th>description</th>
    </tr>
    <tr>
        <td>\n</td>
        <td>new line ("return" character)</td>
    </tr>
    <tr>
        <td>\\</td>
        <td>when you need the backslash itself</td>
    </tr>
    <tr>
        <td>\'</td>
        <td>when you need a single quotation mark</td>
    </tr>
    <tr>
        <td>\"</td>
        <td>when you need a double quotation mark</td>
    </tr>
    <tr>
        <td>\t</td>
        <td>when you need a tab (defaults to 8 spaces)</td>
    </tr>
    <tr>
        <td>\xhh</td>
        <td>ASCII character with hex code hh</td>
    </tr>
    <tr>
        <td>\uhhhh</td>
        <td>16-bit unicode character with hex code hhhh</td>
    </tr>
    <tr>
        <td>\Uhhhhhhhh</td>
        <td>32-bit unicode character with hex code hhhhhhhh</td>
    </tr>
</table>
    

In [None]:
print("He said \"I don't need that backslash...\"")
print('She said "I don\'t need those backslashes..."')
print()
print("Roses are red,\nViolets are... violet.\nIt\'s right in the name.\nDon\'t try to deny it!")
print()
print("Hello, clich\xE9!")
print()
print("e\u02E3 dy/dx\ne\u02E3 dx\ncos sec tan sin\n3.14159\n\u221A \u221B log(\u03C0)\ndis-integrate them RPI")
print()
print("Python? \U0001F40D")

### 1.4.3  string length

The length of a string can be determined using the `len()` function.  The `len()` function returns an integer.  This can be converted to a string using the `str()` function.

Note the use of blanks in concatenated strings to make sure words and phrases are properly separated when printed.

The characters used to escape a character do not count toward the length of the character.  A single unicode character has a length of 1 even though it took ten characters to generate it.

The last example shows how complex strings can be built using the += compound operator.

In [None]:
x = len("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
print("The standard English alphabet has", x, "characters")
print("(though in Old English there were more).")
print()

print("There are " + str(len("0123456789ABCDEF")) + " valid digits in hexidecimal.")
print()

char = "\U0001F40D"     # change the hex code at will
string_to_print  = "The 32-bit unicode character "
string_to_print += char
string_to_print += " has a length of "
string_to_print += str(len(char))
string_to_print += "."
print(string_to_print)

### 1.4.4  raw strings

Sometimes you may not want to risk having a string containing a backslash misinterpreted as an escape character.  This is most common in specifying directory paths in the Windows operating system.

Python allows "raw" strings by prepending the string with r or R:<br />
&nbsp; &nbsp; r'This is a raw string,'<br />
&nbsp; &nbsp; R"and so is this."<br />

There is one exception to raw strings:  they may not be terminated with a single backslash.  Thus,<br />
&nbsp; &nbsp; r'C:\NuMPE' &nbsp; &nbsp; &nbsp; &nbsp; is a valid raw string<br />
&nbsp; &nbsp; r'C:\NuMPE\\' &nbsp; &nbsp; &nbsp; &nbsp; is not valid<br />

In [None]:
path = 'C:\noodle'               # not a raw string; \n gives a new line
print(path)
print()

# use the escape character
file1 = 'C:\\NuMPE\\Homework\\hw01.py'
print(file1)

# or use a raw string
file2 = r'C:\NuMPE\Homework\hw02.py'
print(file2)

#path = r'C:\NuMPE\Homework\'   # the terminating backslash will generate an error
#print(path)

### 1.4.5  triple quotation mark strings (aka docstrings)

One last type of Python string is enclosed in triple quotation marks (using either single or double quotation marks).  This can be used for multi-line comments in Python code, though IDEs will generally not highlight them as comments in the same way that text preceeded by # is.

When we learn about <i>docstrings</i> we will find out that comments in triple quotation marks are interpreted in special ways, and so are discouraged except for those special purposes.

In [None]:
"""
This is a multiline comment
at the top of the code
"""

a = 1
b = 2

'''This is
a multiline comment
in the middle
of the code'''

print(a, b)

# This is the preferred way
# to have multiple lines of
# comments within code.

print (a+b)

string = '''Does this act like a comment or a string?'''
print(string)

### 1.4.6  strings are immutable

Strings can be indexed in much the same way as lists.  However, unlike lists where you can change any of the elements, strings are <i>immutable</i>.  This means that once a string is created, its characters cannot be modified.

New strings can be created with combinations of elements of strings, slices of strings, and whole strings, but once created the new string is also immutable.

In [None]:
name = "Harry Potter"      # the right side is an immutable string literal, assigned to the variable `name`

initials = name[0]+name[6] # you can build new strings, but once built the new string is also immutable
print(initials)
gossip = initials + " fancies GW"
print(gossip)
print()

first = name[:5]
last = name[6:]
len(name)            # returns the number of characters in name
print("First name:", first)
print("Last name is " + last + "\t" + "Length of last name is " + str(len(last)) + " characters.")
print()

#initials[1] = "L"   # this will not work; strings are immutable
#print(initials)
#name[:5] = "Larry"  # this will not work; strings are immutable
#print(name)
new_name = "Larry " + name[6:]   # this will work
print(new_name)

### 1.4.7  pearls before swine:  ways that Python can squeeze you

All computer languages have idiosyncrasies, and Python is no exception.  This section will teach you little to nothing, but it may amuse you to see that even the inventors of computer languages don't get everything right.  It's not just you.

#### 1.4.7.1 unmatched quotation marks

How can this work?  But then how can it not work?

In [None]:
print("Gotcha!""")
print('This does not look like it should work.  But it does.''')

In [None]:
print("""So then this should work... but it does not!  Why?")

Recall from Section 1.4.1 that Python allows implicit concatenation as a convenience, though we cautioned that in general you should instead use the "+" concatenation operator.

In [None]:
x = "Tues" "day"
print('implicitly concatenated: ', x)

y = "Tues" + "day"
print('concatenated using "+":  ', y)

The statement `print("Gotcha!""")` is using implicit concatenation.  Python sees the first quotation mark and reads the string Gotcha! until it finds the second quotation mark, terminating that string.  It then finds the third quotation mark followed by the fourth -- another string, but this string is empty.  It implicitly concatenates Gotcha! and the empty string and successfully prints that to the screen.

This doesn't work when the three quotation marks come first.  Python sees the three quotation marks and begins to read the triple-quoted string... and it won't stop until it finds the terminating three quotation marks.  But Python doesn't find them before the statement runs out.  Instead Python reaches the End Of File ("EOF" -- the end of all of the statements) while "scanning" (reading) the string, and says so in its error message.

#### 1.4.7.2  no last \ in a raw string

In Section 1.4.4 we noted that there is one exception to raw strings:  they may not be terminated with a single backslash.  Thus,<br />
&nbsp; &nbsp; r'C:\NuMPE' &nbsp; &nbsp; &nbsp; &nbsp; is a valid raw string<br />
&nbsp; &nbsp; r'C:\NuMPE\\' &nbsp; &nbsp; &nbsp; &nbsp; is not valid<br />

But it's not unusual to want to build a directory path to a file by starting with a string that ends in a backslash and then concatenating that with a filename.  Why is that not allowed?

In [None]:
# This works as intended
file = r'\DangerNoodle.py'
path = r'C:\NuMPE\Homework'
print(path + file)

In [None]:
# but this is prohibited
file = r'DangerNoodle.py'
path = r'C:\NuMPE\Homework\'   # the terminating backslash in the raw string will generate an error
print(path + file)

This doesn't work because raw strings were not designed to create directory paths.  (Indeed, in Windows you can use forward slashes as in macOS, Linux, and most other operating systems.)

In a raw string the backslash represents itself, but is still also used as an escape character as described in Section 1.4.2.  Thus, the backslash must have a character after it so Python can decide whether the backslash is supposed to be escaping that following character or is representing only itself.

#### 1.4.7.3  you need a babel fish

It seems we can't change "The Answer to the Ultimate Question of Life, The Universe, and Everything".

In [None]:
meaning_of_life = 42
meaning_of_lifе =  1
print(meaning_of_life)

In [None]:
# look deeper into life... especially that second life
print("meaning_of_life" == "meaning_of_lifе", "   <-- shouldn't this be True?!")

In [None]:
# look deeper still... at the last e in each variable that looks like "meaning_of_life"
print("\u0065")      # e, the fifth letter of the English alphabet
print("\u0435")      # е, a Cyrillic small letter pronounced like the "e" in "yes"
print("Can you tell the difference?  Your computer can!")

#### 1.4.7.4  I've got your nose

This is a scam that programmers have been inflicting on themselves since programming began.  Don't fall for this when you write code.

You paid for fourteen words (count them!), but you only got thirteen.

In [None]:
words = ["I",
         "will",
         "sell",
         "you",
         "fourteen",
         "lovely",
         "words",
         "but",
         "you",
         "only",
         "got",
         "an",
         "unlucky"
         "thirteen"]

print("You only got", len(words), "words! ")

#print(words)

It's another case of implicit concatenation.  Print the words and see what's missing.

You can find more of these (though presented irreverently, and most well beyond what we will cover in this course) here:<br>
https://github.com/satwikkansal/wtfpython<br>
and some explanations about the design of Python here:<br>
https://docs.python.org/3/faq/design.html

<br><br>
When your code doesn't work properly it is almost always your fault.  Way, way closer to 100% of the time than you think.  And yet, you can never really be sure.