# Introduction
In this module, we will expand on the earlier module on variables and introduce the string variable type. Though more accurately described as a data structure, the string data type has many features that are not found in numeric values. To understand these issues, we begin by defining a string as a data structure, and from there discuss how the data in strings are structure. We then discuss the properties and methods that are associated with string values.

# Strings
Though data elements are often thought of as numbers or strings, the appropriate distinction is numbers in characters. So, strings are not a data element, but are instead a data structure. Specifically, a string is a data structure that contains an ordered collection of characters and has properties and methods that are unique to the string object. In this chapter, we will look at the structure of a string, the content of a string and the methods and properties that are accessible through the string object. 

## A String is an Ordered Collection of Characters
It is best to think of a string as an *ordered collection of characters* rather than as a string. Thinking of strings as a collection of characters helps us think more clearly about how strings are structured and how to interact with them in our applications.  

In [35]:
myString = "Collection of Characters"

In [3]:
len(myString)

24

When a string is created, we view it as a single entity, but python treats it as an ordered collection of values. So, the above string assignment is treated in the following way:

|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|C|o|l|l|e|c|t|i|o|n| |o|f| |C|h|a|r|a|c|t|e|r|s|

Because strings are stored in this way, we have access to any single character or subset of the string through indices. For example, the following statements can be used to retrieve a letter or series of letters from our string. The first statement prints the letter in the 4th pointer (which is in the 5th letter in the sequence). 

In [4]:
myString[4]

'e'

The second statement prints a subset of the string which includes all letters from the 1st pointer to the 4th pointer.

In [7]:
myString[1:4]

'oll'

The third statement makes use of the len() function (which counts the number of items in a collection of elements) to subset all characters from the 0th pointer to the last pointer (note: because 0 is the first position and len() provides a count, (len() – 1) calculates the position of the last pointer).

In [11]:
myString[4:]

'ection of Characters'

The next two statements subset the string by extracting all letters up to the 3rd pointer and all letters from the 3rd pointer to the end of the string. 

In [12]:
myString[:15]

'Collection of C'

In [13]:
myString[15:]

'haracters'

In [16]:
myString[:-15]

'Collectio'

In [15]:
myString[-15:]

'n of Characters'

Another consequence of the *an ordered collection of characters* paradigm is that each character is an element that can be traversed or searched. So, loops can be used to iterate through all elements in the ordered collection of characters. Likewise, the in operator can be used to search for elements in the collection. 

In [17]:
i = 0 
while i < len(myString): 
    letter = myString[i] 
    print("Letter " + str(i) + ": " + letter) 
    i = i + 1 

Letter 0: C
Letter 1: o
Letter 2: l
Letter 3: l
Letter 4: e
Letter 5: c
Letter 6: t
Letter 7: i
Letter 8: o
Letter 9: n
Letter 10:  
Letter 11: o
Letter 12: f
Letter 13:  
Letter 14: C
Letter 15: h
Letter 16: a
Letter 17: r
Letter 18: a
Letter 19: c
Letter 20: t
Letter 21: e
Letter 22: r
Letter 23: s


Remember, for loops can be used when you know exactly how many times you want to loop your code. In the case of strings, Python has a built-in knowledge of strings which allows you to loop through each item in the collection of characters. **Note:** When using the for loop to iterate through a string in this way, the letter variable is updated each time through the loop to the *next* item (in this case, character) in the collection. This is why we are able to access the individual letter without knowing their position in the string.

In [18]:
for letter in myString: 
    print("Letter " + str(myString.index(letter)) + ": " + letter)

Letter 0: C
Letter 1: o
Letter 2: l
Letter 2: l
Letter 4: e
Letter 5: c
Letter 6: t
Letter 7: i
Letter 1: o
Letter 9: n
Letter 10:  
Letter 1: o
Letter 12: f
Letter 10:  
Letter 0: C
Letter 15: h
Letter 16: a
Letter 17: r
Letter 16: a
Letter 5: c
Letter 6: t
Letter 4: e
Letter 17: r
Letter 23: s


Strings are our first introduction to 'collections.' In later modules we will look at lists and dictionaries. When iterating through collections, it is often helpful to konw which element the loop is processing. To do this, we can use the `enumerate()` function in our for statment. The `enuerate()` function changes the structure of our for statement slightly because `enumerate()` returns an index and value. We modify our for statement by naming two variables (e.g., index, letter) that will be updated for each element in the collection.

In [19]:
for index, letter in enumerate(myString):
    print("Value of element " + str(index) + ": " + letter)

Value of element 0: C
Value of element 1: o
Value of element 2: l
Value of element 3: l
Value of element 4: e
Value of element 5: c
Value of element 6: t
Value of element 7: i
Value of element 8: o
Value of element 9: n
Value of element 10:  
Value of element 11: o
Value of element 12: f
Value of element 13:  
Value of element 14: C
Value of element 15: h
Value of element 16: a
Value of element 17: r
Value of element 18: a
Value of element 19: c
Value of element 20: t
Value of element 21: e
Value of element 22: r
Value of element 23: s


A final consequence of strings being an ordered collection of characters is that when comparing strings, length is irrelevant and capitalization matters. Length is irrelevant because it is irrelevant when we compare words (which is why ‘zero’ comes after ‘one’ and ‘eighty’ comes before ‘seventy’). Also, capitalization matters because, from the computer’s perspective, an ‘a’ and an ‘A’ are different letters. 

In [21]:
print('zero' < 'zone')

True


In [22]:
print('zero' == 'ZERO')

False


In [23]:
print('one' == '1')

False


In [27]:
print('one' == 'one')

True


## A String is an Object
We aren’t covering object-oriented concepts in this class, but you do need to understand that objects are data structures that have properties and methods in addition to whatever values we might assign. So, our string variable above has the value of ‘Collection of Characters’, but by nature of being a string object, our variable has access to properties and methods that are built in to string objects. A property is some static value that is unique to the object it is meant to describe.  

A method is a function that is built in to the string object that performs some operation on the string value. An example of a string method is the upper() method which returns a uppercased version of the string. The following lines provide a brief selection of string methods available. Refer to page 72 of your text book for a comprehensive list. Also refer to https://docs.python.org/3/library/stdtypes.html#string-methods 

In [33]:
myString.upper()

'HELLO WORLD!'

In [34]:
myString.lower()

'hello world!'

In [38]:
myString.capitalize()

'Collection of characters'

In [41]:
myString.strip('sC')

'ollection of Character'

In [45]:
myString.center(50, "-")

'-------------Collection of Characters-------------'

In [49]:
myString.lower().count('c')

4

In [52]:
myString.endswith('Characters')

True

In [56]:
myString.find('of')

11

In [60]:
myString.isupper()

False

In [61]:
myString.islower()

False

In [62]:
myString.split("of")

['Collection ', ' Characters']

In [64]:
stringParts = myString.split()

In [65]:
stringParts

['Collection', 'of', 'Characters']

In [66]:
len(stringParts)

3

In [67]:
for stringPart in stringParts:
    print(stringPart + " is " + str(len(stringPart)) + " characters long.")

Collection is 10 characters long.
of is 2 characters long.
Characters is 10 characters long.


There are different types of objects in Python, and Strings are classified as static objects. A static object is an object that is immutable (or unchangeable). This means that once a string object is instantiated, it cannot be changed and that any changes you wish to save, must be saved as a new string object. This concept is counter-intuitive, but it will make sense when you look at the ways strings are handled in python. So, the first example does not work because you cannot change the value of any element in the ordered collection characters. 

In [70]:
i = 0 
while i < len(myString): 
    myString[i] = myString[i].upper() 
    print(myString)
    i = i + 1

TypeError: 'str' object does not support item assignment

However, the second example does work because the code creates a new instance of the myUpperString each time through the loop. 

In [85]:
i = 0 
myUpperString = ""
while i < len(myString): 
    myUpperString = myUpperString + myString[i].upper() 
    print(myUpperString)
    i = i + 1

C
CO
COL
COLL
COLLE
COLLEC
COLLECT
COLLECTI
COLLECTIO
COLLECTION
COLLECTION 
COLLECTION O
COLLECTION OF
COLLECTION OF 
COLLECTION OF C
COLLECTION OF CH
COLLECTION OF CHA
COLLECTION OF CHAR
COLLECTION OF CHARA
COLLECTION OF CHARAC
COLLECTION OF CHARACT
COLLECTION OF CHARACTE
COLLECTION OF CHARACTER
COLLECTION OF CHARACTERS


## String Presentation
Formatting strings can be one of the most time-consuming aspects of programming. To create a dynamic and informative application interface, you will often find yourself needing to parse strings and to pipe variable values into an output string. Python offers many solutions to this problem (oftentimes referred to as *string interpolation*). The following lines of code illustrate these options.  

In [76]:
firstName = "Jake"
lastName = "London"
myAge = 46

In [78]:
print("My name is " + firstName + " " + lastName + ", and I am " + str(myAge) + " years old.") 

My name is Jake London, and I am 46 years old.


In [77]:
print("My name is", firstName, lastName, ", and I am", myAge, "years old.") 

My name is Jake London , and I am 46 years old.


In [79]:
print("My name is %s %s, and I am %d years old." % (firstName, lastName, myAge)) 

My name is Jake London, and I am 46 years old.


In [80]:
print("My name is {} {}, and I am {} years old.".format(firstName, lastName, myAge)) 

My name is Jake London, and I am 46 years old.


In [81]:
print("My name is {1} {0}, and I am {2} years old.".format(firstName, lastName, myAge)) 

My name is London Jake, and I am 46 years old.


**Note:** The following examples will not work because Azure notebooks does not support Python 3.6+. This is unfortunate, because it is my favorite (because it is the most readable) method for presenting strings. This method will work on your version of Python.

In [82]:
print(f"My name is {firstName} {lastName}, and I am {myAge} years old.") 

My name is Jake London, and I am 46 years old.


In [83]:
myMessage = f'''
My name is {firstName} {lastName}, 
and I am {myAge} years old.
'''

print(myMessage) 


My name is Jake London, 
and I am 46 years old.



In [84]:
myMessage

'\nMy name is Jake London, \nand I am 46 years old.\n'

# Exercise
Write code to detect university email addresses. Prompt the user for their email address and tell them whether or not their email address is a university account (ends in a .edu).

In [None]:
# Step 1...

# Step 2...