Geo Data Science with Python,
Prof. Susanna Werth, VT Geosciences

---
### Reading - Lecture 4
 
# Basic Object Types

This lesson discusses more details of the basic Python object types **Numbers**, **Boolean** and **Strings**. It also shows how data can be stored in Python ***lists***, and then deepen some concepts of objects in programming. 


### Content

- <a href='#numbers'> ***Numbers*** </a>
- <a href='#boolean'> ***Boolean*** </a>
- <a href='#strings'> ***Strings*** </a>
- <a href='#lists'> ***Lists and Sequences*** </a>


### Sources
Some elements of Section A-C of this notebook source from Chapter 5 and Chapter 7 of Lutz (2013).
Part D is an adaption of Lesson 2 from the [Geo-Python](https://geo-python.github.io/site/2018/index.html), which is licensed under CC (Attribution-ShareAlike 4.0 International).



<div class="alert alert-info">

**Note**

For any cell in the notebook to run correctly, you have to run all previous Python code cells that define the respective variables. You just need to press **Shift**-**Enter** to run any cell, or click on the "Run" button in the tool bar at the top of the notebook. You can see if a cell has been executed, if a running number appears in the square brackets at the beginning of the cell. If those brackets are empty, the cell has not been executed. To start over, you can restart the Kernel (Menu item *Kernel* > *Restart*)

</div>



---


<a id='numbers'></a>
# A. Numeric Object Types in Python

Effective data-driven science and computation requires understanding how data is stored and manipulated (VanderPlas, 2016).

Most of Python's number types are typical and will seem familiar if you have used other programming languages before. However, numbers are not really a single object type but rather a category. Python supports the usual numeric types (integers and floating point) as well as literals for creating numbers, expressions for processing them and some built-in functions and modules. 
Python also allows to write integers using hexadecimal, octal and binary literals; offers complex number types Python and allows integers to have unlimited precision - they can grow to have as many ditigs as your memory space allows. Lutz (2013) gives the following overview for numeric object types in Python:

Table 1: *Numeric literals and constructors (Lutz, 2013, Table 5.1).*

<img src="./Image_Table_NumericObjects.jpg" alt="Numeric literals and constructors." title="Lutz (2013), Figure 5-1" width="400" />


Built-in numbers are enough to represent most numeric quantities - from your age to your bank balance - but more types are available from external (third-party) Python packages.
Below we briefly introduce the most important ones for this course. These are integer and floating numbers as well as Boolean types. The latter allows for logic operations.

## Integers

Integers are written as strings of decimal digits. These numbers have no fractional component. The size of integer numbers is only limited by your computer's memory. 

Python's basic number types support the normal mathematical operations, like addition and substraction with the plus and minus signs ```+/-```, multiplication with the star sign ```*```, and two stars are used for exponentiation ```**```. Try to edit and execute the following example performing substractions, multiplications and divisions. What happens? Are the results of all of these operations also of type integer?

In [316]:
123 + 222

345

In [317]:
type(123 + 222)

int

Indeed, most mathematical operations involving two integer numbers, will also return an integer number. However, divisions do not return an integer number. This is holds even for divisions without remainder. Instead, thanks to the dynamic typing in Python, we get a floating point number:

In [318]:
type(4/2)

float

In Python 3 (which we are using here, as you can see from the Kernel type at the top right), if you want to specifically perform an integer division, you have to mark this by using a double division symbol: ```//```.

In [319]:
4//2         # integer division

2

In [320]:
type(4//2)

int

Just as a side note: The integer division ```//``` in Python 3 is actually a floor division, provided by the Python module math. We will discuss Python modules, at a later point in the course.

In [321]:
import math
math.floor(123/222)

0

## Floating-point Numbers

Floating-point numbers have a fractional component. A decimal point and/or an optional signed exponent introduced by an ```e``` or ```E``` and followed by an optional sign are used to write floating-point numbers.


In [322]:
type(3.14)    # literal for a floating-point number

float

In [323]:
314e-2        # literal for a floating-point number in scientific notation

3.14

Floating-point numbers are implemented as C "doubles", therefore they get as much precision as the C compiler used to build the Python interpreter gives to doubles (usually that is 15 decimal digits of precision). For more precision, external Python packages have to be used. 
In addition, Python 3 automatically handles a user-friendly output of floating-point numbers. For example, if we define the mathematical constant π to a numeric object, the unformatted output on screen will have the same length.

In [324]:
pi_approximate = 3.14
pi_accurate = 3.141592653589793
print(pi_approximate)
print(pi_accurate)

3.14
3.141592653589793


However, when printing the variable to the screen, you can also change the precision of the output, by using the modulus operator ```%```. If you want to print out 4 digits after the comma, indicate this with ```%.4f``` in the following way:

In [325]:
print('%.4f'%pi_accurate)   # formated screen output using print() for floating-point numbers

3.1416


Alternatively, the output can be formatted in scientific notation or as ingeter number, thought the indicators ```e``` and ```i```:

In [326]:
print('%.4e'%pi_accurate)   # formated screen output using print() for numbers in scientific notation

3.1416e+00


In [327]:
print('%i'%pi_accurate)     # formated screen output using print() for integer numbers

3


We will discuss further details of formatted output using the print function, further below in the section about strings. 

Furthermore, since variables are strongly typed in Python, you cannot change their type, but you can change the output to the screen or assign a changed output to another variable.
For example, the function ```int()``` truncates a floating-point number into an integer number:

In [328]:
int(3.141)

3

And the function ```float()``` does the opposite:

In [329]:
float(3)

3.0

Take notice what happens, if an operation is performed that involves both number types floating-point and integer. In that case, before the Python intepreter performs the operation, it converts the elements of the operation up to the most complicated type. Hence, the output object type of a mathematical operation that includes integer and floating-point numbers will be of floating-point type:

In [330]:
type(40 + 3.141)

float

## Built-in Numeric Tools

We have already mentioned some basic mathematic operations. Now let's discuss more expressions available for processing numeric object types and some built-in functions and modules. We will meet many of these as we go along.

### *Expression operators:*
```+```, ```-```, ```/```, ```*```, ```**```, ```%```, etc.

Expressions operators are used for mathematical operations between two numbers. Above listed are the operands of an addition, substraction, division, multiplication, exponent, and modulus. Go to this website to find a comprehensive list of expression operators: https://www.tutorialspoint.com/python/python_basic_operators.htm

It is important to keep in mind that:
* Mixed operators follow operator precedence (similar to mathematical operations: multiplications precede additions, hence, ```5+1*10=50```. For a full list of precedence orders see section 6.17 in the Python documentation: https://docs.python.org/3/reference/expressions.html#operator-precedence)
* Parantheses group subexpressions (exactly like in mathematics: ```(5+1)*10=60``` but ```5+(1*10)=50```)
* Mixed types are converted up (as already discussed for the last example in the section about floating-point numbers)

### *Built-in functions:*
Python has some built-in functions and some of them are useful for basic numeric processing. Examples are:
```pow()```, ```abs()```, ```round()```, ```int()```, ```hex()```, ```bin()```, etc. The documentation pages of the Python language provides a comprehensive list: https://docs.python.org/3/library/functions.html

### *Utility modules:*
The packages (modules) ```random``` and ```math``` provide further functions useful for mathematical operations. The documentation pages of the Python language provides a comprehenisve overview of functions coming with the math module: https://docs.python.org/3/library/math.html 

Such modules have to be imported before first, and then functions in that module can be accessed by combining their names with a literal ```.``` (similar to the example above using the ```math``` function ```floor()``` ):

In [331]:
import math
math.floor(3.14)

3

The ```math``` module contains more advanced numeric tools as functions. Conveniently, the math module comes also with some mathematical constants and trigonometric functions:

In [332]:
math.sqrt(99)

9.9498743710662

In [333]:
math.pi, math.e    # returns the mathematical constants pi and euler's number e

(3.141592653589793, 2.718281828459045)

In [334]:
math.sin(math.pi/2)

1.0

After importing the ```random``` module, you can perform random-number generation ...

In [335]:
import random
random.random()

0.7900283273216812

... and random selections (here, from a Python *list* coded in square brackets - an object type to be indroduced later in this course module):

In [336]:
random.choice([1,2,3,4]) # choice([L]) chooses a random element from L

1

Go ahead and use the following code cell to try some of the functions and modules in the examples and/or links above (but be aware that some in the links listed functions request more advanced object types, that we haven't discussed yet).

In [337]:
math.ceil(3.14)      # ceil(x) returns the smallest integer >= x.

4

And of course, you can do all of discussed and listed numerical operations with variables that have been assigned with a numerical values.

In [338]:
a = math.pi
b = math.sin(math.pi*5/4)
print(b)

-0.7071067811865475


Using variable of numeric object type with expressions, the following has to be kept in mind for Python:

* Variables are created when they are first assigned values.
* Variables are replaced with their values when used in expressions.
* Variables must be assigned before they can be used in expressions.
* Variables refer to objects and are never declared ahead of time.

Now, you have gained the most important knowledge to use and process variables of numeric object type in Python. For even more complex numerical operations, especially involving data tables, one has to refer to separate, external Python packages. We will discuss modules in general and external Python packages in specific during a later course module. 

## Python HELP ?!

If you ever wonder what a function's function is without starting any literature or internet search, you may always consult the very useful built-in function ```help()```, through which you can request the manual entry for any function:

In [339]:
help(abs)

Help on built-in function abs in module builtins:

abs(x, /)
    Return the absolute value of the argument.



The returned text delivers information about syntax and semantics of the function. This work also for functions of imported modules:

In [340]:
help(math.ceil)

Help on built-in function ceil in module math:

ceil(x, /)
    Return the ceiling of x as an Integral.
    
    This is the smallest integer >= x.



---
<a id='boolean'></a>
# B. Boolean Types: Truth Values, Comparisons & Tests

Python's Boolean type and its operators are a bit different from their counterparts in languages like C. In Python, the Boolean type, ```bool```, is numeric in nature because its two values, ```True``` and ```False```, are basically custom versions of 1 and 0. Also Boolean values ```True``` or ```False``` are treated as numeric *constants* in Python (see the Table 1) and their Boolean object type (```bool```) is actually a subtype (subclass) of integers (```int```). 

In [341]:
type(True)

bool

Let's look at some examples to understand how Boolean types and their operators function in Python.

### Boolean Truth Values
In Python all objects have an inherent *Boolean* true or false value. 
We can define:

* Any nonzero number or nonempty object is true.
* Zero numbers, empty objects, and a special object ```None``` are considered false.

The built-in function ```bool()```, which tests the Boolean value of an argument, is available to request this inherent value for any variable. For example:

In [342]:
a = 0
b = None
c = 10.0
bool(a), bool(b), bool(c)

(False, False, True)

Because of Python's customization of the Boolean type, the output of Boolean expressions typed at the interactive prompt prints as the words ```True``` and ```False``` instead of the older and less obvious ```1``` and ```0```. Most programmers had been assigning ```True``` and ```False``` to ```1``` and ```0``` anyway. The ```bool``` type simply makes this standard. It's implementation can lead to curious results, though. Because ```True``` is just the integer ```1``` with a custom display format, ```True + 4``` yields integer ```5``` in Python!

In [343]:
True + 4

5

By the way, very much like the Boolean values ```True``` and ```False```, also the value ```None``` is a built-in constant. However the  ```None``` value is special, as it basically sets a variable to an empty value (much like a ```NULL``` pointer in C) and it has it's very separate and unique object type:

In [344]:
type(None), type(True)

(NoneType, bool)

See the top of this Python documentation page for explanations of the built-in constants: https://docs.python.org/3/library/constants.html

### Comparisons & Equality Tests
Also comparisons and equality tests return ```True``` or ```False```. Range comparisons can be performed using the expression operators ```<```, ```>```, ```>=```, ```<=```; and equality tests using the expression operators ```==```, ```!=```. For example:

In [345]:
a < c, a==c, b!=c

(True, False, True)

Notice how mixed types are allowed in numeric expressions (only). In the first test above, Python compares an integer and a floating-point number with each other as well as a number with the NoneType. 

### Boolean Tests
Boolean tests use the logical operators ```and``` and ```or``` and they return a true or false operand object. Such Boolean operators combine the results of other tests in richer ways to produce new truth values. For that, revise also the operator precedence ([Table 6.16 of the Python documentation](https://docs.python.org/3/reference/expressions.html)).
More formally, there are three Boolean expression operators in Python, which are typed out as workds in Python (in contrast to other languages):

* ```X and Y``` Is true if both ```X``` and ```Y``` are true
* ```X or Y``` Is true if either ```X``` or ```Y``` is true
* ```not X``` Is true if ```X``` is false (the expression returns ```True``` or ```False```)

Here, ```X``` and ```Y``` may be any truth value, or any expression that returns a truth value (e.g., an equality test, range comparison, and so on).

Keep in mind, that the Boolean ```and``` and ```or``` operators return a true or false object, not the values ```True``` or ```False```. Let's look at a few examples to see how this works. Compare the following comparison:

In [346]:
1 < 2, 3 < 1

(True, False)

... with the output of the following Boolean tests:

In [347]:
1 or 2, 3 or 1

(1, 3)

In [348]:
None or 3

3

In [349]:
0 and 3

0

You can see, that ```and``` and ```or``` operators always return an object. Either the object on the *left* side of the operator or the object on the *right*. If we test their results, using the built-in function ```bool()``` they will be as expected (remember, every object is inherently true or false), but we won't get back a simple ```True``` or ```False```.

Furthermore, Boolean ```or``` tests are done in a so called *short-circuit evaluations*. This means the interpreter evaluates the operand objects from left to right. Once it finds the first true operand, it terminates (short-circuits) the evaluation of the rest of the expression. After the first true operand was found, the values of further operands in the expression won't be able to change the outcome of an ```or``` test: ```true``` or anything is always true.

Similarily, the Python ```and``` operands stop as soon as the result is known. However, in this case Python evaluates the operands from left to right and stops if the left operand is a ```false``` object because it determines the result: false ```and``` anything is always false.

The concept of *short-circuit evaluations* has to be known, to predict the exact output of a Boolean test. Below some examples to study:

In [350]:
True or 20  # Evaluation stops after first True object: result is True

True

In [351]:
10 or 20    # Evaluation stops after first non-zero object: result is 10

10

In [352]:
False and 20 # Evaluation stops after first False: result is False

False

In [353]:
10 and False # Evaluation stops after first False: result is False

False

In [354]:
10 and 20   # Evaluation continues until last object: results is 20
            # (no zero or false object)

20

In [355]:
10 and 20 and 30  # Evaluation continues until last object: results is 30

30

 

### Chained Comparisons

In addition to that, Python allows us to chain multiple comparisons together. Chained compariosns are sort of shorthand for larger Boolean expressions. This allows to perform range tests. For instance, the expression ```(a < b < c)``` tests wheter ```b``` is between ```a``` and ```c```; it is equivalent to the Boolean test ```(a < b and b < c)```. But the former is easier on the eyes (and the keyboard).
For example:

In [356]:
a = 20
b = 40
c = 60

Now compare:

In [357]:
a < b < c

True

with:

In [358]:
a < b and b < c

True

You can build even longer chains or add comparisons into the chained tests. 

In [359]:
1 < 2 < 3 < 4.0 < 5

True

But the resulting expressions can become nonintuitive, unless you evaluate them the way Python does. The following, for example, is false just because 1 is not equal to 2:

In [360]:
1 == 2 < 3    # Same as 1 == 2 and 2 < 3 (not same as False < 3)

False

In this example, Python does not compare the ```1 == 2``` expression's ```False``` result to 3. This would technically mean the same as ```0 < 3```, which would be ```True```.

### Identity Operators

Lastly, identity operators compare the memory locations of two objects. There are two identity operators: ```is``` and ```is not```.

* ```is``` evaluates to true if the variables on either side of the operator point to the same object and false otherwise.
* ```is not``` evaluates to false if the variables on either side of the operator point to the same object and true otherwise.

For example, remember from the last notebook what we have learned about how Variable names are referenced to objects in Python? From that, it becomes obvious the following identity test has to be true:

In [361]:
a = 3
b = a
a is b

True

And with identity tests, we can also show, that the Boolean "number" ```True``` and the integer number ```1``` are of the same value (both are basically an integer number ```1```), but not of the same object:

In [362]:
True == 1    # Same value

True

In [363]:
True is 1    # But a different object

  True is 1    # But a different object


False

### Boolean Types: Summary
So let's summarize briefly, what we have discussed about Boolean types and operators:

* Any nonzero number or nonempty object is true.
* Zero numbers, empty objects, and a special object ```None``` are considered false.
* Comparisons and equality tests are applied recursively to data structures.
* Comparisons, equality tests and identity operators return ```True``` or ```False``` (which are custom versions of 1 and 0)
* Boolean ```and``` and ```or``` operators return a true or false operand object.
* Boolean operators stop evaluating ("short circuit") as soon as a result is known.

Refer back to this website to find a comprehensive list of expression operators, including those for comparisons and equality test as well as logical operators and identity operators: https://www.tutorialspoint.com/python/python_basic_operators.htm

---
<a id='strings'></a>
# C. Strings in Python



Strings are used to record both, textual information (your name, for instance) as well as arbritrary collection of bytes (such as image file's contents). They are our first example, of what in Python we call a ***sequence*** - **a positionally ordered collection of other objects**. Sequences maintain a **left-to-right order** among the items they contain: their items are stored and fetched by their relative positions. Strictly speaking, strings are sequences of one-character strings; other, more general sequence types include *lists* and *tuples*, coverd later (Lutz, 2013). But let's first begin with the syntax for generating strings.

### String Literals

Python strings are easy to use and several syntax forms can be used to generate them. For example, we can assign the a string "```knight's```" to a variable ```S``` in different ways:

In [364]:
S1 = 'knight"s'       # single quotes
S2 = "knight's"       # double quotes
S3 = '''knights'''    # triple quotes
S4 = '\nknight\'s'    # escape sequence
print(S1 , S2 , S3 , S4, )

knight"s knight's knights 
knight's


Single and double-quote characters are interchangeable and they can be enclosed in either. You can also embed one in the other and vice versa, as seen in the examples above. Triple quotes are an alternative to code entire *block strings*. That is a syntactic convenience for coding mulitiline text data.

Escape sequences allow embedding of special characters in string cannot easily be typed on a keyborad. In the string literal, one Backslash ```\``` precedes a character. The character pair is then replaced by a single character to be stored in the string:

* ```\n``` stores a newline
* ```\t``` stores a horizontal tab
* ```\v``` stores a vertical tab
* ```\\```,```\'```,```\''``` for special caracters like Backslash, single quotes or double quotes 

The ```\\``` stores one ```\``` in the string. While the function print replaces the escape characters (see code cell above). However, the interactive echo of the interpreter keeps showing the special characters as escapes:

In [365]:
S4

"\nknight's"

### String Properties
Because strings are sequences, they support operations that assume a positional ordering among its items. For example, one can request the length of a string with the built-in function ```len()```.  And one can select and print out certain items of a string, or in other words, fetch its components with *indexing* expressions.

In [366]:
len(S1)   # len returns ength of a string sequence

8

In [367]:
S1[0]     # returns the first item from the left

'k'

In [368]:
S1[1]     # returns the second item from the left

'n'

In Python, indexing is coded as offsets from the front. The first item is at index 0, the second at index 1 and so on. In addition to that, strings allow the following typcial sequence operations.

* slicing: general form of indexing - extract an entire section (slice) of a string in a single step
* concatenating: joining two strings into a new string
* repeating: making a new string by repeating another

Here some examples:

In [369]:
S1[1:4]    # slicing an index

'nig'

In [370]:
S2 + S3    # concatenating an index

"knight'sknights"

In [371]:
S3*3       # Repetition

'knightsknightsknights'

Index operations will be discussed in more detail in the upcoming reading material.

Another property of strings in Python is *immutability*. In the previous notebook you have learned about the concepts of mutability and immutability. Now, strings being immutable means they cannot be changed in place after they are created: any operations performed on strings cannot overwrite the values of a string object. But you can always build a new one and assign it to the same name. 

To illustrate that, let's look at two examples. Immutabilitity means, that you cannot change a single item of a string like this:

In [372]:
# S1[1]='y'  #un-comment this line to get the error!

Instead, we get a ```TypeError```, stating that string objects do not support item assignment! But we can run expressions to make new objects and reference them to the same name:

In [373]:
S1 = 'y' + S1
print(S1)

yknight"s


In this case, the old object and its reference are then deleted. In fact, Python cleans up old objects as you go. You will learn more about that in the upcoming reading material.

### Formatted output of strings using ```print()```

You have already used ```print()``` to quickly print variable to the screen. The function, however, can be fed with syntax that formats the output of strings and numbers. For that, two different flavors are possible. 

The original technique available since Python's beginning, which is based on the C language and is used widely:

* String formatting expressions: ```'...%s...' % (values)```

A newer technique added since Python 2.6:

* String formatting method calls: ```'...{}...'.format(values)```

The second method is syntactically a bit more complex, expecially since it uses object oriented syntax, which we will discuss at a later point in the course. However, it has a clear advantage, as type codes are optional and different object types handled automatically. 

Both flavors can be used without (as interactive echo of the interpreter) and with the ```print()``` function. Below you can find a list of type codes useful for the second option (string formatting expressions). The list is not complete, but contains all codes relevant for this course.

Table 2: *Selected string Formatting Type Codes.*

| Code           | Meaning / Object Type      
| :-: | :- |
| ```%s```          | String   
| ```%c```          | Character (int or str)   
| ```%d```          | Decimal (base-10 integer)
| ```%i```          | Integer
| ```%e```          | Floating-point with exponent, lower case
| ```%E```          | Same as ```e``` but uses upper case ```E```
| ```%f```          | Floating-point decimal   
| ```%```           | Literal % (coded as %%) 

In the following examples, both formatting techniques are adapted. Try to alter them and learn how they work:

In [374]:
print("The %s robe is green!" % S2)          # formatting expression
print('The {} robe is green!'.format(S2))    # formatting method calls

The knight's robe is green!
The knight's robe is green!


In [375]:
knifes = 2
print("The %s has %i knifes in his hand." % (S2,knifes))
print("The {} has {} knifes in his hand.".format(S2,knifes))

The knight's has 2 knifes in his hand.
The knight's has 2 knifes in his hand.


Precision of floating points can be controlled for the second formatting method by entering parameter into the curvy brackets, for example in the following way if you want to print two digits after the comma. Also the positions of the variable replacements can be switched: 

In [376]:
money = 2.222222
print("The {1:.3f} cents in the {0} pockets were stolen.".format(S2,money))
print("The {0:.3} cents in the {1:0.3} pockets were stolen.".format(S2,money))

The 2.222 cents in the knight's pockets were stolen.
The kni cents in the 2.22 pockets were stolen.


If you like to get into the details of the very flexible string formatting using method calls, check the following pages:
* https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3 
* https://pyformat.info/

### Type Specific Operations and Methods

Lastly, I would like to provide an overview of type specific operations for strings in Python.

Table 3: *String Type Specific Operations (after Lutz, 2013, Table 7-1).*

| Operation                                     | Interpretation      
| :-----------                                   | :----------- |
| ```S1 + S2```                                 | Concatenate   
| ```S1 * 3```                                  | Repeat    
| ```S[i]```                                    | Indexing   
| ```S[i:j]```                                  | Slicing 
| ```len(S)```                                  | Length 
| ```"The sum of 1 + 2 is %i" % (1+2)```        | String formatting expression 
| ```"The sum of 1 + 2 is {0}".format(1+2)```   | String formatting method calls
| ```.find('pa')```                            | String methods: search 
| ```.strip()```                               | Remove all leading and trailing whitespace
| ```.rstrip()```                              | Remove trailing whitespace
| ```.replace('pa','xx')```                    | Replacement
| ```.split(',')```                            | Split on delimiter
| ```.splitlines()```                          | split string at all ‘\n’ and return a list of the lines
| ```.lower()```                               | Case conversion (to lower case)
| ```.upper()```                               | Case conversion (to upper case)
| ```.endswith(spam')```                       | End test

The first seven entries have been addressed in this notebook. All remaining entries are so called methods. Methods are specific functions that are applied with the following syntax: ```stringname.methodname(arguments)```. The methods in the table are specifically designed to handle strings. These methods may appear to alter the content of strings. However, they are actually not changing the original strings but create new strings as results - because strings are immutable.

Investigate and practice the functionality of these methods. You can use the examples below, the Python ```help()``` function or search them in the Python documentation: https://docs.python.org/3/library/stdtypes.html (scroll down to  the section "String Methods"). Alternatively, study the following external Jupyter Notebook, which discusses the most important string methods: https://www.digitalocean.com/community/tutorials/an-introduction-to-string-functions-in-python-3

In [377]:
S = 'Hello World ! '# define a string S

In [378]:
S.find('World')    # find the substring 'World'

6

In [379]:
S.replace('World','Class')        # replace the substring 'World' with 'Class'

'Hello Class ! '

In [380]:
S.rstrip(), S.lower(), S.upper()  # check what happened to the spaces and the letters

('Hello World !', 'hello world ! ', 'HELLO WORLD ! ')

In [381]:
S.split(' ')       # splits the string at a given delimiter (here space)

['Hello', 'World', '!', '']

In [382]:
S                  # even after the performed operations, the immutable string S remains unchanged

'Hello World ! '

In [383]:
help(str.find)     # request help for a method

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



Use the code cells below, to practice the examples.

In [384]:
# add your code here



In [385]:
# add your code here



In [386]:
# add your code here



In [387]:
# add your code here



---
<a id='lists'></a>
# D. Lists and Sequences


# D.1 Let's start with some data

We saw a bit about variables and their values in the last lesson, and we continue here with some variables related to [Observation stations from the Finnland Meteorological Institute (FMI)](http://en.ilmatieteenlaitos.fi/observation-stations). For each station, a number of pieces of information are given, including the name of the station, an FMI station ID number (FMISID), its latitude, its longitude, and the station type. We can store this information and some additional information for a given station in Python as follows:

In [388]:
stationName = 'Helsinki Kaivopuisto'

In [389]:
stationID = 132310

In [390]:
stationLat = 60.15

In [391]:
stationLong = 24.96

In [392]:
stationType = 'Mareographs'

Here we have 5 values assigned to variables related to a single observation station. Each variable has a unique name and they can store different types of data: numbers and strings.

### Reminder: Data types and their compatibility

We can explore the different types of data stored in variables using the `type()` function.

In [393]:
type(stationName)

str

In [394]:
type(stationID)

int

In [395]:
type(stationLat)

float

As expected, we see that the `stationName` is a character string, the `stationID` is an integer, and the `stationLat` is a floating point number.

<div class="alert alert-info">

**Note**

We haven't mentioned it explicitly yet, but the variable names in this lesson use another popular variable format called *camelCase*.
In camelCase the words in the variable name are not separated by underscores or any other character, but rather the first letter is capitalized for all words in the name other than the first one.

</div>

<div class="alert alert-info">

**Note**

Remember, the data types are important because some are not compatible with one another.

</div>

In [396]:
# stationName + stationID  # uncomment to get Error

Here we get a `TypeError` because Python does not know to combine a string of characters (`stationName`) with an integer value (`stationID`).

### Converting data from one type to another

It is not the case that things like the `stationName` and `stationID` cannot be combined at all, but in order to combine a character string with a number we need to perform a data type conversion to make them compatible. For example, we can could convert the `stationID` integer value into a character string using the `str()` function.

In [397]:
stationIDStr = str(stationID)

In [398]:
type(stationIDStr)

str

In [399]:
print(stationIDStr)

132310


In [400]:
stationIDlist = list(stationIDStr)
print(stationIDlist)

['1', '3', '2', '3', '1', '0']


As you can see, `str()` converts a numerical value into a character string with the same numbers as before.

<div class="alert alert-info">

**Note**

Similar to using `str()` to convert numbers to character strings, `int()` can be used to convert strings or floating point numbers to integers and `float()` can be used to convert strings or integers to floating point numbers.

</div>

### Combining text and numbers

Although most mathematical operations operate on numerical values, a common way to combine character strings is using the addition operator `+`.

In [401]:
stationNameAndID = stationName + ": " + str(stationID)

In [402]:
print(stationNameAndID)

Helsinki Kaivopuisto: 132310


Note that here we are converting `stationID` to a character string using the `str()` function within the assignment to the variable `stationNameAndID`. Alternatively, we could have simply added `stationName` and `stationIDStr`.

# D.2 Lists and Indices

Above we have seen a bit of data related to one of several FMI observation stations in the Helsinki area. Rather than having individual variables for each of those stations, we can store many related values in a *collection*. The simplest type of collection in Python is a **list**. And similar to strings, lists are also sequences. However, while strings are not mutable, lists are. Hence, the content of lists is accessible through indicees and can be altered and managed through them. Nowe, let's use lists for storing the FMI station data.

### Creating a list

Let’s first create a list of selected stationName values.

In [403]:
stationNames = ['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']

In [404]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


In [405]:
type(stationNames)

list

Here we have a list of 4 `stationName` values in a list called `stationNames`. As you can see, the `type()` function recognizes this as a list. Lists can be created using the square brackets (`[` and `]`), with commas separating the values in the list.

<div class="alert alert-info">

**Note**

Similar to using `str()`, `int()` and `float()`, the function `list` can be used to convert strings, integers, floating point numbers or other data types into a list.

</div>

### Index values

To access an individual value in the list we need to use an **index value**. An index value is a number that refers to a given position in the list. Let’s check out the first value in our list as an example:

In [406]:
print(stationNames[1])

Helsinki Kaisaniemi


Wait, what? This is the second value in the list we’ve created, what is wrong? As it turns out, Python (and many other programming languages) start values stored in collections with the index value 0. Thus, to get the value for the first item in the list, we must use index 0.

In [407]:
print(stationNames[0])

Helsinki Harmaja


OK, that makes sense, but it may take some getting used to...

### A useful analog - Bill the vending machine

As it turns out, index values are extremely useful, very commonly used in many programming languages, yet often a point of confusion for new programmers. Thus, we need to have a trick for remembering what an index value is and how they are used. For this, we need to be introduced to Bill.

<img src="./Image_BillTheVendingMachine.png" alt="Illustrating indexing: Bill the vending machine." title="Bill the vending machine" width="600" />

Figure 1: *Bill, the vending machine.*


As you can see, Bill is a vending machine that contains 6 items. Like Python lists, the list of items available from Bill starts at 0 and increases in increments of 1.

The way Bill works is that you insert your money, then select the location of the item you wish to receive. In an analogy to Python, we could say Bill is simply a list of food items and the buttons you push to get them are the index values. For example, if you would like to buy a taco from Bill, you would push button `3`. An equivalent operation in Python could simply be

```python
print(Bill[3])
Taco
```

### Number of items in a list

We can find the length of a list using the `len()` function.

In [408]:
len(stationNames)

4

Just as expected, there are 4 values in our list and `len(stationNames)` returns a value of `4`.

### Index value tips

If we know the length of the list, we can now use it to find the value of the last item in the list, right?

In [409]:
# print(stationNames[4]) # uncomment the print statement to test this

What, an `IndexError`?!? That’s right, since our list starts with index 0 and has 4 values, the index of the last item in the list is `len(SampleIDs) - 1`. That isn’t ideal, but fortunately there’s a nice trick in Python to find the last item in a list.

In [410]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


In [411]:
print(stationNames[-1])

Helsinki Kumpula


In [412]:
print(stationNames[-4])

Helsinki Harmaja


Yes, in Python you can go backwards through lists by using negative index values. Index `-1` gives the last value in the list and index `-len(SampleIDs)` would give the first. Of course, you still need to keep the index values within their ranges.

In [413]:
# print(stationNames[-5])  # uncomment the print statement to test this

### Modifying list values

Another nice feature of lists is that they are *mutable*, meaning that the values in a list that has been defined can be modified. Consider a list of the observation station types corresponding to the station names in the `stationNames` list.

In [414]:
stationTypes = ['Weather stations', 'Weather stations', 'Weather stations', 'Weather stations']
print(stationTypes)

['Weather stations', 'Weather stations', 'Weather stations', 'Weather stations']


Now as we saw before, the station type for Helsinki Kaivopuisto should be ‘Mareographs’, not ‘Weather stations’. Fortunately, this is an easy fix. We simply replace the value at the corresponding location in the list with the correct one.

In [415]:
stationTypes[2] = 'Mareographs'
print(stationTypes)

['Weather stations', 'Weather stations', 'Mareographs', 'Weather stations']


### Data types in lists

Lists can also store more than one type of data. Let’s consider that in addition to having a list of each station name, FMISID, latitude, etc. we would like to have a list of all of the values for station ‘Helsinki Kaivopuisto’.

In [416]:
stationHelKaivo = [stationName, stationID, stationLat, stationLong, stationType]
print(stationHelKaivo)

['Helsinki Kaivopuisto', 132310, 60.15, 24.96, 'Mareographs']


Here we have one list with 3 different types of data in it. We can confirm this using the `type()` function.

In [417]:
type(stationHelKaivo)

list

In [418]:
type(stationHelKaivo[0])    # The station name

str

In [419]:
type(stationHelKaivo[1])    # The FMISID

int

In [420]:
type(stationHelKaivo[2])    # The station latitude

float

### Adding and removing values from lists

Finally, we can add and remove values from lists to change their lengths. Let’s consider that we no longer want to include the first value in the `stationNames` list.

In [421]:
print(stationNames)

['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


In [422]:
del stationNames[0]

In [423]:
print(stationNames)

['Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']


`del` allows values in lists to be removed. It can also be used to delete values from memory in Python. If we would instead like to add a few samples to the stationNames list, we can do so as follows.

In [424]:
stationNames.append('Helsinki Lighthouse')
stationNames.append('Helsinki Malmi airfield')

In [425]:
print(stationNames)

['Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula', 'Helsinki Lighthouse', 'Helsinki Malmi airfield']


As you can see, we add values one at a time using `stationNames.append()`. `list.append()` is called a method in Python, which is a function that works for a given data type (a list in this case). We’ll see a bit more about these below.

### The concept of objects

Python is one of a number of computer programming languages that are called ‘object-oriented languages’, and we will focus on this topic later in the semester. It may take quite some time to understand what this means, but in simple words, we can consider variables as ‘objects’ that contain both:

- data known as **attributes**, and 
- a specific set of functions known as **methods**. 

The concept of ‘objects’ might be much easier to understand from the example, below.

### A (bad) example of methods

Let’s consider our list `stationNames`. As we know, we already have data in the list `stationNames`, and we can modify that data using built-in methods such as `stationNames.append()`. In this case, the method `append()` is something that exists for lists, but not for other data types. It is intuitive that you might like to add (or append) things to a list, but perhaps it does not make sense to append to other data types.

In [426]:
stationNameLength  = len(stationNames)

In [427]:
print(stationNameLength)

5


In [428]:
type(stationNameLength)

int

In [430]:
# stationNameLength.append(1)  # uncomment to test

Here we get an `AttributeError` because there is no method built in to the `int` data type to append to `int` data. While `append()` makes sense for `list` data, it is not sensible for `int` data, which is the reason no such method exists for `int` data.

### Some other useful list methods

With lists we can do a number of useful things, such as count the number of times a value occurs in a list or where it occurs.

In [431]:
stationNames.count('Helsinki Kumpula')    
# The count method counts the number of occurences of a value

1

In [432]:
stationNames.index('Helsinki Kumpula')
 # The index method gives the index value of an item in a list

2

The good news here is that our selected station name is only in the list once. Should we need to modify it for some reason, we also now know where it is in the list (index `2`).

### Reversing a list

There are two other common methods for lists that we need to see. First, there is the `.reverse()` method, used to reverse the order of items in a list.

In [433]:
stationNames.reverse()

In [434]:
print(stationNames)

['Helsinki Malmi airfield', 'Helsinki Lighthouse', 'Helsinki Kumpula', 'Helsinki Kaivopuisto', 'Helsinki Kaisaniemi']


Yay, it works!

<div class="alert alert-warning">

**Caution**

A common mistake when sorting lists is to do something like `stationNames = stationNames.reverse()`. **Do not do this!** Lists are mutable and the method `.reverse()` is mutating the list. Also, when reversing lists with `.reverse()` the `None` value is returned (this is why there is no screen ouput when running `stationNames.reverse()`). If you then assign the output of `stationNames.reverse()` to `stationNames` you will reverse the list, but then overwrite its contents with the returned value `None`. This means you’ve deleted the list contents (!).

</div>

In [None]:
print(stationNames.reverse())

In addition to the info in the warning box above, be aware that copying a list variable to a new variable name first might not save you of this mistake. It won't, because if you copy a list to a new variable name, you do not copy the object, just the reference to the object. See the example below think about the result:

In [None]:
print(stationNames)

In [None]:
stationNames_copy = stationNames
stationNames_copy.reverse()

print(stationNames)
print(stationNames_copy)

Reversing the copy `stationNames_copy` of the list `stationNames`, reverses the object that both variable names refer to. Hence, both names return a reversed list. To make an actual duplicated copy of a variable, you have to generate a new object, not just a new variable name. This is specifically important for list, which have many methods that mutate the list object. To achieve that, you have to make a so-called explicit copy of the list, which works the following way:

`newlist = oldlist[:]`.

Now try this with the list `stationNames`, below.

In [None]:
stationNames_copy2 = stationNames[:]
stationNames_copy2.reverse()

print(stationNames)
print(stationNames_copy2)

Now both list objects are sorted in reverse.

### Sorting a list

The `.sort()` method works the same way.

In [None]:
stationNames.sort()   # Notice no output here...

In [None]:
stationNames

<div class="alert alert-info">

**Note**

As you may have noticed, `Helsinki Malmi airfield` comes before `Helsinki lighthouse` in the sorted list. This is because alphabetical sorting in Python places capital letters before lowercase letters.

</div>

### Summary of important list methods
Below a list of important list methods. Take the time to try these methods and practice their functionality.

Table 1: *Important List Methods*

| Method       | Description |
|--------      |-------------|
| `.append(x)`    | Add item x at the end of the list
| `.remove(x)`    | Remove first item that is equal to x, from the list
| `.count(x)`     | Return the number of items that is equal to x
| `.index(x)`     | Return index of first item that is equal to x
| `.reverse()`    | Reverse the order of items in a list
| `.sort()`       | Sort items in a list in ascending order
| `.pop([i])`     | Remove and return item at position i (last item if i is not provided)
| `.insert(i, x)` | Insert item x at position i
| `.zip()`        | Separates and joins lists of lists

More examples and methods can be studied in this tutorial on list methods: https://www.digitalocean.com/community/tutorials/how-to-use-list-methods-in-python-3.

In addition, you can perform any sequence operations that you have learned with strings, also with lists. A great summary of available list (sequence) operations and methods can be found here:
https://www.tutorialspoint.com/python/python_lists.htm.

### Lists of lists
A list can not just contain and mix numbers and strings, they can also contain lists themselves. For example, we could generate a list `databaseFIM`, which contains all the FMI information at once. Let's first complement the station ID numbers for all stations in the list and then build the database:

In [None]:
stationLats  = [ 60.18, 60.15, 60.20, 60.25, 59.95 ];
stationLongs = [ 24.94, 24.96, 24.96, 25.05, 24.93 ];           
stationIDs   = [100971, 132310, 101004, 101009, 101003];
databaseFIM  = [stationNames, stationIDs, stationTypes, stationLats, stationLongs ];

However, simply printing such a nested list won't provide a very illustrative insight, since the lists in the list will just be print after each other:

In [None]:
print(databaseFIM)

For that, list comprehensions are a very useful coding strategy.

### List comprehensions

List comprehensions allow item by item operation of a sequence. In addition, list Comprehension build a new list by running an expression on each item in a sequence, one at a time, from left to right.

<img src="./Image_ListComprehension.png" alt="Concept of List Comprehensions." title="List Comprehensions" width="400" />

Figure 2: *Concept of List Comprehensions.*

A list, or any other iterable object, provides an input sequence (in the example: `num`). Each items within the sequence is assigned to a variable (in the example: `x`). For that, the in keyword is used in a for loop, to iterate over the sequence. 

An additional optional predicate (in the example: `x>0`) can be used to set a conditions under which the  variables will proceed to be processed by the output expression (in the example: `x**2`). 


Begining with a simple version of this concept (without any condition statement), our FIM database can be print row by row:

In [None]:
 [ datarow for datarow in databaseFIM ]

To access one element of the nested list, indice references are attached together:

In [None]:
databaseFIM[1][1]

In [None]:
# note that we cannot get a colum from the database through indexing & slicing
databaseFIM[:][1]
# this command gives you the second list in "list of lists", 
# (does not work as we might expect from experience with Matlab matrixes)

At this point, it is important to understand that the previous expression does not directly access the second item in the second row. It actually selects the second list in the variable and then the second item in that list.

Knowing that, a column of the database (the entry of one selected station) is returned the following way:

In [None]:
[datarow[1] for datarow in databaseFIM]

Now, one might say, the database should have been structured the other way around (colums and rows inverted). And that might be a valid statement, depending on what are the further tasks to solve. **The way lists are nested, should be chosen wisely by the programmer.**


### Learn and practice: list comprehensions on number ranges

Browse the internet for the build-in function `range()`, study its functionality and learn how to use it to create the list from 0 to 10. Put your code below.


In [437]:
# example for creating a list or numbers with fuction range():


Now, write a list comprehension that returns only the even numbers from this list (containing numbers 0 to 10). 

In [436]:
# put your solution here

### Test your learning: Printing a nested list like a matrix
In the following example nested lists are used to build a 3x3 matrix. Then list comprehension is applied to print the matrix as well as selected items, rows and columns.

In [440]:
num = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
num

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

The variable `num` is now a list of lists and it could be treated like a matrix. The sublist `num[0]` is a list (and so are `num[1]` and `num[2]`). To view the entire matrix:

In [441]:
[print(x) for x in num ]

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


[None, None, None]

Also single matrix elements can be accessed:

In [442]:
num[1][1]

5

To retrieve the second row, type:

In [443]:
[x for x in num[1] ]

[4, 5, 6]

How can you retrieve the second column? Try to code this in the following cell.

In [None]:
# please fill out according to example above for databaseFIM



If you like to further reflect list comprehensions, this page provides some useful examples to study them: https://www.digitalocean.com/community/tutorials/understanding-list-comprehensions-in-python-3