In [None]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:70% !important; }</style>"))

# Lecture 3 - Python's Data Structures, Iteration, Modules, Miscellanea *
---

### Content

1. Lists
2. Dictionaries
3. Tuples
4. Iteration (*for* loops)
4. Importing modules


\* Content in this notebook borrows material in the Python Tutorial resource found on http://www.python-course.eu/course.php.

### Learning Outcomes

At the end of this lecture, you should be able to:

* describe what data structures are  
* list three of Python's most important data structures 
* describe the differences between lists, dictionaries and tuples
* use lists, dictionaries and tuples at an introductory level
* explain the role of iterative constructs in programming
* use a *for* loop at an introductory level
* describe the purpose of modules
* import modules and use their functionality

## Python's Data Structures

Data structures are an important topic of study in computer science and an indispensable part of any program. 

A data structure is an approach to organizing data in a computer so that it can be used stored and retrieved efficiently.

Up to this point, we have seen how ordinary variables can be used to store a single value. However, in any real-world program or even simple script, it is vital to have access to other variable types (data structures) that can hold more than one value. 

Three data structures will be considered here: lists, dictionaries and tuples. 

### Lists

Lists are sequences of values. Each element in a list is assigned a number - its **position** or **index**. The first index is zero, the second index is one, and so forth.

The list is a most versatile data type in Python. Items in a list need not all have the same type. Furthermore lists can grow in a program run. 

We begin firstly with an empty list.

In [None]:
x = []
print (x)

In [None]:
#an example of a constructing a list
x = [1, 40, 33, 20]
print (x)

In [None]:
type(x)

We access elements within a list using the index or the position of a value in a list, beginning with 0, enclosed within square brackets. 

In [None]:
#access the second element in the list
print (x[1])

In [None]:
x[1]

Lists come with a host of built-in operators (functions) which we can apply to them. These can be accessed by using the '.' operator.

In [None]:
x.

One of these functions is *append*, which as the name suggests, allows us to add new elements to the end of an existing list.

In [None]:
x.append(3)
print (x)

One of the extremely versatile features of Python's lists is that they allow you to store mixtures of any types of data within the same list.

**Exercise: Append the string 'flexible' to the list x and print its contents.**

We can insert elements into the list as follows:

In [None]:
x.insert(1, 3)
print (x)

We can change the value at any position.

In [None]:
x[2] = 'forty'
x

We have access to a built in function called len() which tells us the number of elements in both data types as well as strings 

In [None]:
len(x)

**Exercise: Take a look at all the available functions that come with lists and use the appropriate one to remove the value 33 from the list x.**

**Exercise: Create a list containing 5 unordered integers (values of your choosing). Take a look at all the available functions that come with lists and use the appropriate one to sort all the elements in the list.**

**Exercise: Take a look at all the available functions that come with lists and use the appropriate one to output the number of time the value '3' appears in the list x.**

Lists can be easily added together


In [None]:
y = [1, 2, 3]
z = [4, 5, 6]
yz = y + z
print (yz)

In [None]:
#we can apply multiplication to lists
[1, 2, 3] * 2

A useful operator for finding out if a certain value is present within a list is called *in* 

In [None]:
2 in y

**Exercise: Given the two lists y and z above, write code that appends the last 2 elements from z to y and removes them from z. Use the *in* operator to make sure that both 5 and 6 are inside the list y.**

### Slicing

Sequential data structures like lists can get very large and additional constructs are often required for accessing and manipulating portions of them. This is where 'slicing' comes in.

You can select sections of list-like types (arrays, tuples, NumPy arrays) by using the slice notation, which in its basic form consists of *'start:stop'* passed to the indexing operator []:

In [None]:
x = [1, 3, 3, 20, 'flexible', 'forty']
print(x)

To access the two string elements in the list x, we can do the following:

In [None]:
x[4:6]

We also could have written:

In [None]:
x[4:]

Alternatively, we could also select all the integer elements in the list using this notation.

In [None]:
x[:4]

Notice that in all cases the *stop* index is not included.

**Exercise: Given the list x above, write code that selects the [3, 20, 'flexible'] elements from it using the slicing notation.**

### Dictionary


Alongside lists, dict is one of the most important built-in Python data structures. A more common name for it in other languages is hash map or associative array. 

Prior to python 3.6, while lists were ordered sets of objects, dictionaries were unordered sets.  As of python 3.6, Dicts are now also ordered according to order of insertion. A key difference between lists and dicts is that items in dictionaries can be accessed via **keys** and not via their **position**. Any key of the dictionary is associated (or mapped) to a value. The values of a dictionary can be any Python data type. Dictionaries are therefore key-value-pairs.

Once again we begin with an empty dictionary.

In [None]:
y = {}
print (y)

In [None]:
#map words to numbers
y = {'one' : 1,
     'two' : 2,
     'three' : 3}
y

Accessing values through keys:

In [None]:
y['two']

Adding additional key-value pairs:

In [None]:
y['six'] = 6
y

We can insert different data types.

In [None]:
y['first_five'] = [1,2,3,4,5]
y

**Exercise: Insert into the above dictionary the key-value pair 'ten' and 10.**

We can list all the keys and all the values in a dictionary

In [None]:
print (y.keys())
print (y.values())

**Exercise: Find out what data types are returned when we call keys() and values() on the above dictionary.**

If we convert these objects to a list, we can access them by their index

In [None]:
keys = list(y.keys())
print (keys)
print (keys[2])

**Exercise: Write code to test if 1 is a value contained in the above dictionary.**

Often, dictionaries need to be constructed from data that is contained in two lists. This can easily be accomplished using a function called zip().

In [None]:
zip?

In [None]:
list_1 = [ 'one', 'two', 'three']
list_2 = [1, 2, 3]
list_3 = list(zip(list_1, list_2))
print (list_3)

By passing the output of the zip function into a dictionary constructor, we can create a dictionary made up of matched keys and values.

In [None]:
d = dict(zip(list_1, list_2))
print (d)

### Tuple

A tuple is a sequence of elements similar to a list. The only difference is that tuples can't be changed i.e., tuples are thus immutable and also use parentheses, while lists use square brackets.

It useful to at least be familiar with the concept of tuples as a number of functions return them.

Creating a tuple is as simple as putting different comma-separated values and optionally you can put these comma-separated values between parentheses also. For example:

In [None]:
z = (2 , 4)
print (z)

In [None]:
#alternatively
z = 2, 4
z

The contents of tuples are accessed in the same way as that of lists, and they can contain elements of mixed data types.

In [None]:
z[1]

**Exercise: Write code to alter the first element of the above tuple from 2 to 0.**

## Iteration

Say we have a list:




In [None]:
my_list = [3, 6, 7, 77, 35, 234, 1, 978, 656, 80, 44, 4]
len(my_list)

The above list has 12 elements in it. What if you needed to do calculations on each of the elements in the list by multiplying each number by 3. How would you do it? We could of course begin by manually accessing each element and multiplying it by 3 as follows: 


In [None]:
print (my_list[0] * 3)
print (my_list[1] * 3)
print (my_list[2] * 3)
print (my_list[3] * 3)
print (my_list[4] * 3)
print (my_list[5] * 3)
print (my_list[6] * 3)
print (my_list[7] * 3)
print (my_list[8] * 3)
print (my_list[9] * 3)
print (my_list[10] * 3)
print (my_list[11] * 3)

The above is tedious but possible. But what if the list has millions of data items in it (which is not out of the ordinary)? Clearly the above approach becomes insufficient and a new more efficient construct becomes necessary.

Loops, are an indispensable programming construct that enable efficient iteration. It is inconceivable to imagine a useful program without some mechanism that implements iteration.

Python, like all languages, has a number of different looping constructs. We will look at *for*-family of loops, while there are others like *while* that in essence accomplish the same functionality.

Let's look at the above problem and solve it efficiently using a *for* loop.

In [None]:
for x in my_list:    #x is a variable that changes at each iteration by becomming a variable that points to each incremented index value
    print (x * 3 )     #the statements here are executed once per iteration and form a code block signified by the colon in the first line

**Exercise: Write code that iterates through my_list and prints "Above 100" if the value is greater than 100, otherwise prints "Below 100".**

**Exercise: Write code that iterates through my_list from the 6th element to the end and prints the value.(Hint: use slicing in the list)**

The built-in function range() is a helpful and commonly used function to iterate over a sequence of numbers. It generates lists of arithmetic progressions: 

In [None]:
list(range(10))

range(n) generates the progression of integer numbers starting with 1 and ending with (n -1) 

It can also be called with more arguments such as: 
range(begin,end) or
range(begin,end, step)

In [None]:
list(range(5,10))

In [None]:
list(range(0,10, 2))

With the range() function, we can now control the iteration sequence in a more powerful way.

In [None]:
for i in range(1,10):
    print (i)

In [None]:
for i in range(1,10):
    print (my_list[i])

**Exercise: Write code that iterates through my_list and prints every element on the odd position**

We can also iterate though dictionaries.

In [None]:
y = {'first_five': [1, 2, 3, 4, 5], 'one': 1, 'six': 6, 'three': 3, 'two': 2}
for key in y:
    print (key)

**Exercise: Write code that iterates through the above dictionary and prints every value associated with a key**

There is also a way to use a built-in dictionary functionality and iterate directly through the values: 

In [None]:
y.values()

In [None]:
for val in y.values():
    print (val)

<h3>Loop Control Statements:</h3>

<p>Loop control statements change execution from its normal sequence. When execution leaves a scope, all automatic objects that were created in that scope are destroyed.</p>
<p>Python supports the following control statements. Click the following links to check their detail.</p>
<table class="src">
<tr><th style="width:30%">Control Statement</th><th>Description</th></tr>
<tr><td>break statement</td><td>Terminates the <b>loop</b> statement and transfers execution to the statement immediately following the loop.</td></tr>
<tr><td>continue statement</td><td>Causes the loop to skip the remainder of its body and immediately retest its condition prior to reiterating.</td></tr>
<tr><td>pass statement</td><td>The pass statement in Python is used when a statement is required syntactically but you do not want any command or code to execute.</td></tr>
</table>




In [None]:
y.values()

In [None]:
for val in y.values():
    if type(val) == list:
        continue
    elif val == 2:
        break
    else:
        pass    
    print (val)

## Modules

Developing programs which are readable, reliable and maintainable requires some kind of modular software design. This becomes more important as an application's source code increases in size.  

Modular programming is a software design technique that enables the division of the source code into separate parts (files). These parts are called modules. The goal is that each module handles some cohesive set of functionalities and that the dependency between all the modules is kept to a minimum. 

Once all the modules are completed, the executable application is created by putting them all together.

In order to be able to access the functionality in different modules, they have to be explicitly imported into an application's source code.

It is a convention to make all import statements at the top of an applications source code.

For example, Python possesses a module called *math* which has more powerful mathematical functions that we can use apart from the operators we have already come across. One of these is *log()*

In [None]:
log(10)

The above produces an error, however, if we import this module, then we gain access to the above function.

In [None]:
import math

math.log(10)

Notice that we had to prefix log with math in order to access this function. Often, the module names are long and it becomes cumbersome to write the prefix each time when accessing functionality from them. This can be overcome by redefining the module's namespace. This is the recommended way of using modules.

In [None]:
import math as m

m.log(10)

We can find out what is inside a module by running the following

In [None]:
dir(math)


# Miscellanea - Casting, Strings, IPython environment specifics

### Casting

Casting converts one data type to another.



In [None]:
x = 3
print (type(x))
x = float(x)
print (type(x))
print (x)

In [None]:
print (x + " adding an integer and a string should not work")

In [None]:
print (str(x) + " adding a string with another string should work")

In [None]:
y = 3.14
print (int(y))

In [None]:
z = "23.34534534534534"
print (float(z))

**Exercise: Write code that converts the string "23.34534534534534" into an int. (Hint: you will have to perform 2 conversions to eventually end at an int.)**

In [None]:
"23.34534534534534"

### More on Strings

String processing is an important task of data analysis and cleaning.

Python provides a rich set of functionalities to process strings.

Individual letters within strings can be accesses using the [] notation and strings can also be sliced like data structures. However, strings are immutable and thus cannot be altered.

In [None]:
my_string = "Hello World!"
print (my_string)
print (my_string[0])

In [None]:
print (my_string[6:])

In [None]:
my_string[-6:-1]

**Exercise: Write code that changes the first letter of the string my_string into a letter 'Y'.**


We can convert a string into a list.

In [None]:
my_list = list(my_string)
print (my_list)

**Exercise: Write code that prints the length of the string below and says "I am 17 characters long.".**


In [None]:
my_string = "How long am I?..."

#your code here

Sometimes strings can be very long and contain mixtures of single and double quotation symbols that cause problems. To get around this, we can define complex strings using triple double quotations

In [None]:
x = """
Python was conceived in the late 1980s[25] and its implementation was started in December 1989[26] by Guido van Rossum at CWI in the Netherlands as a successor to the ABC language (itself inspired by SETL)[27] capable of exception handling and interfacing with the Amoeba operating system.[5] Van Rossum is Python's principal author, and his continuing central role in deciding the direction of Python is reflected in the title given to him by the Python community, benevolent dictator for life (BDFL).

About the origin of Python, Van Rossum wrote in 1996:[28]

    Over six years ago, in December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office ... would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus).

Python 2.0 was released on 16 October 2000, and included many major new features including a full garbage collector and support for Unicode. With this release the development process was changed and became more transparent and community-backed.[29]

Python 3.0 (also called Python 3000 or py3k), a major, backwards-incompatible release, was released on 3 December 2008[30] after a long period of testing. Many of its major features have been backported to the backwards-compatible Python 2.6 and 2.7.[31]
"""
print (x)

### IPython environment

IPython Notebook comes with a range of helper functions.

To access the list of all of them type the following and execute:

In [None]:
%quickref

The following command will list all the variables defined in the interactive namespace

In [None]:
%who

History lists the history of all the entered commands


In [None]:
%history

The contents of a code block can be written directly to a file as follows: 

In [None]:
%%file test.py

x = """
Python was conceived in the late 1980s[25] and its implementation was started in December 1989[26] by Guido van Rossum at CWI in the Netherlands as a successor to the ABC language (itself inspired by SETL)[27] capable of exception handling and interfacing with the Amoeba operating system.[5] Van Rossum is Python's principal author, and his continuing central role in deciding the direction of Python is reflected in the title given to him by the Python community, benevolent dictator for life (BDFL).

About the origin of Python, Van Rossum wrote in 1996:[28]

    Over six years ago, in December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office ... would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus).

Python 2.0 was released on 16 October 2000, and included many major new features including a full garbage collector and support for Unicode. With this release the development process was changed and became more transparent and community-backed.[29]

Python 3.0 (also called Python 3000 or py3k), a major, backwards-incompatible release, was released on 3 December 2008[30] after a long period of testing. Many of its major features have been backported to the backwards-compatible Python 2.6 and 2.7.[31]
"""
print (x)

We can find out what the current directory is that the above file was written to


In [None]:
import os

my_working_dir = os.getcwd()
my_working_dir

In [None]:
os.listdir(my_working_dir)

And we can run the file that we saved

In [None]:
%run test.py

In [None]:
%%javascript
require(['base/js/utils'],
function(utils) {
   utils.load_extensions('calico-spell-check', 'calico-document-tools', 'calico-cell-tools');
});