![Cloud-First](https://github.com/tulip-lab/sit742/blob/develop/Jupyter/image/CloudFirst.png?raw=1)



# SIT742: Modern Data Science
**(Module: Python Foundations for Big Data)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, change and distribute this package.
- If you found any issue/bug for this document, please submit an issue at [tulip-lab/sit742](https://github.com/tulip-lab/sit742/issues)


Prepared by **SIT742 Teaching Team**

---


## Session 2G: Advanced Data Types

we will learn about advanced data types in addition to the strings and number learnt before. We will examine lists, tuples and dictionaries used for a collection of related data.

## Table of Content


1 [List](#cell_list)

2 [Tuple](#cell_tuple)

3 [Dictionary](#cell_dict)



<a id = "cell_list"></a>


## 1 List

**List is a sequence**

Like a string, a **list** is a sequence of values. The values in a list can be any type. We call the values **items** of the list. To create a list, enclose the items in square brackets.

For example,
               

In [1]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
l = [10, 20, 30, 40]
empty = [ ]   # Initialize an empty list

In [2]:
shoplist

['apple', 'mango', 'carrot', 'banana']

In [3]:
l

[10, 20, 30, 40]

In [4]:
empty

[]

In [5]:
shoplist

['apple', 'mango', 'carrot', 'banana']

The elements of a list do not have to be the same type. An another list can also be nested inside a list.

To access the element of a list, use the bracket operator to obtain the value available at the index. Note that the indices of list start at $0$. You can also use negative value as index, if you counts from right. For example, the negative index of last item is $-1$. Try the following examples:

In [6]:
l = [10, 20, 30, 40]

In [7]:
l[2]

30

In [8]:
l[-1]

40

In [9]:
l[-3]

20

Here is an example of nested list:

In [10]:
l = ['apple', 2.0, 5, [10, 20]]

In [11]:
l[1]

2.0

In [12]:
l[3]

[10, 20]

In [13]:
l[3][1]


20

Unlike strings, lists are mutable, which means they can be altered. We use bracket on the left side of an assignment to assign a value to a list item.

In [14]:
l = [10, 20, 30, 40]
l[1] = 200

In [15]:
l

[10, 200, 30, 40]

**List operation**

 **In** is used to perform membership operation.
 The result of expression equals to **True** if a value exists in the list,
 and equals to **False** otherwise.


In [16]:
shoplist = ['apple', 'mango', 'carrot', 'banana']

In [17]:
'apple' in shoplist

True

In [18]:
'rice' in shoplist

False

Similarly, **In** operator also applies to **string** type. Here are some examples:


In [19]:
'a' in 'banana'

True

In [20]:
'seed' in 'banana'    # Test if 'seed' is a substring of 'banana'

False

'**+**' is  used for concatenation operation

In [21]:
[10, 20, 30, 40 ] + [50, 60]

[10, 20, 30, 40, 50, 60]

'**\***' is used for to repeat the elements in a list a given number of times.

 [50, 60]*3

     

** List slices **
Slicing operation allows to retrieve a slice of of the list. i.e. a part of the sequence. The sliding operation uses square brackets to enclose an optional pair of numbers separated by a colon. Again, to count the position of items from left(first item), start from $0$. If you count the position from right(last item), start from $-1$.

In [22]:
l = [1, 2, 3, 4]

In [23]:
l[1:3]     # From position 1 to position 3 (excluded)

[2, 3]

In [24]:
l[:2]     # From the beginning to position 2 (excluded)

[1, 2]

In [25]:
l[-2:]    # From the second right to the end of the list

[3, 4]

If you omit both the first and the second indices, the slice is a copy of the whole list.

In [26]:
l[:]

[1, 2, 3, 4]

Since lists are mutable, above expression is often useful to make a copy before modifying original list.

In [27]:
l = [1, 2, 3, 4]
l_org = l[:]
l[0] = 8

In [28]:
l

[8, 2, 3, 4]

In [29]:
l_org  # the original list is unchanged


[1, 2, 3, 4]

**List methods**

The methods most often applied to a list include:
- append()
- len()
- sort()
- split()
- join()

**append()** method adds a new element to the end of a list.

In [30]:
l= [1, 2, 3, 4]


In [31]:
l

[1, 2, 3, 4]

In [32]:
l.append(5)
l.append([6, 7])  #list [6, 7] is nested in list l

In [33]:
l

[1, 2, 3, 4, 5, [6, 7]]

 **len()** method returns the number of items of a list.

In [34]:
l = [1, 2, 3, 4, 5]
len(l)

5

In [35]:
# A list nested in another list is counted as a single item
l = [1, 2, 3, 4, 5, [6, 7]]
len(l)

6

 **sort()** arranges the elements of the list from low to high.

In [36]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
shoplist.sort()

In [37]:
shoplist

['apple', 'banana', 'carrot', 'mango']


 It is worth noted that **sort()** method modifies the list in place, and does not return any value. Please try the following:

In [38]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
shoplist_sorted = shoplist.sort()

In [39]:
shoplist_sorted    # No value is returned


There is an alternative way of sorting a list. The build-in function **sorted()** returns a sorted list, and keeps the original one unchanged.

In [40]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
shoplist_sorted = sorted(shoplist)          #sorted() function return a new list

In [41]:
shoplist_sorted

['apple', 'banana', 'carrot', 'mango']

In [42]:
shoplist

['apple', 'mango', 'carrot', 'banana']

There are two frequently-used string methods that convert between lists and strings:

First, **split()** methods is used to break a string into words:

In [43]:
s = 'I love apples'
s.split(' ')

['I', 'love', 'apples']

In [44]:
s = 'spam-spam-spam'
# A delimiter '-' is specified here. It is used as word boundary
s.split('-')

['spam', 'spam', 'spam']

Second, **join()** is the inverse of **split**. It takes a list of strings and concatenates the elements.

In [45]:
l = ['I', 'love', 'apples']
s = ' '.join(l)
s

'I love apples'

How it works:

Since **join** is a string method, you have to invoke it on the *delimiter*. In this case, the delimiter is a space character. So **' '.join()** puts a space between words. The list **l** is passed to **join()** as parameter.

For more information on list methods, type "help(list)" in your notebook.

**Traverse a list**

The most common way to traverse the items of a list is with a **for** loop. Try the following code:

In [46]:
shoplist = ['apple', 'mango', 'carrot', 'banana']
for item in shoplist:
    print(item)

apple
mango
carrot
banana


This works well if you only need to read the items of the list. However, you will need to use indices if you want to update the elements. In this case, you need to combine the function **range()** and **len()**.  

In [47]:
l = [2, 3, 5, 7]

for i in range(len(l)):
       l[i] = l[i] * 2

print(l)

[4, 6, 10, 14]


How it works:

**len()** returns the number of items in the list, while **range(n)** returns a list from 0 to n - 1. By combining function **len()** and **range()**, **i** gets the index of the next element in each pass through the loop. The assignment statement then uses **i** to perform the operation.

<a id = "cell_tuple"></a>

### 3.2 Tuple

**Tuple are immutable **

A **tuple** is also a sequence of values, and can be any type. Tuples and lists are very similar. The important difference is that tuples are immutable, which means they can not be changed.

Tuples is typically used to group and organizing data into a single compound value.  For example,

In [48]:
year_born =  ('David Hilton', 1995)
year_born

('David Hilton', 1995)

To define a tuple, we use a list of values separated by comma.
Although it is not necessary, it is common to enclose tuples in parentheses.

Most list operators also work on tuples.
The bracket operator indexes an item of tuples, and the slice operator works in similar way.

Here is how to define a tuple:

In [49]:
t  = ( ) # Empty tuple
t

()

In [50]:
t = (1)
type(t)  # Its type is int, since no comma is following

int

In [51]:
t = (1,)  # One item tuple; the item needs to be followed by a comma
type(t)

tuple

Here is how to access elements of a tuple:

In [52]:
t = ('a', 'b', 'c', 'd')

In [53]:
t[0]

'a'

In [54]:
t[1:3]

('b', 'c')

But if you try to modify the elements of the tuple, you get an error.

In [55]:
t = ('a', 'b', 'c', 'd')
t[1] = 'B'

TypeError: 'tuple' object does not support item assignment

### Tuple assignment

Tuple assignment allows a tuple of variables on the left of an assignment to be assigned values from  a tuple on the right of the assignment.
(We already saw this type of statements in the previous prac)

For example,


In [56]:
t = ('David', '0233', 78)
(name, id, score) = t

In [57]:
name

'David'

In [58]:
id

'0233'

In [59]:
score

78

Naturally, the number of variables on the left and the number of values on the right have to be the same.Otherwise, you will have a system error.

In [60]:
(a, b, c, d) = (1, 2, 3)


ValueError: not enough values to unpack (expected 4, got 3)

### Lists and tuples

It is common to have a list of tuples. For loop can be used to traverse the data.  For example,

In [62]:
t = [('David', 90), ('John', 88), ('James', 70)]

for (name, score) in t:
       print(name, score)

David 90
John 88
James 70


<a id = "cell_dict"></a>

### 3 Dictionary

A **dictionary** is like an address-book where you can find the address or contact details of a person by knowing only his/her name. The way of achieving this is to associate **keys**(names) with **values**(details). Note that the key in a dictionary must be unique. Otherwise we are not able to locate correct information through the key.

Also worth noted is that we can only use immutable objects(strings, tuples) for the keys, but we can use either immutable or mutable objects for the values of the dictionary. This means  we can use either a string, a tuple or a list  for dictionary values.

The following example defines a dictionary:

In [63]:
dict = {'David': 70, 'John': 60, 'Mike': 85}
dict['David']
dict['Anne'] = 92  # add an new item in the dictionary
dict

{'David': 70, 'John': 60, 'Mike': 85, 'Anne': 92}

**Traverse a dictionary**

The key-value pairs in a dictionary are **not** ordered in any manner. The following example uses **for** loop to traversal a dictionary. Notice that the keys are in no particular order.

In [64]:
dict = {'David': 70, 'John': 60, 'Amy': 85}

for key in dict:
    print(key, dict[key])

David 70
John 60
Amy 85


However, we can sort the keys of dictionary before using it if necessary. The following example sorts the keys and stored the result in a list **sortedKey**. The **for** loop then iterates through list **sortedKey**. The items in the dictionary can then be accessed via the names in alphabetical order. Note that dictionary's **keys()** method is used to return a list of all the available keys in the dictionary.

In [65]:
dict = {'David': 70, 'John': 60, 'Amy': 85}
sortedKeys = sorted(dict.keys())

for key in sortedKeys:
    print(key, dict[key])


Amy 85
David 70
John 60
