# Collections and Tuples

As part of this module we will get an overview of collections and tuples that are part of the standard library of Python.

* Overview of Collections and Tuples
* Tuples
* Collections - list
* Collections - set
* Collections - dict
* List of Tuples
* Using Data Structures

## Overview of Collections and Tuples
Let us understand details about Collections and Tuples.
* A Collection is nothing but a group of homogeneous elements while Tuple is a group of heterogeneous elements.
* Collection is like a spreadsheet or a table while Tuple is like one row in them. We typically create a collection of objects or tuples.
* Standard library of Python covers 3 types of collections.
  * list
  * set
  * dict
* Depending upon the characteristics of each collection type, we have different functions. We will see those details later.
* There are some functions which are applicable to all.
  * Getting a number of elements in a collection or a tuple - len
  * Getting the sum of all elements in a collection or a tuple of integers - sum


## Tuples
Now let us understand definition and characteristics of a tuple.
* Tuple is like object with unnamed attributes
* Values of attributes can be accessed only using positional notation
* It represents individual row in a table or spreadsheet with multiple attributes
* We use () to represent tuples
* Tuples are immutable
* Very limited operations are available - e.g.: count, index

### Tasks
Let us perform few tasks related to tuples.

* Create 3 tuples with order_id, order_date, order_customer_id, order_status.

| order_id | order_date | order_customer_id | order_status |
| --- | --- | --- | --- |
| 1 | 2013-07-25 00:00:00.0 | 11599 | CLOSED |
| 2 | 2013-07-25 00:00:00.0 | 256 | PENDING_PAYMENT |
| 3 | 2013-07-25 00:00:00.0 | 12111 | COMPLETE |


In [1]:
order1 = (1, '2013-07-25 00:00:00.0', 11599, 'CLOSED')

In [2]:
order2 = (2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT')

In [3]:
order3 = (3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')

In [6]:
type(order1)

tuple

In [4]:
help(order1)

Help on tuple object:

class tuple(object)
 |  tuple(iterable=(), /)
 |  
 |  Built-in immutable sequence.
 |  
 |  If no argument is given, the constructor returns an empty tuple.
 |  If iterable is specified the tuple is initialized from iterable's items.
 |  
 |  If the argument is a tuple, the return value is the same object.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(self, /)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<

In [28]:
order3

(3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')

In [29]:
order3.index?

[0;31mSignature:[0m [0morder3[0m[0;34m.[0m[0mindex[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0mstop[0m[0;34m=[0m[0;36m9223372036854775807[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return first index of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [7]:
order3.index(3)

0

In [9]:
order1[1]

'2013-07-25 00:00:00.0'

## Collections - list
Let us understand **list** in detail.
* Group of elements with index and length
* Elements can be added/inserted at a particular position
* We can access elements in list by using index in []
* There can be duplicates in a list
* APIs are available to add elements to the list, delete elements from the list and sort the list

### Tasks
Let us perform few tasks to understand more about list operations.

* Create list of employees. Make sure each item in the list is a tuple.

In [11]:
employees = [
    (1, 'Scott', 'Tiger', 1000.0, 'United States'),
    (2, "Henry", "Ford", 1250.0, "India"),
    (3, "Nick", "Junior", 750.0, "united KINGDOM"),
    (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
]

In [12]:
type(employees)

list

In [None]:
help(employees)

* Adding elements into list (append, insert)

In [15]:
employees.append?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Append object to the end of the list.
[0;31mType:[0m      builtin_function_or_method


In [17]:
employees.append((5, 'Donald', 'Duck', 1800.0, 'USA'))

In [18]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [16]:
employees.insert?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0minsert[0m[0;34m([0m[0mindex[0m[0;34m,[0m [0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Insert object before index.
[0;31mType:[0m      builtin_function_or_method


In [19]:
employees.insert(3, (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'))

In [21]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [22]:
employees[3]

(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')

* Deleting elements from list (pop, clear)

In [23]:
employees.pop?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mpop[0m[0;34m([0m[0mindex[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.
[0;31mType:[0m      builtin_function_or_method


In [24]:
employees.pop()

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [25]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA')]

In [26]:
employees.pop(3)

(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')

In [27]:
employees.clear?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mclear[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Remove all items from list.
[0;31mType:[0m      builtin_function_or_method


* Checking how many times an element is repeated in list (count)

In [50]:
l1 = [1, 'Hello']

In [52]:
type(l1[1])

str

In [30]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [32]:
type(l[0])

int

In [33]:
l.count?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mcount[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return number of occurrences of value.
[0;31mType:[0m      builtin_function_or_method


In [34]:
l.count(4)

3

In [35]:
s = '1244573421'

In [36]:
s.count('4')

3

* Get the position of element (index)

In [37]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA')]

In [40]:
employees.index((2, 'Henry', 'Ford', 1250.0, 'India'))

1

In [41]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [42]:
l.index?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mindex[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0mstop[0m[0;34m=[0m[0;36m9223372036854775807[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return first index of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [43]:
l.index(4)

2

In [46]:
l.index(4, 3)

3

In [47]:
l.index(4, 6)

7

In [48]:
l.index(4, 5, 7)

ValueError: 4 is not in list

In [49]:
help(l)

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

* Accessing elements in list using index and range of index (from the beginning). As `str` is nothing but list of characters, these worked for strings in the past

In [53]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [56]:
l[:3]

[1, 2, 4]

In [58]:
l[3:6]

[4, 5, 7]

In [59]:
l[:6]

[1, 2, 4, 4, 5, 7]

In [60]:
l[3:]

[4, 5, 7, 3, 4, 2, 1]

In [61]:
employees = [(1, "Scott", "Tiger", 1000.0, "united states"),
             (2, "Henry", "Ford", 1250.0, "India"),
             (3, "Nick", "Junior", 750.0, "united KINGDOM"),
             (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
            ]
employees.append((5, "Donald", "Duck", 1800.0, "USA"))
employees.insert(3, (6, "Mickey", "Mouse", 2000.0, "Disney Land"))

employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [62]:
employees[0]

(1, 'Scott', 'Tiger', 1000.0, 'united states')

In [63]:
employees[5]

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [65]:
employees[1:2]

[(2, 'Henry', 'Ford', 1250.0, 'India')]

In [66]:
employees[:3]

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

In [67]:
employees[-3:]

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [68]:
employees[3:6]

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

* Accessing elements in list using index and range of index (from the end).

In [69]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [70]:
len(l)

10

In [71]:
l[-3:]

[4, 2, 1]

In [73]:
l[-5:-2]

[7, 3, 4]

* Sorting elements in the list (sort for in place sort and sorted for sorting and creating new collection)

In [86]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [87]:
l

[1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [88]:
l.sort?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0msort[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Stable sort *IN PLACE*.
[0;31mType:[0m      builtin_function_or_method


In [89]:
l.sort(reverse=True)

In [90]:
l

[7, 5, 4, 4, 4, 3, 2, 2, 1, 1]

In [77]:
sorted?

[0;31mSignature:[0m [0msorted[0m[0;34m([0m[0miterable[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a new list containing all items from the iterable in ascending order.

A custom key function can be supplied to customize the sort order, and the
reverse flag can be set to request the result in descending order.
[0;31mType:[0m      builtin_function_or_method


In [91]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [92]:
sorted(l, reverse=True)

[7, 5, 4, 4, 4, 3, 2, 2, 1, 1]

In [93]:
l

[1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [146]:
set(l)

{1, 2, 3, 4, 5, 7}

In [95]:
sorted(employees)

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [96]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [99]:
t = (1, 'Scott', 'Tiger', 1000.0, 'united states')

In [100]:
t[3]

1000.0

In [102]:
sorted(employees, key=lambda t: t[3], reverse=True)

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

In [103]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [105]:
employees.sort(key=lambda t: t[4])

In [107]:
employees

[(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states')]

In [111]:
employees[1][2]

'Mouse'

## Collections - set

Let us understand **set** in detail.
* Group of unique elements with no index or length
* Elements can be added/inserted but not at a particular position
* We can check whether the element exists using in operator
* There can be no duplicates in a set
* APIs are available to add elements to the set, delete elements from the set and perform set operations such as union, intersection etc
* We need to convert set to list to sort the data or use sorted function. There is no API available in set to sort it.

In [114]:
s = {1, 2, 2, 1, 2, 2}

In [116]:
s[0]

TypeError: 'set' object is not subscriptable

### Exercises

We will see some basic set operations by using simple examples
* Create a set of 3 employees with ids 1, 2 and 3 using elements from **employees** list.

In [137]:
employees_set = {(1, "Scott", "Tiger", 1000.0, "united states"),
                 (2, "Henry", "Ford", 1250.0, "India"),
                 (3, "Nick", "Junior", 750.0, "united KINGDOM")
                }

In [122]:
type(employees_set)

set

In [123]:
employees_set?

[0;31mType:[0m        set
[0;31mString form:[0m {(3, 'Nick', 'Junior', 750.0, 'united KINGDOM'), (2, 'Henry', 'Ford', 1250.0, 'India'), (1, 'Scott', 'Tiger', 1000.0, 'united states')}
[0;31mLength:[0m      3
[0;31mDocstring:[0m  
set() -> new empty set object
set(iterable) -> new set object

Build an unordered collection of unique elements.


* Adding elements into set (add) - Add employees with ids 4, 5.

In [124]:
employees_set.add?

[0;31mDocstring:[0m
Add an element to a set.

This has no effect if the element is already present.
[0;31mType:[0m      builtin_function_or_method


In [125]:
employees_set.add((4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'))

In [126]:
employees_set.add((5, 'Donald', 'Duck', 1800.0, 'USA'))

In [127]:
employees_set

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')}

* Deleting elements from set (pop/remove, clear)

In [128]:
employees_set.pop?

[0;31mDocstring:[0m
Remove and return an arbitrary set element.
Raises KeyError if the set is empty.
[0;31mType:[0m      builtin_function_or_method


In [129]:
employees_set.pop()

(1, 'Scott', 'Tiger', 1000.0, 'united states')

In [133]:
employees_set.remove?

[0;31mDocstring:[0m
Remove an element from a set; it must be a member.

If the element is not a member, raise a KeyError.
[0;31mType:[0m      builtin_function_or_method


In [134]:
employees_set.remove((4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'))

In [136]:
employees_set.remove((5, 'Donald', 'Duck', 1800.0, 'USA'))

* Checking whether element is present in a set using `[]` - check whether employee with ids 2 and 7 exists in the set.

In [130]:
employees_set[(1, 'Scott', 'Tiger', 1000.0, 'united states')]

TypeError: 'set' object is not subscriptable

In [132]:
(2, 'Henry', 'Ford', 1250.0, 'India') in employees_set

True

* Set operations (union, intersection, difference etc) - Create a new set with **employee ids** 4, 5 and 6, then perform all 3 set operations on the set created in first step and this step.

In [139]:
employees_set1 = {(1, "Scott", "Tiger", 1000.0, "united states"),
                  (2, "Henry", "Ford", 1250.0, "India"),
                  (3, "Nick", "Junior", 750.0, "united KINGDOM"),
                  (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
                  (5, 'Donald', 'Duck', 1800.0, 'USA')
                 }

In [140]:
employees_set2 = {(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
                  (5, 'Donald', 'Duck', 1800.0, 'USA'),
                  (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')
                 }

In [141]:
employees_set1.union(employees_set2)

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')}

In [143]:
employees_set1.intersection(employees_set2)

{(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')}

In [144]:
employees_set1.difference(employees_set2)

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')}

In [145]:
employees_set2.difference(employees_set1)

{(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')}

## Collections - dict
Let us understand **dict** in detail.
* Group of key value pairs
* Keys are unique
* Values need not be unique
* We can access values using keys
* APIs are available to add new key value pairs to a dict, update values based on keys in dict, extract keys as set from dict, extract values as list from dict, to check whether key exists in the dict etc

### Tasks
We will see some basic dict operations by using simple examples
* Adding elements to dict

In [150]:
db = {
    'host': 'dslab.itversity.com',
    'db_name': 'retail_db',
    'username': 'retail_fake',
    'username': 'retail_user',
    'password': 'itversity'
}

In [151]:
type(db)

dict

In [152]:
db

{'host': 'dslab.itversity.com',
 'db_name': 'retail_db',
 'username': 'retail_user',
 'password': 'itversity'}

* Get all keys (keys)

In [155]:
db.keys()

dict_keys(['host', 'db_name', 'username', 'password'])

* Get all key value pairs (items)

In [156]:
db.items()

dict_items([('host', 'dslab.itversity.com'), ('db_name', 'retail_db'), ('username', 'retail_user'), ('password', 'itversity')])

* Get only values (values)

In [160]:
db.values()

dict_values(['dslab.itversity.com', 'retail_db', 'retail_user', 'itversity'])

* Accessing values from dict

In [164]:
db.get('host')

'dslab.itversity.com'

In [165]:
db['port']

KeyError: 'port'

In [166]:
db.get('port')

In [169]:
'host' in db

True

In [170]:
'port' in db

False

In [171]:
'itversity' in db.values()

True

* Removing elements from dict (clear, pop, popitem)

In [172]:
db.pop?

[0;31mDocstring:[0m
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised
[0;31mType:[0m      builtin_function_or_method


In [175]:
db.pop('password')

'itversity'

In [176]:
db

{'host': 'dslab.itversity.com',
 'db_name': 'retail_db',
 'username': 'retail_user'}

In [177]:
db.popitem?

[0;31mDocstring:[0m
D.popitem() -> (k, v), remove and return some (key, value) pair as a
2-tuple; but raise KeyError if D is empty.
[0;31mType:[0m      builtin_function_or_method


In [178]:
db.popitem()

('username', 'retail_user')

## List of Tuples

We often create collection (list) of tuples. Let us perform few tasks related to collection of tuples.
* Create 3 tuples with order_id, order_date, order_customer_id, order_status.

|order_id|order_date|order_customer_id|order_status|
|--------|----------|-----------------|------------|
|1|2013-07-25 00:00:00.0|11599|CLOSED|
|2|2013-07-25 00:00:00.0|256|PENDING_PAYMENT|
|3|2013-07-25 00:00:00.0|12111|COMPLETE|

* Create a list of the above 3 tuples by name **orders**

## Using Data Structures

Let us understand how to leverage the data structures for data processing.

* Read data from files using basic file I/O.

In [180]:
open?

[0;31mSignature:[0m
[0mopen[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfile[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmode[0m[0;34m=[0m[0;34m'r'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbuffering[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mencoding[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0merrors[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnewline[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mclosefd[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mopener[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file

In [181]:
orders_file = open('/Users/itversity/Research/data/retail_db/orders/part-00000.csv')

In [182]:
type(orders_file)

_io.TextIOWrapper

In [184]:
orders_file.read?

[0;31mSignature:[0m [0morders_file[0m[0;34m.[0m[0mread[0m[0;34m([0m[0msize[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
[0;31mType:[0m      builtin_function_or_method


In [185]:
orders_raw = orders_file.read()

In [186]:
type(orders_raw)

str

In [None]:
orders_raw

* Get data into collections.

In [191]:
orders_raw.split('\n')[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [192]:
orders_raw.splitlines?

[0;31mSignature:[0m [0morders_raw[0m[0;34m.[0m[0msplitlines[0m[0;34m([0m[0mkeepends[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and
true.
[0;31mType:[0m      builtin_function_or_method


In [193]:
orders_raw.splitlines()[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

* Convert data each record into tuple for better control.
* Process data based up on the problem statement using APIs that are available on top of collections.

**We will understand these as part of subsequent modules.**