# 2.1. Data Structures

Module M-227-04: Programming for Data Analytics

Instructor: prof. Dmitry Pavlyuk

## Important Data Structures

* __Tuple__ is a fixed-length immutable sequence of Python objects
* __List__ is a variable-length ordered mutable sequence of Python objects
* __Set__ is an mutable unordered collection of unique Python objects
* __Dictionary__ stores a mutable collection of key-value pairs, where key and value are Python objects. 

In [1]:
tuple1 = (1, 2, 3)
print("Tuple ", tuple1)
list1 = [1, 2, 3]
print("List ", list1)
set1 = {1, 2, 3}
print("Set ", set1)
dict1 = {1:"One", 2:"Two", 3:"Three"}
print("Dictionary ", dict1)

Tuple  (1, 2, 3)
List  [1, 2, 3]
Set  {1, 2, 3}
Dictionary  {1: 'One', 2: 'Two', 3: 'Three'}


## Tuples vs. Lists

Similar structures:

In [2]:
tuple1 = (1, 2, 3)
print(tuple1)
print(type(tuple1))
print("Third element = ", tuple1[2])

(1, 2, 3)
<class 'tuple'>
Third element =  3


In [3]:
list1 = [1, 2, 3]
print(list1)
print(type(list1))
print("Third element = ", list1[2])

[1, 2, 3]
<class 'list'>
Third element =  3


## Key differences:

* Tuples use less memory than lists

In [4]:
import sys

print(sys.getsizeof(tuple1))
print(sys.getsizeof(list1))

64
120


* Tuples are used by reference, lists are copied

In [5]:
tuple2 = tuple(tuple1)
print("The same tuple? ", tuple2 is tuple1)
list2 = list(list1)
print("The same list? ", list2 is list1)

The same tuple?  True
The same list?  False


* Tuples construction is faster than lists

In [6]:
import timeit
print(timeit.timeit("(1,2,3,4,5)",number=1000))
print(timeit.timeit("[1,2,3,4,5]",number=1000))

7.1999999997629516e-06
4.619999999988522e-05


* Tuples are hashable (will explain later)
* Tuples are immutable, e.g., can not add/delete element, can not sort tuples

In [7]:
list1[0]='First'
print("Updated list = ", list1)

Updated list =  ['First', 2, 3]


In [8]:
tuple1[0]='First' #error!
print("Updated tuple = ", tuple1)

TypeError: 'tuple' object does not support item assignment

### Tuples: good examples

In [9]:
weekdays = ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday','Saturday', 'Sunday')
print(weekdays)
rigaCoords = (56.9677, 24.1056)
print(rigaCoords)
rgbColor = (255, 255, 255) #white
print(rgbColor)


('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')
(56.9677, 24.1056)
(255, 255, 255)


Tuples: questionable usage

In [10]:
students = ('Alexey', 'Anar', 'Erika', 'Jevgenijs', 'Janis', 'Vjaceslav', 'Nikita')
print(students)


('Alexey', 'Anar', 'Erika', 'Jevgenijs', 'Janis', 'Vjaceslav', 'Nikita')


## Lists vs. Sets

Similar structures, but no duplicates

In [11]:
list1 = [1, 2, 3, 3, 3, 3]
print(list1)
print(type(list1))

[1, 2, 3, 3, 3, 3]
<class 'list'>


In [12]:
set1 = {1, 2, 3, 3, 3, 3}
print(set1)
print(type(set1))

{1, 2, 3}
<class 'set'>


* Sets are faster than lists for checking presence of object

In [13]:
print(5 in [1,2,3,4,5])
print(5 in {1,2,3,4,5})
import timeit
print("Time for checking in list = ", timeit.timeit("5 in [1,2,3,4,5]",number=1000))
print("Time for checking in set = ",timeit.timeit("5 in {1,2,3,4,5}",number=1000))

True
True
Time for checking in list =  5.1300000000420454e-05
Time for checking in set =  2.2100000000246922e-05


* Element in sets can not be accessed by index

In [14]:
print(set1[0])

TypeError: 'set' object is not subscriptable

* Set operations

In [15]:
set2 = {2, 3, 5}
set3 = {2, 4, 6}
print(f"Instersection of {set2} and {set3} is {set2.intersection(set3)}")
print(f"Union of {set2} and {set3} is {set2.union(set3)}")

Instersection of {2, 3, 5} and {2, 4, 6} is {2}
Union of {2, 3, 5} and {2, 4, 6} is {2, 3, 4, 5, 6}


## Dictionaries

A dictionary stores a collection of key-value pairs, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key.

In [16]:
dict1 = {'Monday': 'Moon', 'Tuesday' : 'Mars', 'Wednesday' : 'Mercury', 'Thursday' : 'Jupiter',
         'Friday' : 'Venus','Saturday' : 'Saturn'}
print(dict1)
print(type(dict1))
print(dict1.keys())
print(dict1.values())

{'Monday': 'Moon', 'Tuesday': 'Mars', 'Wednesday': 'Mercury', 'Thursday': 'Jupiter', 'Friday': 'Venus', 'Saturday': 'Saturn'}
<class 'dict'>
dict_keys(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
dict_values(['Moon', 'Mars', 'Mercury', 'Jupiter', 'Venus', 'Saturn'])


In [17]:
dict1['Sunday'] = 'Sun'
print(dict1)

{'Monday': 'Moon', 'Tuesday': 'Mars', 'Wednesday': 'Mercury', 'Thursday': 'Jupiter', 'Friday': 'Venus', 'Saturday': 'Saturn', 'Sunday': 'Sun'}


In [18]:
dict1.pop('Monday')
print(dict1)

{'Tuesday': 'Mars', 'Wednesday': 'Mercury', 'Thursday': 'Jupiter', 'Friday': 'Venus', 'Saturday': 'Saturn', 'Sunday': 'Sun'}


### Complex keys

In [19]:
coords = dict()
coords[(56.9677, 24.1056)] = 'Riga'
coords[(51.5072, 0.12)] = 'London'
print(coords)

{(56.9677, 24.1056): 'Riga', (51.5072, 0.12): 'London'}


### Hashability

In [20]:
coords[[50.8476, 4.3572]] = 'Brussel'
print(coords)

TypeError: unhashable type: 'list'

In [21]:
coords[{50.8476, 4.3572}] = 'Brussel'
print(coords)

TypeError: unhashable type: 'set'

In [22]:
hash((50.8476, 4.3572))

1658362060139021477

In [23]:
hash({50.8476, 4.3572})

TypeError: unhashable type: 'set'

In [24]:
hash([50.8476, 4.3572])


TypeError: unhashable type: 'list'

In [25]:
coords[(50.8476, 4.3572)] = 'Brussel'
print(coords)

{(56.9677, 24.1056): 'Riga', (51.5072, 0.12): 'London', (50.8476, 4.3572): 'Brussel'}


# Thank you