![alt text](python.png "Title")

# Data Structures 

Python has several important types of structures to hold data. It's essential to know them.

https://docs.python.org/3/tutorial/datastructures.html

## Lists

In [1]:
# Lists are ordered, mutable collections of heteregonous objects. They are very handy!
# List items are enclosed in square braquets and comma delimited.

a = []     # Declares an empty list
a = list() # Same thing

# let's create a list of 3 objects
a = ['Hello', 1, True, ] # It's OK to leave a comma at the end, but not mandatory

print ( 'a: ', a )  # see how it's printed out
print ( len(a)   )  # number of items in the list
print ( type (a) )  # it's a list

# Lists can be nested 
b = [ [1, 2, 3], [4, 5, 6]]
print("b: ", b)

# Lists can be concatenated:
c = [1, 2, 3] + [4, 5, 6]
print("c: ", c)

a:  ['Hello', 1, True]
3
<class 'list'>
b:  [[1, 2, 3], [4, 5, 6]]
c:  [1, 2, 3, 4, 5, 6]


In [2]:
# Slices: access the content by position. In Python, an iteration always starts at zero!
a = [ [1, 2, 3], [4, 5, 6]]

print ('a       =', a          )
print ('a[0]    =', a[0]       ) # First item in the list, which is a  list
print ('a[0][1] =', a[0][1]    ) # First item in the list & second item in the nested list, which is an integer
print ('a[-1]   =', a[-1], '\n') # Last item in the list, which is a list

# A slice can also be a range:
b = [ 1, 2, 3, 4, 5, 6]

print ('b        =',  b       )
print ('b[0:2]   =',  b[0:2]  ) # first and second item (the second range number is NOT inclusive )
print ('b[:2]    =',  b[:2]   ) # same 
print ('b[2:]    =',  b[2:]   ) # from third item till the last item
print ('b[2:-1]  =',  b[2:-1] ) # from third item till the second to last item
print ('b[-1:-3] =',  b[-1:-3]) # empty
print ('b[::2]   =',  b[::2])   # increment, from left to right
print ('b[::-2]   =',  b[::-2]) # increment, from right to left

a       = [[1, 2, 3], [4, 5, 6]]
a[0]    = [1, 2, 3]
a[0][1] = 2
a[-1]   = [4, 5, 6] 

b        = [1, 2, 3, 4, 5, 6]
b[0:2]   = [1, 2]
b[:2]    = [1, 2]
b[2:]    = [3, 4, 5, 6]
b[2:-1]  = [3, 4, 5]
b[-1:-3] = []
b[::2]   = [1, 3, 5]
b[::-2]   = [6, 4, 2]


![alt text](slices.jpg "Title")

In [3]:
# Lists are iterable objects. We'll cover the 'for' loops later, for now note that you can iterate on the items in a list
for item in ["Hello", "world", '!']:
    print(item)

Hello
world
!


In [4]:
# Check for existence of a list item using 'in':
a = ['Hello', 'world']

print ('Hello' in a) # the returning object is a boolean
print ('hello' in a)

True
False


In [5]:
# More functions & methods:

# del(): remove an item by position
a = [1, 2, 3]
del (a[1])

print('a: ', a)

# pop(): remove the last item
b = [1, 2, 3]
b.pop()
print('b: ', b)

# append(): add an item at the end of the list
c = [1, 2, 3]
c.append(4)
print('c: ', c)

# sorted(): function to lexicographically sort list items
d = ['b1', 'a2', 'a1', 'b5', ]
d = sorted(d)
print ('d:', d )

# sort(): **in-place** method to lexicographically sort list items
e = ['b1', 'a2', 'a1', 'b5', ]
e.sort()
print ('e:', e )

# join() converts a list to a string, with a delimiter. This does the opposite of split()
joined = '@'.join(['Hello', 'world']) 
print(joined)
print(type(joined))

a:  [1, 3]
b:  [1, 2]
c:  [1, 2, 3, 4]
d: ['a1', 'a2', 'b1', 'b5']
e: ['a1', 'a2', 'b1', 'b5']
Hello@world
<class 'str'>


In [8]:
# List comprehensions are a compact & pythonic way to create lists. It's a very important feature that you should know

a = [1, 2, 3]
squared = [number**2 for number in a]
print(squared)

# You can do this with a 'for' loop, but this is not the best way
squared = []
for number in a:
    squared.append(number**2)
    

[1, 4, 9]


In [6]:
# Here's one way to apply a function on every item in a list:
a = ['Hello', 'World']
a = [item.upper() for item in a]
print ('a:', a)

a: ['HELLO', 'WORLD']


In [9]:
# And that's how we could create a list of even numbers:
b = [number for number in [0,1,2,3,4,5,6,7,8,9,10] if number % 2 == 0]
print ('b:', b)

b: [0, 2, 4, 6, 8, 10]


In [11]:
# You can easily create a list of integers with the 'range' function
a = range(10) # it also takes option for starting point, increment etc
print(a) # you can iterate on that object

# Turn the range object into an actual list
print(list(a))

range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [6]:
# all() and any() functions

# Are all the values equal to True?
print( all([True, True, False]))
print( all([1<0, True, True ]))

# Do we have at least one True?
print( any([True, True, False]))
print( any([False, False, False] ))

False
True
True
False


## Tuples

Tuples are very similar to lists, with one key difference: lists are mutable and tuples are not. Because of that, tuples are more efficient to process. Use them if you know you won't need to change them. The rest is the same (iterable, accessing with index etc)

In [13]:
# Create a list and add an item. That will work.
myList = ['hello']
myList.append('world')

# Create a tuple (using parentheses, not square brackets) and add an item. That will fail.
myTuple = ('hello')
myTuple.append('world')

AttributeError: 'str' object has no attribute 'append'

In [14]:
# Tuple comprehension:

# Note we need to specify 'tuple', because it would be ambigious with just parentheses
t = tuple(letter for letter in 'hello') 
t

('h', 'e', 'l', 'l', 'o')

In [15]:
# Convert a tuple to a list and back to a tuple, in case you do need to change it :-)
a = ('hello', 'world')

a = list(a)
a[0] = 'Hello'
a = tuple(a)

print(a)

('Hello', 'world')


## Set

Somehow similar to lists, with some differences
* sets are unordered 
* sets do not allow duplicate values (that makes them also more efficient)
* set items are unmutable but you can add more items to a set (using add() or update() )

In [18]:
# Convert a list to a set removes the duplicates:
a = set( [1, 2, 3, 3, 4] )
print(a)
print(type(a))

# You can convert back to a list:
a = list(a)
print(a)

{1, 2, 3, 4}
<class 'set'>
[1, 2, 3, 4]


In [19]:
# Create set with curly brackets:
a = {1, 2, 3, 3, 4}
a

{1, 2, 3, 4}

In [20]:
# Sets are not ordered, so you can NOT access values using indexes
a = {1, 2, 3, 3, 4}
# a[1] # that will crash

# However, you can iterate on a set and retrieve the values:
for item in a: 
    print(item)
    
# Or, like in a list, check for existence of an item:
print( 2 in a)
print( 20 in a)

1
2
3
4
True
False


In [21]:
# Sets have useful methods for content comparisons:
setA= {'a', 'b', 'd'}
setB= {'a', 'c', 'e'}

# A few examples:
print('Union: ',        setA.union(setB))
print('Intersection: ', setA.intersection(setB))
print('Not in both: ',  setA.symmetric_difference(setB))

Union:  {'a', 'b', 'd', 'e', 'c'}
Intersection:  {'a'}
Not in both:  {'e', 'b', 'd', 'c'}


In [12]:
# Set comprehension: a proof that sets are not ordered:
s = {letter for letter in 'hello world'}
s

{' ', 'd', 'e', 'h', 'l', 'o', 'r', 'w'}

## Strings

In [22]:
# A string can be seen as a list of characters
a = 'Hello!'

# We can slice a string. As with lists, start point from 0 and end point is excluded.
print('a[0]    :' , a[0])
print('a[-1]   :' , a[-1])
print('a[0:2]  :' , a[0:2])
print('a[-3:]  :' , a[-3:])
print('a[-3:-1]:' , a[-3:-1])

a[0]    : H
a[-1]   : !
a[0:2]  : He
a[-3:]  : lo!
a[-3:-1]: lo


In [16]:
# You can use single, double or triple quotes:

# a and b are the same
a = 'Hello'
b = "Hello"

# Depending on how you want it inside the string:
c = "I'm here."
d = 'I am "here".'

# Triple quotes protect single and double quotes:
e = ''' I'm "here" '''

# Use the \ sign to indent the way you like. Useful when the line gets too long
a = "hello this is " \
    "a string written on 2 lines"
print(a)

hello this is a string written on 2 lines


In [23]:
# Concatenate strings with the + operator:
'Hello' + " world!"

'Hello world!'

In [24]:
# As with all iterable objects, you can use len() to retrieve the number of elements
long_word = "Supercalifragilisticexpialidocious"
len(long_word)

34

In [25]:
# String multiplication
comments = "/*" + "*" * (len(long_word)+2) + "*/"

print (comments)
print ("/*", long_word, '*/')
print (comments )

/**************************************/
/* Supercalifragilisticexpialidocious */
/**************************************/


In [26]:
# You can easily create a list out of a string with split(). Default delimiter is space.
# This does the opposite of join()

Mylist = 'Hello, world !'.split()
print(Mylist)

# using a different delimiter
Mylist = 'Hello, world !'.split(', ')
print(Mylist)

# Chain splits:
comment = "patient (id=123) was discharged"
usubjid = int(comment.split('id=')[1].split(')')[0]) # we force the conversion to Integer, otherwise we'd get a String
print (usubjid)
print (type(usubjid))

# we'll do this in a more elegant way with Regex

['Hello,', 'world', '!']
['Hello', 'world !']
123
<class 'int'>


In [16]:
# Replacing substrings

a = 'Hello, world'
a = a.replace('Hello', 'Hi') # replace is not in-place
print(a)

Hi, world


In [18]:
# Escape characters are using backslashes, i.e \n 

# Python will raise an error because \u is not a valid escape char
#folder = "c:\user\nicolas"  

# You can prefix the string with 'r' to avoid this:
folder = r"c:\user\nicolas"
print(folder)

# or alternatively protect every backslash with an additional backslash:
folder = 'c:\\user\\nicolas'

# Sometimes we do want to use escape characters (\n means return carriage):
print( 'Hello \nworld')
print( r'Hello \nworld')

c:\user\nicolas
Hello 
world
Hello \nworld


In [19]:
# Formating strings

usubjid = 123
date = "October 15th"

# let's put these 2 vars in a sentence

# Solution 1: concatenate. Not the easiest or readable way...
comment = "Patient (id=" + str(usubjid) + ") was discharged on " + date

# Solution 2: use format() and curly brackets as placeholders. More option with this (e.g. number of digits)
comment = "Patient (id={}) was discharged on {}".format(usubjid, date)

# Solution 3 and my personal favorite: use the format prefix. You can use functions inside the curly brackets. 
comment = f"Patient (id={usubjid}) was discharged on {date}"
print (f"There are {len(date)} letters in '{date}'.")

There are 12 letters in 'October 15th'.


In [20]:
# Reverse a string :-)
"Hello World"[::-1]

'dlroW olleH'

## Dictionnaries

Python dicts are unordered, mutable and indexed collections. Think of it as pairs of keys and values.

In [21]:
# Create dicts with curly brackets, full-stop as delimiter between key/value and a comma between pairs
contacts = {'Clark': '555-153-0486', 'Lois': '555-594-1647'}
print(contacts)

# Dicts are iterable:
print (f'We have {len(contacts)} contacts:')

for name in contacts:
    print('-', name)
    print(contacts[name])

{'Clark': '555-153-0486', 'Lois': '555-594-1647'}
We have 2 contacts:
- Clark
555-153-0486
- Lois
555-594-1647


In [7]:
# Dicts are unordered, so this will crash if you thought accessing the first key. It only works if you have a key named 0
contacts[0]

KeyError: 0

In [20]:
# Get a list of keys/values
contacts.items() # this retrieves a collection of tuples, which can be converted to a list of tuples

dict_items([('Clark', '555-153-0486'), ('Lois', '555-594-1647')])

In [22]:
# Accessing values using the key as index:
print ("Clark's phone number:", contacts['Clark'] )

Clark's phone number: 555-153-0486


In [22]:
# Dictionaries can contain all kinds of objects (dict, list etc)
a = {'Clark': {'phone':  '555-153-0486', 
               'email':  'ICanFly@gmail.com',
               'powers': ['flying', 'Invulnerable', 'X-rays']},
     'Lois' : None}

# Accessing values with chain indexes:
print ( a['Clark']['powers'][0] )

# Dicts are mutable:
a['Clark']['powers'].append('Freezing breath') # In fact, we are modifying a list, which is mutable
a['Clark']['powers'][0] = 'Flying' 
print (a['Clark']['powers'])

flying
['Flying', 'Invulnerable', 'X-rays', 'Freezing breath']


In [24]:
# add a new key
contacts['test'] = 'Test value'
contacts

{'Clark': '555-153-0486', 'Lois': '555-594-1647', 'test': 'Test value'}

In [26]:
del (contacts['Loiss'])

KeyError: 'Loiss'

In [None]:
# Remove a key by name:
del (a['Lois'])

# Python will crash if the key doesn't exist when deleting.
# You can test it beforehand using 'in' (because Dicts are iterable):
if 'Lois' in a:
    del (a['Lois'])
else:
    print('Lois was already deleted from the dictionnary.')

a

# We'll see the try/except block later, which is a better way

In [5]:
# Dict comprehension:
tuples = [(1, 'a'), (2, 'b')]
d = {key:value for key, value in tuples}
print(d)

# this works because Python unpacks the tuple values:
key, value = ('A','B')
print('key=', key, 'value=', value)

{1: 'a', 2: 'b'}
key= A value= B


## Copies




When copying (e.g. a=b) mutable objects, you don't create a independent object but a reference!

In [24]:
# Strings are safe: a copy is a copy. Strings are NOT mutable.
old = "Hello"
new = old

# If we change 'old', 'new' remains unchanged:
old = "Hi"
new

'Hello'

In [25]:
# A proof strings are not mutable
test = "Hello"
test[0] = 'h'

TypeError: 'str' object does not support item assignment

In [26]:
# BE CAREFUL with lists, as they are mutable. A copied list is in fact a reference!!
old = ['Hello', 'world']
new = old

# If we change 'old', 'new' is changed too :-o
old[0] = 'Hi'
new

['Hi', 'world']

In [27]:
# and of course, that 'works' both ways :s
old = ['Hello', 'world']
new = old

new[0] = 'Hi'
old

['Hi', 'world']

In [28]:
# So, use the copy() built-in method if you want a independant copy of that list
old = ['Hello', 'world']
new = old.copy()

old[0] = 'Hi'
new

['Hello', 'world']

In [None]:
# Dictionaries are mutable, so same story as lists
old = {'Clark': '555-153-0486', 'Lois': '555-594-1647'}

# take a real copy:
new = old.copy()

# or alternatively
new = dict(old)

In [27]:
# Sets are mutable too. You can use copy()
a = {1, 2, 3}
b = a.copy()
print(b)

{1, 2, 3}


In [14]:
# Tuples? Well, tuples are unmutable so you can't use copy() because you don't need to
a = (1, 2)
b = a

AttributeError: 'tuple' object has no attribute 'copy'

## Advanced: Memory address

id() returns the memory address of an object. Mutable object references share the same id! This is a great feature in fact but dangerous if you don't know.

In [29]:
a = ['Hello']
b = a 

# b is a reference to a
print('Before:', id(a), id(b) )

# Let's modify a
a.append('world')

# These objects share the same id, even after we changed them.
print('After :', id(a), id(b) )

Before: 140184015429696 140184015429696
After : 140184015429696 140184015429696


In [30]:
# Using copy(), these are now different objects:
c = a.copy()
print('Before:', id(a), id(c) )

Before: 140184015429696 140184015469440


In [29]:
# Let's try the same with unmutable objects (like strings or integer)
a = 'Hello'
b = a

# Interestingly a and b share the same id, for now they point to the same address (that's efficient)
print('Before:', id(a), id(b) )

# And what if we modify a?
a = 'Hi'

# id is now different. Not a surprize since we in fact created a object from scratch.
print('After :', id(a), id(b) ) 

Before: 140708102365632 140708102365632
After : 140708102348144 140708102365632


__________________________________________________
Nicolas Dupuis, Methodology and Innovation (IDAR C&SP), 2020+