# Data Structures

This notebook introduces differen data structures (containers) in which we can store information in Python. Many times the same information can be stored in more than one data structure. The choice of the right data structure will depend on the application and, to a great extent, on your experience and coding preferences.

In [3]:
# Lists (also known as arrays, they are mutable)

mylist = [1,2,3]  # A row vector
print(mylist)
type(mylist)

# What happens if you replace one of the list elements above by another element?
# Can you mix words with numbers?
# Can you nest multiple lists?

[1, 2, 3]


list

In [24]:
# Lists are mutable, we can re-define them
mylist = [2000,2001,2002]

In [34]:
# Lists can be nested
mylist = [2000,2001,2002,2003,['apple','orange']]
print(mylist[0])
print(mylist[4])
print(mylist[4][0])

2000
['apple', 'orange']
apple


In [None]:
# Copy and save part of the list in a different variable
fruits = myList[4]
print(fruits[0])

In [41]:
# List methods
print(mylist.count(2000)) # Count specific element. Takes only one argument
mylist.index(2001) 
mylist.append([2003,2004])
print(mylist)

1


[2000, 2001, 2002, 2003, ['apple', 'orange'], [2003, 2004]]

In [45]:
# Will this work?
mylist.count('apple')

# Solution: mylist[4].count('apple')

1

In [31]:
# Clear list and check that list was cleared (it should print empty brackets)
mylist.clear()
print(mylist)

[]


In [17]:
# Creating a matrix or 2D array
M = [[1, 4, 5],
    [-5, 8, 9]]
print(M)

[[1, 4, 5], [-5, 8, 9]]


In [23]:
# Slicing
# Source: https://stackoverflow.com/questions/509211/understanding-slice-notation

#                +---+---+---+---+---+---+
#                 | P | y | t | h | o | n |
#                 +---+---+---+---+---+---+
# Slice position: 0   1   2   3   4   5   6
# Index position:   0   1   2   3   4   5

# sliceable[start:stop:step]

# Definitions
# start: the beginning index of the slice, it will include the element at this index unless 
# it is the same as stop, defaults to 0, i.e. the first index. If it's negative, it means to 
# start n items from the end.

# stop: the ending index of the slice, it does not include the element at this index, 
# defaults to length of the sequence being sliced, that is, up to and including the end.

# step: the amount by which the index increases, defaults to 1. 
# If it's negative, you're slicing over the iterable in reverse.

p = ['P','y','t','h','o','n']

# Why the two sets of numbers:
# indexing gives items, not lists
p[0]
p[5]
print(type(p[5]))

# Slicing gives lists
p[0:1]
p[0:2]
print(type(p[0:2]))

# Get last 3 letters
print(p[-3:]) # This means: "3rd from the end, to the end."

# Technically this is what is going on behind the Python interpreter
# sliceable[start:stop:step]
print(p[-3:len(p):1])

# The colon is what tells Python you're giving it a slice and not a regular index. 

# 
print(p[-3:-1]) # This will not return the last element. You can use -1 for indexing. 
# Since this is a slicing operation we need to use the : operator


<class 'str'>
<class 'list'>


['t', 'h', 'o', 'n']

In [3]:
# Tuples (immutable)
mytuple = (1,2,3)
type(mytuple)

mytuple[0] = 11; # This will throw an error. We can't change the value of a tuple (immutable).

TypeError: 'tuple' object does not support item assignment

In [6]:
# A dictionary (key:value pairs, similar to JSON)
myDict = {'country': ['Argentina','Brazil','Uruguay','USA'], 
          'capital': ['Buenos Aires','Brasilia','Montevideo','Washington D.C.'],
          'airTemp': [28, 12, 14, 35,]}

print(myDict)
print(myDict['country'])
print(myDict['airTemp'][3])
type(myDict)
print("My country is",myDict['country'],"and the capital is",myDict['capital'][0])

{'country': ['Argentina', 'Brazil', 'Uruguay', 'USA'], 'capital': ['Buenos Aires', 'Brasilia', 'Montevideo', 'Washington D.C.'], 'airTemp': [28, 12, 14, 35]}
['Argentina', 'Brazil', 'Uruguay', 'USA']
35
My country is ['Argentina', 'Brazil', 'Uruguay', 'USA'] and the capital is Buenos Aires


In [45]:
# Sets (set of unique values, test membership)

states = ['Kansas', 'Texas', 'California', 'Texas', 'Alaska', 'Kansas']
uniqueStates = set(states) # 
print(uniqueStates)

print('Kansas' in uniqueStates) # Testing membership (True)
print('Iowa' in uniqueStates)   # Testing membership (False)

{'California', 'Kansas', 'Texas', 'Alaska'}
True
False


In [8]:
dna1 = set('ATTTGAATTA') # DNA sequence 1
dna2 = set('GGATTCGCGT') # DNA sequence 2

# Print unique bases in each DNA sequence
print(dna1)
print(dna2)

dna1 - dna2   # bases in dna1 BUT NOT in dna2
dna1 | dna2   # bases in dna1 OR in dna2
dna1 & dna2   # bases in dna1 AND in dna2
dna1 ^ dna2   # bases in dna1 OR dna2, BUT NOT BOTH

{'G', 'A', 'T'}
{'G', 'A', 'C', 'T'}


{'C'}

In [29]:
# String
mystr = 'this is a string'
print(mystr)
type(mystr)

this is a string


str

In [32]:
# An integer
mynum = 3
print(mynum)
type(mynum)

3


int

In [33]:
# A floating point number
myfloat = 4.0
print(myfloat)
type(myfloat)


4.0


float

In [12]:
# Boolean
myBool = [True,True,False,True]
print(type(myBool))
print(type(myBool[0]))

<class 'list'>
<class 'bool'>
