# Built-in data structures: containers and sequences

- One of the great and popular features of python is the presence of built-in containers for sequenes of objects.
  - These are similar to STL containers discussed in C++.

- Since in python everything is an object and all objects can be refrenced in the same way, containers can include objects of different type
  - this is unlike anything seen in C++
  
- these built-in types and the reference-driven flexibility of python has made it very popular for data analysis

- basic buil-in data structures in python are
  - tuple
  - list
  - set
  - dictionary
  
- Today we only focus on these types
- We will introduce more advanced types when discussing [NumPy](https://www.numpy.org) and [pandas](http://pandas.pydata.org) packages, e.g.
  - ndarrays
  - series
  - time series
  - DataFrame

## Tuples

- sequence of python objects
  - fixed length
  - immutable

to create a tuple simply separate its elements with a `,`

In [35]:
a = 'lec23', 'lec24', 'lec25'
print(a)


('lec23', 'lec24', 'lec25')


In [36]:
len(a)

3

a tuple can contain different type of objects

In [37]:
b = 'paul', 24, 1.75, 85.3
print(b)

('paul', 24, 1.75, 85.3)


In [38]:
print(a,b)

('lec23', 'lec24', 'lec25') ('paul', 24, 1.75, 85.3)


## access tuple elements
Access to the i-th element of a tuple is done with `[]` operator

In [39]:
print(a[2])
print(b[3])

lec25
85.3


## empty or one-element tuple

In [40]:
c = ()
print(type(c),c)
d = ('one',)
print(type(d),d)

<class 'tuple'> ()
<class 'tuple'> ('one',)


Note that the `,` is critical to distinguish a on-element tuple from a normal variable.

In [None]:
e = ('valore')
print(type(e), e)

## conversion to tuple

In [None]:
tup = range(10)
print(len(tup))
print(tup)

Note how tup is not a tuple but simply a refernce to function call `range(10)`.

If you want a tuple you have to explicitly convert the output of `range(10)` to be a tuple

In [None]:
tup = tuple(range(10))
print(len(tup))
print(tup)

Iterating over a tuple is easy

In [None]:
for i in tup:
    print(i)

## converting strings to tuples


In [None]:
tup = tuple("hello world")
print(tup)
for i in tup:
    print(i)

## Tuples can contain any object
even a function is a valid object

In [31]:
def myprod(a,b):
    return a*b

tup = (1, 'name', myprod)
print(tup)

(1, 'name', <function myprod at 0x116ae2ea0>)


In [None]:
tup[2](3,4)

a tuple can contain tuples as its elements

In [None]:
x = a,b,c, tup
print(x)

In [None]:
print(x[2])
print(x[0])

## Tuple is immutable
You can bind a variable to a new tuple but you cannot change an element of a tuple

In [None]:
print(b)

In [None]:
b[0] = 'one'

In [None]:
y = 'one', a, (2,3)
print(y)
print(b)

In [None]:
b = y
print(b)

## Tuple methods
given the limitation of tuple, content and size are immutable, there are very few methods. (checkout `dir(tuple)`)

One very useful one is `count()`

In [None]:
grades = [30, 22, 24, 23, 30, 18, 24, 27, 28, 28, 25, 24, 22, 30, 30, 18, 20]
grades.count(30)

## Lists
- Lists are also a collection of objects but unlike tuples they are mutable
  - variable length
  - each element can be modified
  

In [27]:
alist = [2,3,4]
print(alist)
print(alist[2])
alist[2] = -3
print(alist)

[2, 3, 4]
4
[2, 3, -3]


lists (and tuples) are protected against out of range index

In [29]:
print(len(alist))
alist[3]

3


IndexError: list index out of range

A list can cantain any type of data. In this example the list is made of strings, float, int, function, lists, and tuples

In [32]:
alist = ['one', 2, 3.24, myprod, (23,24), ['lec1', 'lec2']]
print(alist)

['one', 2, 3.24, <function myprod at 0x116ae2ea0>, (23, 24), ['lec1', 'lec2']]


## lists and tuples
- a list is created using the `[]` operator or the explicit type `list`
- a tuple is created with the `()` operator or the explicit type `tuple`
- Lists and tuples are semantically similar
  - many functions can take a tuple or a list
  
- Lists are used in data analysis to store data from iterators or generators

In [33]:
values  = range(-3,10, 2)
print(values)
list(values)

range(-3, 10, 2)


[-3, -1, 1, 3, 5, 7, 9]

Note that as with tuples, you have to convert the output of `range` to be a list.

## list from tuple
you can create a list from a tuple by explicit conversion 

In [51]:
print(a)
blist = list(a)
print(blist)
blist[2] = 'lec28'
blist

('lec23', 'lec24', 'lec25')
['lec23', 'lec24', 'lec25']


['lec23', 'lec24', 'lec28']

## Manipulating lists

### adding and removing elements
to add an element at the end of the list

In [42]:
clist = ['one', 2, 3.14, 4, 'five']
clist.append(6)
print(clist)

['one', 2, 3.14, 4, 'five', 6]


We can also insert a value at a specific location by providing the index

In [43]:
clist.insert(2, 'two')
clist

['one', 2, 'two', 3.14, 4, 'five', 6]

note how the new element is inserted __before__ the indicated index. 

You can also remove an element from the list at a specific location with `pop`

In [44]:
clist.pop(2)
clist

['one', 2, 3.14, 4, 'five', 6]

The `insert` and `pop` methods have a return value. 

In particular with `pop` it is useful to see the value you have removed from the list

In [45]:
x = clist.insert(2, 'test')
print (x)
x = clist.pop(2)
print(x)
print(clist)

None
test
['one', 2, 3.14, 4, 'five', 6]


### removing by value
Although not very efficient, you can remove a given value from the list. It will only remove the first such occurance. python will linearly go through all elements until it finds the first occurance

In [48]:
print(4 in clist)
print(clist)

True
['one', 2, 3.14, 4, 'five', 6]


In [49]:
if 4 in clist:
    clist.remove(4)
print(clist)


['one', 2, 3.14, 'five', 6]


### combining lists
you can use `+` to combine or extend exisiting or new lists

In [56]:
print(blist)
print(clist)
all = blist + ['id', 'name', 'major']
print(all)

['lec23', 'lec24', 'lec28']
['one', 2, 3.14, 'five', 6]
['lec23', 'lec24', 'lec28', 'id', 'name', 'major']


Note that this is very different than doing

In [60]:
all = [blist,'id', 'name', 'major']
print(all)

[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major']


The most efficient way to extend a list is with `extend`. It can take one or more elements to be added

In [61]:
all.extend([2,3,4, 'test', 'python'])
print(all)

[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major', 2, 3, 4, 'test', 'python']


In [None]:
### sorting a list
lists of elements that can be compared to each other can be sorted

In [62]:
all.sort()


TypeError: '<' not supported between instances of 'str' and 'list'

In [90]:
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
print(months)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']


In [67]:
months.sort()
print(months)

['april', 'august', 'december', 'february', 'january', 'july', 'june', 'march', 'may', 'november', 'october', 'september']


In [69]:
months.sort(key=len)
print(months)

['may', 'july', 'june', 'april', 'march', 'august', 'january', 'october', 'december', 'february', 'november', 'september']


In [70]:
help(list.sort)

Help on method_descriptor:

sort(self, /, *, key=None, reverse=False)
    Stable sort *IN PLACE*.



### sort vs sorted
in this example `sort()` is applied to the object and modifies it. Instead we might prefer keeping the data intact and have a new sorted copy

In [134]:
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
print(months)
sorted_months = sorted(months)
print(sorted_months)
sorted_months = sorted(months, key=len)
print(sorted_months)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
['april', 'august', 'december', 'february', 'january', 'july', 'june', 'march', 'may', 'november', 'october', 'september']
['may', 'june', 'july', 'march', 'april', 'august', 'january', 'october', 'february', 'november', 'december', 'september']


### lists and strings

In [135]:
chars = list("in a far away galaxy")
print(chars)
chars.count(' ')


['i', 'n', ' ', 'a', ' ', 'f', 'a', 'r', ' ', 'a', 'w', 'a', 'y', ' ', 'g', 'a', 'l', 'a', 'x', 'y']


4

### enumerate function
useful python function  to keep track of index while iterating on a collection, e.g. a list.

see how in python the `for` loop takes advantage of `numerate`

In [136]:
for i,m in enumerate(months):
    print("month %-2d: %s"%(i+1,m))

month 1 : january
month 2 : february
month 3 : march
month 4 : april
month 5 : may
month 6 : june
month 7 : july
month 8 : august
month 9 : september
month 10: october
month 11: november
month 12: december


### slicing
one of most popular featurs in data analysis with python is the possibility of accessing a subset of a collection by specifying the indices

In [137]:
print(months[:3])

['january', 'february', 'march']


In [138]:
print(months[4:6])

['may', 'june']


In [139]:
print(months[5:])
print(len(months[6:]))

['june', 'july', 'august', 'september', 'october', 'november', 'december']
6


### references and lists
in python all collection objects are handled as a reference. This is shown explicitly in this example

In [140]:
newlist = months
print(newlist)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']


In [141]:
newlist.append('NewMonth')
print(months)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth']


so `newlist` __is not a new copy__. `newlist` and `months` are simply two references to the same list object!

to have a new copy you have to use the explcit conversion

In [142]:
newlist = list(months)
newlist.append('CrazyMonth')
print(months)
print(newlist)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth']
['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth', 'CrazyMonth']


## Dictionaries
dictionaries are very similar to the associative container `map<T,K>` discussed in C++. They are also known as __hash tables__ in other languages, e.g. `perl`. The `{}` operator is used to create a `dict` object

In [143]:
days = {}
for m in months:
    days[m] = int(input("# of days in {0}: ".format(m)))
print(days)

# of days in january: 


ValueError: invalid literal for int() with base 10: ''

You can also create a dictionary by hand

In [144]:
dict1 = { 'a' : 1, 'b' : (1,2,3), 'c' : ['one','two'], 'd' : 'example', }
print(dict1)

{'a': 1, 'b': (1, 2, 3), 'c': ['one', 'two'], 'd': 'example'}


In [145]:
students = { 'rio': {'name':'john', 'age':23, 'id':123456}, 'nairobi':{'name':'susan', 'id':123123, 'age':21},  'tokyo':{'name':'maria', 'id':123651, 'age':24}, }
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}}


you can add a new value for a new key

In [146]:
students['oslo'] = {'name':'', 'age':30, 'id':111111} 
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}, 'oslo': {'name': '', 'age': 30, 'id': 111111}}


If the key already is used, its value will be updated. This is similar to modifying elements of a list

In [147]:
students['oslo'] = {'name':'sergey', 'age':22} 
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}, 'oslo': {'name': 'sergey', 'age': 22}}


You can check if the dictionary contains a key

In [148]:
while True:
    name = input("name (press return to end): ")  
    if(name==''): break
    if name not in students:
        print("{0} not in the list. sorry.".format(name))
    else: 
        print("name: {0}\t age: {1}\t id: {2}".format(students[name]['name'], students[name]['age'], students[name]['id']))
    

name (press return to end): 


This if-else structure is very common with dictionaries. so in pythion there is a dedicated method
```python
value = some_dict.get(key, default_value)
```

In [149]:
while True:
    name = input("name (press return to end): ")  
    if(name==''): break
    val = students.get(name, "not found")
    print(val)

name (press return to end): 


### Keys and values
you can also obtain a list of all keys and values in a dictionary

In [150]:
type(students.keys())
print(students.keys())
print(list(students.keys()))

dict_keys(['rio', 'nairobi', 'tokyo', 'oslo'])
['rio', 'nairobi', 'tokyo', 'oslo']


In [151]:
print( list( students.values()))

[{'name': 'john', 'age': 23, 'id': 123456}, {'name': 'susan', 'id': 123123, 'age': 21}, {'name': 'maria', 'id': 123651, 'age': 24}, {'name': 'sergey', 'age': 22}]


# Using a list for plotting

## motion of a body under gravity
We want to simulate the motion of a body under gravity. This is one of the first exercises in __Laboratorio di Calcolo__. This time we also want to quickly plot the trajectory to check our equations.

In [152]:
%matplotlib notebook
import matplotlib.pyplot as plt
import math

# initial conditions
g = 9.8
h = 0.
theta = (45./180.)*math.pi
v0 = 10.
dt=0.1
        
#compute velocity components
v0x = v0*math.cos(theta)
v0y = v0*math.sin(theta)
print("v0_x: %.3f m/s \t v0_y: %.3f m/s"%(v0x,v0y))

t=0.
x=[]
y=[]
xi=0
yi=h

while yi>=0:
    x.append(xi)
    y.append(yi)
    t+=dt
    xi=v0x*t
    yi=h+v0y*t-0.5*g*t*t

#print(x,y)
plt.plot(x,y)

v0_x: 7.071 m/s 	 v0_y: 7.071 m/s


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11cde68d0>]

To make it more flexible we could ask the user to provide initial conditons

In [153]:
%matplotlib notebook
import matplotlib.pyplot as plt
import math

# initial conditions
g = 9.8
h = 0.

v0 = 10.
dt=0.1

theta = 45.
while True:
    theta = float(input("angle theta in [0,90] degrees: "))
    if(theta>0 and theta<90): break
theta = (theta/180.)*math.pi

#compute velocity components
v0x = v0*math.cos(theta)
v0y = v0*math.sin(theta)
print("v0_x: %.3f m/s \t v0_y: %.3f m/s"%(v0x,v0y))

t=0.
x=[]
y=[]
xi=0
yi=h

while yi>=0:
    x.append(xi)
    y.append(yi)
    t+=dt
    xi=v0x*t
    yi=h+v0y*t-0.5*g*t*t

#print(x,y)
plt.plot(x,y)

angle theta in [0,90] degrees: 


ValueError: could not convert string to float: 

### To make it more user friendly we could provide a default value for the angle!
We do this by providing a default and pressing return w/o any input

In [154]:
%matplotlib notebook
import matplotlib.pyplot as plt
import math

# initial conditions
g = 9.8
h = 0.

v0 = 10.
dt=0.1

theta = 45.
while True:
    x = input("angle theta in [0,90] degrees (press return for {0} degree): ".format(theta))
    if x == "" : break
    theta  = float(x)
    if(theta>0 and theta<90): break
theta = (theta/90.)*math.pi/2.
 
#compute velocity components
v0x = v0*math.cos(theta)
v0y = v0*math.sin(theta)
print("v0_x: %.3f m/s \t v0_y: %.3f m/s"%(v0x,v0y))

t=0.
x=[]
y=[]
xi=0
yi=h

while yi>=0:
    x.append(xi)
    y.append(yi)
    t+=dt
    xi=v0x*t
    yi=h+v0y*t-0.5*g*t*t

#print(x,y)
plt.plot(x,y)

angle theta in [0,90] degrees (press return for 45.0 degree): 
v0_x: 7.071 m/s 	 v0_y: 7.071 m/s


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11d9d88d0>]

We now change all the variables to be configurable by the user

In [123]:
%matplotlib notebook
import matplotlib.pyplot as plt
import math

g = 9.8

h = 0.
while True:
    x = input("initial height h in m: (press return for h = 0 m): ")
    if x == "":  break
    h = float(x)
    if(h>=0): break


theta = 20.
while True:
    x = input("angle theta in [0,90] degrees (press return for {0} degree): ".format(theta))
    if x == "" : break
    theta  = float(x)
    if(theta>0 and theta<90): break
theta = (theta/90.)*math.pi/2.
        

v0 = 10.
while True:
    x = input("insert v_0 > 0 in m/s (press return for {0} m/s): ".format(v0))
    if x == "":  break
    v0 = float(x)
    if(v0>0): break

dt=0.1
while True:
    x = input("insert dt > 0 in sec (press return for {0} sec): ".format(dt))
    if x == "": break
    dt = float(x)
    if(dt>0): break
      
        
v0x = v0*math.cos(theta)
v0y = v0*math.sin(theta)

print("v0_x: %.3f m/s \t v0_y: %.3f m/s"%(v0x,v0y))

t=0.
x=[]
y=[]
xi=0
yi=h

while yi>=0:
    x.append(xi)
    y.append(yi)
    t+=dt
    xi=v0x*t
    yi=h+v0y*t-0.5*g*t*t

#print(x,y)
plt.plot(x,y)

# we also make the plot nicer
plt.title('motion under gravity')
plt.xlabel("x [m]")
plt.ylabel("y [m]")
plt.grid(True)
plt.xlim(-0.1, max(x)*1.1)
plt.ylim(-0.1,max(y)*1.10)


initial height h in m: (press return for h = 0 m): 
angle theta in [0,90] degrees (press return for 20.0 degree): 
insert v_0 > 0 in m/s (press return for 10.0 m/s): 
insert dt > 0 in sec (press return for 0.1 sec): 
v0_x: 9.397 m/s 	 v0_y: 3.420 m/s


<IPython.core.display.Javascript object>

(-0.1, 0.6435664729747067)