## Data Structures
Last week we looked at storing single pieces of data, such as an age or length measurement.
This week we are going to look at the data structures python uses to organise large amounts of data.

***
### Lists
a list is a collection of arbitrary objects, somewhat akin to an array in many other programming languages but more flexible. Lists are defined in Python by enclosing a comma-separated sequence of objects in square brackets ([]), as shown below:

In [48]:
myFruits = ['Fig', 'Banana', 'Pear', 'Apple']

You can think of a collection of shoe boxes stacked on top of each other, you can store anything you want in each box
Lists are ordered, the order you put them in is the order they are read out.
You dont need to store the same type of data in a list

In [49]:
myData = [21, "Ashley", 182.1, 'Blue']

We can address any item of the list by just using its position in the list.
One thing to point out here is that python starts counting at 0, so the first item is item 0.
Here is an example

In [50]:
print('The first item is', myData[0])

The first item is 21


We can also select a range of items in the list, so to select the first 2 items we use

In [51]:
print('The first two items are', myData[0:2])

The first two items are [21, 'Ashley']


In the brackets we use the : to tell python we want a range, the range if from the number before the : upto, but not including the number after the

Negative indexing, we can also ask python for the last item, or the last but one using negative indexing. We just use a -ve sign in front of the index number of

In [52]:
print("The last item is ", myData[-1])
print("The last two items are", myData[-2:])
print("The last two but one are", myData[-3:-1])

The last item is  Blue
The last two items are [182.1, 'Blue']
The last two but one are ['Ashley', 182.1]


In [53]:
# We can alter the contents of a list using the following syntax
myData[0] = 42
print(myData)

[42, 'Ashley', 182.1, 'Blue']


Methods:
Lists are a form of class, a class is an internal structure which has a number of actions associated with it.  These actions are called methods, below is a list of methods available for lists. We will explore them ober the next couple of cells. Again this reinforces our view of data being defined by the program rather than the hardware.

\begin{tabular}{ |l| |l| }

\textrm{append()} & \textrm{Adds an element at the end of the list}\\
\textrm{clear()} & \textrm{Removes all the elements from the list} \\
\textrm{copy()} & \textrm{Returns a copy of the list} \\
\textrm{count()} & \textrm{Returns the number of elements with the specified value} \\
\textrm{extend()} & \textrm{Add the elements of a list (or any iterable), to the end of the current list} \\
\textrm{index()} & \textrm{Add the elements of a list (or any iterable), to the end of the current list} \\
\textrm{insert()} & \textrm{Adds an element at the specified position} \\
\textrm{pop()} & \textrm{Removes the element at the specified position} \\
\textrm{remove()} & \textrm{Removes the first item with the specified value} \\
\textrm{reverse()} & \textrm{Reverses the order of the list} \\
\textrm{sort()} & \textrm{Sorts the list} \\
\end{tabular}


In [54]:
# Using a method
myData.append('Bunny Kitten') # as you can see here the method acts on the list we don't neeed any assignments
print(myData)

[42, 'Ashley', 182.1, 'Blue', 'Bunny Kitten']


In [55]:
print('There are', myData.count("Ashley"),'occurrences of the name Ashley')

There are 1 occurrences of the name Ashley


In [56]:
print('The sorted list of fruits is')
myFruits.sort()
print(myFruits)

The sorted list of fruits is
['Apple', 'Banana', 'Fig', 'Pear']


---
Write code to create a list with just one item in it, your first name.
Then append to the list your last name amd print the list.

In [57]:
#Write your code here

---

***

***
### Tuples
Tuples are very similar to lists, they can created using () brackets, so they look very similar to lists.  In fact, they have very similar characteristics.  The main difference is once created a tuple can not be changed. There are only two methods associated with tuples, these are count() and index().


In [58]:
thistuple = ("apple", "banana", "cherry")
print(thistuple)

('apple', 'banana', 'cherry')


Tuples are used for definitions of data which should not be changed, such as GPS coordinates of a specif place or physical constants. The inability to change the value makes them safer programmatically.

In [59]:
print(thistuple.count('cherry'))

1


In [60]:
print(thistuple.index('cherry'))

2


---
Build your own tuple

In [61]:
#write you code here

---

***
### Dictionaries
Dictionaries are used to store data with associated keys, they are defined as shown below.They are very powerful for storing blocks of related data.

In [62]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}

Notice the structure, the dictionary is enclosed in {} brackets and each data set contains a key followed by a : followed by the data to be stored in that key.
Here are the methods associated with dictionaries
\begin{tabular}{ |l| |l| }

\textrm{clear()} & \textrm{Removes all the elements from the dictionary} \\
\textrm{copy()} & \textrm{Returns a copy of the dictionary} \\
\textrm{fromkeys()}	& \textrm{Returns a dictionary with the specified keys and value}\\
\textrm{get()} &	\textrm{Returns the value of the specified key}\\
\textrm{items()} &	\textrm{Returns a list containing a tuple for each key value pair}\\
\textrm{keys()} &	\textrm{Returns a list containing the dictionary's keys}\\
\textrm{pop()} &	\textrm{Removes the element with the specified key}\\
\textrm{popitem()} &	\textrm{Removes the last inserted key-value pair}\\
\textrm{setdefault()} &	\textrm{Returns the value of the specified key. If the key does not exist: insert the key, with the specified value}\\
\textrm{update()} &	\textrm{Updates the dictionary with the specified key-value pairs}\\
\textrm{values()} &	\textrm{Returns a list of all the values in the dictionary}\\
\end{tabular}

In [63]:
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}


In [64]:
# We can get a list of the keys used to define the dictionary using the keys() method.
print(thisdict.keys())

dict_keys(['brand', 'model', 'year'])


So we can now define a standard dictionary for a car catalog

We can address the dictionary using the get() method or by a variation of the slice syntax. However, the get() method should be used.


In [67]:
print(thisdict.get("model")) # Here we use the get method
print(thisdict["model"]) # Here we slice using the key as the value.

Mustang
Mustang


If we wanted to access the first item in a list we could not actually do it in a classical python way so

print(thisdict[0])
would not work.  However, sometimes it is useful in that case we can use this little hack.

print(list(thisdict[0]))


In [70]:
print(list(thisdict)[0]) # this code actual converts the dictionary to a list and the addresses it will return the keys in order.  It might be useful

model


If you want to add to a dictionary you can simply make a new key and assign it a value

In [73]:
thisdict['Max speed'] = 145
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964, 'Max speed': 145}


***
Build a dictionary cities and their rivers (it does not have to be a big dictionary).


In [None]:
#Write your code here

***
Numpy arrays
It is not possible to talk about data structures without talking about arrays.  An array is a matrix of numbers it can be any number of dimensions any length, width or depth. For now, we will only use 2D arrays to make visualizing them easier.
We will use a type of array called a numpy array, this is very common and used by nearly everyone.
To use numpy we have to import the numopy libray in to python to give us access to the array commands, next week we will explain libaries in more detail but for now here is a very brief overview using numpy.


In [79]:
import numpy as np
# this line of code tells python that we want to use the numpy library
# by adding the 'as np' we tell python that to call a command in the numpy library we will prefix the command with np.
# lets try it.
x = 3.14159 / 2
y = np.sin(x)
print(y)
# just for fun lets use the numpy defined value of pi
x = np.pi/2
y = np.sin(x)
print(y)

0.9999999999991198
1.0


Ok, so above may seem a little confusing. let us break it down line by line:
import numpy as np, this line tells python we want to use the numpy library and we will call functions uing the prefix np.

**x = 3.14159 / 2**,
this defines a variable to be pi/2

**y = np.sin(x)**,
here we are using the sin function from numpy so np.sin() the value is stored in the variable y.

We also use the constant **np.pi**, this is a constant stored in the numpy library with the value 3.141592653589793

Ok, now lets use numpy for storing data.  Numpy gives you access to arrays, arrays can store numbers and only numbers. Let's define one.

In [81]:
#define an array
myFirstArray = np.asarray([1,2,3,4,5,6,7,8,9]) # this builds a 1D array

In [82]:
print(myFirstArray)

[1 2 3 4 5 6 7 8 9]


In [84]:
mySecondArray = np.asarray([[1,2,3],[4,5,6],[7,8,9]]) # this builds a 2D array notice the [] brackets
print(mySecondArray)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


***
### Addressing and slicing arrays in 1 and 2 dimensions.
##### We are now going to learn how to access and cut up those arrays
---

In [87]:
# for 1D arrays
print('The first value in my array is', myFirstArray[0])
print('The last value in my array is', myFirstArray[-1])
print('The first 3 values in my array are', myFirstArray[0:3])
print('The last 3 values in my array are', myFirstArray[-4:-1])

# This should be obvious now, if not just ask a TA

The first value in my array is 1
The last value in my array is 9
The first 3 values in my array are [1 2 3]
The last 3 values in my array are [6 7 8]


In [92]:
# for 2D arrays
print('The value in the top left is', mySecondArray[0,0]) # The first number is the Y value 0 at the top increasing as in a table, the second number is the X axis
print('The first row is', mySecondArray[0,:]) # The : prints the whole row
print('The middle column is', mySecondArray[:,1])

The value in the top left is 1
The first row is [1 2 3]
The middle column is [2 5 8]


***
### Now some maths.
##### Here we will use some function provided with numpy to perform maths on the arrays
---

In [94]:
# Calcaute the mean
print('The mean of the 1D array is', np.mean(myFirstArray))
print('The mean of the 2D array is', np.mean(mySecondArray))

The mean of the 1D array is 5.0
The mean of the 2D array is 5.0


In [96]:
# Cunningly we can also ask numpy to give us the mean of a specific axis
print('The mean of the 2D array in Y is', np.mean(mySecondArray, axis=0)) # Notice the axis keyword
print('The mean of the 2D array in X is', np.mean(mySecondArray, axis=1))
# numpy is giving us three answers because it is treating each column or row individually.

The mean of the 2D array in X is [4. 5. 6.]
The mean of the 2D array in Y is [2. 5. 8.]


In [97]:
# Summing
print('The sum of the 2D array is', np.sum(mySecondArray))
print('The sum of the 2D array along Y is', np.sum(mySecondArray, axis=0))
print('The sum of the 2D array along X is', np.sum(mySecondArray, axis=1))

The sum of the 2D array is 45
The sum of the 2D array along Y is [12 15 18]
The sum of the 2D array along X is [ 6 15 24]


***
### Now some maths using multiple arrays
##### Here we will operate on arrays with other arrays
---

In [98]:
arrayOne = np.asarray([[1,1,1],[1,0,1],[1,1,1]])
arrayTwo = np.asarray([[1,0,1],[1,0,1],[1,0,1]])

In [102]:
# Summation
print(arrayOne+arrayTwo)

[[2 1 2]
 [2 0 2]
 [2 1 2]]


In [103]:
# Multiplication
print(arrayOne*arrayTwo)

[[1 0 1]
 [1 0 1]
 [1 0 1]]


In [104]:
#Real matrix multiplication
print(np.matmul(arrayOne, arrayTwo))

[[3 0 3]
 [2 0 2]
 [3 0 3]]


***
### Reshaping the array
##### Here we will change the shape of an array
---

In [107]:
# First lets look at the size and shape of an array
print('The size of the array is', np.size(mySecondArray))
print('The shape of the array is', np.shape(mySecondArray))

The size of the array is 9
The shape of the array is (3, 3)


In [108]:
#what happens if we want to change the shape of an array.
# We can use the reshape command
print('Shape', np.shape(myFirstArray))
myNewArray = np.reshape(myFirstArray,[3,3]) # here I give the new shape as [3,3]
print('Shape', np.shape(myNewArray))

Shape (9,)
Shape (3, 3)


***
### Generating data
##### It can be a pain buiding large arrays, but there are plenty of tools to create array with prefilled values.
---

In [114]:
#Simple empty array
print(np.zeros([3,6]))
print('')
#Simple array of ones
print(np.ones([3,6]))

[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]

[[1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]]


In [116]:
#Often we want to generate a linear spaced set of data, for example if we were going to plot a function.
x = np.linspace(0, 9, 10)
print(x)
# linspace takes three inputs, the starting value, the final value and the total number of points required, it generates equally spaced numbers in between the start and final values.

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


In [117]:
# The arange command is very similar but allows you to define the step size
x = np.arange(0, 10, .5 )
print(x)
# notice that the 10 is missing this is because the command stops before the final value.

[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5 7.  7.5 8.  8.5
 9.  9.5]


array([1.        , 1.02591437, 1.05250029, 1.07977516, 1.10775685,
       1.13646367, 1.1659144 , 1.19612833, 1.22712524, 1.25892541])