#### B03: Basic Data Structures Part 1

We've already learned how to assign a single value to a variable. But what if we have multiple values that we'd like to assign, store and use? Storing these as separate variables would be time consuming and inefficient:

In [6]:
a = 1
b = 2
c = 3
d = 4
e = 5

However by using a **data structure** we can assign multiple values to a single variable reference which is far easier. There are four basic types of data structure in Python as follows:

* List
* Dictionary
* Tuple
* Set

Note that we'll be meeting more complex data structures further on in the course but for now, lets understand the basics, starting with <b>Lists</b>

**Lists**

Lists are probably the data structure that you'll meet most in Python as they are the most simple and versatile. Lists are created using square brackets like so:

In [1]:
mylist1 = []     # Creates a blank list

You can call them the same way you would call any other variable:

In [2]:
mylist1

[]

However, our list is empty so we only get a set of square brackets back =(

Let's create a list with data in like so:

In [7]:
mylist2 = [10,11,12,13,14,15]    # Creating a basic list

In [8]:
mylist2

[10, 11, 12, 13, 14, 15]

We can also create a list using variables:

In [9]:
mylist3 = [a,b,c,d,e]       # Creating a list using variables
mylist3

[1, 2, 3, 4, 5]

Most data structures in Python are versatile and can store a variety of varible types, and lists are no exception:

In [10]:
mylist4 = [1,1.5,'text',True]     # Creating a list with a variety of variable types

In [11]:
mylist4

[1, 1.5, 'text', True]

You can even use data structures to store other data structures:

In [12]:
mylist5 = [mylist2,mylist3]       # Creating a list to store other lists

In [13]:
mylist5

[[10, 11, 12, 13, 14, 15], [1, 2, 3, 4, 5]]

This may seem (and can get!) quite complicated, but as you'll see later on in the course 'nesting' data structures in this manner is a powerful and versatile way in which we can store and access data.

Lists can be used with functions in the exact same manner as other variables:

In [14]:
type(mylist4)

list

In [15]:
print(mylist3,mylist4)

[1, 2, 3, 4, 5] [1, 1.5, 'text', True]


In [16]:
len(mylist4)

4

Note that the **len** function returns the number of items in the list rather than the number of characters. However what if we wanted to return the length of a specific item in a list? Before we can do this, we must understand indexing.


**Indexing**


Indexing in its simplest form basically refers to: **what position something appears in a data structure**. A good real-world example of this is houses on a street. The houses can be thought of as the items in the data structure and the number of the house can be thought of as its index.

One very important thing to note is that **INDEXING STARTS FROM 0 IN PYTHON!!**

This means that the first data item will have an **index of 0**. Let's have a look at the syntax for indexing and see how this works in practice.

Note that for now we're using lists, but you'll be able to apply the principles you learn here to all other data strucutres in Python, including the more advanced ones we'll meet as part of the analysis and visualisation modules.

In [17]:
mylist2[0]   # Calling the first index from a list

10

In [18]:
mylist2[1]   # Calling the second index from a list

11

Python also supports negative indexing. This allows us to call the indexes from the end of the data structure like so:

In [19]:
mylist2[-1]   # Calling the last index from a list

15

In [20]:
mylist2[-3]   # Calling the 3rd to last index from a list

13

**Slicing**

Slicing is a way of cutting a specific number of items from a data structure. Again, the principles you learn here will apply to all other data structures in Python. Slicing is important when it comes to working with raw data as it allows you to extract specific items of data and model this into structured variables.

The syntax is very similar to indexing:

In [3]:
mylist2[:3]       # Returns items up to but not including item 3

NameError: name 'mylist2' is not defined

In [22]:
mylist2[3:]       # Returns items after item 3

[13, 14, 15]

You can also specify start and end points for slicing:

In [23]:
mylist2[3:5]      # Returns items after item 3 (inclusive) but before item 5

[13, 14]

Don't forget that you can create new variables from indexes and slices too:

In [24]:
newlist = mylist2[3:]

In [25]:
newlist

[13, 14, 15]

We can also specify a step size when slicing:

In [26]:
mylist2[0:4:2]    # Returns items after 0 up to item 4 using 2 steps

[10, 12]

And slice backwards by using a negative value:

In [27]:
mylist2[-1::-2]    # Starts slicing at the end and works backwards

[15, 13, 11]

Lastly, slicing and indexing isn't confined to data structures. We can use it on strings:

In [28]:
mystring = "Lorem ipsum dolor sit amet."

In [29]:
mystring[0:10:2] # Returns characters after 0 up to item 10 using 2 steps

'Lrmis'

In [30]:
mystring[0:10:2][0]    # Using an index on top of a slice

'L'

But not ints, floats or Boolean values:

In [1]:
myint = 123456789012345
myint[0]

TypeError: 'int' object is not subscriptable

In [2]:
myfloat = 3.14159
myfloat[0:1]

TypeError: 'float' object is not subscriptable

In [4]:
mybool = True
mybool[0:1]

TypeError: 'bool' object is not subscriptable

**Everything is an Object!**

Earlier in the course we talked about everything being an object in the sense that anything can be assigned to a variable or passed as an argument to a function. We're going to see what that actually means now! 

Up until now we've been using numbers as arguments to our index and slicing functions. The issue with this is that you can't create dynamic programs that respond to input data since these values are 'hard-coded'.

In the example below, we return the first 4 items of our list:

In [37]:
mylist6 = [0,0,0,1,2,3,3,3,3,3,3,4,5,6,7,8,9]
mylist6[0:4]     # Returning the first 4 items of a list.

[0, 0, 0, 1]

However we can just as easily pass variable references to tell Python where to slice:

In [41]:
start = 0
end = int(len(mylist6)/2)

mylist6[start:end]

[0, 0, 0, 1, 2, 3, 3, 3]

This gives us more power in that the values for the 'start' and 'end' variables can be dynamically generated based upon the input data.