# Introduction to Data Analysis with Python
## Python Refresher: Data Containers
## Learning Objectives
at the end of this session, participants will abe to apply:
* Python lists, indexing and slicing
* Python dictionaries (dicts)
* Python list and dict comprehensions


In [12]:
nephews = ["ALi","Abed","Moe"]

Lists in Python are denoted by brackets, and their elements are separated by commas. Let's look at the list that we just created.

Individual list elements can be accessed by index. Starting with zero for the first element. For instance, the first nephew is Ali. This convention of starting from zero comes from C. The language that inspired Python, and that was used to write the standard Python interpreter which is known as CPython for this reason. 

In [3]:
nephews[0]

'ALi'

In [4]:
nephews[2]

'Moe'

In [5]:
nephews[3]

IndexError: list index out of range

So we can also look for the last nephew, and we can even look for a nephew beyond the end of list, which in this case will yield an error. The list index is out of range. The length of a list is obtained with len.

In [6]:
len(nephews)

3

This bracket indexing notation can also be used to reassign elements. Let's do it for all of them with a simple loop. lets have a look. 

In [13]:
for i in range(3):
    nephews[i] = nephews[i] + ' natsheh'

In [14]:
print(nephews)

['ALi natsheh', 'Abed natsheh', 'Moe natsheh']


An important point is that lists do not need to have homogeneous content, such as all strings of all numbers. We could mix it up. In all these cases, the output representation of our Python object is very similar to our input. To add a single element to a list, we use the append method.

In [15]:
mix_it_up = [1,[2,3],'alpha']

In [16]:
mix_it_up

[1, [2, 3], 'alpha']

In [18]:
nephews.append('Masa Natsheh')
print(nephews)

['ALi natsheh', 'Abed natsheh', 'Moe natsheh', 'Masa Natsheh', 'Masa Natsheh']


In the same cell, we're going to write a second statement to print out our list. To concatenate two lists, we can use the extend method or just a plus. For instance, let's add the missing two nieces. Or let's side the other way with a plus sign.

In [19]:
nephews.extend(['zenah Natsheh', "Maya Natsheh"])
print (nephews)

['ALi natsheh', 'Abed natsheh', 'Moe natsheh', 'Masa Natsheh', 'Masa Natsheh', 'zenah Natsheh', 'Maya Natsheh']


In [20]:
Natsheh = nephews + ['Nawal Natsheh', 'Eman Natsheh']
print(Natsheh)

['ALi natsheh', 'Abed natsheh', 'Moe natsheh', 'Masa Natsheh', 'Masa Natsheh', 'zenah Natsheh', 'Maya Natsheh', 'Nawal Natsheh', 'Eman Natsheh']


We could also insert elements at any position in the list using the insert method.

In [21]:
Natsheh.insert(0, 'Noor Natsheh')
print(Natsheh)

['Noor Natsheh', 'ALi natsheh', 'Abed natsheh', 'Moe natsheh', 'Masa Natsheh', 'Masa Natsheh', 'zenah Natsheh', 'Maya Natsheh', 'Nawal Natsheh', 'Eman Natsheh']


we can delete elements either by their index, or by their value

In [24]:
del Natsheh[0]
print(Natsheh)
Natsheh.remove('Nawal Natsheh')
print (Natsheh)

['Moe natsheh', 'Masa Natsheh', 'Masa Natsheh', 'zenah Natsheh', 'Maya Natsheh', 'Nawal Natsheh']
['Moe natsheh', 'Masa Natsheh', 'Masa Natsheh', 'zenah Natsheh', 'Maya Natsheh']


Last, we can sort the elements. The default sorting order is what's known as Lexicographic. For strings with standard characters, this just means alphabetic. The sorting is in place, so it modifies a list that already exists. This should all be very basic to you if you've worked with Python in the past. 

In [25]:
Natsheh.sort()
print(Natsheh)

['Masa Natsheh', 'Masa Natsheh', 'Maya Natsheh', 'Moe natsheh', 'zenah Natsheh']


Let's move a little further. Beyond working with individual elements, we can manipulate them collective in contiguous groups.

In [26]:
squares = [0,1,4,9,16,25,36,49]

These are known as slices. For instance, let's use numbers now. Let's have a list of the first few squares. Zero is the square of zero, one is the square of one, and so on. For instance, if we wish to have a sublist of the first two elements, we will write it as a slice going from zero to two. If we want the sublist of the second and third element, we will write it as a slice of elements one through index three.

In [27]:
squares[0:2]

[0, 1]

In [28]:
squares[1:3]

[1, 4]

This notation may look a little counterintuitive because the second index actually refers to an element that is not picked in the sublist. However, this is again a C inspired notation, and it has the advantage that the number of elements selected in a slice, is given by the difference of the two indices. So slice zero to two were selecting two elements, slice one to three will also select in two elements. There are a few more tricks that we can use in slicing.

For instance, we can omit the starting index to start at the beginning of a list. We can omit the ending index to go until the end. We can omit both indices to get the entire list. We can also use negative indices to count from the end. So minus one will yield the last element in the list, 49. 

In [29]:
squares[:4]

[0, 1, 4, 9]

In [30]:
squares[1:]

[1, 4, 9, 16, 25, 36, 49]

In [31]:
squares[:]

[0, 1, 4, 9, 16, 25, 36, 49]

In [32]:
squares[-1]

49

Such slicing is not limited to accessing sublists, but can also be used to reassign them by providing an object of the appropriate length on the right side of an assignment

For instance, we may reassign elements two and three in this case to strings. Which will make a bit of a node list. We can also delete elements using the slicing syntax, which in this case we remove the last two. 

In [33]:
squares[2:4] = ['4', '9']
print(squares)

[0, 1, '4', '9', 16, 25, 36, 49]


In [34]:
del squares[-2:]

In [35]:
print(squares)

[0, 1, '4', '9', 16, 25]


When we introduce NumPy arrays, which are a very useful extension to Python when we want to deal with bulk data, we will see that the basic slicing syntax carries over. Indeed the syntax is extended even further in NumPy.

However, even the standard Python lists can be very useful in many applications. One sees them very often in loops. In this case, a loop is just printing each of the elements. In Python a for-loop amounts to a very simple statement.

We're going to iterate over the values contained in this list, and assign them in turn to this variable.

However, even the standard Python lists can be very useful in many applications. One sees them very often in loops. In this case, a loop is just printing each of the elements. In Python a for-loop amounts to a very simple statement.


In [37]:
for value in squares:
    print("Element: ", value)

Element:  0
Element:  1
Element:  4
Element:  9
Element:  16
Element:  25


Given a list, enumerate lets you loop through the list indices and elements together, like so.

For index comma value in enumerate of my list. We can then write a statement that uses both the index and the value. 

In [38]:
for index, value in enumerate(squares):
    print("Element", index, "-> ", value)

Element 0 ->  0
Element 1 ->  1
Element 2 ->  4
Element 3 ->  9
Element 4 ->  16
Element 5 ->  25


Python lists are very useful, and you'll find them everywhere in Python code. They're flexible and efficient, and they can contain any type of data. Not necessarily homogeneous. You would use them where the order of your data items matters, so that you can refer to them with a numeric index.

Although, as we've seen lists of mixed data types are possible, lists are most appropriate where you're collecting data items of the same type.