# Python Data Types and Methods: Tuples, Dictionaries and Arrays

Continuing from numeric types, strings and lists, we now cover three more useful data types in Python: `tuple`, `dict`, and `numpy.array`.  We will cover how to create them, what they are used for, and how to use some of their methods.

## [Tuples](https://docs.python.org/3/library/stdtypes.html#tuple)

`tuple` is like `list`, but is **immutable**.  The syntax is similar except tuples use parentheses instead of square brackets.

In [1]:
d = ('a', 'b', 'c')
print(d)

('a', 'b', 'c')


In [2]:
d[2] = 'z'

TypeError: 'tuple' object does not support item assignment

See?  It really is immutable.  You'll just get a traceback if you try.  Use immutables only when you don't want to allow them to be modified.

In [3]:
d.remove['c']

AttributeError: 'tuple' object has no attribute 'remove'

If you want to remove an element or update it, you could translate the tuple back to a list first.

In [4]:
print(d)
e = list(d)
e.remove('c')
print(e)

('a', 'b', 'c')
['a', 'b']


But notice that `e` is a list, not a tuple.  If we want the result to be a tuple, we have to convert it back from a list.

In [8]:
f = tuple(e)
print(f)

('a', 'b')


The `zip` function takes two equal-length collections (like lists) and combines them element by element to create tuples of the pairs with the same index value. Here we create two lists of integers and `zip` them to create a list of tuples:

In [9]:
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
print(list(zipped))

[(1, 4), (2, 5), (3, 6)]


## [Dictionaries](https://docs.python.org/3/library/stdtypes.html#dict)

A `dict` is a way to store data just like a list, but instead of using only numbers to get the data, you can use almost anything. This lets you treat a dict like it's a database for storing and organizing data.

A python dictionary is a collection of key, value pairs. The **key** is a way to name the data, and the **value** is the data itself. 

Dictionaries are a very handy data type that can be used to manage data you need to look up by a key.  Dictionaries are unordered key - value pairs, separated by a colon.  They are much more general than the word : definition kind of pairing, since the value can be many different kinds of objects.  The syntax in this case identifies a dictionary with curly braces, containing lists of key-value pairs. 

### Creating Dictionaries


There are a few different ways to create dictionaries.  The first two create an empty dictionary.

In [10]:
newdict = {}

In [11]:
newdict=dict()

Another way to create a dictionary is to provide key: value pairs in a list, and put these into curly brackets:

In [12]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
print(antonyms)

{'fast': 'slow', 'hot': 'cold', 'good': 'bad'}


We can then add items to a dictionary using update, or assigning a value to a new key:

In [13]:
newdict.update({'new': 'item'})

In [14]:
newdict["next"] = "thing"

In [15]:
newdict

{'new': 'item', 'next': 'thing'}

Another way to do create a dictionary is by converting lists.  This is a convenient thing to do with real data that comes from files, compared to the simple data we are using here.  The `zip` function is a bit advanced -- we will come back to it later when we talk about loops and iterables.  For now, just understand that it creates an iterable (think list) of tuples, containing the paired entries from the Keys and Values lists.

Notice that we can use the `zip` function to combine the keys and values to make the dictionary, making tuples of key-value pairs:

In [16]:
Keys = ['hot', 'fast', 'good']
Values = ['cold', 'slow', 'bad']
antonyms2 = dict(zip(Keys,Values))
print(antonyms2)

{'fast': 'slow', 'hot': 'cold', 'good': 'bad'}


### Working with Dictionaries
As usual, find the functions available for this class by using its name, dot, and tab:

In [18]:
#dict.

We can retrieve the value of any dictionary entry by its key:

In [19]:
antonyms['hot']

'cold'

We can get the length, keys, and values of a dictionary:

In [20]:
len(antonyms)

3

To see all the keys in a dictionary, use the keys function:

In [21]:
print(antonyms.keys())

dict_keys(['fast', 'hot', 'good'])


The same thing works to get the values:

In [22]:
print(antonyms.values())

dict_values(['slow', 'cold', 'bad'])


In [23]:
antonyms.get('hot')

'cold'

### Dictionaries are mutable

We already saw that we can add elements to a dictionary. We can change the value associated with a particular key by just assigning a value:

In [24]:
antonyms['fast'] = 'gorge'
antonyms

{'fast': 'gorge', 'good': 'bad', 'hot': 'cold'}

As you can see, working with dictionaries is kind of like working with
lists and tuples, except that you can’t join dicts with the plus operator
(+). If you try to do that, you’ll get an error message:

In [25]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}

synonyms = {'hot': 'very warm', 'fast': 'quick', 'good': 'fine'}

antonyms+synonyms

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

Here is one way to merge the list.  But notice the result has only three elements, not six. Why?

In [26]:
newdict = {}
newdict.update(antonyms)
newdict.update(synonyms)
newdict

{'fast': 'quick', 'good': 'fine', 'hot': 'very warm'}

Maybe the result is different if we ensure the keys are unique?

In [27]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
antonyms2 = {'blue': 'cold', 'red': 'hot'}
newdict = {}
newdict.update(antonyms)
newdict.update(antonyms2)
newdict

{'blue': 'cold', 'fast': 'slow', 'good': 'bad', 'hot': 'cold', 'red': 'hot'}

If you want to delete a dictionary entry, use del:

In [28]:
del newdict['red']
newdict

{'blue': 'cold', 'fast': 'slow', 'good': 'bad', 'hot': 'cold'}

What happens if you try to rerun the cell above after you have already run it?

In [29]:
cityPlanners_dict = {"name": "Jane Jacobs", \
                     "year of birth": 1916, \
                     "year of death": 2006, \
                     "place of birth": "Pennsylvania"}

- The keys have to be **unique** and are **immutable**. The usual suspects are strings and integers.
- The values can be anything, including lists, and even other dictionaries:

In [30]:
cityPlanners_dict = {"name": "Jane Jacobs", \
                     "year of birth": 1916, \
                     "year of death": 2006, \
                     "place of birth": "Pennsylvania", \
                     "books": ["The Death and Life of Great American Cities",\
                               "Cities and the Wealth of Nations ","Dark Age Ahead ",\
                               "Eyes on the Street: The Life of Jane Jacobs ",\
                               "The Economy of Cities "]}


- key/value pairs are **unordered**. Even though they print in a particular way, this doesn't mean that one comes before the other.

In [31]:
print(cityPlanners_dict)

{'name': 'Jane Jacobs', 'place of birth': 'Pennsylvania', 'year of birth': 1916, 'year of death': 2006, 'books': ['The Death and Life of Great American Cities', 'Cities and the Wealth of Nations ', 'Dark Age Ahead ', 'Eyes on the Street: The Life of Jane Jacobs ', 'The Economy of Cities ']}


### Use dictionary keys to access the values

- Instead of using indices to extract items, dictionaries uses key-value pairs to find and retrieve information.

In [32]:
print(cityPlanners_dict.keys(), '\n')
print(cityPlanners_dict.values())

dict_keys(['name', 'place of birth', 'year of birth', 'year of death', 'books']) 

dict_values(['Jane Jacobs', 'Pennsylvania', 1916, 2006, ['The Death and Life of Great American Cities', 'Cities and the Wealth of Nations ', 'Dark Age Ahead ', 'Eyes on the Street: The Life of Jane Jacobs ', 'The Economy of Cities ']])


- If you wanted the value of a particular key:

In [33]:
cityPlanners_dict["name"]

'Jane Jacobs'

- Or perhaps you wanted the last element of the `books` list

In [34]:
cityPlanners_dict["books"][-1]

'The Economy of Cities '

### Dictionaries compared to lists

In general, if you need data to be ordered or you have only simple data not needing to be subset, use a list.

If the data is complex or hierarchical, the dictionary's `key` / `value` structure can be very helpful. If you are only concerned about membership in a collection, dictionaries will always be much faster to reference, as the computer doesn't have to keep track of order. And to make a hierarchical or nested data structure, you can put a list (or even another dictionary!) inside a dictionary as the `value`.

### Once a dictionary has been created, you can change the values of the data. 

This is because its a *mutable* object.

In [35]:
cityPlanners_dict["place of birth"] = "San Francisco"
print(cityPlanners_dict)

{'name': 'Jane Jacobs', 'place of birth': 'San Francisco', 'year of birth': 1916, 'year of death': 2006, 'books': ['The Death and Life of Great American Cities', 'Cities and the Wealth of Nations ', 'Dark Age Ahead ', 'Eyes on the Street: The Life of Jane Jacobs ', 'The Economy of Cities ']}


Remember, this means that if you assign this dictionary to a new variable, a change to either variable will change the dictionary.

### You can also add new keys to the dictionary.  

- Note that dictionaries are "indexed" with square braces, just like lists--they look the same, even though they're very different.

In [36]:
cityPlanners_dict["gender"] = "Female"
print(cityPlanners_dict)

{'name': 'Jane Jacobs', 'books': ['The Death and Life of Great American Cities', 'Cities and the Wealth of Nations ', 'Dark Age Ahead ', 'Eyes on the Street: The Life of Jane Jacobs ', 'The Economy of Cities '], 'place of birth': 'San Francisco', 'gender': 'Female', 'year of birth': 1916, 'year of death': 2006}


### You can loop through dictionaries

- There are several ways to loop through dictionaries. Looping over `.keys()` using a 'for' loop is an easy method.
- Note the order is not sorted by key.

In [37]:
race = {'white': 0.643, 'african_american': 0.068, 'asian': 0.21, 'other': 0.079}

for key in race.keys():
    print(key, race[key])

other 0.079
white 0.643
african_american 0.068
asian 0.21


Using a for loop makes it really easy to change the value of items in the dictionary, like transforming fractions to percentages:

In [38]:
# translate fractions to percentages 
race = {'white': 0.643, 'african_american': 0.068, 'asian': 0.21, 'other': 0.079}
for key in race.keys():
    race[key] = round(100 * race[key], 2)

print(race)

{'other': 7.9, 'white': 64.3, 'african_american': 6.8, 'asian': 21.0}


To see if something is in a collection like a list or a dictionary, use the `in` operator:

In [39]:
countries = ["Afghanistan", "Canada", "Denmark", "Japan"]
race = {'white': 0.643, 'african_american': 0.068, 'asian': 0.21, 'other': 0.079}

print('Japan' in countries)
print('Iran'in countries)
print('asian' in race)
print('asian' not in race)

True
False
True
False


Below is a **list** containing 5 **dictionaries** representing some American states. 

1. Loop through all the dictionaries in the list
2. Check to see if "state bird" is in the dictionary
3. If the key is NOT in the dictionary, add the key and [assign](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#assign) the value "unknown" to it

In [40]:
states = [{'state': 'Ohio', 'population': 11.6, 'year in union': 1803, 'state bird': 'Northern cardinal', 'capital': 'Columbus'},
          {'state': 'Michigan', 'population': 9.9, 'year in union': 1837, 'capital': 'Lansing'},
          {'state': 'California', 'population': 39.1, 'year in union': 1850, 'state bird': 'California quail', 'capital': 'Sacramento'},
          {'state': 'Florida', 'population': 20.2, 'year in union': 1834, 'capital': 'Tallahassee'},
          {'state': 'Alabama', 'population': 4.9, 'year in union': 1819, 'capital': 'Montgomery'}]

In [42]:
# Solution (this uses some syntax we will get to in the programing logic sessions coming soon)
for i in states:
    if not'state bird' in i:
        i['state bird']= "unknown"
print(states)

[{'state': 'Ohio', 'state bird': 'Northern cardinal', 'year in union': 1803, 'population': 11.6, 'capital': 'Columbus'}, {'state': 'Michigan', 'state bird': 'unknown', 'year in union': 1837, 'population': 9.9, 'capital': 'Lansing'}, {'state': 'California', 'state bird': 'California quail', 'year in union': 1850, 'population': 39.1, 'capital': 'Sacramento'}, {'state': 'Florida', 'state bird': 'unknown', 'year in union': 1834, 'population': 20.2, 'capital': 'Tallahassee'}, {'state': 'Alabama', 'state bird': 'unknown', 'year in union': 1819, 'population': 4.9, 'capital': 'Montgomery'}]


*****

### Dictionary Summary

1. A python dictionary is a collection of key, value pairs.
2. Use dictionary keys to access the values.
3. Once a dictionary has been created, you can change the values of the data and assign new keys.
4. You can loop through key/value pairs in a dictionary.

## 3. Arrays

A datatype that is incredibly valuable for doing numeric processing on is the Array.  It is provided by the Numpy library so we have to import Numpy in order to use it and its many methods.  We will compare it to lists of numbers to get some insight into why it is useful.  But in short, it provides a way to vectorize your calculations instead of iterating over a list and doing the computations element by element.  When datasets are large, the computational efficiency from using vectorized calculations over for loops are very significant.  But in addition to speed, it also provides a lot of numerical methods that make complex math and linear algebra and other scientific computing used in data science, so much easier.

In [None]:
import numpy as np

Let's start by creating a list, and then creating an array from that list.  Then let's compare how the list of integers works compared to the array.

In [None]:
x = list(range(1,6))
y = np.array(x)

In [None]:
print(x)

In [None]:
print(y)

In [None]:
type(x)

In [None]:
type(y)

Let's see how we can do math operations on these two versions of our data.

In [None]:
sum(x)

In [None]:
sum(y)

In [None]:
min(x)

In [None]:
min(y)

So far so good -- not easy to tell the difference between lists and arrays... but some methods are not available for lists that apply to arrays.

In [None]:
mean(x)

In some cases, we can use a Numpy method and apply it to a list of numbers like we have in this case:

In [None]:
np.mean(x)

In [None]:
np.mean(y)

In [None]:
np.median(x)

In [None]:
np.median(y)

In [None]:
np.size(x)

In [None]:
np.size(y)

In [None]:
x / 10

In [None]:
y / 10

Doing this operation on the list would require iterating over its values and doing the operation element by element:

In [None]:
xscaled = [ z/10 for z in x] 
xscaled

We can create arrays and initialize them with zeros or ones

In [None]:
Z = np.zeros(10)
print(Z)

And we can set values in the arrays by index value -- meaning they are mutable.

In [None]:
Z[4] = 1
print(Z)

In [None]:
Z = np.arange(9).reshape(3,3)
print(Z)

In [None]:
Z.shape

In [None]:
Z.size

In this example we create a 10 x 10 array of random numbers and find the min and max of the array:

In [None]:
Z = np.random.random((10,10))
Zmin, Zmax = Z.min(), Z.max()
print(Zmin, Zmax)

Below we normalize a 5 x 5 random matrix

In [None]:
Z = np.random.random((5,5))
Zmax, Zmin = Z.max(), Z.min()
Z = (Z - Zmin)/(Zmax - Zmin)
print(Z)

Remember this example?  It was using NumPy arrays and the Matplotlib library for plotting.

In [None]:
import matplotlib.pyplot as plt

In [None]:
x=range(100)
y=np.sin(x)
plt.plot(x*y)

Below are a few exercises and solutions from http://www.python-course.eu/numpy.php.  Review these and experiment below.

1. Create an arbitrary one dimensional array called "v".

2. Create a new array which consists of the odd indices of previously created array "v".

3. Create a new array in backwards ordering from v.

4. What will be the output of the following code:

       a = np.array([1, 2, 3, 4, 5])
       b = a[1:4]
       b[0] = 200
       print(a[1])
   
5. Create a two dimensional array called "m".

6. Create a new array from m, in which the elements of each row are in reverse order.

7. Another one, where the rows are in reverse order.

8. Create an array from m, where columns and rows are in reverse order.

9. Cut of the first and last row and the first and last column.

Here are the solutions:

In [13]:
# 1
import numpy as np

a = np.array([3, 8, 12, 18, 7, 11, 30])

In [14]:
# 2
odd_elements = a[1::2]
odd_elements

array([ 8, 18, 11])

In [15]:
# 3
reverse_order = a[::-1]
reverse_order

array([30, 11,  7, 18, 12,  8,  3])

In [16]:
# 4
# The output will be 200, because slices are views in numpy and not copies.

In [17]:
# 5
m = np.array([ [11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])

In [18]:
# 6
m[::,::-1]

array([[14, 13, 12, 11],
       [24, 23, 22, 21],
       [34, 33, 32, 31]])

In [19]:
# 7
m[::-1]

array([[31, 32, 33, 34],
       [21, 22, 23, 24],
       [11, 12, 13, 14]])

In [20]:
# 8
m[::-1,::-1]

array([[34, 33, 32, 31],
       [24, 23, 22, 21],
       [14, 13, 12, 11]])

In [21]:
# 9
m[1:-1,1:-1]

array([[22, 23]])