# Data Science I Topic 1 - List, Tuple, and Dictionary

<u>The lists used below are the first few entries of the dataset [MovieLens](https://grouplens.org/datasets/movielens/)  that we will use again later.</u>

## List

<u>A list can contain a mix of variable types.

Run the cells below.</u>

### Initializing

In [1]:
movie_ratings = ['Toy Story', 4.0, 
                 'Jumanji', 4.0,
                 'Grumpier Old Men', 4.0, 
                 'Waiting to Exhale', 5.0]
print(movie_ratings)

['Toy Story', 4.0, 'Jumanji', 4.0, 'Grumpier Old Men', 4.0, 'Waiting to Exhale', 5.0]


**Tips**

<u>You can make lists within a list. Continue the following such that it contains list of lists of the movie title and its corresponding rating.</u>

In [2]:
# expand this to contain the 4 movie titles and ratings above
movie_rating_list = [['Toy Story', 4.0], 
                     ['Jumanji', 4.0],
                     ['Grumpier Old Men', 4.0],
                     ['Waiting to Exhale', 5.0]]
# display it
movie_rating_list

[['Toy Story', 4.0],
 ['Jumanji', 4.0],
 ['Grumpier Old Men', 4.0],
 ['Waiting to Exhale', 5.0]]

### Extending, appending

<u>`.extend()`: iterates over its argument, you can put many additions **in form of a list**, to add into another list.</u>

In [3]:
# extend the movie_ratings list to include the movie 'Father of The Bride Part II' with rating  5.0
# movie_ratings list shouldn't contain any list
movie_ratings.extend(['Father of the Bride Part II', 5.0])

# display it
print(movie_ratings)

['Toy Story', 4.0, 'Jumanji', 4.0, 'Grumpier Old Men', 4.0, 'Waiting to Exhale', 5.0, 'Father of the Bride Part II', 5.0]


<u>`append()`: add a single element to the end of a list.</u>

In [4]:
# append 'Heat' and then the rating of 4.0 to the list movie_ratings
# movie_ratings list shouldn't contain any list
movie_ratings.append('Heat')
movie_ratings.append(4.0)

# display it
print(movie_ratings)

['Toy Story', 4.0, 'Jumanji', 4.0, 'Grumpier Old Men', 4.0, 'Waiting to Exhale', 5.0, 'Father of the Bride Part II', 5.0, 'Heat', 4.0]


### Subsetting

<u>Indexing in Python starts with 0. You can access the last element with index -1. Run the cells below.</u>

In [5]:
print(movie_ratings[0])
print(movie_ratings[-1])

Toy Story
4.0


In [6]:
# Getting indices
# Which index contains the movie title 'Jumanji'?
ind_jumanji = movie_ratings.index('Jumanji')

In [7]:
# Print the rating of Jumanji
movie_ratings[ind_jumanji+1]

4.0

**ADDITION**: use `.__getitem__` to subset a list with multiple indices

In [8]:
# e.g.:
list(map(movie_ratings.__getitem__, [0,1,4,5]))

['Toy Story', 4.0, 'Grumpier Old Men', 4.0]

<u>What happens if you use negative indices like -2, -3, and so on? Try it below.</u>

In [9]:
# Last entry: index -1, second to last:-2, and so on
print(movie_ratings[-1])
print(movie_ratings[-2])

4.0
Heat


### Subsetting list of lists

<u>To subset a list of lists, use double square brackets. Try it below.</u>

In [10]:
print(movie_rating_list[0])
print(movie_rating_list[0][0])

['Toy Story', 4.0]
Toy Story


### Slicing

<u>Slicing means selecting multiple elements from your list. You can slice from the first element by using blank. Run the cell below.</u>

In [11]:
print(movie_ratings[:2])

['Toy Story', 4.0]


In [12]:
print(movie_ratings[::2])

['Toy Story', 'Jumanji', 'Grumpier Old Men', 'Waiting to Exhale', 'Father of the Bride Part II', 'Heat']


<u>Notice how the start index is included and the end index isn't. Now try printing the last two elements.</u>

In [13]:
print(movie_ratings[-2:])

['Heat', 4.0]


### Removing

<u>Recall `ind_jumanji` contains the index of the movie title 'Jumanji'.</u>

In [14]:
# remove Jumanji and its rating using .pop()
movie_ratings.pop(ind_jumanji+1) # pop the rating

# remove 'Jumanji'
movie_ratings.pop(ind_jumanji) # pop the rating
print(movie_ratings)

['Toy Story', 4.0, 'Grumpier Old Men', 4.0, 'Waiting to Exhale', 5.0, 'Father of the Bride Part II', 5.0, 'Heat', 4.0]


### List comprehension

<u>We've seen an example of list comprehension in the lecture. Try it yourself: generate the power 3 of `x`, where `x` is **even** numbers from 0 to 20.

In [15]:
x = range(0,21,2)
# power 3 of x
[i**3 for i in x]

[0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832, 8000]

## Tuple

<u>Tuples are immutable. Run the cells below and inspect the results.</u>

In [16]:
titles = ('Toy Story', 
          'Jumanji', 
          'Grumpier Old Men', 
          'Waiting to Exhale')
print(titles[0])
titles[0] = 'Toy Story 2'

Toy Story


TypeError: 'tuple' object does not support item assignment

<u>In the lecture, we have seen the following example using `zip()` function.</u>

In [None]:
titles = ['Toy Story', 
          'Jumanji', 
          'Grumpier Old Men', 
          'Waiting to Exhale']
ratings = [4.0, 4.0, 4.0, 4.0]

# zip() function returns an iterator of tuples
movie_data = zip(titles, ratings)

# iterate over movie_data
for idx, data in enumerate(movie_data, 1): # to start at 1 instead of 0
    mx, rx = (data)
    print('{}: {}, rating: {}'.format(idx, mx, rx))

<u>Given the following genres, print each movie title and its genre by using `titles`, `genres`, and `zip()` function.</u>

In [17]:
genres = ['Animation', 'Adventure', 'Comedy', 'Comedy']

movie_genres = zip(titles, genres)
for idx, data in enumerate(movie_genres, 1): # to start at 1 instead of 0
    mx, gx = (data)
    print('{}: {}, genre: {}'.format(idx, mx, gx))
    
# Note: enumerate adds a counter so we don't have to declare a separate variable and increases its value

1: Toy Story, genre: Animation
2: Jumanji, genre: Adventure
3: Grumpier Old Men, genre: Comedy
4: Waiting to Exhale, genre: Comedy


**ADDITION: example - using .zip() for 3 lists**

In [18]:
movie_ratings_genres = zip(titles, ratings, genres)
for idx, data in enumerate(movie_ratings_genres, 1): # to start at 1 instead of 0
    mx, rx, gx = (data)
    print('{}: {}, rating:{}, genre: {}'.format(idx, mx, rx, gx))

NameError: name 'ratings' is not defined

## Dictionary

### Initializing

<u>Run the following cells and inspect the results</u>

In [19]:
movie_ratings = {'Toy Story':4.0, 
                 'Jumanji':4.0,
                 'Grumpier Old Men':4.0, 
                 'Waiting to Exhale':5.0}

print(movie_ratings)

{'Toy Story': 4.0, 'Jumanji': 4.0, 'Grumpier Old Men': 4.0, 'Waiting to Exhale': 5.0}


### Keys, values

In [20]:
# print out all the keys using .keys() method
print(movie_ratings.keys())

dict_keys(['Toy Story', 'Jumanji', 'Grumpier Old Men', 'Waiting to Exhale'])


In [21]:
## use .values() method to get the rating values
print(movie_ratings.values())

dict_values([4.0, 4.0, 4.0, 5.0])


In [22]:
# print out "Toy Story" rating by first checking if it's indeed in movie_ratings
if ('Toy Story' in movie_ratings.keys()):
    # print out the rating of Toy Story by using movie_ratings[key]
    print(movie_ratings['Toy Story'])

4.0


In [23]:
# check if the movie 'Ace Ventura' is in the dictionary we've just created
# if yes, print the rating, if not, write an appropriate message.
if ('Ace Venture' in movie_ratings.keys()):
    print('Ace Ventura has a rating of {}'.format(movie_ratings['Toy Story']))
else:
    print('No rating yet for Ace Ventura.')

No rating yet for Ace Ventura.


### Using `zip()` to create a dictionary from lists

In [24]:
titles = ['Toy Story', 
          'Jumanji', 
          'Grumpier Old Men', 
          'Waiting to Exhale']
ratings = [4.0, 4.0, 4.0, 5.0]

In [25]:
# Complete the following line
movie_ratings = {key:value for (key,value) in zip(titles, ratings)}
movie_ratings

{'Toy Story': 4.0,
 'Jumanji': 4.0,
 'Grumpier Old Men': 4.0,
 'Waiting to Exhale': 5.0}

**ADDITION**: also works:

In [26]:
dict(zip(titles, ratings))

{'Toy Story': 4.0,
 'Jumanji': 4.0,
 'Grumpier Old Men': 4.0,
 'Waiting to Exhale': 5.0}

### Looping over dictionary using `.items()`

<u>Complete the following to print all the dictionary contents</u>

In [27]:
for key, value in movie_ratings.items():
    print(key + ' has a rating of ' + str(value))

Toy Story has a rating of 4.0
Jumanji has a rating of 4.0
Grumpier Old Men has a rating of 4.0
Waiting to Exhale has a rating of 5.0


<u>`.ìtems()`can also be used to filter a dictionary based on some conditions.

Run the cell below.</u>

In [28]:
[title for title,rating in movie_ratings.items() if rating==5]

['Waiting to Exhale']

### Adding an item

<u>Add by using `movie_ratings[key] = value`, the movie 'Heat' with rating 4.0</u>

In [29]:
movie_ratings['Heat'] = 4.0

movie_ratings

{'Toy Story': 4.0,
 'Jumanji': 4.0,
 'Grumpier Old Men': 4.0,
 'Waiting to Exhale': 5.0,
 'Heat': 4.0}

### Removing an item

<u>Run the cell.</u>

In [30]:
del movie_ratings['Heat']
movie_ratings

{'Toy Story': 4.0,
 'Jumanji': 4.0,
 'Grumpier Old Men': 4.0,
 'Waiting to Exhale': 5.0}

<u> Use `.items()` to remove movies with rating less than 5.</u>

In [31]:
to_remove = [title for title,rating in movie_ratings.items() if rating<5]

for i in to_remove:
    del movie_ratings[i]
    
movie_ratings

{'Waiting to Exhale': 5.0}

### Dictionaries in a dictionary

In [32]:
titles = ['Toy Story', 
          'Jumanji', 
          'Grumpier Old Men', 
          'Waiting to Exhale']
ratings = [4.0, 4.0, 4.0, 5.0]
genres = ['Animation', 'Adventure', 'Comedy', 'Comedy']

<u>Expand the following to include all 4 titles using the information above.</u>

In [33]:
movie_ratings_extended = {'Toy Story':{'rating':4.0, 'genre':'Animation'},
                          'Jumanji': {'rating':4.0, 'genre':'Adventure'},
                          'Grumpier Old Men': {'rating':4.0, 'genre':'Comedy'},
                          'Waiting to Exhale': {'rating':5.0, 'genre':'Comedy'}
                         }

**ADDITION**: also works:

In [34]:
{t:{'rating':r, 'genre':g} for t,r,g in zip(titles, ratings, genres)}

{'Toy Story': {'rating': 4.0, 'genre': 'Animation'},
 'Jumanji': {'rating': 4.0, 'genre': 'Adventure'},
 'Grumpier Old Men': {'rating': 4.0, 'genre': 'Comedy'},
 'Waiting to Exhale': {'rating': 5.0, 'genre': 'Comedy'}}

<u>Like nested list, use double square bracket to get the sub-dictionary value.</u>

In [35]:
# Print out the genre of Jumanji 
# use movie_ratings_extended[key_parent][key_sub]
movie_ratings_extended['Jumanji']['genre']

'Adventure'