<h1 align=center><font size = 5>Sets and Dictionaries</font></h1>


## Table of Contents


<div class="alert alert-block alert-info" style="margin-top: 20px">
<li><a href="#ref0">About the Dataset</a></li>
<li><a href="#ref1">Sets</a></li>
<li><a href="#ref2">Dictionaries</a></li>
</div>

<hr>

<a id="ref0"></a>
<center><h2>About the Dataset</h2></center>

Imagine you got a number album recommendations from your friends and compiled all of the recommendations in a table, with specific info about each album.

The table has one row for each album and several columns

- **artist** - Name of the artist
- **album** - Name of the album
- **released_year** - Year the album was released
- **length_min_sec** - Length of the album (hours,minutes,seconds)
- **genre** - Genre of the album
- **music_recording_sales_millions** - Music recording sales (millions in USD)
- **claimed_sales_millions** - Album's claimed sales (millions in USD)
- **date_released** - Date on which the album was released
- **soundtrack** - Indicates if the album is the movie soundtrack (Y) or (N)
- **rating_of_friends** - Indicates the rating from your friends from 1 to 10
<br>
<br>

The same dataset been used in earlier topic and can be seen below:

In [1]:
import pandas as pd
bigmart = pd.read_csv("dataset/music_dataset.csv")
bigmart.head()

Unnamed: 0,artist,album,released_year,length_min_sec,genre,music_recording_sales_millions,claimed_sales_millions,date_released,soundtrack,rating_of_friends
0,Michael Jackson,Thriller,1982,42:19:00,"Pop, rock, R&B",46.0,65,30/11/82,N,10.0
1,AC/DC,Back in Black,1980,42:11:00,Hard rock,26.1,50,25/07/80,N,8.5
2,Pink Floyd,The Dark Side of the Moon,1973,42:49:00,Prigressive rock,24.2,45,01/03/73,N,9.5
3,Whtney Houston,The Bodyguard,1992,57:44:00,"R&B, soul, pop",27.4,44,17/11/92,Y,7.5
4,Meat Loaf,Bat Out of Hell,1977,46:33:00,"Hard rock, progressive rock",20.6,43,21/10/77,N,7.0


<hr>

<a id="ref1"></a>
<center><h2>Sets</h2></center>

Lets take a look at sets in Python. A set is a unique sequenced collection of objects in Python. You can  denote a set with a curly bracket “{}”.Python will remove duplicate items: 


In [1]:
{"pop", "rock", "soul", "hard rock", "rock", "R&B", "rock", "disco"}


{'R&B', 'disco', 'hard rock', 'pop', 'rock', 'soul'}

The process of mapping is illustrated in figure 1:


<a ><img src = "https://ibm.box.com/shared/static/qk4m2ebiv726wlgh47xh8c17b0qcyt6r.png" width = 1100, align = "center"></a>
  <h4 align=center> Figure 1: Mapping a Set so it does not have any duplicate items 


  </h4> 

 You can also  create a set from a list as follows"

In [2]:
album_list = ["Michael Jackson", "Thriller", 1982, "00:42:19", \
              "Pop, Rock, R&B", 46.0, 65, "30-Nov-82", None, 10.0]

album_set = set(album_list)             
album_set

{'00:42:19',
 10.0,
 1982,
 '30-Nov-82',
 46.0,
 65,
 'Michael Jackson',
 None,
 'Pop, Rock, R&B',
 'Thriller'}

Now let's create a set of the genres:

In [3]:
music_genres = set(["pop", "pop", "rock", "folk rock", "hard rock", "soul", \
                    "progressive rock", "soft rock", "R&B", "disco"])
music_genres

{'R&B',
 'disco',
 'folk rock',
 'hard rock',
 'pop',
 'progressive rock',
 'rock',
 'soft rock',
 'soul'}

Notice that the duplicates are removed and the output is sorted.

### Working with sets

Remember that with sets you can check the difference between sets, as well as the symmetric difference, intersection and union:

In [4]:
album_set1 = set(["Thriller","Back in Black", "AC/DC"] )
album_set2 = set([ "AC/DC","Back in Black", "The Dark Side of the Moon"] )

 <a ><img src = "https://ibm.box.com/shared/static/bl6ijga6g8r7bdfkl17qw7zh62czte47.png" width = 850, align = "center"></a>
  <h4 align=center> Figure 2: Visualizing the sets as two circles 
 
  </h4> 

In [5]:
album_set1, album_set2

({'AC/DC', 'Back in Black', 'Thriller'},
 {'AC/DC', 'Back in Black', 'The Dark Side of the Moon'})

 As both sets contain 'AC/DC' and 'Back in Black' we represent these common elements with the intersection of two circles.    


 <a ><img src = "https://ibm.box.com/shared/static/7ttuf8otui4s6axm23csmb4s3pxz16y2.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 3: Visualizing common elements with the intersection of two circles.
 
  </h4> 

We can find all the elments that are in “album_set1”  that are not in  “album_set2” by by applying "difference" mehtod  as follows:

In [6]:
album_set1.difference(album_set2)  

{'Thriller'}

We only consider elements in “album_set1” all the elements in “album_set2” including the intersection are not included.


 <a ><img src = "https://ibm.box.com/shared/static/osmxw1qnb5t9odon2cx94wxhfzlkn1n8.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 4: The difference of “album_set1” and   “album_set2
 
  </h4> 

Difference between album_set2 and album_set1 

In [7]:
album_set2.difference(album_set1)  

{'The Dark Side of the Moon'}

<a ><img src = "https://ibm.box.com/shared/static/klgc09bgpsjudr9v3wtl8yk9s2lya3hl.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 5: The difference of “album_set2” and   “album_set1
 
  </h4> 

We can find the elements that are in both sets i.e in both “album_list2” and “album_list1” as follows using the intersection command :



In [8]:
album_set1.intersection(album_set2)   

{'AC/DC', 'Back in Black'}

 The corresponds the intersection of the two circles :

 <a ><img src = "https://ibm.box.com/shared/static/s2xfytq43twp6jsvbvr4o2fir7wdablo.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 6:  intersection of set
 
  </h4> 

In [9]:
album_set1.union(album_set2)

{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}

 The union corresponds to all the elements in both sets, this is represented by coloring  both circles 


 <a ><img src = "https://ibm.box.com/shared/static/vkczce5jh50g0oh53xn0ilgriflcrog0.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 7:  union of set
 
  </h4> 

And you can check if a set if a superset or subset of another set, respectively, like this:

In [10]:
set(album_set1).issuperset(album_set2)   

False

In [11]:
set(album_set2).issubset(album_set1)     

False

<a id="ref7"></a>
<h2 align=center> Dictionaries  in Python  </h2>


 A dictionary consists of keys and values. It is helpful to compare a Dictionary to a List. Instead of the numerical indexes like a list, dictionarys have keys. These keys are labels that are used to access values within a dictionary.

<a ><img src = "https://ibm.box.com/shared/static/6tyznuwydogmtuv73o8l5g7xsb8o92h2.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 8: Compare a Dictionary to a list. Instead of the numerical indexes like a list, dictionary have keys.
 
  </h4> 


 Each key is separated from its value by a colon (`:`). The items are separated by commas, and the whole thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly braces, like this: {}.

In [12]:
release_year_dict = {"Thriller":"1982", "Back in Black":"1980", \
                    "The Dark Side of the Moon":"1973", "The Bodyguard":"1992", \
                    "Bat Out of Hell":"1977", "Their Greatest Hits (1971-1975)":"1976", \
                    "Saturday Night Fever":"1977", "Rumours":"1977"}
release_year_dict

{'Thriller': '1982',
 'Back in Black': '1980',
 'The Dark Side of the Moon': '1973',
 'The Bodyguard': '1992',
 'Bat Out of Hell': '1977',
 'Their Greatest Hits (1971-1975)': '1976',
 'Saturday Night Fever': '1977',
 'Rumours': '1977'}

What we just did was create a dictionary in Python. Like a list, a dictionary holds a sequence of elements. Each element is represented by a key and its corresponding value. Dictionaries are created with two curly braces containing keys and values separated by a colon. For every key, there can only be a single value. Multiple keys however can hold the same value. Keys can only be strings, numbers, or tuples, but values can be any data type.

 It is helpful to visualize the dictionary as a table, as in figure 9. The first column represents the keys, the second column represents the values.


 <a ><img src = "https://ibm.box.com/shared/static/i45fppou18c3t0fuf2ikks48tod7chbl.png" width = 650, align = "center"></a>
  <h4 align=center> Figure 9: Table representing a Dictionary  

  </h4> 


Now, you can retrieve the values based on the names:

In [13]:
release_year_dict['Thriller'] 

'1982'

This corresponds to the value thriller as shown in figure 10 


 <a ><img src = "https://ibm.box.com/shared/static/glbwz23cgjjxqi7rjxn7me5i16gan7h7.png" width = 500, align = "center"></a>
  <h4 align=center> 
figure 10 : Table used to represent accessing the value for "Thriller"   

  </h4> 


Similarly for The below example :     


In [14]:
release_year_dict['The Bodyguard'] 

'1992'

 <a ><img src = "https://ibm.box.com/shared/static/6t7bu8jusckaskukwq1k0a3im5ltcpsn.png  " width = 500, align = "center"></a>
  <h4 align=center> 
figure 11 : Accessing the value for the 'The Bodyguard'

  </h4> 




Now let us retrieve the keys of the dictionary release_year_dict.

In [15]:
release_year_dict.keys() 

dict_keys(['Thriller', 'Back in Black', 'The Dark Side of the Moon', 'The Bodyguard', 'Bat Out of Hell', 'Their Greatest Hits (1971-1975)', 'Saturday Night Fever', 'Rumours'])

 And you can retrieve the values using **`values()`**

In [16]:
release_year_dict.values() 

dict_values(['1982', '1980', '1973', '1992', '1977', '1976', '1977', '1977'])