# Data Types for Data Science in Python


## 1. Fundamental Sequence Data Types

### Manipulating lists for fun and profit
You may be familiar with adding individual data elements to a list by using the .append() method. However, if you want to combine a list with another array type (list, set, tuple), you can use the .extend() method on the list.

You can also use the .index() method to find the position of an item in a list. You can then use that position to remove the item with the .pop() method.

In [1]:
# Create a list containing the names: baby_names
baby_names = ['Ximena', 'Aliza', 'Ayden', 'Calvin']

# Extend baby_names with 'Rowen' and 'Sandeep'
baby_names.extend(['Rowen', 'Sandeep'])

# Find the position of 'Rowen': position
position = baby_names.index('Rowen')

# Remove 'Rowen' from baby_names
baby_names.pop(position)

# Print baby_names
print(baby_names)

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'Sandeep']


### Looping over lists
List comprehensions take the form of [action for item in list] and return a new list.

We can use the sorted() function to sort the data in a list from lowest to highest in the case of numbers and alphabetical order if the list contains strings. The sorted() function returns a new list and does not affect the list you passed into the function. You can learn more about sorted() in the Python documentation.

In [2]:
#create records variable containing list
records = [['2014', 'F', '20799', 'Emma'], 
           ['2014', 'F', '19674', 'Olivia'], 
           ['2014', 'F', '18490', 'Sophia'], 
           ['2014', 'F', '16950', 'Isabella'], 
           ['2014', 'F', '15586', 'Ava'], 
           ['2014', 'F', '13442', 'Mia'], 
           ['2014', 'F', '12562', 'Emily'], 
           ['2014', 'F', '11985', 'Abigail'], 
           ['2014', 'F', '10247', 'Madison'], 
           ['2014', 'F', '10048', 'Charlotte'], 
           ['2014', 'F', '9564', 'Harper'], 
           ['2014', 'F', '9542', 'Sofia'], 
           ['2014', 'F', '9517', 'Avery'], 
           ['2014', 'F', '9492', 'Elizabeth'], 
           ['2014', 'F', '8727', 'Amelia'], 
           ['2014', 'F', '8692', 'Evelyn'], 
           ['2014', 'F', '8489', 'Ella'], 
           ['2014', 'F', '8469', 'Chloe'], 
           ['2014', 'F', '7955', 'Victoria'], 
           ['2014', 'F', '7589', 'Aubrey'], 
           ['2014', 'F', '7554', 'Grace'], 
           ['2014', 'F', '7358', 'Zoey'], 
           ['2014', 'F', '7061', 'Natalie'], 
           ['2014', 'F', '6950', 'Addison'], 
           ['2014', 'F', '6869', 'Lillian'], 
           ['2014', 'F', '6767', 'Brooklyn']]

# Create the list comprehension: baby_names
baby_names = [name[3] for name in records]
    
# Print the sorted baby names in ascending alphabetical order
print(sorted(baby_names))

['Abigail', 'Addison', 'Amelia', 'Aubrey', 'Ava', 'Avery', 'Brooklyn', 'Charlotte', 'Chloe', 'Elizabeth', 'Ella', 'Emily', 'Emma', 'Evelyn', 'Grace', 'Harper', 'Isabella', 'Lillian', 'Madison', 'Mia', 'Natalie', 'Olivia', 'Sofia', 'Sophia', 'Victoria', 'Zoey']


### Data type usage
Which data type would you use if you wanted your data to be immutable and ordered?

A - Tuple i.e. ((a, b ), (c, d))

### Using and unpacking tuples
If you have a tuple like ('chocolate chip cookies', 15) and you want to access each part of the data, you can use an index just like a list. However, you can also "unpack" the tuple into multiple variables such as type, count = ('chocolate chip cookies', 15) that will set type to 'chocolate chip cookies' and count to 15.

Often you'll want to pair up multiple array data types. The zip() function does just that. It will return a list of tuples containing one element from each list passed into zip().

When looping over a list, you can also track your position in the list by using the enumerate() function. The function returns the index of the list item you are currently on in the list and the list item itself

In [3]:
#create variables of lists as boy and girl names
girl_names = ['Jada', 'Emily', 'Ava', 'Serenity', 'Claire', 'Sophia', 'Sarah', 'Ashley', 'Chaya', 'Abigail', 'Zoe', 'Leah', 'Hailey', 'Ava', 'Olivia', 'Emma', 'Chloe', 'Sophia', 'Aaliyah', 'Angela', 'Camila', 'Savannah', 'Serenity', 'Chloe', 'Fatoumata']
boy_names = ['Josiah', 'Ethan', 'David', 'Jayden', 'Mason', 'Ryan', 'Christian', 'Isaiah', 'Jayden', 'Michael', 'Noah', 'Samuel', 'Sebastian', 'Noah', 'Dylan', 'Lucas', 'Joshua', 'Angel', 'Jacob', 'Matthew', 'Josiah', 'Jacob', 'Muhammad', 'Alexander', 'Jason']

# Use the zip() function to pair up girl_names and boy_names into a variable called pairs (list of tuples).
pairs = list(zip(girl_names, boy_names))
print(pairs)

[('Jada', 'Josiah'), ('Emily', 'Ethan'), ('Ava', 'David'), ('Serenity', 'Jayden'), ('Claire', 'Mason'), ('Sophia', 'Ryan'), ('Sarah', 'Christian'), ('Ashley', 'Isaiah'), ('Chaya', 'Jayden'), ('Abigail', 'Michael'), ('Zoe', 'Noah'), ('Leah', 'Samuel'), ('Hailey', 'Sebastian'), ('Ava', 'Noah'), ('Olivia', 'Dylan'), ('Emma', 'Lucas'), ('Chloe', 'Joshua'), ('Sophia', 'Angel'), ('Aaliyah', 'Jacob'), ('Angela', 'Matthew'), ('Camila', 'Josiah'), ('Savannah', 'Jacob'), ('Serenity', 'Muhammad'), ('Chloe', 'Alexander'), ('Fatoumata', 'Jason')]


In [4]:
# Use a for loop to loop through pairs, using enumerate() to keep track of your position. 
# Unpack pairs into the variables rank and pair.
for rank, pair in enumerate(pairs):
    # Unpack pair: girl_name, boy_name
    girl_name, boy_name = pair
    # Print the rank and names associated with each rank
    print(f'Rank {rank+1}: {girl_name} and {boy_name}')

Rank 1: Jada and Josiah
Rank 2: Emily and Ethan
Rank 3: Ava and David
Rank 4: Serenity and Jayden
Rank 5: Claire and Mason
Rank 6: Sophia and Ryan
Rank 7: Sarah and Christian
Rank 8: Ashley and Isaiah
Rank 9: Chaya and Jayden
Rank 10: Abigail and Michael
Rank 11: Zoe and Noah
Rank 12: Leah and Samuel
Rank 13: Hailey and Sebastian
Rank 14: Ava and Noah
Rank 15: Olivia and Dylan
Rank 16: Emma and Lucas
Rank 17: Chloe and Joshua
Rank 18: Sophia and Angel
Rank 19: Aaliyah and Jacob
Rank 20: Angela and Matthew
Rank 21: Camila and Josiah
Rank 22: Savannah and Jacob
Rank 23: Serenity and Muhammad
Rank 24: Chloe and Alexander
Rank 25: Fatoumata and Jason


### Making tuples by accident
Tuples are very powerful and useful, and it's super easy to make one by accident. All you have to do is create a variable and follow the assignment with a comma. This becomes an error when you try to use the variable later expecting it to be a string or a number.

You can verify the data type of a variable with the type() function. In this exercise, you'll see for yourself how easy it is to make a tuple by accident.

In [5]:
# Create a variable named normal and set it equal to 'simple'.
normal = 'simple'

# Create a variable named error and set it equal to 'trailing comma',
error = 'trailing comma',

# Print the types of the variables
print(type(normal))
print(type(error))

<class 'str'>
<class 'tuple'>


### Formatted String Literals ("f" strings)
We've been using plain strings with "" or '' in this class so far, but there are several types of strings and blend variables with them. the most recent addition of a string type to Python is the "f-strings", which is short for formatted string literals. "F-strings" make it easy to mix strings with variables and formatting to help get exactly the output you want and you make them by prefacing the quotes with the letter f like f"". If you want to include a variable within a string you can use the {} around the variable in an f-string to insert the variable's value into the string itself. For example if we had a variable count with the number 12 stored it in, we could make an f-string like f"{count} cookies", which would output the string "12 cookies" when printed. The list top_ten_girl_names contains tuples that correspond to the top_ten_rank and name for each position.



In [6]:
#variable of interest
top_ten_girl_names = [(1, 'Jada'), (2, 'Emily'), (3, 'Ava'), (4, 'Serenity'), (5, 'Claire'), (6, 'Sophia'), (7, 'Sarah'), (8, 'Ashley'), (9, 'Chaya'), (10, 'Abigail')]

# Loop over the top_ten_girl_names list and use tuple unpacking to get the top_ten_rank and name.
for top_ten_rank, name in top_ten_girl_names:
  	# Print out each rank and name like this Rank #: 1 - Jada where the number 1 is the rank and Jada is the name.
    print(f"Rank #: {top_ten_rank} - {name}")

Rank #: 1 - Jada
Rank #: 2 - Emily
Rank #: 3 - Ava
Rank #: 4 - Serenity
Rank #: 5 - Claire
Rank #: 6 - Sophia
Rank #: 7 - Sarah
Rank #: 8 - Ashley
Rank #: 9 - Chaya
Rank #: 10 - Abigail


### Combining multiple strings
F strings work great for a few variables, but what if you want to combine a whole list of variables into a string. You can use the "".join() method for just that. You put what you want to join the list items with inside the "" and the pass the list into the join() method. For example, if you want to join all the items in a list named cookies with a comma and space it would look like ", ".join(cookies).

In [7]:
# Make a string that contains: The top ten boy names are: and store it as preamble.
preamble = "The top ten boy names are: "

# Make a string that contains: , and and store it as conjunction.
conjunction = ', and'

# Make a that combines the first 9 names in boy_names list with a comma and store it as first_nine_names. nd space as first_nine_names
first_nine_names = ", ".join(boy_names[0:9])

# Make an f-string that contains preamble, first_nine_names, conjunction, the final item in boy_names and a period. on, the final item in boy_names and a period
print(f"{preamble}{first_nine_names}{conjunction} {boy_names[-1]}.")

The top ten boy names are: Josiah, Ethan, David, Jayden, Mason, Ryan, Christian, Isaiah, Jayden, and Jason.


### Finding strings in other strings
Many times when we are working with strings, we care about which characters are in the string. For example, we may want to know how many cookies in a list of cookies have the word Chocolate in them, or how many start with the letter C. We can perform these checks by using the in keyword and the .startswith() method on a string. We can also use conditionals on a list comprehension in the form of 

    [action for item in list if something is true]. 

Using our cookies examples, it would be something like 
    
    [cookie_name for cookie_name in cookies 
    if 'chocolate' in cookie_name.lower()]. 

Note these checks are case sensitive so we're using the .lower() method on the string. We can also "chain" methods together by calling them one after the other.

In [8]:
# Store a list of girl_names that start with s: names_with_s
names_with_s = [name for name in girl_names if name.lower().startswith('s')]
print(names_with_s)

# Store a list of girl_names that contain angel: names_with_angel
names_with_angel = [name for name in girl_names if 'angel' in name.lower()]

print(names_with_angel)

['Serenity', 'Sophia', 'Sarah', 'Sophia', 'Savannah', 'Serenity']
['Angela']


## 2. Dictionaries - The Root of Python

### Creating and looping through dictionaries
You start that by creating an empty dictionary and assigning part of your array data as the key and the rest as the value.

Previously, you used sorted() to organize your data in a list. Dictionaries can also be sorted. By default, using sorted() on a dictionary will sort by the keys of the dictionary.

The goal of this exercise is to get familiar with building dictionaries via looping over some data source, and then looping over the dictionary to use that data.

In [9]:
# create squirrels list of tuples variable
squirrels = [('Marcus Garvey Park', ('Black', 'Cinnamon', 'Cleaning', None)), 
             ('Highbridge Park', ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')), 
             ('Madison Square Park', ('Gray', None, 'Foraging', 'Indifferent')), 
             ('City Hall Park', ('Gray', 'Cinnamon', 'Eating', 'Approaches')), 
             ('J. Hood Wright Park', ('Gray', 'White', 'Running', 'Indifferent')), 
             ('Seward Park', ('Gray', 'Cinnamon', 'Eating', 'Indifferent')), 
             ('Union Square Park', ('Gray', 'Black', 'Climbing', None)), 
             ('Tompkins Square Park', ('Gray', 'Gray', 'Lounging', 'Approaches'))]
print(squirrels)
print(type(squirrels))

[('Marcus Garvey Park', ('Black', 'Cinnamon', 'Cleaning', None)), ('Highbridge Park', ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')), ('Madison Square Park', ('Gray', None, 'Foraging', 'Indifferent')), ('City Hall Park', ('Gray', 'Cinnamon', 'Eating', 'Approaches')), ('J. Hood Wright Park', ('Gray', 'White', 'Running', 'Indifferent')), ('Seward Park', ('Gray', 'Cinnamon', 'Eating', 'Indifferent')), ('Union Square Park', ('Gray', 'Black', 'Climbing', None)), ('Tompkins Square Park', ('Gray', 'Gray', 'Lounging', 'Approaches'))]
<class 'list'>


In [10]:
# Create an empty dictionary: squirrels_by_park
squirrels_by_park = {}

# Loop over squirrels, unpacking it into the variables park and squirrel_details.
# for key, values, in dict
for park, squirrel_details in squirrels:
    squirrels_by_park[park] = squirrel_details

print(squirrels_by_park)

{'Marcus Garvey Park': ('Black', 'Cinnamon', 'Cleaning', None), 'Highbridge Park': ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree'), 'Madison Square Park': ('Gray', None, 'Foraging', 'Indifferent'), 'City Hall Park': ('Gray', 'Cinnamon', 'Eating', 'Approaches'), 'J. Hood Wright Park': ('Gray', 'White', 'Running', 'Indifferent'), 'Seward Park': ('Gray', 'Cinnamon', 'Eating', 'Indifferent'), 'Union Square Park': ('Gray', 'Black', 'Climbing', None), 'Tompkins Square Park': ('Gray', 'Gray', 'Lounging', 'Approaches')}


In [11]:
print("Squirrels details tuple: {0}".format(squirrel_details)) #takes the last element Tompkins Square Park details
print("Squirrels by park list: {0}".format(squirrels)) # list
print("Squirrels by park dictionary: {0}".format(squirrels_by_park)) #dict equivalant

Squirrels details tuple: ('Gray', 'Gray', 'Lounging', 'Approaches')
Squirrels by park list: [('Marcus Garvey Park', ('Black', 'Cinnamon', 'Cleaning', None)), ('Highbridge Park', ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')), ('Madison Square Park', ('Gray', None, 'Foraging', 'Indifferent')), ('City Hall Park', ('Gray', 'Cinnamon', 'Eating', 'Approaches')), ('J. Hood Wright Park', ('Gray', 'White', 'Running', 'Indifferent')), ('Seward Park', ('Gray', 'Cinnamon', 'Eating', 'Indifferent')), ('Union Square Park', ('Gray', 'Black', 'Climbing', None)), ('Tompkins Square Park', ('Gray', 'Gray', 'Lounging', 'Approaches'))]
Squirrels by park dictionary: {'Marcus Garvey Park': ('Black', 'Cinnamon', 'Cleaning', None), 'Highbridge Park': ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree'), 'Madison Square Park': ('Gray', None, 'Foraging', 'Indifferent'), 'City Hall Park': ('Gray', 'Cinnamon', 'Eating', 'Approaches'), 'J. Hood Wright Park

In [12]:
# Sort the names_by_rank alphabetically dict by park
for park in sorted(squirrels_by_park):
    # Print each park and it's value in squirrels_by_park
    print(f'{park}: {squirrels_by_park[park]}')

City Hall Park: ('Gray', 'Cinnamon', 'Eating', 'Approaches')
Highbridge Park: ('Gray', 'Cinnamon', 'Running, Eating', 'Runs From, watches us in short tree')
J. Hood Wright Park: ('Gray', 'White', 'Running', 'Indifferent')
Madison Square Park: ('Gray', None, 'Foraging', 'Indifferent')
Marcus Garvey Park: ('Black', 'Cinnamon', 'Cleaning', None)
Seward Park: ('Gray', 'Cinnamon', 'Eating', 'Indifferent')
Tompkins Square Park: ('Gray', 'Gray', 'Lounging', 'Approaches')
Union Square Park: ('Gray', 'Black', 'Climbing', None)


### Safely finding by key
if you attempt to access a key that isn't present in a dictionary, you'll get a KeyError. One option to handle this type of error is to use a try: except: block. 

Python provides a faster, more versatile tool to help with this problem in the form of the .get() method. The .get() method allows you to supply the name of a key, and optionally, what you'd like to have returned if the key is not found.

You'll be using same squirrels_by_park dictionary, which is keyed by the park name and the value is a tuple with the main color, highlights, action, and reaction to humans, and will gain practice using the .get() method.

In [13]:
# Safely print 'Union Square Park' values from the squirrels_by_park dictionary
print(squirrels_by_park.get('Union Square Park'))

# Safely print the type of 'Fort Tryon Park' from the squirrels_by_park dictionary
print(type(squirrels_by_park.get('Fort Tryon Park')))

# Safely print 'Central Park' from the squirrels_by_park dictionary or 'Not Found'
print(squirrels_by_park.get('Central Park', 'Not Found'))

('Gray', 'Black', 'Climbing', None)
<class 'NoneType'>
Not Found


### Adding and extending dictionaries
If you have a dictionary and you want to add data to it, you can simply create a new key and assign the data you desire to it. It's important to remember that if it's a nested dictionary, then all the keys in the data path must exist, and each key in the path must be assigned individually.

You can also use the .update() method to update a dictionary with a list of keys and values from another dictionary, tuples or keyword arguments.

The squirrels_by_park dictionary is already loaded for you, which is keyed by the park name and the value is a tuple with the main color, highlights, action, and reaction to humans.

In [14]:
# create list squirrels_madison and squirrels_union about details of squirrels
squirrels_madison = [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}, 
                     {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, 
                     {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Climbing, Foraging', 'interactions_with_humans': 'Indifferent'}]
squirrels_union = ('Union Square Park', 
                   [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 
                    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None}, 
                    {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, 
                    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactions_with_humans': 'Runs From'}, 
                    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'}, 
                    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Black', 'activities': 'Climbing', 'interactions_with_humans': None}, 
                    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}])


In [15]:
print("Squirrels madison list: {0}".format(squirrels_madison))
print("Squirrels union tuple: {0}".format(squirrels_union))

Squirrels madison list: [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Climbing, Foraging', 'interactions_with_humans': 'Indifferent'}]
Squirrels union tuple: ('Union Square Park', [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None}, {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactio

In [16]:
# redfine squirrels_by_park dictionary
squirrels_by_park = {
    'Union Square Park': 
        [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None},
        {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactions_with_humans': 'Runs From'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Black', 'activities': 'Climbing', 'interactions_with_humans': None},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}],
    'Madison Square Park': 
        [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}, 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Climbing, Foraging', 'interactions_with_humans': 'Indifferent'}]
        }

In [17]:
# Assign squirrels_madison as the value to the 'Madison Square Park' key
squirrels_by_park['Madison Square Park'] = squirrels_madison

# Squirrels maddison is key for values of Madison Square Park
print("Squirrels maddison is key for values of Madison Square Park: {0}".format(squirrels_madison))

# Update the 'Union Square Park' key with the squirrels_union tuple
squirrels_by_park.update([squirrels_union])
print("--------------------")
print(squirrels_by_park.keys())
print("--------------------")

# Loop over the park_name in the squirrels_by_park dictionary 
for park_name in squirrels_by_park:
    # Safely print a list of all primary_fur_colors for squirrels in the park_name
    print(park_name, [squirrel.get('primary_fur_color', 'N/A') for squirrel in squirrels_by_park[park_name]])

Squirrels maddison is key for values of Madison Square Park: [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Climbing, Foraging', 'interactions_with_humans': 'Indifferent'}]
--------------------
dict_keys(['Union Square Park', 'Madison Square Park'])
--------------------
Union Square Park ['Gray', 'Gray', 'Cinnamon', 'Gray', 'Gray', 'Gray', 'Gray']
Madison Square Park ['Gray', 'Gray', 'Gray']


### Popping and deleting from dictionaries
Often, you will want to remove keys and value from a dictionary. You can do so using the del Python instruction. It's important to remember that del will throw a KeyError if the key you are trying to delete does not exist. You can not use it with the .get() method to safely delete items; however, it can be used with try: catch:.

If you want to save that deleted data into another variable for further processing, the .pop() dictionary method will do just that. You can supply a default value for .pop() much like you did for .get() to safely deal with missing keys. It's also typical to use .pop() instead of del since it is a safe method.

In [18]:
# Remove "Madison Square Park" from squirrels_by_park and store it as squirrels_madison.
squirrels_madison = squirrels_by_park.pop('Madison Square Park')

# Safely remove "City Hall Park" from squirrels_by_park with a empty dictionary as the default and store it as squirrels_city_hall. To do this, pass in an empty dictionary {} as a second argument to .pop().
squirrels_city_hall = squirrels_by_park.pop("City Hall Park", {})

# Delete "Union Square Park" from squirrels_by_park
del squirrels_by_park['Union Square Park']

# Print squirrels_by_park
print(squirrels_by_park)

{}


### Working with dictionaries more pythonically
So far, you've worked a lot with the keys of a dictionary to access data, but in Python, the preferred manner for iterating over items in a dictionary is with the .items() method.

This returns each key and value from the dictionary as a tuple, which you can unpack in a for loop. You'll now get practice doing this.

We've loaded a squirrels_by_park dictionary, and the Madison Square Park key contains a list of dictionaries.

In [19]:
# redfine squirrels_by_park dictionary
squirrels_by_park = {
    'Union Square Park': 
        [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Climbing, Eating', 'interactions_with_humans': None},
        {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Running, Digging', 'interactions_with_humans': 'Runs From'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Black', 'activities': 'Climbing', 'interactions_with_humans': None},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}],
    'Madison Square Park': 
        [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Sitting', 'interactions_with_humans': 'Indifferent'}, 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'},
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Climbing, Foraging', 'interactions_with_humans': 'Indifferent'}]
        }

In [20]:
# Iterate over the first squirrel entry in the Madison Square Park list
for field, value in squirrels_by_park["Madison Square Park"][0].items():
   # Print field and value
    print(field, value)
    
print('-' * 13)

# Iterate over the second squirrel entry in the Union Square Park list
for field, value in squirrels_by_park["Union Square Park"][1].items():
    # Print field and value
    print(field, value)

primary_fur_color Gray
highlights_in_fur_color None
activities Sitting
interactions_with_humans Indifferent
-------------
primary_fur_color Gray
highlights_in_fur_color Cinnamon
activities Climbing, Eating
interactions_with_humans None


### Checking dictionaries for data
You can check to see if a key exists in a dictionary by using the in expression.

For example, you can check to see if 'cookies' is a key in the recipes dictionary by using if 'cookies' in recipes: this allows you to safely react to data being present in the dictionary.


In [21]:
# redefine squirrels_by_park
squirrels_by_park = {
    'Tompkins Square Park': 
    [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Approaches'}, 
     {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Climbing (down tree)', 'interactions_with_humans': 'Indifferent'}, 
     {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, 
     {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}], 
    'Union Square Park': 
    [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 
    {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': None}, 
    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 
    {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'}]}

In [24]:
## Check to see if Tompkins Square Park is in squirrels_by_park
if "Tompkins Square Park" in squirrels_by_park:
    print('Found Tompkins Square Park') 
#Check to see if Central Park is in squirrels_by_park
if "Central Park" in squirrels_by_park:
    print('Found Central Park')
else:
    print('Central Park missing')

Found Tompkins Square Park
Central Park missing


### Dealing with nested dictionaries
A dictionary can contain another dictionary as the value of a key, and this is a very common way to deal with repeating data structures such as yearly, monthly or weekly data. All the same rules apply when creating or accessing the dictionary.

For example, if you had a dictionary that had a ranking of my cookie consumption by year and type of cookie. It might look like 

        cookies = {'2017': {'chocolate chip': 483, 'peanut butter': 115}, '2016': {'chocolate chip': 9513, 'peanut butter': 6792}} 

I could access how many chocolate chip cookies I ate in 2016 using 
        
        cookies['2016']['chocolate chip'].

When exploring a new dictionary, it can be helpful to use the .keys() method to get an idea of what data might be available within the dictionary. You can also iterate over a dictionary and it will return each key in the dictionary for you to use inside the loop.

We've loaded a squirrels_by_park dictionary with park names for the keys and a nested dictionary of one squirrels data.

In [31]:
# redefine squirrels_by_park to amke it a nested dictionary
# nested because park is a key and also each activity is a key
squirrels_by_park = {
    'J. Hood Wright Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Running', 'interactions_with_humans': 'Indifferent'}, 
    'Stuyvesant Square Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}, 
    'Highbridge Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'White', 'activities': 'Climbing', 'interactions_with_humans': None}, 
    'Tompkins Square Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': None}, 
    'Union Square Park': 
        {'primary_fur_color': 'Gray', 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 
    'City Hall Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'White', 'activities': 'Eating, Foraging', 'interactions_with_humans': 'Indifferent'}, 
    'Msgr. McGolrick Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Cinnamon', 'activities': 'Running', 'interactions_with_humans': 'Indifferent'}, 
    'John V. Lindsay East River Park': 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Running, Chasing, Eating', 'interactions_with_humans': None}}

In [32]:
# Print a list of keys from the squirrels_by_park dictionary
print(squirrels_by_park.keys())

# Print the keys from the squirrels_by_park dictionary for 'Union Square Park'
print(squirrels_by_park['Union Square Park'].keys())

# Loop over the dictionary
for park_name in squirrels_by_park:
    # Safely print the park_name and the highlights_in_fur_color or 'N/A'
    print(park_name, squirrels_by_park[park_name].get('highlights_in_fur_color', 'N/A'))

dict_keys(['J. Hood Wright Park', 'Stuyvesant Square Park', 'Highbridge Park', 'Tompkins Square Park', 'Union Square Park', 'City Hall Park', 'Msgr. McGolrick Park', 'John V. Lindsay East River Park'])
dict_keys(['primary_fur_color', 'activities', 'interactions_with_humans'])
J. Hood Wright Park Cinnamon
Stuyvesant Square Park Cinnamon
Highbridge Park White
Tompkins Square Park Gray
Union Square Park N/A
City Hall Park White
Msgr. McGolrick Park Cinnamon
John V. Lindsay East River Park Gray


In [26]:
# Print a list of keys from the squirrels_by_park dictionary
print(squirrels_by_park.keys())

dict_keys(['Tompkins Square Park', 'Union Square Park'])


In [30]:
# Print the keys from the squirrels_by_park dictionary for 'Union Square Park'
print(squirrels_by_park['Union Square Park'])

[{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'}]


### Dealing with nested mixed types
Previously, we used the in expression so see if data is in a dictionary such as if 'cookies' in recipes_dict. However, what if we want to find data in a dictionary key that is a list of dictionaries? In that scenario, we can use a for loop to loop over the items in the nested list and operate on them. Additionally, we can leverage list comprehensions to effectively filter nested lists of dictionaries. For example: [cookie for cookie in recipes["cookies"] if "chocolate chip" in cookie["name"]] would return a list of cookies in recipes list that have chocolate chip in the name key of the cookie.

We've loaded a squirrels_by_park dictionary with park names for the keys and a list of dictionaries of the squirrels.



In [46]:
# nexted dictionary with multiple possible values for inner dictionary values

squirrels_by_park = {
    'Tompkins Square Park': 
        [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Approaches'}, 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Climbing (down tree)', 'interactions_with_humans': 'Indifferent'}, 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'interactions_with_humans': 'Indifferent'}, 
        {'primary_fur_color': 'Gray', 'highlights_in_fur_color': 'Gray', 'activities': 'Foraging', 'interactions_with_humans': 'Indifferent'}], 
    'Union Square Park': 
        [{'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 
         {'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': None}, 
         {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Eating, Foraging', 'interactions_with_humans': None}, 
         {'primary_fur_color': 'Gray', 'highlights_in_fur_color': None, 'activities': 'Digging', 'interactions_with_humans': 'Indifferent'}]}

In [48]:
# Use a for loop to iterate over the squirrels in Tompkins Square Park:
for squirrel in squirrels_by_park["Tompkins Square Park"]:
    # Safely print the activities of each squirrel or None
    print(squirrel.get("activities"))

Foraging
Climbing (down tree)
None
Foraging


In [50]:
# Print the list of 'Cinnamon' primary_fur_color squirrels in Union Square Park
print([squirrel for squirrel in squirrels_by_park["Union Square Park"] if "Cinnamon" in squirrel["primary_fur_color"]])

[{'primary_fur_color': 'Cinnamon', 'highlights_in_fur_color': None, 'activities': 'Foraging', 'interactions_with_humans': None}]


## 3. Numeric Data Types, Booleans and Sets

### Choosing when to use integers and floats
Both floats and integers can be interchangeably used in a lot of circumstances. Let's categorize a few good patterns around when to use one vs. the other.

Integer:
 - Exceptionally large values
 - When the result will be an integer
 
 Float
 - When the expected result will be a float
 - Scientific Notation
 - When the approximation is good enough (most programs)
 
 Decimal
 - Exacting precision required
 - Curreny

### Printing floats
Scientific notation is a powerful tool for representing numbers, but it can be confusing to handle when trying to print float values. However, we can use the f strings we learned about previously to make sure we get them printed properly every time by using a format specifier. For example, if we wanted to format a variable in an f string as a float, we can use the f format specifier, such as: print(f"{some_variable:f}"). It also takes an operation precision on the float format specifier, for example, print(f"{some_variable:.4f}") would print four decimal places of precision.

In [51]:
float1 = 0.0001
float2 = 1e-05
float3 = 1e-07

In [52]:
# Print floats 1, 2, and 3
print(float1)
print(float2)
print(float3)

# Print floats 2 and 3 using the f string formatter
print(f"{float1:f}")
print(f"{float2:f}")

# Print float 3 with a 7 f string precision
print(f"{float3:.7f}")

0.0001
1e-05
1e-07
0.000100
0.000010
0.0000001


### Division with integers and floats
Python supports two different division operators: / and //. In Python 3, / will consistently return a float result, and // is floor division and will consistently return an integer result. Floor division is the same as doing math.floor(numerator/divisor), which returns the highest integer less than or equal to the result of the division operation. You can learn more about math.floor in the Python Docs.

In [53]:
import math

In [54]:
# Print the result of 2/1 and 1/2
print(2/1)
print(1/2)

# Print the floored division result of 2//1 and 1//2
# not can even just do 2//1 without math.floor
print(math.floor(2//1))
print(math.floor(1//2))

# Print the type of 2/1 and 2//1
print(type(2/1))
print(type(2//1))

2.0
0.5
2
0
<class 'float'>
<class 'int'>


### More than just true and false
Python has two boolean values available for you to use: True and False. Most commonly these boolean values are used to indicate that something is on or off, yes or no, or similar states. Additionally, many python types return "truthy" or "falsey" values depending on their condition when evaluated using the bool() function.

In [56]:
# Create an empty list
my_list = []

# Check the truthiness of my_list
print(bool(my_list))

# Append the string 'cookies' to my_list
my_list.append('cookies')

# Check the truthiness of my_list
print(bool(my_list))
print(my_list)

False
True
['cookies']


### Comparisons
Booleans and their truthiness are most often used in comparisions, and we use comparisions without even thinking about their underlying data type. To perform a comparision, we can use a comparision operator. Python supports the following comparision operators:

    == equal to
    != not equal to
    > greater than
    < less than
    >= greater than or equal to
    <= less than or equal to 

For this exercise, we'll be using a subset of the Palmer Archipelago (Antarctica) penguin data set, named penguins, as a list of dictionaries with the keys of species, flipper_length, body_mass and sex.

In [57]:
# create penguins which is list of dictionaries
penguins = [{'species': 'Adlie', 'flipper_length': 190.0, 'body_mass': 3050.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 184.0, 'body_mass': 3325.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 209.0, 'body_mass': 4800.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 193.0, 'body_mass': 4200.0, 'sex': 'MALE'}, 
            {'species': 'Gentoo', 'flipper_length': 210.0, 'body_mass': 4400.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 213.0, 'body_mass': 4650.0, 'sex': 'FEMALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 193.0, 'body_mass': 3600.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 193.0, 'body_mass': 3800.0, 'sex': 'MALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 199.0, 'body_mass': 3900.0, 'sex': 'FEMALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 195.0, 'body_mass': 3650.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 185.0, 'body_mass': 3700.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 208.0, 'body_mass': 4575.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 196.0, 'body_mass': 4350.0, 'sex': 'MALE'}, 
            {'species': 'Adlie', 'flipper_length': 191.0, 'body_mass': 3700.0, 'sex': 'FEMALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 195.0, 'body_mass': 3300.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 195.0, 'body_mass': 3450.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 217.0, 'body_mass': 4875.0, 'sex': '.'}, 
            {'species': 'Gentoo', 'flipper_length': 212.0, 'body_mass': 4875.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 205.0, 'body_mass': 4300.0, 'sex': 'MALE'}, 
            {'species': 'Gentoo', 'flipper_length': 220.0, 'body_mass': 6000.0, 'sex': 'MALE'}]
print(type(penguins))

<class 'list'>


In [58]:
# Use a for loop to iterate over the penguins list
for penguin in penguins:
  # Check the penguin entry for a body mass of more than 3300 grams
  if penguin["body_mass"] > 3300:
  	# Print the species and sex of the penguin if true
    print(f"{penguin['species']} - {penguin['sex']}")

Adlie - FEMALE
Gentoo - FEMALE
Adlie - MALE
Gentoo - FEMALE
Gentoo - FEMALE
Chinstrap - FEMALE
Adlie - MALE
Chinstrap - FEMALE
Chinstrap - FEMALE
Adlie - FEMALE
Gentoo - FEMALE
Adlie - MALE
Adlie - FEMALE
Adlie - FEMALE
Gentoo - .
Gentoo - FEMALE
Adlie - MALE
Gentoo - MALE


### Truthy, True, Falsey, and False
While comparisons check for truthiness, something being truthy is not the same as it being True. The inverse of that statement is also true about falsey values and them not being False. So we need to be vigilent when we are checking is something is True or False vs truthy or falsey. In Python, we have the is operator to check if two things are identical. This time we'll be using a penguin details record dictionary which has the same keys as the prior exercise (species, flipper_length, body_mass, sex) with the tracked key that has a boolean value.

In [59]:
# create dictionary penguin_305_details
penguin_305_details = {'species': 'Adlie', 'flipper_length': 190.0, 'body_mass': 3050.0, 'tracked': True, 'sex': 'FEMALE'}

In [60]:
# Check the truthiness of penguin_305_details sex key
if penguin_305_details["sex"]:
	# If true, check if sex is True and store it as sex_is_true
    sex_is_true = penguin_305_details["sex"] is True
    # Print the sex key's value and sex_is_true
    print(f"{penguin_305_details['sex']}: {sex_is_true}")

# Check the truthiness of penguin_305_details tracked key
if penguin_305_details["tracked"]:
	# If true, check if tracked is True and store it as tracked_is_true
    tracked_is_true = penguin_305_details["tracked"] is True
    # Print the tracked key value and tracked_is_true
    print(f"{penguin_305_details['tracked']}: {tracked_is_true}")

FEMALE: False
True: True


### Determining set differences
Another way of comparing sets is to use the difference() method. It returns all the items found in one set but not another. It's important to remember the set you call the method on will be the one from which the items are returned. Unlike tuples, you can add() items to a set. A set will only add items that do not exist in the set.

In [61]:
# create male_penguin_species set
male_penguin_species = {'Adlie', 'Gentoo'}
print(type(male_penguin_species))

# create penguins list
penguins = [{'species': 'Adlie', 'flipper_length': 190.0, 'body_mass': 3050.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 184.0, 'body_mass': 3325.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 209.0, 'body_mass': 4800.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 193.0, 'body_mass': 4200.0, 'sex': 'MALE'}, 
            {'species': 'Gentoo', 'flipper_length': 210.0, 'body_mass': 4400.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 213.0, 'body_mass': 4650.0, 'sex': 'FEMALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 193.0, 'body_mass': 3600.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 193.0, 'body_mass': 3800.0, 'sex': 'MALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 199.0, 'body_mass': 3900.0, 'sex': 'FEMALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 195.0, 'body_mass': 3650.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 185.0, 'body_mass': 3700.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 208.0, 'body_mass': 4575.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 196.0, 'body_mass': 4350.0, 'sex': 'MALE'}, 
            {'species': 'Adlie', 'flipper_length': 191.0, 'body_mass': 3700.0, 'sex': 'FEMALE'}, 
            {'species': 'Chinstrap', 'flipper_length': 195.0, 'body_mass': 3300.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 195.0, 'body_mass': 3450.0, 'sex': 'FEMALE'}, 
            {'species': 'Gentoo', 'flipper_length': 217.0, 'body_mass': 4875.0, 'sex': '.'}, 
            {'species': 'Gentoo', 'flipper_length': 212.0, 'body_mass': 4875.0, 'sex': 'FEMALE'}, 
            {'species': 'Adlie', 'flipper_length': 205.0, 'body_mass': 4300.0, 'sex': 'MALE'}, 
            {'species': 'Gentoo', 'flipper_length': 220.0, 'body_mass': 6000.0, 'sex': 'MALE'}]
print(type(penguins))

<class 'set'>
<class 'list'>


In [62]:
# Use a list comprehension to iterate over each penguin in penguins saved as female_species_list
# If the the sex of the penguin is 'FEMALE', return the species value
female_species_list = [penguin["species"] for penguin in penguins 
                       if penguin["sex"] == 'FEMALE']

# Create a set using the female_species_list as female_penguin_species
female_penguin_species = set(female_species_list)

# Find the difference between female_penguin_species and male_penguin_species. Store the result as differences
differences = female_penguin_species.difference(male_penguin_species)

# Print the differences
print(differences)

{'Chinstrap'}


### Finding all the data and the overlapping data between sets
Sets have several methods to combine, compare, and study them all based on mathematical set theory. The .union() method returns a set of all the elements found in the set you used the method on plus any sets passed as arguments to the method. You can also look for overlapping data in sets by using the .intersection() method on a set and passing another set as an argument. It will return an empty set if nothing matches.

Your job in this exercise is to find the union and intersection in the species from male and female penguins. For this purpose, two sets have been pre-loaded into your workspace: female_penguin_species and male_penguin_species.

In [63]:
# Combine all the species in female_penguin_species and male_penguin_species by computing their union. Store the result as all_species.
all_species = female_penguin_species.union(male_penguin_species)

# Print the count of names in all_species
print(len(all_species))

# Find the intersection: overlapping_species
overlapping_species = female_penguin_species.intersection(male_penguin_species)

# Print the count of species in overlapping_species
print(len(overlapping_species))

3
2


## 4. Advanced Data Types

### Using Counter on lists
Counter is a powerful tool for counting, validating, and learning more about the elements within a dataset that is found in the collections module. You pass an iterable (list, set, tuple) or a dictionary to the Counter. You can also use the Counter object similarly to a dictionary with key/value assignment, for example counter[key] = value.

A common usage for Counter is checking data for consistency prior to using it, so let's do just that.

In [64]:
# penguins list
penguins = [{'Species': 'Gentoo', 'Flipper Length (mm)': 230.0, 'Body Mass (g)': 5500.0, 'Sex': 'MALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 201.0, 'Body Mass (g)': 4300.0, 'Sex': 'MALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 180.0, 'Body Mass (g)': 3800.0, 'Sex': 'MALE'}, {'Species': 'Gentoo', 'Flipper Length (mm)': 229.0, 'Body Mass (g)': 5800.0, 'Sex': 'MALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 210.0, 'Body Mass (g)': 4100.0, 'Sex': 'MALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 200.0, 'Body Mass (g)': 3975.0, 'Sex': 'MALE'}, {'Species': 'Gentoo', 'Flipper Length (mm)': 225.0, 'Body Mass (g)': 5400.0, 'Sex': 'MALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 210.0, 'Body Mass (g)': 4800.0, 'Sex': 'MALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 193.0, 'Body Mass (g)': 3800.0, 'Sex': 'FEMALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 176.0, 'Body Mass (g)': 3450.0, 'Sex': 'FEMALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 210.0, 'Body Mass (g)': 3950.0, 'Sex': 'MALE'}, {'Species': 'Gentoo', 'Flipper Length (mm)': 219.0, 'Body Mass (g)': 5250.0, 'Sex': 'MALE'}, {'Species': 'Gentoo', 'Flipper Length (mm)': 210.0, 'Body Mass (g)': 4300.0, 'Sex': 'FEMALE'}, {'Species': 'Gentoo', 'Flipper Length (mm)': 216.0, 'Body Mass (g)': 4925.0, 'Sex': 'MALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 187.0, 'Body Mass (g)': 3550.0, 'Sex': 'FEMALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 192.0, 'Body Mass (g)': 3950.0, 'Sex': 'MALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 193.0, 'Body Mass (g)': 3800.0, 'Sex': 'MALE'}, {'Species': 'Chinstrap', 'Flipper Length (mm)': 201.0, 'Body Mass (g)': 4050.0, 'Sex': 'MALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 190.0, 'Body Mass (g)': 3650.0, 'Sex': 'MALE'}, {'Species': 'Adlie', 'Flipper Length (mm)': 181.0, 'Body Mass (g)': 3175.0, 'Sex': 'FEMALE'}]
print(type(penguins))

# Import the Counter object
from collections import Counter

# Create a Counter of the penguins sex using a list comp
penguins_sex_counts = Counter(penguin['Sex'] for penguin in penguins 
                              if penguin['Sex'] in ('FEMALE', 'MALE'))
# Print the penguins_sex_counts
print(penguins_sex_counts)

<class 'list'>
Counter({'MALE': 15, 'FEMALE': 5})


### Finding most common elements
Another powerful usage of Counter is finding the most common elements in a list. This can be done with the .most_common() method.

In [65]:
# Import the Counter object
from collections import Counter

# Create a Counter of the penguins list called penguins_species_counts; use a list comprehension to return the Species of each penguin to the Counter.
penguins_species_counts = Counter(penguin['Species'] for penguin in penguins)

# Find the 3 most common species counts
print(penguins_species_counts.most_common(3))

[('Chinstrap', 7), ('Adlie', 7), ('Gentoo', 6)]


### Creating dictionaries of an unknown structure
Occasionally, you'll need a structure to hold nested data, and you may not be certain that the keys will all actually exist. This can be an issue if you're trying to append items to a list for that key. You might remember the NYC data that we explored in the video. In order to solve the problem with a regular dictionary, you'll need to test that the key exists in the dictionary, and if not, add it with an empty list.

You'll be working with a list of entries that contains species, flipper length, body mass, and sex of the female penguins in our study. You're going to solve this same type of problem with a much easier solution in the next exercise.

In [66]:
# weight_log list
weight_log = [('Chinstrap', 'FEMALE', 3800.0), 
              ('Adlie', 'FEMALE', 3450.0), 
              ('Gentoo', 'FEMALE', 4300.0), 
              ('Adlie', 'FEMALE', 3550.0), 
              ('Adlie', 'FEMALE', 3175.0)]

In [67]:
# Create an empty dictionary: female_penguin_weights
female_penguin_weights = {}

# Iterate over the weight_log entries
for species, sex, body_mass in weight_log:
    # Check to see if species is already in the dictionary
    if species not in female_penguin_weights:
        # Create an empty list for any missing species
        female_penguin_weights[species] = []
    # Append the sex and body_mass as a tuple to the species keys list
    female_penguin_weights[species].append([sex, body_mass])
    
# Print the weights for 'Adlie'
print(female_penguin_weights['Adlie'])

[['FEMALE', 3450.0], ['FEMALE', 3550.0], ['FEMALE', 3175.0]]


### Safely appending to a key's value list
Often when working with dictionaries, you will need to initialize a data type before you can use it. A prime example of this is a list, which has to be initialized on each key before you can append to that list.

A defaultdict allows you to define what each uninitialized key will contain. When establishing a defaultdict, you pass it the type you want it to be, such as a list, tuple, set, int, string, dictionary or any other valid type object.

In [68]:
# Import defaultdict
from collections import defaultdict

# Create a defaultdict with a default type of list: male_penguin_weights
male_penguin_weights = defaultdict(list)

# Iterate over the weight_log entries
for species, sex, body_mass in weight_log:
    # Use the species as the key, and append the body_mass to it
    male_penguin_weights[species].append(body_mass)
    
print(male_penguin_weights)
print('--'*50)
# Print dictionary tiems
print(male_penguin_weights.items())
print('--'*50)
# Print the first 2 items of the male_penguin_weights dictionary
# Use the .items() method on male_penguin_weights to access its items, and then convert it into a list using list(). Be sure to use list slicing to select only the first 2 items.
print(list(male_penguin_weights.items())[:2])

defaultdict(<class 'list'>, {'Chinstrap': [3800.0], 'Adlie': [3450.0, 3550.0, 3175.0], 'Gentoo': [4300.0]})
----------------------------------------------------------------------------------------------------
dict_items([('Chinstrap', [3800.0]), ('Adlie', [3450.0, 3550.0, 3175.0]), ('Gentoo', [4300.0])])
----------------------------------------------------------------------------------------------------
[('Chinstrap', [3800.0]), ('Adlie', [3450.0, 3550.0, 3175.0])]


### Creating namedtuples for storing data
Often times when working with data, you will use a dictionary just so you can use key names to make reading the code and accessing the data easier to understand. Python has another container called a namedtuple that is a tuple, but has names for each position of the tuple. You create one by passing a name for the tuple type and a list of field names.

For example, Cookie = namedtuple("Cookie", ['name', 'quantity']) will create a container, and you can create new ones of the type using Cookie('chocolate chip', 1) where you can access the name using the name attribute, and then get the quantity using the quantity attribute.

In [69]:
# Import namedtuple from collections
from collections import namedtuple

# create weight_log
weight_log = [('Gentoo', 'MALE', 5500.0), ('Chinstrap', 'MALE', 4300.0), ('Adlie', 'MALE', 3800.0), ('Gentoo', 'MALE', 5800.0), ('Chinstrap', 'MALE', 4100.0), ('Adlie', 'MALE', 3975.0), ('Gentoo', 'MALE', 5400.0), ('Chinstrap', 'MALE', 4800.0), ('Chinstrap', 'FEMALE', 3800.0), ('Adlie', 'FEMALE', 3450.0), ('Chinstrap', 'MALE', 3950.0), ('Gentoo', 'MALE', 5250.0), ('Gentoo', 'FEMALE', 4300.0), ('Gentoo', 'MALE', 4925.0), ('Adlie', 'FEMALE', 3550.0), ('Adlie', 'MALE', 3950.0), ('Chinstrap', 'MALE', 3800.0), ('Chinstrap', 'MALE', 4050.0), ('Adlie', 'MALE', 3650.0), ('Adlie', 'FEMALE', 3175.0)]

# Create a namedtuple called SpeciesDetails with a type name of SpeciesDetails and fields of 'species', 'sex', and 'body_mass'.
SpeciesDetails = namedtuple('SpeciesDetails', ['species', 'sex', 'body_mass'])

# Create the empty list: labeled_entries
labeled_entries = []

# Iterate over the weight_log list, unpacking it into species, sex, and body_mass, and create a new SpeciesDetails namedtuple instance for each entry and append it to labeled_entries.
for species, sex, body_mass in weight_log:
    # Append a new SpeciesDetails namedtuple instance for each entry to labeled_entries
    labeled_entries.append(SpeciesDetails(species, sex, body_mass))
    
print(labeled_entries[:5])

[SpeciesDetails(species='Gentoo', sex='MALE', body_mass=5500.0), SpeciesDetails(species='Chinstrap', sex='MALE', body_mass=4300.0), SpeciesDetails(species='Adlie', sex='MALE', body_mass=3800.0), SpeciesDetails(species='Gentoo', sex='MALE', body_mass=5800.0), SpeciesDetails(species='Chinstrap', sex='MALE', body_mass=4100.0)]


### Leveraging attributes on namedtuples
Once you have a namedtuple, you can write more expressive code that is easier to understand. Remember, you can access the elements in the tuple by their name as an attribute. For example, you can access the species of the namedtuples in the previous exercise using the .species attribute.

In [70]:
# Iterate over the first twenty entries in labeled_entries
for entry in labeled_entries[:20]:
    # if the entry's species is Chinstrap
    if entry.species == 'Chinstrap':
      # Print each entry's sex and body_mass seperated by a colon
      print(f'{entry.sex}:{entry.body_mass}')

MALE:4300.0
MALE:4100.0
MALE:4800.0
FEMALE:3800.0
MALE:3950.0
MALE:3800.0
MALE:4050.0


### Creating a dataclass
Dataclasses can provide even richer ways of storing and working with data. Previously we used a namedtuple on weight log entries to make a nice easy to use data structure. In this code, we're going to use a dataclass to do the same thing, but add a custom property to return the body mass to flipper length ratio. Dataclasses start with a collection of fields and their types. Then you define any properties, which are functions on the dataclass that operate on itself to return additional information about the data. For example, a person dataclass might have a property that calculates someone's current age based on their birthday and the current date.

In [71]:
# Import dataclass
from dataclasses import dataclass

@dataclass
class WeightEntry:
    # Add the species (string), sex (string), body_mass (int), and flipper_length (int) fields to the dataclass.
    species: str
    flipper_length: int
    body_mass: int
    sex: str
        
    # Define a property that returns the body_mass / flipper_length
    @property
    def mass_to_flipper_length_ratio(self):
        return self.body_mass / self.flipper_length

### Using dataclasses
Let's put our WeightEntry dataclass we created in the prior exercise to use. We'll create an instance of the WeightEntry for each entry in the weightlog and then use the masstoflipperlength_ratio property we added to perform the calculation. Here is a reminder of our WeightEntry dataclass.

In [72]:
@dataclass
class WeightEntry:
    # Define the fields on the class
    species: str
    flipper_length: int
    body_mass: int
    sex: str

    @property
    def mass_to_flipper_length_ratio(self):
        return self.body_mass / self.flipper_length

In [74]:
# Create the empty list: labeled_entries
labeled_entries = []

# Iterate over the weight_log entries
for species, flipper_length, body_mass, sex in weight_log:
    # Append a new WeightEntry instance to labeled_entries
    labeled_entries.append(WeightEntry(species, flipper_length, body_mass, sex))
    
# Print a list of the first 5 mass_to_flipper_length_ratio values
print([entry.mass_to_flipper_length_ratio for entry in labeled_entries[:5]])

ValueError: not enough values to unpack (expected 4, got 3)