## 8.5 Bags

A **bag** is an unordered collection of items, possibly with duplicates.
The notation is as for sets, e.g. {2, 3, 2, 5} is a bag of size 4
with two copies of 2.

<div class="alert alert-info">
<strong>Info:</strong> Bags are also called multisets.
</div>

### 8.5.1 The bag ADT

The bag ADT has the same operations as sets, but most work slightly differently:

- the add operation always adds one more copy, even if the item is already in the bag
- the remove operation removes a single copy of the item
- the intersection of two bags has the smaller number of copies of each item
- the union of two bags has the larger number of copies of each item
- the difference of two bags has the difference of the number of copies
  of each item, or zero copies if the difference is negative.

Bags have one extra operation over sets:
the **multiplicity** of an item is the number of times it occurs in a bag.
For example, in {'y', 'e', 's', 's', 's', '!'}, 'y' has multiplicity 1,
's' has multiplicity 3 and 'Y' has multiplicity 0.
An item is a member of a bag if and only if its multiplicity isn't zero.

In bag operations, each copy is matched separately. For example,
two bags are equal if each copy in one of them matches a copy in the other,
i.e. if each item occurs with the same multiplicity in both bags.
A bag _A_ is included in bag _B_, written _A_ $\subseteq$ _B_,
if each copy in _A_ matches a copy in _B_ but
not necessarily the other way around, i.e.
the multiplicity of each item is the same or higher in _B_.
Like for sets, _A_ $\subset$ _B_ if _A_ $\subseteq$ _B_ and _A_ ≠ _B_.

<div class="alert alert-info">
<strong>Info:</strong> The terms 'subbag' and 'submultiset' are not often used,
and even less so with the qualifier 'proper'.
</div>

Consider the bags _A_ = {1, 2, 3, 3} and _B_ = {1, 2, 2, 2, 3}.
We have _A_ ≠ _B_, with neither _A_ $\subset$ _B_ nor _A_ $\subset$ _B_,
because _A_ has more 3s but _B_ has more 2s.

We have _A_ $\cap$ _B_ = {1, 2, 3}
because although 2 and 3 occur multiple times in one of the bags, only one copy
matches the other bag and is in the intersection.

We have _A_ $\cup$ _B_ =  {1, 2, 2, 2, 3, 3} because copies that match
are added only once to the union. The union doesn't have four 2s, just three:
one copy that is in both bags and two additional copies from the second bag.

Likewise, {1, 2, 3, 3} − {1, 2, 2, 2, 3} = {3} because each copy in the first bag matches a copy in the second bag, except the second 3.

#### Exercise 8.5.1

What is {1, 2, 2, 2, 3} − {1, 2, 3, 3}?

_Write your answer here._

[Answer](../32_Answers/Answers_08_5_01.ipynb)

#### Exercise 8.5.2

How would you explain what a set is to someone who only knows what a bag is?

_Write your answer here._

[Answer](../32_Answers/Answers_08_5_02.ipynb)

### 8.5.2 Implementing bags

The simplest – but not most efficient – way of implementing bags is to put all
members, i.e. including duplicates, in a sequence.
Adding  an item takes constant time by appending it to the sequence, but
removing an item or computing its multiplicity
takes linear time in the size of the bag.
Computing the intersection, union or difference is even more complicated.

A better approach is to see a bag as a map of unique items (the map's keys)
to their multiplicities (the map's values),
so any map data type can form the basis for a bag data type.
For example, a bag where items are integers from a fixed range
can be implemented with a lookup table.
A more flexible bag can be implemented with a hash table.

#### Exercise 8.5.3

The following exercises ask you to explain how to implement some bag operations
using maps. List at least five bags you would use to test bag operations.
Include edge and non-edge cases.

_Write your answer here._

[Hint](../31_Hints/Hints_08_5_03.ipynb)
[Answer](../32_Answers/Answers_08_5_03.ipynb)

#### Exercise 8.5.4

Describe an algorithm that, given a bag as
a map of items to their multiplicities, computes the size of the bag.

_Write your answer here._

[Answer](../32_Answers/Answers_08_5_04.ipynb)

#### Exercise 8.5.5

Describe an algorithm that determines if _left_ $\subseteq$ _right_, where
_left_ and _right_ are maps of items to multiplicities.

_Write your answer here._

[Hint](../31_Hints/Hints_08_5_05.ipynb)
[Answer](../32_Answers/Answers_08_5_05.ipynb)

#### Exercise 8.5.6

Describe an algorithm that computes _left_ $\cap$ _right_, where
_left_ and _right_ are maps of items to multiplicities.

_Write your answer here._

[Hint](../31_Hints/Hints_08_5_06.ipynb)
[Answer](../32_Answers/Answers_08_5_06.ipynb)

#### Exercise 8.5.7 (optional)

Write and test a class that implements the bag ADT using a Python dictionary.
Although this exercise is optional, it's good practice for you to write
classes from scratch. I recommend you start by implementing the
add, remove, size, inclusion and intersection operations.
Only implement further operations if you have the time.

If you have a study buddy, you may divide up the work:
you implement operation X and write the tests for operation Y,
while your buddy writes the tests for operation X and implements operation Y.
You may wish to use a platform like [repl.it](https://repl.it)
to work together on the same code file.
I encourage you to share your class and tests in the forum and
constructively critique your peers' solutions.

I suggest you write the `__init__` method as follows to ease the
creation of bags for tests.
```py
def __init__(self, items: object):
    """Create a new bag with the given items.

    Preconditions: items is an iterable collection of hashable objects
    """
    self.members = dict()
    for item in items:
        self.add(item)
```
You will have to implement the `add` method.
Assuming your class is called `Bag`, here are some ways of creating a bag:
`Bag([])` (empty bag), `Bag([1,2,3,1])` (bag of integers),
`Bag('1231')` (bag of characters), `Bag({1,2,3})` (bag that is a set).

⟵ [Previous section](08_4_set.ipynb) | [Up](08-introduction.ipynb) | [Next section](08_6_counter.ipynb) ⟶