In [15]:
# dependencies
import numpy as np
import pandas as pd

## Features
What are the cases for choosing a `set` over a `list` or a `dict`? What functionality do we get with sets that we don't with others and how is it useful?

In the simplest case, let's say we are iterating through some data and we want to capture all unique values we find that meet a particular condition.
  
### [time complexity](https://wiki.python.org/moin/TimeComplexity)
If we use a `list`, we will see the lowest overhead possible to append a new item, `O(1)` average and worst case, but we will have to deduplicate the list contents after building the collection.

If we use a `dict`, we will make great time with appending new items with an average of `O(1)`, but in order to protect against duplicate keys, we estimate worst case time at `O(n)` to append a new item.

If we use a `set`, we observe the same time complexity for adding a new item to the collection, but we get to save a little bit of memory by only storing keys instead of key-value pairs. In addition, `set` type objects come with statistical operations that can be used to examine relational features of the data.
- `l.isdisjoint(r)`
- `l.issubset(r)`
- `l.issuperset(r)`
- `l.union(r)`
- `l.intersection(r)`
- `l.difference(r)`
- `l.symmetric_difference(r)`

In [28]:
time = {
    'list': ['O(1)', 'O(1)'],
    'dict': ['O(1)', 'O(n)'],
    'set': ['O(1)', 'O(n)']
}
time_df = pd.DataFrame.from_dict(time, orient='index', columns=['average case', 'worst case'])
time_df

Unnamed: 0,average case,worst case
list,O(1),O(1)
dict,O(1),O(n)
set,O(1),O(n)


## Problem with ambiguous declaration
- Cause: Partial assignment at initialization

Because you don't have to use the name of an object to declare it in python, and sets and dicts both use curly brackets, `{}`, to denote themselves, python makes an inference at runtime as to which type is intended, based on the initialization.

- Empty initialization: `info = {}`
    - Inferred to be `dict` type to leave the most room upfront
    
- Partial initialization: `info = {'name', 'address', 'phone'}`
    - Inferred to be `set` type, since it was given keys but no values
    
- Complete initialization: `info = {'name'=None, 'address'=None, 'phone'=np.nan}`
    - Inferred to be `dict`, since it has both keys and values

Let's say we intend to collect some data that meets some criteria, and we have an idea of the keys we want to save. So, we fill in the keys in our dictionary notation and then proceed to filling values.

In [4]:
info = {'name', 'address', 'phone'}
info['name'] = 'Kenny'

TypeError: 'set' object does not support item assignment

#### Solution
Okay, so we can't just throw keys into curly brackets and expect python to know we want a dictionary. But we still don't have the values at that stage, so we find the appropriate placeholder to denote a value that's missing _for now_.

In [12]:
info = {'name':None, 'address':None, 'phone':np.nan}
info['name'] = 'Kenny'
info

{'name': 'Kenny', 'address': None, 'phone': nan}

**Comments about solution**
You might wonder if a single placeholder is enough to signal to the compiler that you want a dictionary, and the answer is yes and no.
- Yes: If you only insert one key, then you only need one placeholder.
- No: For every key you add, you need a placeholder value, or else you get a `SyntaxError`.

In [8]:
info = {'name':None, 'address', 'phone'}

SyntaxError: invalid syntax (<ipython-input-8-cf8bf1468a8c>, line 1)

**More comments**
If you're wondering why we used two different placeholder values, check out `language-tips/python/missingness.md` in the training-docs repo, or read the "Handling Missing Data" topic in _Python for Data Science Handbook_.