# Sets

Here's a quick comparison between these 4 container data types:

| Feature          | List                                  | Dictionary                           | Set                                | Tuple                             |
|------------------|---------------------------------------|--------------------------------------|------------------------------------|-----------------------------------|
| Syntax           | `[item1, item2, ...]`                 | `{'key1': value1, 'key2': value2}`   | `{item1, item2, ...}`              | `(item1, item2, ...)` or `item,`  |
| Order            | Ordered                               | Unordered                            | Unordered                          | Ordered                           |
| Indexing         | Yes (by index)                        | Yes (by key)                         | No                                 | Yes (by index)                    |
| Duplicate Values | Allowed                               | Values can be duplicated, keys cannot| Not allowed                        | Allowed                           |
| Mutability       | Mutable                               | Mutable                              | Mutable                            | Immutable                         |
| Usage            | For a collection of ordered items     | For key-value pairs                  | For unique items                   | For fixed data                    |

## Notes

* For unique items
* Used to store multiple items in a single variable
* It is:
  * unorderd
  * Unchangeable
  * Unindexed
* Set *items* are unchangeable but you can still remove items and add new items

## Importance

Employed for removing duplicates and for set operations, aiding in data cleaning and preparation.

## Examples

First let's create a set of data science skills.

In [1]:
# Define a set of data science skills
job_skills = {'tableau', 'sql', 'python', 'statistics'}

job_skills

{'python', 'sql', 'statistics', 'tableau'}

### Unordered and No Index

Sets are unorderd and have no index so don't behave completely like lists.

Therefore, we can't use an index to access items.

In [2]:
job_skills[1]

TypeError: 'set' object is not subscriptable

### Add()

Add an item using `add()`. Now we're going to add a skill to the list, 'looker'.  

In [3]:
# Adding a skill
job_skills.add('looker')

job_skills

{'looker', 'python', 'sql', 'statistics', 'tableau'}

What if we add a skill that's already in the list like `sql`? It won't duplicate so the skill won't be added. The set is the exact same as before.

In [4]:
# Adding a skill that already exists doesn't duplicate
job_skills.add('sql')

job_skills

{'looker', 'python', 'sql', 'statistics', 'tableau'}

### Remove()

Remove an item in the set using `remove()`. Let's remove the 'statistics' item from this set.

🪲 **Debugging**

**This is an intentional mistake**

This is used to demonstrate debugging.

Error: Tried to remove a skill that does not exist in this set.

```python
job_skills.remove('r')
```

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [5]:
# Removing a skill
job_skills.remove('r')

job_skills

KeyError: 'r'

This is the correct code ✅.

In [6]:
# Removing a skill
job_skills.remove('statistics')

job_skills

{'looker', 'python', 'sql', 'tableau'}

### Set()

Sets are great for removing repetitive values. Let's create a list of skills called `skill_list` but these will have repeating skills. If we use `set()` on this new `skill_list`. It will show the list as a set (aka remove repeated values).

In [7]:
# make a list of skills but some repeated values
skill_list = ['python', 'sql', 'statistics', 'tableau', 'python', 'sql', 'statistics', 'tableau']

set(skill_list)

{'python', 'sql', 'statistics', 'tableau'}

### List()

We can even convert this set back to a list. This will still have the duplicates removed, it will have the unique skills.

In [8]:
skill_list = list(set(skill_list))

skill_list

['python', 'tableau', 'sql', 'statistics']