---

### 🎓 **Professor**: Apostolos Filippas

### 📘 **Class**: Web Analytics

### 📋 **Topic**: Sets, Assignment by value & reference (self-study)

🚫 **Note**: You are not allowed to share the contents of this notebook with anyone outside this class without written permission by the professor.

---

# ⚪⚫ 1. Sets </font></center>


Another useful datatype that is the "set" data type.


Sets are an **unordered** collection of **unique** items. Let's see how they work through examples.

In [None]:
#first let's construct an empty set
my_set = set()
my_set

Our set is empty, so let's start adding items to it

In [None]:
my_set.add(0)
my_set

In [None]:
my_set.add(1)
my_set.add(2)
my_set

In [None]:
# now let's add a duplicate
my_set.add(2)

# notice that the duplicate is not added - sets contain unique values
my_set

In [None]:
# `update` lets you add multiple elements at once
my_set.update([11,12,13])


In [None]:
# `remove` lets you remove an element
my_set.remove(11)

In [None]:
#  but will throw an error if the element is not in the set
my_set.remove(11)

In [None]:
# `discard` lets you remove an element without throwing an error if it's not in the set
my_set.discard(9)

### Casting

We can also change lists to sets and back by a process that is called casting. (We saw casting before in the context of dictionaries keys and values). Let's see how this works.

In [None]:
my_list = [1,2,3,4,0,2,3,4,5,6,1,2,3]
my_list

In [None]:
#casting the list to a set - notice how that will eliminate additional occurences of any value
set(my_list)

### Other set operations

Going back to our math education, we can define two common operations between two sets A and B
- **union** the union of A and B is a set that contains any element contained in either A or B
- **intersection** the interection of A and B is a set that contains elements both in A and B

Python sets have the union and interection methods. Let's see an example

In [None]:
A = {0,1,3,5,9,10}
B = {0,2,4,6,8,10}

In [None]:
# `union` returns all elements in either set
A.union(B)

In [None]:
# therefore it does not matter which set we call the method on
B.union(A)

In [None]:
# `intersection` returns only elements in both sets
A.intersection(B)

In [None]:
# again, it does not matter which set we call the method on
B.intersection(A)

In [None]:
# `difference` returns a set containing all the elements of the first set that are not in the second set.
A.difference(B)

In [None]:
# here, the order of the sets matters
B.difference(A)

### Sets VS Lists

While both sets and lists are used to store collections of items, there are key differences:

1. Uniqueness: As already discussed, sets can't have duplicate items, while lists can.
2. Order: Lists are ordered collections, meaning items have a specific order in which they appear. Sets are unordered.
3. Mutability: Lists are mutable (i.e., you can change their content after creation). Sets are mutable too, but the items contained in sets must be of an immutable type (like strings, numbers, and tuples).

### Example

Imagine you're a web analyst for an e-commerce company. You want to understand the behavior of users visiting your website. Here's how sets can be beneficial:

Scenario: You've just had a big sale on your website. You've collected data on visitors for two days - the day of the sale and the day after. You want to know:

How many unique visitors came to your website each day?
How many visitors came on both days?
How many visitors only came on the sale day and not the day after?
Assuming you have a list of user IDs (or some identifier) for visitors each day:

In [None]:
# Day of the sale
sale_day_visitors = ["user1", "user2", "user3", "user4", "user2", "user5"]

# Day after the sale
post_sale_day_visitors = ["user4", "user6", "user7", "user8", "user1"]

# Converting lists to sets to get unique visitors
sale_day_set = set(sale_day_visitors)
post_sale_day_set = set(post_sale_day_visitors)


In [None]:
# 1. Number of unique visitors each day
print(f"Unique visitors on sale day: {len(sale_day_set)}")
print(f"Unique visitors on the day after the sale: {len(post_sale_day_set)}")


In [None]:
# 2. Visitors on Both Days (Intersection):
both_days = sale_day_set.intersection(post_sale_day_set)
print(f"Visitors on both days: {both_days}")


In [None]:
# 3. Visitors only on the sale day (Difference):
only_sale_day = sale_day_set.difference(post_sale_day_set)
print(f"Visitors only on the sale day: {only_sale_day}")


--- 
# 🔗 ✂️ 2. Assignment by reference and value

"In Python, understanding how variables reference data can be crucial to prevent unintended side effects. The behavior of variables, especially when assigned or copied, can be surprising if you don't understand how Python's memory management works.

- When using the = operator, Python performs an **assignment by reference**:
- Assignment by reference means that the contents to which a variable refers to **are not copied**.
- Instead, the variable remembers only the address of those contents in the memory of your computer.

Let's see an example

In [None]:
my_dict = {'cat':'die Katze', 'dog':'der Hund'}

# assign my_dict to the variable second_dict by reference
second_dict = my_dict

In [None]:
#both refer to the same content
print( my_dict == second_dict )
print(my_dict)
print(second_dict)

The `is` operator checks if two variables refer to the exact same object in memory, not just if they are equal.

In [None]:
#but each variable refers to the same place in memory!
my_dict is second_dict

In [None]:
# when you change the contents, then they change for every referece (variable) to this place in memory
my_dict['mice'] = 'die Mäuse'
print(my_dict)
print(second_dict)

In [None]:
del my_dict['mice']
print(my_dict)
print(second_dict)

The variables my_dict and second_dict refer to the same place in memory!
- when you change the dictionary by using any one of them, then the changes are reflected in both variables
- this can be good, but also incovnenient: you might want to story a copy of the dictionary at second_dict, and make changes that are not reflected in the original copy!

If you don't want this to happen, you have to perform an **assignment by value**
- that is, make a copy of the original memory contents to the new variable
- this can be done by specifying that you want a new copy

In [None]:
second_dict = dict(my_dict)
print( my_dict == second_dict )
print( my_dict is second_dict )

The same thing holds for lists, but instead of dict(my_dict) you would have to call list(my_list)

Now see what happens if you make changes:

In [None]:
my_dict['the butterfly'] = 'der Schmetterling'

# changes are now only reflected in one of the two variables
# since the two variables refer to **different** copies of the dictionary (different places in memory)
print(my_dict)
print(second_dict)

## Another example

Let's create a dictionary as follows

In [2]:
x = [5,6,7]

my_dict = {
    'a': 1,
    'b': 2,
    'c': x,
    'd': x
}

print(my_dict)

{'a': 1, 'b': 2, 'c': [5, 6, 7], 'd': [5, 6, 7]}


Now let's change the variable `x`

In [3]:
x.append(8)
print(my_dict)

{'a': 1, 'b': 2, 'c': [5, 6, 7, 8], 'd': [5, 6, 7, 8]}


Note that you didn't just change the list that x refers to, but also the values associated with keys `c` and `d`. 
The reason is that Python does assignment by reference! 

### Always be aware of whether you're working with a reference or a copy of your data. When in doubt, make an explicit copy