# Lesson 2: Data Structures - Part 3

## Sets


A **set** is an unordered collection of unique elements. Sets are useful when you need to eliminate duplicate values or check for membership.

### Creating Sets
Sets are created using curly braces `{}` or the `set()` function:

```python
#create an empty set
my_set = set()

#create a set with initial values
my_set = {1, 2, 3, 4, 5}

#create a set with initial values of difference data types
my_set = {1, "Hello", 3.14, True}

#create a set from other objects, e.g, lists, tuples, range of numbers using the constructor set()
my_set = set([1,2,3]) # ...from a list
my_set = set((1,2,3)) # ...from a tuple
my_set = set(range(10)) # ...from a range of numbers
```

### Set Operations
- **Union**: Combines elements from both sets (`set1 | set2`).
- **Intersection**: Elements common to both sets (`set1 & set2`).
- **Difference**: Elements in `set1` but not in `set2` (`set1 - set2`).
- **Symmetric Difference**: Elements in either `set1` or `set2` but not both (`set1 ^ set2`).

### Membership Testing
Use the `in` keyword to check if an item exists in a set:

```python
if 2 in my_set:
print("2 is in the set")
```
            

In [1]:
# Creating and using sets
numbers = {1, 2, 3, 4, 5, 5, 6}  # Duplicates will be removed
print("Set of numbers:", numbers)

Set of numbers: {1, 2, 3, 4, 5, 6}


In [None]:
# Set operations
evens = {2, 4, 6, 8}
odds = {1, 3, 5, 7, 9}

# ...

In [None]:
# Membership testing
if 3 in numbers:
    print("3 is in the set")

## Exercises: Sets

**Write a program that removes duplicates from a list of integers using a set**

In [2]:
numbers = [1, 2, 2, 3, 4, 4, 5, 5, 6]
# ...


**Implement a simple word frequency counter using a set and a dictionary. The program should count the occurrence of each unique word in a given string**

In [4]:
sentence = "this is a test this is only a test"
words = sentence.split()
word_count = {}

for word in words:
   if word in word_count:
      # ...
   else:
      # ...
print("Word frequencies:", word_count)

Word frequencies: {'this': 2, 'is': 2, 'a': 2, 'test': 2, 'only': 1}


## Working with Mixed Data Structures


In real-world applications, you often need to combine different data structures to solve complex problems.

### Example: Combining Lists, Dictionaries, and Sets
- **Lists** can hold dictionaries, allowing you to create a list of records.
- **Dictionaries** can have lists or sets as values, enabling you to manage collections of data.
            

In [6]:
# Example: List of dictionaries
students = [
    {"name": "Alice", "grades": [85, 90, 78]},
    {"name": "Bob", "grades": [92, 88, 84]},
    {"name": "Charlie", "grades": [70, 75, 80]}
]

# Calculate average grade for each student
for student in students:
    # ...
    

Alice's average grade: 84.33333333333333
Bob's average grade: 88.0
Charlie's average grade: 75.0


In [7]:
# Example: Dictionary of sets
course_enrollment = {
    "Math": {"Alice", "Bob"},
    "Science": {"Alice", "Charlie"},
    "Art": {"Bob", "Charlie"}
}

# Find students enrolled in both Math and Science
# ...


Students in both Math and Science: {'Alice'}


## Exercises: Mixed Data Structures

**Create a dictionary where keys are student names and values are sets of enrolled courses. Implement a function to add a course to a student's set of courses**

In [8]:
student_courses = {
"Alice": {"Math", "Science"},
"Bob": {"Art"},
"Charlie": {"Math"}
}

def add_course(student, course):
   if student in student_courses:
      # ...
      
   else:
      # ...
      

# Add a new course
# ...

Updated courses for Alice: {'Art', 'Science', 'Math'}


**Write a program that takes a list of student dictionaries (each containing name and grades) and outputs the names of students who have an average grade above 80**

In [3]:
students = [
{"name": "Alice", "grades": [85, 90, 78]},
{"name": "Bob", "grades": [92, 88, 84]},
{"name": "Charlie", "grades": [70, 75, 80]}
]

for student in students:
   # ...
  

SyntaxError: unexpected EOF while parsing (3275097275.py, line 9)

**Calculate frequency of occurence per base for a list of strings using dictionaries**

In [11]:
def my_count(dna_string, base):
    count_bases = 0 # counter
    for character in dna_string: # it iterates on every single character of the dna variable (string)
        if character == base:
            count_bases += 1 # i.e., count_bases = count_bases + 1
    return count_bases

my_sequences = [
"CCTGGCGGCCATGGCGAACCGGAACCACCCGATCCCATCTCGAACTCGGAAGTGAAACGGTTCAGCGCCGATGA-TAGTGTG--GGGCCTCCCCATGTGAAAGTAGGTCACTGCCAGGC",
"-CAGGTGGTGATGGCGGAAAGGTCACACCCGAACACATCCCGAACTCGGAAGTTAAGCTTTCCAGCGCCGATG-GTAGTTGG--GGGTTTCCCCCTGCGAGAGTAGGACGTTGCCGGGC",
"AACGGCGGTCATAGCGGTGGGGAAACGCCCGGTCCCATCCCGAACCCGGAAGCTAAGCCCACCAGCGCCGATG-GTACTGCACTC-GTGAGGGTGTGGGAGAGTAGGACGCCGCCGGAC",
"GCTGGCGACCATAGCAAGAGTGAACCACCTGATCCCTTCCCGAACTCAGAAGTGAAACCTCTTCGCGCTGATG-GTAGTGNGG-GT--TA-CCCATGTGAGAGTAAGTCATCGCCAGCT",
"GTCGGTGGTCATTGCGGAGGGGGAACGCCCGGTCCCATCCCGAACCCGGAAGCTAAGCCCTCCAGCGCCGATG-GTACTGCACTC-GCCAGGGTGTGGGAGAGTAGGTCGCCGCCGACA"]

dict = {} # an empty dictionary
dict['A'] = 0
dict['C'] = 0
dict['G'] = 0
dict['T'] = 0

#i = 0
for line in my_sequences:
    # ...
    
    
print(dict)

{'A': 124, 'C': 170, 'G': 188, 'T': 96}
