In [None]:
files_uploaded = ["data1.csv", "report.xlsx", "summary.docx",
                  "data1.csv", "report.xlsx", "new_file.txt"]

seen = set()
dupes = set(file for file in files_uploaded if file in seen or not seen.add(file))

print(dupes)


In [None]:
files_uploaded = ["data1.csv", "report.xlsx", "summary.docx",
                  "data1.csv", "report.xlsx", "new_file.txt"]

seen = set()
dupes = set()

for file in files_uploaded:
    if file in seen:
        dupes.add(file)  # If it's already in seen, it's a duplicate
    else:
        seen.add(file)  # Otherwise, add it to seen

print(dupes)


## 📝 Example 1: Checking if a User Exists in a System
Imagine you have a huge list of registered users, and you need to check if a new user ID is already in the system.

✅ Without Sets (Slow)

In [3]:
users_list = [101, 202, 303, 404, 505, 606]  # Imagine this list is huge
user_id = 303

if user_id in users_list:  # O(n) lookup time
    print("User exists")
else:
    print("User not found")


User exists


**🔹 Problem: If users_list is big, searching will take time.**

In [None]:
user_list = set(users_list)
if user_id in users_list:
    print(f'user exists')
else:
    print("user not found")

user exists


### 1️⃣ Detecting Plagiarism (Finding Common Words in Documents)
You have two text documents, and you want to check how many common words appear in both.

In [7]:
doc1 = """Python is great for data science and machine learning"""
doc2 = """Machine learning and AI are transforming data science"""

words1 = set(doc1.lower().split())  
words2 = set(doc2.lower().split())

# Find words that appear in both documents
print(words1 & words2)

{'learning', 'data', 'machine', 'and', 'science'}


## 2️⃣ Finding Employees Who Haven't Submitted Reports
You have two sets:

One contains all employees.
* The other contains employees who submitted reports.
* You need to find out who hasn't submitted their report.


In [8]:
all_employees = {"Alice", "Bob", "Charlie", "David", "Eve"}
submitted_reports = {"Alice", "David", "Eve"}

# Find employees who DID NOT submit reports
print(all_employees - submitted_reports)

{'Bob', 'Charlie'}


## 3️⃣ Identifying Fraudulent Transactions
You have a list of transactions, but some accounts are blacklisted for fraud.
- Filter out any transactions from blacklisted accounts.

In [12]:
transactions = [
    ("user1", 250), ("user2", 400), ("user3", 500),
    ("user4", 100), ("user5", 750), ("user3", 600)
]
blacklisted_accounts = {"user3", "user5"}

# Create a new list without blacklisted transactions
filtered_transactions = []

for user, amount in transactions:
    if user not in blacklisted_accounts:
        filtered_transactions.append((user, amount))

print(filtered_transactions)


[('user1', 250), ('user2', 400), ('user4', 100)]


## Tuple Unpacking in a List
* What is Tuple Unpacking?
* When you have a list of tuples, each tuple contains multiple values.
* You can unpack those values directly in a for loop instead of accessing them by index.

✅ Basic Example: Unpacking Tuples in a List

In [18]:
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]

# Instead of doing this:
for pair in pairs:
    print(pair[0], pair[1])  # Accessing elements by index

# Do this:
for num, letter in pairs:
    print(num, letter, end=":")  # Unpacking the tuple directly


1 a
2 b
3 c
1 a:2 b:3 c:

## 🛑 Common Mistake: Forgetting to Unpack the Tuple
### ❌ Wrong:
 

In [31]:
transactions = [("user1", 250), ("user2", 400)]
for transaction in transactions:
    if transaction[0] == "user1":  # Accessing manually
        print("Found transaction!")


Found transaction!


### ✅ Right (Unpack It!)

In [21]:
transactions = [("user1", 250), ("user2", 400)]
for user, amount in transactions:  # UNPACK THE TUPLE
    if user == "user1":
        print("Found transaction!")


Found transaction!


**💡 Why? Python lets you unpack the tuple automatically, so you can use user and amount directly.**

## Using `enumerate()` Correctly
* What Does `enumerate()` Do?
    - `enumerate()` is used to get the index + value at the same time when **looping through a list**.

✅ Basic Example:

In [22]:
names = ["Alice", "Bob", "Charlie"]

for index, name in enumerate(names):
    print(f"Index {index} -> {name}")


Index 0 -> Alice
Index 1 -> Bob
Index 2 -> Charlie


**🔹 `enumerate()` gives you a *tuple* `(index, value)`, which you unpack in the loop.**

## 🛑 Common Mistake: Misusing enumerate() with Tuples
If you already have a list of tuples, enumerate() adds an extra index before the tuple, which can confuse you.

### ❌ Wrong: Using enumerate() on a List of Tuples Incorrectly

In [1]:
transactions = [("user1", 250), ("user2", 400)]

for index, user, amount in enumerate(transactions):  # ❌ This won't work!
    print(index, user, amount)


ValueError: not enough values to unpack (expected 3, got 2)

## ✅ Right Way: Unpacking enumerate() Correctly

In [2]:
transactions = [("user1", 250), ("user2", 400)]

for index, (user, amount) in enumerate(transactions):  # ✅ Unpack the tuple properly
    print(f"{index}: {user} -> ${amount}")


0: user1 -> $250
1: user2 -> $400


### 💡 Rule of Thumb:

* If working with a regular list, use enumerate() normally.
* If working with a list of tuples, unpack the tuple inside the enumerate() loop.

## Practice Problems to Drill This Concept
Try these to lock it in:

### 🧩 Problem 1: Unpack Tuples from a List


In [13]:
products = [("Laptop", 1000), ("Phone", 500), ("Tablet", 750)]
# Print each product name and its price
for item, price in products:
    print(f'{item} costs {price}')

Laptop costs 1000
Phone costs 500
Tablet costs 750


### 🧩 Problem 2: Use `enumerate()` Correctly

In [None]:
students = [("Alice", 90), ("Bob", 85), ("Charlie", 88)]
# Print each student's position (index) in the list and their grade
for index, (student, grade) in enumerate(students):
    print(f'Student # {index}: {student} scored {grade}')

### 🧩 Problem 3: Find the First Duplicate in a List

In [29]:
numbers = [4, 5, 6, 7, 5, 8, 9, 6]
# Use a set and unpacking to find the first duplicate
seen = set()

for number in numbers:
    if number in seen:
        print(number)
        break
    else:
        seen.add(number)
        
        

5


### Finding Unused API Keys
You have a set of all API keys and another set of used API keys.

Find which API keys have **never been used**.

In [32]:
all_keys = {"key1", "key2", "key3", "key4", "key5"}
used_keys = {"key1", "key3", "key5"}

# Find API keys that were never used
print(all_keys - used_keys)

{'key4', 'key2'}


### 🔧 Problem 1: Detecting Missing Records in a Data Pipeline
Your system processes data every hour, but some expected records are missing from the output.

In [33]:
expected_records = {
    "2024-03-01T01:00:00", "2024-03-01T02:00:00", "2024-03-01T03:00:00",
    "2024-03-01T04:00:00", "2024-03-01T05:00:00"
}
actual_records = {
    "2024-03-01T01:00:00", "2024-03-01T03:00:00", "2024-03-01T05:00:00"
}

print(expected_records | actual_records)

{'2024-03-01T05:00:00', '2024-03-01T03:00:00', '2024-03-01T04:00:00', '2024-03-01T02:00:00', '2024-03-01T01:00:00'}


### 🔧 Problem 2: Identifying Frequent API Callers
Your logs track API calls made by users. You want to find users who made more than 2 API calls.

✅ Task: Use a dictionary to count the number of API calls per user and find users with more than 2 calls.

In [None]:
api_logs = [
    ("user1", "/login"), ("user2", "/data"), ("user2", "/data"),
    ("user3", "/data"), ("user2", "/login"), ("user1", "/logout"),
    ("user3", "/logout"), ("user2", "/data"), ("user3", "/data")
]

# Step 1: Count API calls per user
api_counts = {}

# Make the frequency dictionary using `get()`
for user, _ in api_logs:
    api_counts[user] = api_counts.get(user, 0) + 1  # O(1) update

# Step 2: Find users with more than 3 API calls
frequent_users = {user for user, count in api_counts.items() if count > 3}

print(frequent_users)  # Expected Output: {'user1', 'user2', 'user3'}

    

{'user2'}


## What Does `.get()` Do?
The `.get()` method is used to retrieve a value from a dictionary.

* If the key exists → Returns the value. ✅
* If the key doesn't exist → Returns the default value (instead of throwing an error). ✅

`dictionary.get(key, default_value)
`

### Basic Example: Using `.get()` to Fetch Values

In [None]:
user_ages = {"Alice": 25, "Bob": 30, "Charlie": 28}

print(user_ages.get("Alice"))
print(user_ages.get("Eve"))    # ✅ None (Key doesn’t exist, default is None)
print(user_ages.get("Eve", "Unknown"))  # ✅ "Unknown" (Custom default value)


25
None
Unknown


#### Why is this Useful?
* Using user_ages["Eve"] would throw a KeyError if "Eve" isn’t in the dictionary.
* get() avoids this error and lets you handle missing keys gracefully.

## Lets look at the `get()` function
You're given a list of API calls made by users. 
* Use get() to count how many times each user made an API call.

### Using `.get()` for Counting (Why It’s Useful in above problem)
When counting things (like API calls), .get() makes it easy to initialize missing keys.

In [54]:
api_counts = {}

# Without .get() - More complicated
for user in ["Alice", "Bob", "Alice", "Charlie", "Alice"]:
    if user in api_counts:
        api_counts[user] += 1
    else:
        api_counts[user] = 1

print(api_counts)  # {'Alice': 3, 'Bob': 1, 'Charlie': 1}


{'Alice': 3, 'Bob': 1, 'Charlie': 1}


**✅ With .get() - Cleaner & Faster:**

In [None]:
api_counts = {}

for user in ["Alice", "Bob", "Alice", "Charlie", "Alice"]:
    api_counts[user] = api_counts.get(user, 0) + 1  # 🔥 No if-else needed!

print(api_counts) 


{'Alice': 3, 'Bob': 1, 'Charlie': 1}


**Why?**

* If user already exists, .get(user, 0) returns its count and increments it.
* If user doesn’t exist, .get(user, 0) returns 0, so we start counting from 1.

### Avoiding KeyErrors with .get()
Imagine a dictionary storing configuration settings.

❌ Without .get(), you risk errors:


In [56]:
config = {"theme": "dark", "language": "English"}

print(config["theme"])  # ✅ "dark"
print(config["font_size"])  # ❌ KeyError because "font_size" isn’t in the dictionary


dark


KeyError: 'font_size'

### ✅ Using .get() Prevents Crashes:

In [57]:
print(config.get("font_size", 12))  # ✅ 12 (default value)


12


### Using .get() for Nested Dictionaries
Example: User Data in a Dictionary

In [59]:
user_profiles = {
    "Alice": {"age": 25, "location": "NY"},
    "Bob": {"age": 30}
}

# Without .get() (Risky)
print(user_profiles["Alice"]["location"])  # ✅ "NY"
# print(user_profiles["Bob"]["location"])  # ❌ KeyError

# With .get() (Safe)
print(user_profiles.get("Bob", {}).get("location", "Unknown"))  # ✅ "Unknown"


NY
Unknown


**Why?**

* `.get("Bob", {})` → Returns `{}` if "Bob" doesn’t exist, preventing an error.
* `.get("location", "Unknown")` → Returns "Unknown" if "location" is missing.


### `.get()` vs. DefaultDict (Alternative Approach)
* **If you're always using .get() to initialize missing keys, consider collections.defaultdict instead.**

In [60]:
from collections import defaultdict

api_counts = defaultdict(int)  # Default value for missing keys is 0

for user in ["Alice", "Bob", "Alice", "Charlie", "Alice"]:
    api_counts[user] += 1  # No need for .get()

print(api_counts)  # {'Alice': 3, 'Bob': 1, 'Charlie': 1}


defaultdict(<class 'int'>, {'Alice': 3, 'Bob': 1, 'Charlie': 1})


#### Difference?

* `dict.get()` → Manual way to handle missing keys.
* defaultdict → Auto-creates missing keys with a default value.


In [None]:
api_calls = ["Alice", "Bob", "Alice", "Charlie", "Alice", "Bob", "Charlie", "Charlie"]

# Count occurrences using .get()
api_counts = {}

for user in api_calls:
    api_counts = api_calls.get()

print(api_counts)
