## Python Tuples

In Python, a tuple is an immutable, ordered collection of elements.
It is similar to a list, but once created, its elements cannot be changed, added, or removed.

**Key Features of Tuples**

**Ordered** – Elements have a fixed order and can be accessed by index.

**Immutable** – You cannot modify a tuple after creation.

**Heterogeneous** – Can store elements of different data types.

**Allow duplicates** – Multiple identical values are allowed.

**Faster than lists** – Because they are immutable, they are generally more memory-efficient and faster.

In [None]:
my_tuple = (10, 30, 20, 10)
print(my_tuple)         #Ordered, allowed duplicate
print(my_tuple[2])      #Indexed

(10, 30, 20, 10)
20


## Python Sets
A set in Python is a collection of unique, unordered elements. Sets are defined using curly braces {} or the set() function.

**Unordered:** The elements in a set do not have a specific order.

**Unique Elements:** Duplicate elements are not allowed.

**Immutable Elements:** Elements must be of immutable types like strings, numbers, or tuples.

Python sets store data in hash tables with open addressing. They store pointers to objects in buckets determined by hash values, allowing fast membership tests but with extra memory usage for speed.

In [None]:
my_set = {10, 30, 20, 10}
print(my_set)       #Unordered  #Unique
print(my_set[0])    # Not Indexed

{10, 20, 30}


TypeError: 'set' object is not subscriptable

In [None]:
my_set = {10, 30, 20, 10}
my_set.remove(20)           # Mutable/Changeable
print(my_set)

{10, 30}


**Set Methods**

**add():** inserts the item somewhere in the set, but only if it is new

**update():** merges another group of values (iteraple) into the set

we can use math operators as quick shortcuts : | & - ^

**remove():** removes an item from the set but throws an error if the value is missing

**discard():** removes the item if it exists and does nothing if it doesn't exist

In [None]:
a = {10, 30, 20, 40}
a.add(50)           #adds only single value at a time
print(a)

{40, 10, 50, 20, 30}


In [None]:
a = {10, 30, 20, 10}
a.update({1, 2})        #adds several values at once
# OR
a |= {1, 2}             # | works like update, used as shortcut
print(a)

{1, 2, 20, 10, 30}


In [None]:
# Removing a value - remove(), discard()
a = {10, 30, 20, 10}
a.remove(10)
# OR
a.discard(2)        #does nothing if the value is missing
#a.pop() method can be used but it removes a random value, so it is not recommened to use
print(a)

{20, 30}


**Math operators in Sets** - retun a new set and leave the original untouched

**union():** combine all unique items from both sets

**intersection():** returns only the shared/common items

**difference():** returns items present in one set but not in another set

**symmertic_difference():** returns only none-shared items, means all common items will be excluded

In [None]:
a = {10, 20, 30, 40}
b = {30, 40, 50, 60}

print('Union:', a.union(b))                     # OR    print(a | b)
print('Intersection:', a.intersection(b))       # OR    print(a & b)
print('Difference (in A but not in B):', a.difference(b))       # OR    print(a - b)
print('Difference (in B but not in A):', b.difference(a))       # OR    print(b - a)
print('Symmertric_Difference:', a.symmetric_difference(b))


Union: {40, 10, 50, 20, 60, 30}
Intersection: {40, 30}
Difference (in A but not in B): {10, 20}
Difference (in B but not in A): {50, 60}
Symmertric_Difference: {10, 50, 20, 60}


**Relationship methods in sets**

**issubset():** returns true if ALL items in the set exist in the other

**issuperset():** returns true when it includes ALL items of the other set

**isdisjoint():** returns true if both sets share no items (No Overlapping)


**Use case of subset and superset:** Imagine you have a master table that stores all customer IDs. Every customer in the business must have an entry in this master table—no exceptions. At times, you may also maintain another table that contains customer information, but this secondary table is not the master. Instead, it represents a subset of customers, such as VIP users.

The rule here is that any customer appearing in the subset table must also exist in the master table. If a customer is found in the subset but not in the master, it indicates a flaw in the system. To ensure data integrity and perform quality checks, you can use operations like is subset or is superset to verify that the master table fully encompasses all customers and remains clean and consistent.

In [20]:
a = {30, 40}
b = {30, 40, 50, 60}

print('Subset:', a.issubset(b))
print('Superset:', b.issuperset(a))
print('Disjoint:', a.isdisjoint(b))

Subset: True
Superset: True
Disjoint: False


## Dictionary in Python

A dictionary in Python is a collection of key-value pairs, where each key is unique and maps to a specific value. This data structure is highly efficient for data retrieval and manipulation, making it a fundamental tool in Python programming.

**Key Characteristics:** maintains orders, keys must be unique, values can be duplicated, values can be accessed using their keys but not indexes, dict is mutable

In [None]:
my_dict = {
    'a' : 10,
    'b' : 20,
    'c' : 20,
    'a' : 40
}

print(my_dict)          #Ordered #Keys are unique   #Values can be duplicated
print(my_dict['b'])     #Not indexed(keyed)
my_dict['c'] = 80       #Mutable/Changeable
print(my_dict) 

{'a': 40, 'b': 20, 'c': 20}
20
{'a': 40, 'b': 20, 'c': 80}


**Methods of Dictionary**

**get():** returns the value safely, gives None if the value is missing. Missing key returns None or any given value.

**in operator:** checks if the key is inside the dict.

**view objects:** gives us a live view of dict's keys, values or key value pair - keys(), values(), items().

**keys():** returns all the keys from the dictionary

**values():** returns all the values from the dictionary

**items():** returns all the (key,value) pairs of the dictionary. Perfect when we need key and value together for looping, transforming data, building new dicts, comapring and more.

To add, use = operator; To update, use = operator to update single value or use update() to modify multiple values; 

**pop():** To remove value, use pop() and always specify the key in pop()

**popitem():** returns and deletes the most recent key value pair from the dictionary

**dict.fromkeys():** builds a new dictionary where all keys get the same default value.


In [29]:
user = {'id': 1, 'age': 30, 'city': 'Amsterderm'}

#Access
#print(user['name']) #will throw error as 'name' is missing in the dict
print('Name is -', user.get('name'))            #OR
print('Name is -', user.get('name', 'Unknown'))

print('Age is -',user.get('age'))



Name is - None
Name is - Unknown
Age is - 30


In [30]:
user = {'id': 1, 'age': 30, 'city': 'Amsterderm'}

#Checks
print('age' in user)
print('name' in user)
print('name' not in user)

True
False
True


In [31]:
user = {'id': 1, 'age': 30, 'city': 'Amsterderm'}

#View objects
print(user.keys())
print(user.values())
print(user.items())

dict_keys(['id', 'age', 'city'])
dict_values([1, 30, 'Amsterderm'])
dict_items([('id', 1), ('age', 30), ('city', 'Amsterderm')])


In [None]:
user = {'id': 1, 'age': 30, 'city': 'Amsterderm'}

#Looping
for u in user:
    print(u, user[u])       #not recommended

for key, value in user.items():     # modern way(recommended), more readable
    print(key, value)

id 1
age 30
city Amsterderm
id 1
age 30
city Amsterderm


In [40]:
user = {'id': 1, 'age': 30, 'city': 'Amsterderm'}

#Add, Remove, Update
user['name'] = 'John'   #add
user['city'] = 'Paris'  #Update single value
print(user)

user.update({'age': 45, 'id': 2})   #update multiple values
print(user)

Age = user.pop('age')       #removes a key from the dict and returns its value
print(user)
print('Removed Value:', Age)

#if the key is not found, python throws a key error. To avoid this, specify a default value for missing keys
Age = user.pop('salary', 'Not found')   
print('Removed Value:', Age)

user.popitem()
print(user)


{'id': 1, 'age': 30, 'city': 'Paris', 'name': 'John'}
{'id': 2, 'age': 45, 'city': 'Paris', 'name': 'John'}
{'id': 2, 'city': 'Paris', 'name': 'John'}
Removed Value: 45
Removed Value: Not found
{'id': 2, 'city': 'Paris'}


In [41]:
#Creation
user = {'id': None, 'age': None, 'city': None}

#OR
user = dict.fromkeys(['id', 'age', 'city'], None)

print(user)


{'id': None, 'age': None, 'city': None}


Dict Challenges

In [43]:
#Challenge: Keep only String values & Convert them to Uppercase
user = {'id': 1, 'name': 'John', 'age': 30, 'city': 'berlin'}

user_exp = {
    k: v.upper()           #data transformation
    for k, v in user.items()    #loop
    if isinstance(v, str)       #filter
}

print(user_exp)

{'name': 'JOHN', 'city': 'BERLIN'}


**Real world application**

In [None]:
#1 Use case: Database or API records
#Returned records are stored as dictionaries where column names are keys and the row values are the dict values

#Representing a Single row from a Database or API
row = {
    'id': 101, 
    'name': 'John', 
    'age': 29, 
    'country': 'DE',
    'status': 'active'
}

In [None]:
#2 Use case: Mapping to friendly values
#Great for converting technical codes into friendly labels

#Mapping translations to friendly values
status_map = {
    '01': 'Open',
    '02': 'In Progress',
    '03': 'Done'
}

In [None]:
#3 Use case: Mapping Abbreviations
#Turning short abbreviations into full readable names

country_map = {
    'DE': 'Germany',
    'FR': 'France',
    'BD': 'Bangladesh'
}

In [None]:
# 4 Use case: Config and environment data
#Store system settings like host, port, and usernames in one clean space

#Storing environment variables and configurations
system_conn = {
    'DB_HOST': 'prod-db.company.com',
    'DB_PORT': 5432,
    'DB_USER': 'admin_user',
    'DB_NAME': 'analytics_warehouse'
}

In [None]:
# 5 Use case: ETL and pipeline settings
#Great for storing run parameter and controlling how your ETL pipeline loads data

etl_config = {
    "DEBUG_MODE": False,              # Turn verbose logging on/off
    "BATCH_SIZE": 500,                # How many rows to process per batch
    "LOG_LEVEL": "INFO",              # Logging verbosity
    "SOURCE_PATH": "/data/bronze/",   # Where raw files live
    "TARGET_PATH": "/data/silver/",   # Where cleaned files go
    "RETRY_COUNT": 3,                 # How many times to retry a failed extr
    "FAIL_ON_ERROR": False,           # Keep going or stop the job
    "VALIDATE_SCHEMA": True,          # Enforce schema check
    "SUPPORTED_FORMATS": ["csv", "parquet"],  # Allowed file formats
    "RUN_ENV": "production"           # dev / test / prod
}

In [None]:
#6 Use case: Metadata: Data about the data

# Metadata
table_metadata = {
    "table_name": "customers",
    "columns": {
        "id": {"type": "integer", "nullable": False},
        "name": {"type": "string", "nullable": True},
        "age": {"type": "integer", "nullable": True},
        "country": {"type": "string", "nullable": True}
    },
    "row_count": 105320,
    "file_format": "parquet",
    "last_updated": "2024-10-01T12:45:00Z",
    "partition_by": ["country"],
    "tags": ["pii", "customer-data"]
}

### **Choosing the Right Python Data Structure**

→ If no other reason applies → use a **LIST** → `[1, 2, 3]`

From LIST, choose based on need:

- **Need protection (no changes)?** → use a **TUPLE** → `(1, 2, 3)`
- **Need uniqueness & performance (comparison)?** → use a **SET** → `{1, 2, 3}`
- **Need mapping (multiple info in one place)?** → use a **DICT** → `{'a': 1, 'b': 2}`