# Data Structures in Python for Data Science

Welcome to this interactive notebook on **data structures** in Python, designed for beginner and intermediate data scientists. Here, you will learn how to manipulate **lists**, **tuples**, **dictionaries**, and **sets**, which are fundamental for organizing, processing, and analyzing data in data science projects.

## Objectives
- Master the use of **lists** to store and manipulate sequential data.
- Understand **dictionaries** for mapping and organizing structured data.
- Explore **tuples** for immutable and fixed data.
- Use **sets** for operations with unique elements and duplicate removal.
- Apply visualizations with **matplotlib** to interpret data.
- Solve practical exercises to reinforce learning.

## Prerequisites
- Basic knowledge of Python (variables, loops, conditionals).
- Familiarity with Jupyter Notebooks.
- Required libraries: `matplotlib` (for visualizations).

We will work with a fictional dataset of ages and professions to contextualize the examples!

## 1. Lists
Lists are **ordered** and **mutable** collections, perfect for storing sequences of data such as ages, prices, or experiment results. They support operations like addition, removal, and sorting of elements.

### Characteristics
- Defined with square brackets `[]`.
- Can contain elements of different types (numbers, strings, etc.).
- Useful methods: `append()`, `remove()`, `sort()`, `pop()`.

Let's create a list of ages and perform common operations.

In [None]:
# Example 1: Manipulating a list of ages
ages = [22, 35, 28, 19, 22, 40]
print("List of ages:", ages)

# Adding a new age
ages.append(45)
print("After adding 45:", ages)

# Removing an element
ages.remove(22)  # Removes the first occurrence of 22
print("After removing one 22:", ages)

# Calculating the average
average_age = sum(ages) / len(ages)
print(f"Average age: {average_age:.2f}")

# Sorting the list
ages.sort()
print("Sorted ages:", ages)

### Visualizing Lists
Let's use `matplotlib` to create a histogram of the ages, making it easier to visualize the distribution.

In [None]:
import matplotlib.pyplot as plt

# Histogram of ages
plt.figure(figsize=(8, 6))
plt.hist(ages, bins=5, edgecolor='black', color='skyblue')
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)


## 2. Dictionaries
Dictionaries are **mutable** structures that map **keys** to **values**. They are ideal for organizing structured data such as counts, configurations, or records.

### Characteristics
- Defined with curly braces `{}`.
- Keys must be unique and immutable (strings, numbers, tuples).
- Values can be of any type.

Let's create a dictionary to count ages by age group and another to map professions.

In [None]:
# Example 1: Counting by age group
age_groups = {
    "18-25": 0,
    "26-35": 0,
    "36+": 0
}

for age in ages:
    if 18 <= age <= 25:
        age_groups["18-25"] += 1
    elif 26 <= age <= 35:
        age_groups["26-35"] += 1
    else:
        age_groups["36+"] += 1

print("Count by age group:", age_groups)

# Example 2: Mapping professions
people = {
    "Alice": {"age": 22, "profession": "Engineer"},
    "Bob": {"age": 35, "profession": "Data Analyst"},
    "Clara": {"age": 28, "profession": "Scientist"}
}

print("Bob's profession:", people["Bob"]["profession"])

# Adding a new person
people["David"] = {"age": 40, "profession": "Manager"}
print("After adding David:", people)

### Visualizing Dictionaries
Let's create a bar chart to visualize the count by age group.

In [None]:
# Bar chart
plt.figure(figsize=(8, 6))
plt.bar(age_groups.keys(), age_groups.values(), color='lightgreen', edgecolor='black')
plt.title('Count by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Number of People')
plt.grid(True, alpha=0.3)


## 3. Tuples
Tuples are **ordered** and **immutable** collections, useful for data that should not be changed, such as coordinates, categories, or fixed configurations.

### Characteristics
- Defined with parentheses `()`.
- Support indexing and slicing, like lists.
- Immutable: cannot be modified after creation.

Let's use tuples for age categories and coordinates.

In [None]:
# Example 1: Tuple of categories
categories = ("Young", "Adult", "Elderly")
print("Categories:", categories)
print("First category:", categories[0])

# Example 2: Tuple for coordinates
location = (40.7128, -74.0060)  # Latitude and longitude of New York
print("Coordinates of New York:", location)

# Attempting to modify (will raise an error)
try:
    categories[0] = "Teenager"
except TypeError as e:
    print("Error when trying to modify the tuple:", e)

## 4. Sets
Sets are **unordered** collections of **unique** elements, perfect for removing duplicates or performing set operations (union, intersection).

### Characteristics
- Defined with curly braces `{}` or `set()`.
- Do not allow duplicates.
- Support operations like `union()`, `intersection()`, `difference()`.

Let's remove duplicates and perform set operations.

In [None]:
# Example 1: Removing duplicates
duplicate_ages = [22, 35, 22, 19, 35, 40]
unique_ages = set(duplicate_ages)
print("Unique ages:", unique_ages)

# Example 2: Set operations
group1 = {"Alice", "Bob", "Clara"}
group2 = {"Bob", "David", "Emma"}

print("Union (members of both groups):", group1.union(group2))
print("Intersection (common members):", group1.intersection(group2))
print("Difference (members only in group1):", group1.difference(group2))

## 5. Practical Exercises
Below are some exercises for you to practice using data structures. Try to solve them before checking the solutions.

### Exercise 1: Age Filtering
Create a list with ages greater than 30 from the `ages` list.

### Exercise 2: Profession Count
Using the `people` dictionary, create a new dictionary that counts how many people have each profession.

### Exercise 3: Profession Sets
Create two sets with professions from different departments and find the professions that appear in both.

In [None]:
# Solution Exercise 1
ages_above_30 = [age for age in ages if age > 30]
print("Ages above 30:", ages_above_30)

# Solution Exercise 2
profession_count = {}
for person in people.values():
    profession = person["profession"]
    profession_count[profession] = profession_count.get(profession, 0) + 1
print("Profession count:", profession_count)

# Solution Exercise 3
dept1 = {"Engineer", "Data Analyst", "Designer"}
dept2 = {"Data Analyst", "Manager", "Scientist"}
common_professions = dept1.intersection(dept2)
print("Professions in both departments:", common_professions)

## 6. Conclusion
In this notebook, you learned how to use **lists**, **dictionaries**, **tuples**, and **sets** in Python, applying them in practical data science examples. Additionally, we explored visualizations with `matplotlib` and solved exercises to reinforce learning.

### Next Steps
- Experiment with these structures on real datasets (e.g., CSV, JSON).
- Dive deeper into libraries like `pandas` for advanced data manipulation.
- Explore more complex operations, such as list comprehensions and nested dictionaries.

Keep practicing and manipulating data to become a more confident data scientist!