
# Python Copying: `dict.copy()` vs `dict()` vs Shallow vs Deep Copy

This notebook helps you **avoid bugs caused by shared references** when working with JSON-like data (APIs, MongoDB, ML pipelines).

You will learn:
- What **shallow copy** means (and why it bites with nested dict/list)
- `d.copy()` vs `dict(d)` (when copying)
- How to do **deep copy** with `copy.deepcopy`
- **Selective deep copy** patterns (faster than full deepcopy)
- Common anti-patterns and safe patterns


## 1) Setup

In [None]:

a = {
    "id": "abc123",
    "statistics": {"views": 1000, "likes": 120},
    "tags": ["music", "live"],
}

print("a:", a)
print("id(a):", id(a))



## 2) `copy()` vs `dict()` when copying

Both create a **new top-level dict**, but they are **shallow copies**.


In [None]:

b = a.copy()
c = dict(a)

print("id(a):", id(a))
print("id(b):", id(b))
print("id(c):", id(c))

print("a is b:", a is b)
print("a is c:", a is c)



## 3) Shallow copy means nested objects are SHARED

Even though `b` and `c` are different dict objects, they still point to the same nested dict/list.


In [None]:

print("id(a['statistics']):", id(a["statistics"]))
print("id(b['statistics']):", id(b["statistics"]))
print("id(c['statistics']):", id(c["statistics"]))

print("id(a['tags']):", id(a["tags"]))
print("id(b['tags']):", id(b["tags"]))
print("id(c['tags']):", id(c["tags"]))


### Modify nested data in the copy â†’ original also changes (bug!)

In [None]:

b["statistics"]["views"] = 9999
b["tags"].append("2025")

print("a after modifying b:", a)
print("c after modifying b:", c)



## 4) Deep copy (`copy.deepcopy`) â€” fully independent nested objects

Use this when you need a completely independent clone.


In [None]:

import copy

a = {
    "id": "abc123",
    "statistics": {"views": 1000, "likes": 120},
    "tags": ["music", "live"],
}

deep = copy.deepcopy(a)

deep["statistics"]["views"] = 7777
deep["tags"].append("DEEP")

print("original a:", a)
print("deep copy:", deep)

print("id(a):", id(a), "id(deep):", id(deep))
print("id(a['statistics']):", id(a["statistics"]), "id(deep['statistics']):", id(deep["statistics"]))
print("id(a['tags']):", id(a["tags"]), "id(deep['tags']):", id(deep["tags"]))



## 5) Selective deep copy (recommended for performance)

Often you only need to isolate a few nested fields (e.g., `ml_flags`, `statistics`, `tracking`).

### Pattern A: copy top-level + copy specific nested dicts/lists


In [None]:

a = {
    "id": "abc123",
    "statistics": {"views": 1000, "likes": 120},
    "tags": ["music", "live"],
    "ml_flags": {"low_quality_3h": False, "viral_6h": False}
}

clone = a.copy()
clone["statistics"] = a["statistics"].copy()
clone["tags"] = list(a["tags"])
clone["ml_flags"] = a["ml_flags"].copy()

clone["statistics"]["views"] = 5555
clone["tags"].append("SAFE")
clone["ml_flags"]["viral_6h"] = True

print("a:", a)
print("clone:", clone)



### Pattern B: dict unpacking + nested copy
Same idea, just different style.


In [None]:

a = {
    "id": "abc123",
    "statistics": {"views": 1000, "likes": 120},
    "tags": ["music", "live"]
}

clone = {
    **a,
    "statistics": a["statistics"].copy(),
    "tags": a["tags"][:],  # copy list
}

clone["statistics"]["likes"] = 999
clone["tags"].append("UNPACK")

print("a:", a)
print("clone:", clone)



## 6) Common anti-patterns (and fixes)

### Anti-pattern: using shallow copy then mutating nested fields


In [None]:

a = {"stats": {"views": 100}, "tags": ["x"]}
b = a.copy()  # shallow
b["stats"]["views"] += 1
b["tags"].append("y")

print("a (unexpectedly changed):", a)
print("b:", b)


### Fix: deep copy or selective copy

In [None]:

import copy

a = {"stats": {"views": 100}, "tags": ["x"]}

# Option 1: full deep copy
b1 = copy.deepcopy(a)
b1["stats"]["views"] += 1
b1["tags"].append("y")

# Option 2: selective copy (faster)
b2 = a.copy()
b2["stats"] = a["stats"].copy()
b2["tags"] = a["tags"][:]  # list copy

b2["stats"]["views"] += 1
b2["tags"].append("y")

print("a:", a)
print("b1 (deep):", b1)
print("b2 (selective):", b2)



## 7) Quick decision rules ðŸ§ 

- If your dict contains only primitives (int/str/bool/None): `copy()` is usually enough.
- If your dict contains nested dict/list AND you will mutate them:
  - Use `copy.deepcopy()` for safety, OR
  - Use **selective deep copy** for performance (recommended in pipelines).
- `dict(d)` is great for conversion, but for readability when copying, prefer `d.copy()`.

---
âœ… You can now safely clone JSON-like documents without accidental shared-reference bugs.
