## 🧪 White-box Testing: Statement Coverage under Different Python Versions

### 1️⃣ Background and Objectives
> This notebook evaluates whether Python's `pickle` module produces deterministic serialized output when executing various **statements** under different Python versions (3.7.12 and 3.12.4) on the **same OS**.


### 2️⃣ Environment Information
> This code identifies the Python version used in the test.

In [2]:
import platform
import sys

def print_environment_info():
    print("📌 Current operating system:", platform.system(), platform.release())
    print("📌 Python version:", sys.version)

In [2]:
print_environment_info()

📌 Current operating system: Windows 10
📌 Python version: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 05:35:01) [MSC v.1916 64 bit (AMD64)]


In [3]:
print_environment_info()

📌 Current operating system: Windows 11
📌 Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)]


### 3️⃣ Test Cases and Input Structures
> Each input structure is mapped to a specific branch logic (e.g., empty vs. non-empty lists).

In [4]:
import pickle
import hashlib

def get_hash(obj):
    """Return the SHA256 hash value of the object after pickle serialization"""
    return hashlib.sha256(pickle.dumps(obj)).hexdigest()

In [5]:
import os

def save_result_by_python_version(case_name, value_hash, file_path="statement_python_hashes.txt"):
    py_version = ".".join(sys.version.split(" ")[0].split(".")[:2])
    block = f"{case_name}\n{py_version.ljust(10)}Result: {value_hash}\n"

    if os.path.exists(file_path):
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
    else:
        content = ""

    blocks = content.strip().split("\n\n") if content else []
    updated = False

    for i in range(len(blocks)):
        if blocks[i].startswith(case_name):
            lines = blocks[i].split("\n")
            lines = [line for line in lines if not line.startswith(py_version)]
            lines.append(f"{py_version.ljust(10)}Result: {value_hash}")
            blocks[i] = "\n".join(lines)
            updated = True
            break

    if not updated:
        blocks.append(block.strip())

    with open(file_path, "w", encoding="utf-8") as f:
        f.write("\n\n".join(blocks) + "\n")


In [6]:
import tempfile

class MyClass:
    def __init__(self):
        self.x = 42

def run_statement_tests_by_version():
    save_result_by_python_version("TC_SC_01 Integer object", get_hash(123))
    save_result_by_python_version("TC_SC_02 List object", get_hash([1, 2, 3]))
    save_result_by_python_version("TC_SC_03 Nested dictionary", get_hash({"a": {"b": {"c": 1}}}))
    save_result_by_python_version("TC_SC_04 Custom class object", get_hash(MyClass()))
    
    data1 = {"hello": [1, 2, 3]}
    with tempfile.NamedTemporaryFile(delete=False) as f:
        pickle.dump(data1, f)
        filepath = f.name
    with open(filepath, "rb") as f:
        loaded1 = pickle.load(f)
    save_result_by_python_version("TC_IO_01 File I/O consistency", get_hash(loaded1))
    os.remove(filepath)

    data2 = {"name": "Alice", "age": 30}
    b = pickle.dumps(data2)
    restored = pickle.loads(b)
    save_result_by_python_version("TC_IO_02 Bytes I/O consistency", get_hash(restored))

    try:
        pickle.loads(b"")
    except EOFError as e:
        save_result_by_python_version("TC_IO_03 Empty byte stream error", get_hash(str(e)))

    try:
        pickle.loads(b"not a pickle")
    except pickle.UnpicklingError as e:
        save_result_by_python_version("TC_IO_04 Invalid byte stream error", get_hash(str(e)))


In [7]:
run_statement_tests_by_version()

In [8]:
def print_version_results(file_path="statement_python_hashes.txt"):
    if not os.path.exists(file_path):
        print("❌ No result file found.")
        return
    with open(file_path, "r", encoding="utf-8") as f:
        print("✅ Recorded results:\n")
        print(f.read())

### 4️⃣ Platform-specific Hash Results
> Hash outputs for each test case executed on macOS, Windows, and Linux.

In [9]:
print_version_results("statement_python_hashes.txt")

✅ Recorded results:

TC_SC_01 Integer object
3.12      Result: b78afd939a4aef912cfa7945f436bb5de305a4dc69cae7af84ddd948519f3a31
3.7       Result: ca9493975e3875030e2d5a5c2265f13827a049d4473f62a448d71c05cd0e41ce

TC_SC_02 List object
3.12      Result: f9343d7d7ec5c3d8bcced056c438fc9f1d3819e9ca3d42418a40857050e10e20
3.7       Result: 0b2e0ab2f9000007b958bb4453492309dae1c6e1ddd4428f53455b053d662e67

TC_SC_03 Nested dictionary
3.12      Result: 46216ca97eebc983a09cf7458c32f2729fdb45b99151fb47939d51addc34c162
3.7       Result: 47cabfa3ba087c959992b7202c61311211c457a480a578a6375a27fea5adbba0

TC_SC_04 Custom class object
3.12      Result: dcfdf5f39f6649b2a26f445ee7f6e9c4bc0f087899683146f6a0a2561c9d3ee7
3.7       Result: 597fdbdbed85070e944ef0258a8c8cc5681aedeb0c33a5b21b13aa57c331e7d3

TC_IO_01 File I/O consistency
3.12      Result: eebed109eb579b15e13ec8e5363fa6443fbc257d588f726f850cf3533fc38e38
3.7       Result: 463c652e7f39053c03c19f3e2971529b0394e31c5845a42c27ca198ab558d7ce

TC_IO_02 Byte

### 5️⃣ Consistency Analysis and Divergence Detection

All test cases were executed on the same operating system under two Python versions: **3.12.4** and **3.7.12**.  
The serialized outputs from `pickle.dumps()` were hashed using SHA256 and compared across versions.

**Result:** All test cases resulted in different hashes between Python 3.12 and 3.7, including both standard values and exception messages.

#### ⚠️ Test cases with inconsistent hashes across Python versions:

| Test Case ID     | Description                          |
|------------------|--------------------------------------|
| TC_SC_01         | Integer object                       |
| TC_SC_02         | List object                          |
| TC_SC_03         | Nested dictionary                    |
| TC_SC_04         | Custom class object                  |
| TC_IO_01         | File I/O consistency                 |
| TC_IO_02         | Bytes I/O consistency                |
| TC_IO_03         | Empty byte stream error              |
| TC_IO_04         | Invalid byte stream error            |

<br>

> Even exception cases (e.g., EOFError and UnpicklingError) produced different message byte streams across versions, confirming that `pickle`'s internal structure and error reporting is version-sensitive.


### 6️⃣ Conclusions and Findings

The `pickle` module, when tested under **Python 3.12.4** and **Python 3.7.12** on the **same operating system**, did **not** demonstrate deterministic behavior at the binary (hash) level in any test case, despite identical input logic.

#### 🔍 Key Findings:

- ❌ All tested statements produced different SHA256 hashes between Python 3.7 and 3.12;
- ❌ Even error-handling code (`pickle.loads("")`, invalid byte content) showed version-specific output hashes;
- ⚙️ The differences likely stem from structural or metadata changes in pickle protocol implementations across versions.

#### ⚠️ Limitations:

- Only two Python versions (3.12.4, 3.7.12) were tested; intermediate versions were not included;
- The pickle protocol level was not explicitly varied (default used);
- Only one operating system and architecture was used (e.g., x86-64); results may differ under ARM, PyPy, etc.

> For any use case requiring binary-level stability (e.g., cache validation, file integrity, reproducibility), pickle is unsuitable across Python versions. Alternative formats like JSON or Protocol Buffers are recommended.


### 📎 Appendix: Raw Data File

The complete platform hash records can be found in the following file:

👉 [Download statement_python_hashes.txt](./statement_python_hashes.txt)