# Overview
This note explains a practical, secure pattern for storing and sharing healthcare records on IPFS using encryption, then gives a complete Python implementation you can adapt. It covers threat model, recommended cryptography (hybrid encryption with AES + RSA/ECC key-encryption), key management, privacy/regulatory points, and performance considerations. The design assumes you control an IPFS node or trusted gateway and will store only encrypted patient data on IPFS while keeping access control metadata (who can decrypt) off-chain or on a permissioned ledger.

## Key idea (hybrid encryption pattern)
    - Generate a random symmetric key (AES-256-GCM) per file for confidentiality.
    - Encrypt the medical file with that symmetric key (authenticated encryption to protect integrity).
    - Encrypt the symmetric key for each authorized recipient using that recipient’s public key (RSA-OAEP or ECIES) so only intended recipients can recover the file key.
    - Upload the ciphertext to IPFS; IPFS returns a CID that identifies the encrypted blob.
    - Store the CID plus encrypted symmetric key(s) and minimal access metadata in your access-control store (permissioned blockchain, database, or access service). Use provenance/integrity audit (blockchain) only for metadata, not the encrypted data itself.

> Security benefits: decentralised storage and content addressing (integrity via CID), combined with strong cryptography for confidentiality; small metadata footprint on-chain preserves privacy while allowing auditable access logs.

_____________________________________________________________________________________________________________________

## Threat model and security goals
    - Adversaries: external attackers who can read IPFS content; malicious node operators; network eavesdroppers; compromised recipients.
    - Goals: confidentiality of medical data, integrity and tamper-evidence, controlled sharing (only authorized recipients can decrypt), auditability of access, forward secrecy per-file by using unique per-file keys.
    - Non-goals: preventing nodes from refusing to serve data; protecting metadata leakage in your off-chain store if misconfigured.

_____________________________________________________________________________________________________________________

## Encryption choices — quick comparison

| Option | Confidentiality | Integrity | Key size/Notes |
|---|---:|---|---|
| **AES-256-GCM (symmetric)** | High | Built-in AEAD | Fast; use per-file random key; requires key distribution |
| **RSA-OAEP (asymmetric)** | Medium (for keys) | Depends on scheme | Use to encrypt AES keys for recipients; slow for large data |
| **ECIES / X25519 + HKDF** | High | Can provide AEAD via derived keys | Smaller keys, efficient for many recipients |
| **Hybrid AES + RSA/ECIES** | High | AEAD + asymmetric key encapsulation | Best practical trade-off for large files and multiple recipients |

Sources: patterns and benefits for IPFS + blockchain in healthcare literature.

_______________________________________________________________________________________________________________



### Compliance and privacy notes
- Never store unencrypted PHI (protected health information) on IPFS. Maintain minimal off-chain metadata.  
- Ensure access control, consent, and audit logging meet HIPAA/GDPR obligations in your jurisdictions. Maintain data retention / deletion policies since IPFS is content-addressed and garbage collection must be managed by node operators.  
- Consider organizational key management (HSM or cloud KMS) for private keys; never store private keys on the same host as encrypted data.

______________________________________________________________________________________________________________

### High-level architecture choices
- Option A (centralized key manager): server holds recipient public keys; encrypt symmetric keys for recipients and store encrypted keys in access DB. Access requests check policy and return encrypted key + CID.  
- Option B (user-managed keys): recipients manage their private keys; access DB stores only encrypted session keys for recipient public keys. This reduces trusted-server surface.  
- For revocation: use time-limited keys or re-encrypt and reupload new ciphertexts when revoking access; maintain mapping of current CIDs.

______________________________________________________________________________________________________________

### Python implementation (practical, ready-to-run)
This example uses:
- py-ipfs-http-client to talk to an IPFS node/gateway
- cryptography (hazmat) for AES-GCM and RSA-OAEP key encryption

Install prerequisites:

In [5]:
!pip install ipfshttpclient




Files included:
- key_utils.py (RSA key gen, save/load)
- encrypt_ipfs.py (encrypt file, upload to IPFS, produce metadata)
- decrypt_ipfs.py (download from IPFS, recover symmetric key, decrypt file)



In [17]:
# I wrote this module to load a PEM private key, download an encrypted payload from IPFS,
# decrypt a symmetric key that was encrypted to the recipient, and then decrypt the payload
# using AES-GCM.

from cryptography.hazmat.primitives.asymmetric import rsa, padding as asym_padding
from cryptography.hazmat.primitives import serialization, hashes
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

import base64
import ipfshttpclient
from typing import Dict

# I provide a helper that loads a PEM-encoded private key into a cryptography object.
# I expect pem_data to be bytes containing a PEM private key (PKCS#1, PKCS#8, or similar).
def load_private_key(pem_data: bytes):
    # I call the serialization loader to convert PEM bytes into a private key object.
    # If the PEM is encrypted, this function currently expects no password (password=None).
    private_key = serialization.load_pem_private_key(
        pem_data,
        password=None,
    )
    # I return the private key object so it can be used for decryption operations.
    return private_key

# I download raw bytes from IPFS given a CID and an address for the IPFS API.
# cid is the content identifier (string) and ipfs_addr is the multiaddr for the IPFS HTTP API.
def download_from_ipfs(cid: str, ipfs_addr="/dns/localhost/tcp/5001/http"):
    # I create a client connected to the supplied IPFS API address.
    client = ipfshttpclient.connect(ipfs_addr)
    # I fetch the content identified by the CID; client.cat returns raw bytes.
    data = client.cat(cid)
    # I return the downloaded bytes to the caller.
    return data

# I decrypt a base64-encoded, RSA-OAEP-encrypted symmetric key using the recipient's private key.
# enc_key_b64 is the encrypted key encoded in base64, and private_key is a cryptography private key object.
def decrypt_sym_key(enc_key_b64: str, private_key):
    # I decode the base64 data into raw encrypted bytes.
    enc_bytes = base64.b64decode(enc_key_b64)
    # I use OAEP with MGF1(SHA256) and SHA256 to decrypt the symmetric key.
    sym_key = private_key.decrypt(
        enc_bytes,
        asym_padding.OAEP(
            mgf=asym_padding.MGF1(algorithm=hashes.SHA256()),
            algorithm=hashes.SHA256(),
            label=None
        )
    )
    # I return the decrypted symmetric key (bytes) for use with AES-GCM.
    return sym_key

# I take a "package" dictionary that describes the encrypted file (cid, nonce, enc_keys),
# the recipient's PEM private key bytes, and an output path to write the decrypted plaintext.
# package is expected to look like:
# {
#   "cid": "<ipfs-cid>",
#   "nonce": "<base64-nonce>",
#   "enc_keys": { "<recipient_id>": "<base64-encrypted-sym-key>", ... }
# }
def decrypt_file_from_package(package: Dict, recipient_priv_pem: bytes, out_path: str, ipfs_addr="/dns/localhost/tcp/5001/http"):
    # I load the recipient's private key from the provided PEM bytes.
    priv = load_private_key(recipient_priv_pem)

    # I extract the IPFS CID from the package so I can download the ciphertext.
    cid = package["cid"]

    # I decode the nonce from base64 into the raw bytes AESGCM expects (12 bytes typically).
    nonce = base64.b64decode(package["nonce"])

    # I obtain the mapping of encrypted keys intended for recipients.
    enc_keys = package["enc_keys"]

    # NOTE (I explain my assumption): In a real package I would select the encrypted key
    # specific to my recipient identifier. For this example I pick the first value from enc_keys.
    # I decode the correct recipient's encrypted key in practice.
    enc_key_b64 = next(iter(enc_keys.values()))

    # I decrypt the symmetric key using my RSA private key and OAEP padding.
    sym_key = decrypt_sym_key(enc_key_b64, priv)

    # I download the ciphertext from IPFS using the CID.
    ciphertext = download_from_ipfs(cid, ipfs_addr)

    # I create an AESGCM instance with the decrypted symmetric key.
    aesgcm = AESGCM(sym_key)

    # I decrypt the ciphertext with the provided nonce. I do not use additional authenticated data (AAD),
    # so I pass None for that parameter.
    plaintext = aesgcm.decrypt(nonce, ciphertext, None)

    # I write the recovered plaintext bytes to the requested output path.
    with open(out_path, "wb") as f:
        f.write(plaintext)

    # I return the path where I wrote the plaintext to indicate success to the caller.
    return out_path


# Example usage notes (I keep these commented out so the module is import-safe):
# if __name__ == "__main__":
#     import json
#     # I would load a package description from a JSON file:
#     pkg = json.load(open("package.json"))
#     # I would read my private key PEM bytes from disk:
#     priv_pem = open("alice_priv.pem","rb").read()
#     # I would call the function to decrypt and write the file:
#     decrypt_file_from_package(pkg, priv_pem, "recovered_scan.pdf")

________________________________________________________________


### Notes:
    - This example uses RSA-OAEP for simplicity. For scalable recipient sets, use an EC-based KEM (X25519) + HKDF to derive per-recipient keys (faster and smaller keys).  
    - Store package metadata in a secure access control service, not in public IPFS data. Keep audit logs for access.  
    - Use authenticated encryption (AES-GCM) to detect tampering.

_______________________________________________________________________________________________________________

### Key management and best practices
    - Use HSMs / cloud KMS for production private keys. Do not store private keys on shared application servers.  
    - Rotate keys periodically and implement re-encryption workflows for revocation. For efficient re-encryption consider proxy re-encryption schemes (advanced, but promising for revocation without reuploading whole files).  
    - Protect metadata: even encrypted CIDs can leak access patterns; consider mixing, access obfuscation, or a permissioned IPFS network for sensitive deployments.

_______________________________________________________________________________________________________________

### Operational & performance considerations
    - IPFS pinning: ensure nodes pin encrypted blobs to keep them available; otherwise garbage collection can remove them. Manage pinning policies and costs.  
    - Chunking/large files: IPFS handles chunking, but encrypt entire file as single blob (per-file key) to maintain confidentiality. For streaming, use chunk-level keys or envelope encryption consistent with your access policy.  
    - Bandwidth and latency: uploading/downloading large medical images over IPFS can be slower than dedicated cloud blobs. Consider hybrid architecture (on-prem node + IPFS for cross-org sharing).

_______________________________________________________________________________________________________________

### Advanced topics (paths to strengthen system)
    - Use ECC (X25519) for recipient key encap, with HKDF to derive AES keys—more efficient and modern than RSA for multiple recipients.  
    - Consider attribute-based encryption (ABE) for policy-based sharing (complex and heavy).  
    - Explore proxy re-encryption to support revocation without re-encrypting large files.  
    - Use threshold encryption or multi-party HSM for shared custody of decryption rights.

_______________________________________________________________________________________________________________

### References and further reading
    - Implementation examples and IPFS-healthcare project work in academic and community projects.

_______________________________________________________________________________________________________________

