# Hands-on C2PA: A Developer's Walkthrough

In today's digital landscape, AI-generated content is becoming increasingly sophisticated and widespread. While this brings incredible creative possibilities, it also raises concerns about authenticity and trust. How can we embrace AI's potential while maintaining transparency about content origins?

C2PA (Coalition for Content Provenance and Authenticity) offers a solution. It's a metadata standard that major tech companies like Adobe and OpenAI have adopted to clearly identify AI-generated content.

Through practical examples in this guide, you'll discover how to:
1. Read C2PA metadata from images using Python
2. See how C2PA detects image tampering through practical examples
3. Understand how to verify content authenticity from trusted providers
4. Explore current limitations and challenges

Overview of C2PA:

![C2PA Visual Glossary](blog_images/c2pa_visualglossary.png)
*Source: [C2PA Technical Specification](https://c2pa.org/specifications/specifications/1.0/specs/)*

## A C2PA example


Below is an image generated by DALL-E through ChatGPT, verifying with https://contentcredentials.org/verify

![OpenAI C2PA Example](blog_images/openai_c2pa.png)
*Source: [OpenAI Help - C2PA in ChatGPT Images](https://help.openai.com/en/articles/8912793-c2pa-in-chatgpt-images)*

This image includes C2PA metadata that tells us:
- It was "Generated by OpenAI"
- Which model created it (DALL-E)
- A digital signature from OpenAI

## Reading C2PA Data with Python

This is a runnable notebook demonstrating C2PA with Python. We'll use fast-c2pa-python, our high-performance wrapper of c2pa-rs.

⚠️ While CAI provides an official c2pa-python library, we chose to develop fast-c2pa-python using PyO3 bindings which runs much faster for reading C2PA data. However, unlike the official library, fast-c2pa-python currently only supports reading C2PA metadata, not signing content.

Below is an image generated by ChatGPT

In [2]:
from IPython.display import HTML, display
display(HTML('<img src="blog_images/chatgpt_image.png" width="300"/>'))


In [31]:
from fast_c2pa_python import read_c2pa_from_file
from PIL import Image

# Read C2PA metadata from our example image
metadata = read_c2pa_from_file("blog_images/chatgpt_image.png")

# Get the active manifest which contains the main C2PA data
active_manifest_id = metadata["active_manifest"]
active_manifest = metadata["manifests"][active_manifest_id]

print("C2PA Metadata:")
print(f"- Validation State: {metadata['validation_state']}")

if "signature_info" in active_manifest:
    print(f"- Signed by: {active_manifest['signature_info'].get('issuer', 'Unknown')}")


C2PA Metadata:
- Validation State: Valid
- Signed by: OpenAI


This image is C2PA-valid and signed by OpenAI, full metadata can be found below

In [33]:
metadata

{'active_manifest': 'urn:c2pa:35adfcd7-6ebe-463f-9829-310afd9cefcb',
 'manifests': {'urn:c2pa:f24e32e2-fbe2-4bf0-b31f-7b69ae159067': {'claim_generator_info': [{'name': 'ChatGPT',
     'org.cai.c2pa_rs': '0.49.5'}],
   'title': 'image.png',
   'instance_id': 'xmp:iid:db163776-cbae-4c3d-89b3-59b3ca3c1fba',
   'ingredients': [],
   'assertions': [{'label': 'c2pa.actions.v2',
     'data': {'actions': [{'action': 'c2pa.created',
        'softwareAgent': {'name': 'GPT-4o'},
        'digitalSourceType': 'http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia'},
       {'action': 'c2pa.converted',
        'softwareAgent': {'name': 'OpenAI API'}}]}}],
   'signature_info': {'alg': 'Es256',
    'issuer': 'OpenAI',
    'cert_serial_number': '631872854730012650133502748526736898092667640635'},
   'label': 'urn:c2pa:f24e32e2-fbe2-4bf0-b31f-7b69ae159067'},
  'urn:c2pa:35adfcd7-6ebe-463f-9829-310afd9cefcb': {'claim_generator_info': [{'name': 'ChatGPT',
     'org.cai.c2pa_rs': '0.49.5'}

The JSON response shows:
1. Some content information (issuer: OpenAI, ...)
2. Security validations that prevent tampering:
   - claimSignature.insideValidity: signature is within valid timeframe
   - claimSignature.validated: signature is cryptographically valid
   - assertion.dataHash.match: content hasn't been modified

C2PA Data Structure

![C2PA Diagram](blog_images/c2pa_diagram.png)

*Source: [C2PA Technical Specification](https://c2pa.org/specifications/specifications/2.2/specs/C2PA_Specification.html)*

C2PA uses multiple layers of verification:

1. Content Integrity
   - Various aspects of the content are hashed
   - For example, image pixels are hashed in c2pa.hash.data
   - The manifest contains multiple assertions and their hashes
   - Any modification to content or assertions breaks verification

2. Source Authentication
   - Claim Signature contains signer's certificate
   - Certificate is signed by trusted CAs (DigiCert, Truepic)
   - Allows verifying trusted sources (OpenAI, Adobe)

Let's test C2PA's tampering detection by modifying an image.

⚠️ Most image libraries strip C2PA metadata when saving. To properly demonstrate tampering detection, fast-c2pa-python provides utilities (based on c2pa-rs) that preserve C2PA data while modifying the image.

### Testing Pixel Modification Detection

Below are two versions of the same image:
- Left: Original image generated by ChatGPT
- Right: Same image converted to grayscale, with C2PA metadata preserved

We use fast-c2pa-python to convert the image while keeping its C2PA data intact. This allows us to demonstrate how C2PA detects even simple pixel modifications.

In [3]:
from fast_c2pa_python import convert_to_gray_keep_c2pa

# Convert image to grayscale while keeping C2PA data
input_image = "blog_images/chatgpt_image.png"
output_image = "blog_images/chatgpt_image_gray.png"
convert_to_gray_keep_c2pa(input_image, output_image, format="image/png")

# Display both images side by side
from IPython.display import HTML, display
display(HTML(f'''
<div style="display: flex; gap: 20px;">
    <div>
        <p>Original Image:</p>
        <img src="{input_image}" width="300"/>
    </div>
    <div>
        <p>Grayscale Image (with C2PA preserved):</p>
        <img src="{output_image}" width="300"/>
    </div>
</div>
'''))


In [35]:
# Verify C2PA data in the grayscale image
metadata = read_c2pa_from_file(output_image)

print("C2PA Validation State:", metadata["validation_state"])

C2PA Validation State: Invalid


In [36]:
metadata['validation_results']['activeManifest']['failure']

[{'code': 'assertion.dataHash.mismatch',
  'url': 'self#jumbf=/c2pa/urn:c2pa:35adfcd7-6ebe-463f-9829-310afd9cefcb/c2pa.assertions/c2pa.hash.data',
  'explanation': 'asset hash error, name: jumbf manifest, error: hash verification( Hashes do not match )'}]

As expected, C2PA validation fails because the image pixels were modified - the grayscale conversion changed the pixel values, causing the data hash to mismatch with the original hash stored in C2PA.

### Testing Metadata Tampering Detection

Now let's try a different type of tampering - modifying image metadata. We'll use exiftool to change the CreateDate field while keeping the image pixels intact. This tests if C2PA can detect metadata-only modifications.

In [37]:
# First create a copy of the image
!cp blog_images/chatgpt_image.png blog_images/chatgpt_image_createdate.png
# Then modify the CreateDate of the copy
!exiftool -CreateDate="2024:01:01 12:00:00" -overwrite_original blog_images/chatgpt_image_createdate.png


    1 image files updated


In [38]:
print("Original image CreateDate:")
!exiftool -CreateDate -s -s -s blog_images/chatgpt_image.png
print("\n")
print("Modified image CreateDate:")
!exiftool -CreateDate -s -s -s blog_images/chatgpt_image_createdate.png

Original image CreateDate:


Modified image CreateDate:
2024:01:01 12:00:00


The exiftool output shows we successfully changed the image's CreateDate from None to 2024:01:01 12:00:00. Let's see how C2PA validates this modified image.

In [39]:
metadata = read_c2pa_from_file("blog_images/chatgpt_image_createdate.png")


In [40]:
print("C2PA Validation State:", metadata["validation_state"])
print("C2PA Validation Failures:", metadata['validation_results']['activeManifest']['failure'])

C2PA Validation State: Invalid
C2PA Validation Failures: [{'code': 'assertion.dataHash.mismatch', 'url': 'self#jumbf=/c2pa/urn:c2pa:35adfcd7-6ebe-463f-9829-310afd9cefcb/c2pa.assertions/c2pa.hash.data', 'explanation': 'asset hash error, name: jumbf manifest, error: hash verification( Hashes do not match )'}]


As expected, C2PA successfully detected the metadata tampering, marking the image as Invalid. Even though we only modified the CreateDate field, C2PA's integrity checks caught this change.

## Verifying Content Sources

We've seen how C2PA detects tampering, but there's another crucial aspect: how do we verify that content comes from trusted providers like Adobe or OpenAI?

By default, C2PA SDKs only verify content integrity, not the trustworthiness of sources. To enable trust verification, we need to configure a list of trusted certificates.

⚠️ Currently, C2PA uses a temporary list of trusted certificates (source: [CAI documentation](https://opensource.contentauthenticity.org/docs/verify-known-cert-list/)). This will be replaced by an official public list when C2PA publishes it.

Let's test how C2PA validates an image that has valid signatures but isn't from our trusted sources list. We'll use a test image without enabling trust verification.

For this test, we'll use a sample image from the c2pa-rs repository that contains valid C2PA data but isn't signed by a known provider.

In [41]:
metadata = read_c2pa_from_file("blog_images/C.jpg")

In [42]:
print("C2PA Validation State:", metadata["validation_state"])
print("C2PA Validation Failures:", metadata['validation_results']['activeManifest']['failure'])

C2PA Validation State: Valid
C2PA Validation Failures: []


The image passes all integrity checks with a Valid status. Looking at the manifest, we can see it's signed by "C2PA Test Signing Cert" - indicating it's a test certificate, not a real provider.

In [54]:
metadata['manifests']['contentauth:urn:uuid:b2b1f7fa-b119-4de1-9c0d-c97fbea3f2c3']['signature_info']

{'alg': 'Ps256',
 'issuer': 'C2PA Test Signing Cert',
 'cert_serial_number': '720724073027128164015125666832722375746636448153',
 'time': '2024-08-06T21:53:37+00:00'}

### Enabling Trust Verification

CAI provides a list of trusted certificates that we can configure in our C2PA settings ([documentation](https://opensource.contentauthenticity.org/docs/verify-known-cert-list/)).

This list includes certificates from major providers such as:
- Adobe Root CA G2
- Truepic
- Samsung Corporation
- Canon C2PA Root CA

In [59]:
from fast_c2pa_python import read_c2pa_from_file, setup_trust_verification

# Setup trust verification
setup_trust_verification(
    "tests/tmp_cert/anchors.pem",  # Root certificates
    "tests/tmp_cert/allowed.pem",  # Allowed certificates
    "tests/tmp_cert/store.cfg"     # Trust configuration
)

# Read with trust list
metadata = read_c2pa_from_file("blog_images/C.jpg")

In [60]:
print("C2PA Validation State:", metadata["validation_state"])
print("C2PA Validation Failures:", metadata['validation_results']['activeManifest']['failure'])

C2PA Validation State: Invalid
C2PA Validation Failures: [{'code': 'signingCredential.untrusted', 'url': 'self#jumbf=/c2pa/contentauth:urn:uuid:b2b1f7fa-b119-4de1-9c0d-c97fbea3f2c3', 'explanation': 'signing certificate untrusted'}]


Now the same image fails validation with status Invalid. The error code signingCredential.untrusted indicates that while the signature is valid, the issuer is not in our list of trusted providers.

## Current Limitations

C2PA is a powerful tool for building trust in digital content, but its effectiveness faces two significant challenges. First is adoption - the system requires widespread implementation across digital services, content creators, and platforms to maintain the chain of trust.

The second challenge is metadata preservation. Currently, C2PA data is easily lost through common actions like taking screenshots or sharing on social media platforms. Most image processing operations strip this crucial metadata, breaking the verification chain.

These limitations highlight that while C2PA provides robust technical solutions for content authenticity, its success depends heavily on ecosystem-wide support and improved metadata resilience.