-
Apologies if this is the wrong place -- this might also just be a general question. I'm having trouble re-creating the record keys for each record after downloading the CAR file from the Bluesky PDS. This is largely because I am unfamiliar with the format, and am having some trouble tracking down the info I need in the docs. Records themselves do not contain record keys, i.e
contains only the core data content. If you look at records just before the one I showed, you get something that (I think) is part of the Merkle tree -- it includes a record key (
Presumably there is a way to take the CAR records with the Any tips appreciated! MWEDownload a repoimport os
from atproto import Client
from atproto_identity.did.resolver import DidResolver
# Handle to test
handle = "cameron.pfiffer.org"
# Resolve to DIDs
atproto_client = Client()
did = atproto_client.resolve_handle(handle).did
# Get the repo
resolver = DidResolver()
data = resolver.resolve_atproto_data(did)
print(data)
# Get the PDS URL
pds = data.pds
# Create a client using the PDS URL. This one is needed
# for stale PDS references.
# https://github.com/MarshalX/atproto/discussions/188
pds_client = Client(base_url=pds)
# Download the repo
repo = pds_client.com.atproto.sync.get_repo({'did': did})
# Save the CAR file
if not os.path.exists("cars"):
os.makedirs("cars")
with open(f"cars/{did}.car", "wb") as f:
f.write(repo) Read the repoimport glob
import re
from rich import print
from atproto import CAR
for path in glob.glob("cars/*.car"):
did = re.match(r'cars/(did:plc:.*)\.car', path).group(1)
print(f"Processing {did}")
# Load the CAR file
with open(path, "rb") as f:
car = CAR.from_bytes(f.read())
for cid in car.blocks:
record = car.blocks.get(cid)
print(record) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hello, yeah, this is general question about AT Protocol. Speaking about Python SDK I can tell that I did not implement easy to use tree walker. So the only way for now is to implement it by yourself. The proper documentation lives here: https://atproto.com/specs/repository. And the official implementation is here: https://github.com/bluesky-social/atproto/blob/main/packages/repo/src/mst/walker.ts
Speaking of your example with repost record you should try to get CID (from |
Beta Was this translation helpful? Give feedback.
-
In case anyone comes across this later, here's a relatively simple technique to match record keys to their contents. from rich import print
from atproto import CAR
from atproto_core.cid import CID
def reconstruct_key(entries, current_entry):
"""Reconstruct the full key using prefix length"""
if current_entry['p'] == 0:
return current_entry['k'].decode('utf-8')
# Find the previous entry in the array
entry_idx = entries.index(current_entry)
if entry_idx == 0:
raise ValueError("First entry cannot have non-zero prefix length")
prev_entry = entries[entry_idx - 1]
prev_key = prev_entry['k'].decode('utf-8')
curr_suffix = current_entry['k'].decode('utf-8')
# Take prefix_len characters from previous key and append current suffix
return prev_key[:current_entry['p']] + curr_suffix
# SET YOUR DID HERE
did = "did:plc:123456789"
print(f"Processing {did}")
# Load the CAR file
CAR_PATH = f"blahblahblah.car"
with open(CAR_PATH, "rb") as f:
car = CAR.from_bytes(f.read())
# Map to store record_key -> record content
records = {}
# First pass: process MST nodes to build key -> CID mapping
key_to_cid = {}
for cid in car.blocks:
block = car.blocks.get(cid)
# Check if this is an MST node (has 'e' entries)
if isinstance(block, dict) and 'e' in block:
for entry in block['e']:
try:
# Reconstruct the full key
full_key = reconstruct_key(block['e'], entry)
# Decode the CID from the 'v' field
value_cid = CID.decode(entry['v'])
key_to_cid[full_key] = value_cid
except Exception as e:
print(f"Error processing entry: {e}")
continue
# Second pass: match keys with their record content
for record_key, record_cid in key_to_cid.items():
try:
record_content = car.blocks.get(record_cid)
if record_content:
records[record_key] = record_content
except Exception as e:
print(f"Error fetching record content: {e}")
continue
# Now process the records as needed
for record_key, record in records.items():
record_type = record.get("$type")
if record_type == "app.bsky.feed.repost":
print(f"Found repost: {record_key}")
print(record) |
Beta Was this translation helpful? Give feedback.
Hello, yeah, this is general question about AT Protocol. Speaking about Python SDK I can tell that I did not implement easy to use tree walker. So the only way for now is to implement it by yourself. The proper documentation lives here: https://atproto.com/specs/repository. And the official implementation is here: https://github.com/bluesky-social/atproto/blob/main/packages/repo/src/mst/walker.ts
v
is CID. You can decode it usingfrom atproto import CID; CID.decode(v)
k
is record key which could be truncated and need to be restored from prev events. usep
to understand how many chars you need to copy from prevk
to restore itSpeaking of your example with repost record you should try to…