Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added update_metadata() to write_ adapters #716

Merged

Conversation

jmaruland
Copy link
Collaborator

@jmaruland jmaruland commented Jul 8, 2022

This PR started as an approach to add more information to the document schema that will help to keep track of all the updates made to the metadata and specs of the samples.
In addition, we implemented a revisions system with the mongo database that will keep track of old versions of documents. Every time that update_metadata() is run, the active document that is saved in collections is copied to revisions where every entry is protected by the same key id plus a revision number. There two parameters is used as a unique identifier for every entry.


class DocumentRevision(BaseDocument):
revision: int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to add a classmethod constructor here to do what you were trying to do in __init__.

@classmethod
def from_document(cls, document)
    return cls(key=document.key, ...)

databroker/experimental/schemas.py Outdated Show resolved Hide resolved
@@ -62,8 +63,9 @@ def inner(self, *args, **kwargs):
class WritingArrayAdapter:
structure_family = "array"

def __init__(self, collection, directory, doc):
def __init__(self, collection, revisions, directory, doc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to pass in the database here rather than separately passing in each of its collections.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

databroker/experimental/server_ext.py Outdated Show resolved Hide resolved
updated_at = datetime.now(tz=timezone.utc)
self.doc.updated_at = updated_at

if len(metadata) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want to update metadata to be empty {} or specs to be empty [], shouldn't that update be processed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

)

if result.matched_count != result.modified_count:
raise ValueError("Error while writing to database")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest classifying this as a RuntimeError.

@danielballan danielballan force-pushed the Add-timestamps-to-experimental-document branch from 5583ff5 to 0518b1d Compare August 4, 2022 16:20
@jmaruland jmaruland changed the title Added created_at and updated_at to document schema Added update_metadata() to write_ adapters Aug 4, 2022
@danielballan
Copy link
Member

Now that we are adding indexes, I think we should also add an index to the nodes collection. This will make lookup by key faster. We might as well also enforce it to be unique. Using UUID4 should achieve that result anyway, but it doesn't hurt to claim uniqueness via an index as well.

Copy link
Member

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems on track. A couple comments. More tests would be good, too.

def __len__(self):
return self._collection.count_documents(
{"key": self._key}
) # maybe wrong MongoDB usage here...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete comment (assuming this usage is now correct).

{"key": self._key}
) # maybe wrong MongoDB usage here...

def __getitem__(self, item_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mixed up and likely needs testing.

The usage r[i:j] should lead to skip(offset).limit(j - i). The usage r[i:] or r[i:None] or r[:-1] (all equivalent) should lead to skip(offset) with no limit. Pymongo also accept skip(offset).limit(0) where 0 means "no limit", which is an option if you find it leads to cleaner code.

if now > self.deadline:
self._doc = Document(
**self.collection.find_one({"key": self.key})
) # run query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems superfluous. :-)

def create_indexes(self):
self.revision_coll.create_index(
[("key", pymongo.ASCENDING), ("revision", pymongo.DESCENDING)], unique=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're creating indexes, we should also create an index on the nodes collection to ensure that key is unique.

@danielballan danielballan merged commit bafe6c5 into bluesky:main Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants