Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConfigurationError: montydb has been config to use BSON and cannot be changed in current session. #79

Closed
rkingsbury opened this issue Jun 14, 2023 · 8 comments
Labels
resolved Issue has been resolved but remain open for reference

Comments

@rkingsbury
Copy link

I am trying to use two instances of MontyClient simultaneously - one in memory and another on disk. Is this possible? Right now I am recieving the following error, which I am having trouble understanding.

ConfigurationError: montydb has been config to use BSON and cannot be changed in current session.

In the README I see

The configuration process only required on repository creation or modification. And, one repository (the parent level of databases) can only assign one storage engine.

By "repository creation" do you mean "on instantiation of `MontyClient"? Is there any way to set the storage on a per-instance basis so that I can move data between different repositories, and memory?

Thanks for any assistance!

@davidlatwe
Copy link
Owner

davidlatwe commented Jun 19, 2023

Hi @rkingsbury , sorry for my late reply 💦

I didn't do the test, but I believe it's possible to operate multiple instances of MontyClient under the same Python session (process).

But the ConfigurationError you get is another story. The message itself is not doing its best job, quite confusing indeed. 😅

The thing is, MongoDB uses BSON, which makes montydb also uses BSON. And usually, one would get BSON from pymongo.

But for minimum dependency, it doesn't make much sense to add pymongo as a dependency of montydb just for the bson module that came with it.

So the choice I made at that time, was to vendoring a small part of the bson into montydb. And since the bson within montydb is not the same as the bson that came with pymongo (lots of fake type, e.g. SON, Code...), we have to pick which bson we are going to use for current Python session, for performance reason.

Now back to the ConfigurationError you get, my guess is this:

  1. a monty database was created with montydb's own bson module
  2. pymongo installed
  3. pymongo's bson was picked by default simply becasue it now exists
  4. Conflict.

Can you confirm this?

By "repository creation" do you mean "on instantiation of `MontyClient"?

Specifically, the first time you initialize a MontyClient for a database that is not yet exists.

Is there any way to set the storage on a per-instance basis

I think, if both databases were created with same BSON configuration (e.g. have pymongo installed then create all the databases you need), we are safe.

Please let me know if anything is unclear or incorrect, this issue is on my radar now. 😊

@davidlatwe davidlatwe added the enhancement Things need to improve or adopt label Jun 20, 2023
@rkingsbury
Copy link
Author

Thank you for clarifying @davidlatwe ! The error message makes a little more sense to me now. In my case, pymongo is definitely installed and I am passing use_bson=True to MontyClient(). But it seems the error is triggered if I try to create more than one MontyClient using the memory storage engine. As a minimal example:

>>> from montydb import MontyClient
>>> mc=MontyClient(":memory:", use_bson=True)
>>> mc2=MontyClient(":memory:", use_bson=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ryan/miniconda3/envs/md/lib/python3.9/site-packages/montydb/client.py", line 41, in __init__
    storage_cls = provide_storage(repository)
  File "/Users/ryan/miniconda3/envs/md/lib/python3.9/site-packages/montydb/configure.py", line 269, in provide_storage
    _bson_init(_session["use_bson"])
  File "/Users/ryan/miniconda3/envs/md/lib/python3.9/site-packages/montydb/configure.py", line 282, in _bson_init
    raise ConfigurationError(
montydb.errors.ConfigurationError: montydb has been config to use BSON and cannot be changed in current session.

But if I continue in the above session and create clients using flatfile storage, I don't get this error:

>>> mc3=MontyClient(use_bson=True)
>>> mc4=MontyClient(use_bson=True)

@rkingsbury
Copy link
Author

Hi @davidlatwe , any thoughts about how to fix or work around this? For additional context, I'm trying to use MontyStore as a replacement for mongomock in a scientific data management package called maggma. But to do so, I really need to be able to instantiate multiple independent databases in memory.

@davidlatwe
Copy link
Owner

Hey @rkingsbury , sorry again for another late reply, got flooded by works. 😅 And thanks for pinning me!

The use_bson flag shouldn't be set by MontyClient, but set_storage function. Here's the example:

from montydb import MontyClient, set_storage

set_storage(
    repository=":memory:",
    storage="memory",
    use_bson=True,
)

mc1 = MontyClient(":memory:")
mc2 = MontyClient(":memory:")

It's not obvious, but the README did mention this, just not using memory storage as example though. See the example code in Storage section here 😊

Also noted that in memory storage, all clients are sourcing same storage instance.

Here's the full code that I just tested, also checking that montydb is using correct bson module:

from montydb import MontyClient, set_storage

use_bson = True

set_storage(
    repository=":memory:",
    storage="memory",
    use_bson=use_bson,
)
mc1 = MontyClient(":memory:")
mc2 = MontyClient(":memory:")


# Proving that two clients are using same memory storage
bar1 = mc1.get_database("foo").get_collection("bar")
bar2 = mc2.get_database("foo").get_collection("bar")
bar1.insert_one({"test": "doc"})
assert bar2.find_one({"test": "doc"}) == bar1.find_one({"test": "doc"})


# Check which bson module was used.
doc = bar1.find_one({"test": "doc"})
if use_bson:
    assert type(doc["_id"]).__module__ == "bson.objectid"
else:
    assert type(doc["_id"]).__module__ == "montydb.types.objectid"

Hope this helps!

@rkingsbury
Copy link
Author

Thanks for clarifying @davidlatwe ! So basically there can only be one MontyClient using memory at a time. That's what I needed to know. Would it be difficult to change that, to make it possible to have multiple independent memory repos?

@davidlatwe
Copy link
Owner

It shouldn't be difficult to change that.

Right now every instance of in-memory MontyClient is read/write data from/to this one OrderedDict at this line.

To change that, we will need to extend in-memory URL from :memory: to something like :memory:any-name, and then maybe, use that as a top level key of that internal _repo object. Same goes to the _config object.

Also needs to be thread-safe.

See if I can implement that and make a new release by the end of this weekend.

@davidlatwe
Copy link
Owner

Hey @rkingsbury

Got delayed, but the in-memory engine can now have independent repos. Please try it out. See 2.5.2 😃

@davidlatwe davidlatwe added resolved Issue has been resolved but remain open for reference and removed enhancement Things need to improve or adopt labels Jul 22, 2023
@rkingsbury
Copy link
Author

Thanks so much for adding this @davidlatwe ! It works great. I used the following code to test.

from montydb import MontyClient, set_storage

use_bson = True

set_storage(
    repository=":memory:",
    storage="memory",
    use_bson=use_bson,
)
mc1 = MontyClient(":memory:")
mc2 = MontyClient(":memory:2")


# Proving that two clients are using same memory storage
bar1 = mc1.get_database("foo").get_collection("bar")
bar2 = mc2.get_database("foo").get_collection("bar")
bar1.insert_one({"test": "doc"})
assert bar2.count()==0
assert bar1.count()==1

If understanding correctly from reading your PR, I just need to make sure the repo name starts with :memory:, and then as long as the repo names passed to MontyClient are unique, they will point to different instances.

I'm still slightly confused about where to use set_storage vs. the repository kwarg in MontyClient. In the above example, it seems that it's ok to just specify :memory:2 as the repo for the 2nd MontyClient instance, even though I have never pointed set_storage to it.

Anyway, thanks again for addressing! I'll go ahead and close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolved Issue has been resolved but remain open for reference
Projects
None yet
Development

No branches or pull requests

2 participants