Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INTEGRATION] Improve Endpoint configuration #1653

Closed
fnikolai opened this issue Jan 7, 2024 · 1 comment
Closed

[INTEGRATION] Improve Endpoint configuration #1653

fnikolai opened this issue Jan 7, 2024 · 1 comment
Assignees

Comments

@fnikolai
Copy link
Collaborator

fnikolai commented Jan 7, 2024

We need to improve the way we define endpoints as environment variables (and make an exhaustive list of the accepted variables).

Ideally, the testing workflow should be:

make testenv_image # Create cdc, vector-search, and notebook image
make testenv_init # Launch the testing environment
source ./deploy/testenv/endpoints # Load the endpoints for mongo,cdc,vector-search,...
pytest ./test # Run the tests

The ./deploy/testenv/endpoints should look like:

SUPERDUPER_DATA_BACKEND: 'mongodb://superduper:superduper@mongodb:27017/test_db'
SUPERDUPER_VECTOR_SEARCH: ...
SUPERDUPER_CDC: ...

The current workflow has everything hardcoded which makes it impossible to run the tests on a kubernetes environment. For the instance, this code is always trying to connect to a local database

@pytest.fixture
def test_db(monkeypatch, request) -> Iterator[Datalayer]:
    from superduperdb import CFG
    from superduperdb.base.build import build_datalayer

    # mongodb instead of localhost is required for CFG compatibility with docker-host
    db_name = "test_db"
    data_backend = f'mongodb://superduper:superduper@mongodb:27017/{db_name}'

    monkeypatch.setattr(CFG, 'data_backend', data_backend)

    db = build_datalayer(CFG)
@fnikolai fnikolai changed the title Use Ray endpoint as env variable Improve Endpoint configuration Jan 7, 2024
@fnikolai
Copy link
Collaborator Author

fnikolai commented Feb 15, 2024

here is the job: https://github.com/SuperDuperDB/superduperdb/actions/runs/7906999810/job/21583069571

superduperdb.base.exceptions.ServiceRequestException: Server error at cdc with 400 :: {'error': 'Config is not match'}

The problem is that conftest.py effectively ignores the env variables, resulting into different configurations from CDC, Vector-search, ray-workers etc.
There are two reasons for that:

  1. It reads non-standard env variables like SUPERDUPER_MONGO_URI.
  2. Has hardcoded values.

Here are some examples from conftest.py

    data_backend = 'mongodb://superduper:superduper@mongodb:27017/test_db'
    data_backend = os.environ.get('SUPERDUPER_MONGO_URI', data_backend)
    db_name = data_backend.split('/')[-1]

    artifact_store = 'filesystem:///tmp/artifacts'
    monkeypatch.setattr(CFG, 'artifact_store', artifact_store)

Standard and Non-Standard Variables
The standard variables are only those defined in the Config (e.g SUPERDUPER_DATA_BACKEND ). The fact that a variable begins with SUPERDUPER does not make it standard. (e.g SUPERDUPER_MONGO_URI)

As Kartik said on a previous iteration, the reason for having non-standard variables is that we may want to test multiple databases (e.g, mongo, sql, etc).
So we cannot simply provide a SUPERDUPER_DATA_BACKEND .

I get the problem, but we cannot simply hardcode paths because we lose portability.

Possible Solution

Use SUPERDUPERDB_TEST_CONFIGS that will as input a JSON of different Datalayer configuration we want to test.

For example:

export SUPERDUPERDB_TEST_CONFIGS='[
  {
    "DATA_BACKEND": "mongodb://...",
    "COMPUTE_URI": "ray://localhost...",
    "ARTIFACT_STORES": "/artifacts"
  },
  {
    "DATA_BACKEND": "sql://...",
    "COMPUTE_URI": "ray://localhost...",
    "ARTIFACT_STORES": "/whatever"
  }
]'

@guerra2fernando guerra2fernando changed the title Improve Endpoint configuration [INTEGRATION] Improve Endpoint configuration Feb 15, 2024
@fnikolai fnikolai self-assigned this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant