Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed extra conditions in _auto_identify_connection_string method #2038

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 7 additions & 17 deletions superduperdb/base/superduper.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import re
import typing as t
from re import match as re_match

from superduperdb.base.configs import CFG

Expand All @@ -24,26 +24,16 @@ def superduper(item: t.Optional[t.Any] = None, **kwargs) -> t.Any:


def _auto_identify_connection_string(item: str, **kwargs) -> t.Any:
from superduperdb.base.build import build_datalayer

if item.startswith('mongomock://'):
kwargs['data_backend'] = item
if re_match(r'^[a-zA-Z0-9]+://', item) is None:
raise ValueError(f'{item} is not a valid connection string')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use re.match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jieguangzhou it is re.match method I have created alias which is being used since we are only using single method from the re package.


elif item.startswith('mongodb://'):
kwargs['data_backend'] = item
if item.endswith('.csv') and CFG.cluster.cdc.uri is not None:
raise TypeError('Pandas is not supported in cluster mode!')

elif item.startswith('mongodb+srv://') and 'mongodb.net' in item:
kwargs['data_backend'] = item
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ChandanChainani , we need to check this pattern, because re_match(r'^[a-zA-Z0-9]+://', 'mongodb+srv://') is None == Ture, then will raise an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jieguangzhou for this type of conditions we can improve by creating single condition which validates all the supported url types.

For example:

supported_url_types = ["mongodb+srv://", "mongodb://", "mongomock://"]
if not any(item.startswith(url_type) for url_type in supported_url_types):
    raise Exception("url format not supported")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, could you update the pr with these changes?

kwargs['data_backend'] = item

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChandanChainani thanks for the PR. Are you sure that these can be dropped like this? Do you get passing tests?

Copy link
Contributor Author

@ChandanChainani ChandanChainani May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @blythed, Yes, except for llm and doc_string related test all the other tests are validated

Command

pytest ./test/unittest/ -svv --ignore=test/unittest/ext/test_llama_cpp.py --ignore=test/unittest/ext/llm --ignore=test/unittest/ext/transformers/test_llm.py --ignore=test/unittest/ext/transformers/test_llm_training.py --ignore=test/unittest/ext/transformers/test_transformers.py

Output

=============================================== short test summary info ===============================================
=============================================== short test summary info ===============================================
FAILED test/unittest/cli/test_cli.py::test_cli_info - subprocess.CalledProcessError: Command '('python', '-m', 'superduperdb', 'info')' returned non-zero exit status 1.
ERROR test/unittest/backends/ibis/test_query.py::test_renamings - NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\...\\AppData\\Local\\Temp\\tmpi9bnoy_0\\test.ddb'
=========================== 1 failed, 194 passed, 1112 skipped, 1 error in 85.44s (0:01:25) ===========================```

Copy link
Contributor Author

@ChandanChainani ChandanChainani May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blythed

I have update the code and there is one additional condition that I think can be added which is to make sure the user only supply supported db/file uri format please suggest (for supported db/file format we can maintain constants)

elif item.endswith('.csv'):
if CFG.cluster.cdc.uri is not None:
raise TypeError('Pandas is not supported in cluster mode!')
kwargs['data_backend'] = item
from superduperdb.base.build import build_datalayer

else:
if re.match(r'^[a-zA-Z0-9]+://', item) is None:
raise ValueError(f'{item} is not a valid connection string')
kwargs['data_backend'] = item
return build_datalayer(CFG, **kwargs)


Expand Down
Loading