-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow embeddings reads from csv file format #71
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
src/phoenix/validation/errors.py
Outdated
class MissingVectorColumn(ValidationError): | ||
def __init__(self, col: str) -> None: | ||
self.missing_col = col | ||
|
||
def error_message(self) -> str: | ||
return ( | ||
f"The embedding vector column {self.missing_col} is declared in the schema " | ||
"but is not found in the dataframe." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure sub-classing for every case makes sense here - this will resort in too many error types - can you just use the DatasetError below?
Also you are leaking internals by using dataframe
in the message - it's not a huge deal but the caller doesn't need to know that we are using dataframes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure sub-classing for every case makes sense here - this will resort in too many error types - can you just use the DatasetError below?
See if you like it better now. There are specific errors for each situation. What I like about this approach is that the message stays consistent, instead of a developer changing the message 2 months from now to say the same thing.
Also you are leaking internals by using
dataframe
in the message - it's not a huge deal but the caller doesn't need to know that we are using dataframes.
Would you like it better if it said not found in data
? Besides from this, most of our code has column_name(s)
at the end. It gives a pretty good indication we are using tables. Just raising in case you would like me to change that in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Mikyo King <mikyo@arize.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Much cleaner.
|
||
@classmethod | ||
def from_dataframe(cls, dataframe: DataFrame, schema: Schema): | ||
return cls(dataframe, schema) | ||
|
||
@classmethod | ||
def from_csv(cls, filepath: str, schema: Schema): | ||
return cls(read_csv(filepath), schema) | ||
dataframe: DataFrame = read_csv(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind adding a small unit test here? - might be a good excuse to split out the parsing from the file system read.
from .types import Schema | ||
|
||
|
||
def validate_dataset_inputs(dataframe: DataFrame, schema: Schema) -> List[err.ValidationError]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good opportunity for unit tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up in #73
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
* main: feat: Allow embeddings reads from csv file format (#71)
Closes #22, #61