-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pointcloud): Add HDBSCAN to point cloud #57
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
src/phoenix/pcloud/pcloud.py
Outdated
float(point.y), | ||
float(point.z), | ||
] | ||
"clusterId": point.cluster_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to propagate point metadata to the UI by populating a metaData
field that is a dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean each point will have a metadata field the same as position
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on this with #63
src/phoenix/pcloud/projectors.py
Outdated
|
||
@dataclass(frozen=True) | ||
class UMAPProjector: | ||
hyperparameters: Dict[str, Union[int, float, str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make private - otherwise the programmer may think you can modify directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on providing a reasonable default for the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if we want to pass defaults and/or run proper validation, we should create a separate class UMAPHyperparameter
, and put defaults and validations in it. It would be much more transparent and easier. The downside is then we would have to implement all the options of the UMAP class, which I think is not ideal.
The reason I went with this approach is that we don't need default values. When we call UMAP we use
UMAP(**hyperparameters)
. That command expands the dictionary setting the values specified and leaving the rest as default. So, our defaults are UMAP's defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About making it private -- It is already done, by using dataclasses
with frozen=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About making it private -- It is already done, by using dataclasses with frozen=true
Nice
As for the type - the problem from a DX perspective is that there now is no type hints for these values - making this projector
harder to use as I don't have documentation on the parameters. pyright
actually catches the boundary issues - not 100% sure how. Thoughts on just making a strict type for the parameters we think need tuning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to create another class for the hyperparameters and have a private method that turns the class into a dictionary. To UMAP eyes it will be the same. To our users, it will be mostly the same and we can address new hyperparameters as they are requested. Or directly setting them all up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up in #64
src/phoenix/pcloud/pcloud.py
Outdated
self.prediction_label = prediction_label | ||
# self.prediction_score = prediction_score | ||
self.actual_label = actual_label | ||
# self.actual_score = actual_score | ||
self.raw_text_data = raw_text_data | ||
# self.link_to_data = link_to_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than having flat attributes, it might be good to structure these constructs a bit in a class or dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
particularly since feature data will need to be serialized - Unlike with the arize platform, all data needs to be serialized in bulk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think it deserves a little conversation though. Don't know what direction to go particularly. I am tempted to call it metadata class since it will go in the metadata part of the json. But not too in love with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inference or Event attributes seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up in #65
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
…eat/add-hdbscan * 'feat/add-hdbscan' of github.com:Arize-ai/phoenix: Update src/phoenix/pcloud/projectors.py Update src/phoenix/pcloud/pcloud.py
Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikyo@arize.com>
8cdb046
to
0970ab3
Compare
…eat/add-hdbscan * 'feat/add-hdbscan' of github.com:Arize-ai/phoenix: black more fixes fix types: Update src/phoenix/pcloud/projectors.py Update src/phoenix/pcloud/pcloud.py Delete _fit_transform method Fix json int bug Fix style Add HDBSCAN clustering Restore umap_demo deleted fix: python types (#55)
y: float | ||
|
||
def get_coordinates(self): | ||
return [float(self.x), float(self.y)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confused on the casting to float here - is it for type safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because the default coordinates spit from UMAP and HDBSCAN have been float32
and, if left like that, you get the error that is not JSON serializable. I am open to other solutions.
Co-authored-by: Mikyo King <mikyo@arize.com>
resolves #46