You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is where you fit a PCA on top of the embeddings obtained by your model:
# To determine the PCA matrix, we need some example sentence embeddings.# Here, we compute the embeddings for 20k random sentences from the AllNLI datasetpca_train_sentences=nli_sentences[0:20000]
train_embeddings=model.encode(pca_train_sentences, convert_to_numpy=True)
# Compute PCA on the train embeddings matrixpca=PCA(n_components=new_dimension)
pca.fit(train_embeddings)
pca_comp=np.asarray(pca.components_)
I was wondering if it would be better if first a LayerNorm with elementwise_affine=False was appended to the model to ensure PCA receives standardized inputs. I've extended sentence-transformer's models.LayerNorm so that it accepts additional args and kwargs for self.norm and performed this experiment on my own dataset (which I'm not at liberty to share unfortunately), and it seems to be performing better than plain PCA with no LayerNorm.
I was just wondering if somehow I got lucky with my particular data or if it's something to actually consider when performing dimensionality reduction.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
The idea is to not have to use an external preprocessing and have the model perform end-to-end forward passes natively. Which is why we use a dense layer here filled with the PCA components instead of pca.transforming new inputs every time and that means your first suggestion isn't desirable as it requires sklearn's normalize every time during inference.
Secondly, LayerNorm and torch.nn.functional.normalize (which is what happens if you set normalize_embeddings=True) do very different things. Since PCA is sensitive to data scale, it's a good practice to z-score standardize your data before fitting a PCA on top of it which is what LayerNorm with elementwise_affine=False does (or leaving elementwise_affine=True and never using it in training mode). Meanwhile, torch.nn.functional.normalize, simply divides each tensor by its $L_p$ norm to ensure that all tensors have unit length in $L_p$ space. I'm not sure if these two scenarios are mathematically equivalent from the point of view of PCA. I'm just pointing out the differences.
Hi,
I would to begin by thanking you for your tremendous work on this library.
I had a question regarding your dimensionality_reduction.py example.
This is where you fit a PCA on top of the embeddings obtained by your model:
I was wondering if it would be better if first a
LayerNorm
withelementwise_affine=False
was appended to the model to ensure PCA receives standardized inputs. I've extendedsentence-transformer
'smodels.LayerNorm
so that it accepts additional args and kwargs forself.norm
and performed this experiment on my own dataset (which I'm not at liberty to share unfortunately), and it seems to be performing better than plain PCA with no LayerNorm.I was just wondering if somehow I got lucky with my particular data or if it's something to actually consider when performing dimensionality reduction.
Thanks in advance!
The text was updated successfully, but these errors were encountered: