Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get stable results? #85

Closed
rsarai opened this issue Jan 6, 2021 · 4 comments
Closed

How to get stable results? #85

rsarai opened this issue Jan 6, 2021 · 4 comments

Comments

@rsarai
Copy link

rsarai commented Jan 6, 2021

Hello Folks,

thank you for all the work on this lib. I have a question about reproducibility: Is there a way to set a random seed or random state and get stable results?

I'm trying to achieve this with:

import random
import numpy
random.seed(42)
numpy.random.seed(42)

I'm aware that these are not threadsafe, so this may be the reason of the not reproducible results. Anyway, is there any way to enforce this?

@idroz
Copy link
Collaborator

idroz commented Jan 6, 2021

Hi there -

There are several sources of stochastic behaviour in Ivis (see issue #31 ).

Here's an example script that should provide reproducible results between Ivis runs.

Note that I tested this in Jupyter Notebook. If you're running ivis from shell, you'd need to set the PYTHONHASHSEED environment variable before running the script. Something like: PYTHONHASHSEED=0 python3 run_ivis.py

import os
os.environ["PYTHONHASHSEED"]="0"

import random
import numpy as np

import numpy as np
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors

from ivis import Ivis

iris = load_iris()
data = iris.data
target = iris.target

X = MinMaxScaler().fit_transform(data)

# Here we're creating a fixed NN matrix. For large out-of-memroy datasets, you can achieve the same
# with Ivis' Annoy functionality (https://bering-ivis.readthedocs.io/en/latest/api.html#neighbour-retrieval),
# i.e. build the index separately and then pass it into the Ivis constructor.
nbrs = NearestNeighbors(n_neighbors=5).fit(X)
distances, indices = nbrs.kneighbors(X)

model = Ivis(embedding_dims=2, k=5, batch_size=X.shape[0],
             neighbour_matrix=indices,
             n_epochs_without_progress=5, verbose=0)

model.fit(X)

embeddings = model.transform(X)

plt.scatter(embeddings[:, 0], embeddings[:, 1], c=target)

You should get this result:

image

@rsarai
Copy link
Author

rsarai commented Jan 7, 2021

Got it, thanks! From what I can see it will give very very similar representations but not exactly the same (and that's ok, just clarifying the behavior), like:
Figure_2
Figure_1

@idroz
Copy link
Collaborator

idroz commented Jan 7, 2021

Interesting - you should be getting identical results between each run, as long as all seeds are set before Ivis module is imported.

Are you using Ivis' built-in nearest neighbour search, or are you pre-building the nearest neighbour matrix?

Other contributing factors may be how different versions of python, tensorflow, and numpy handle RNG...

@rsarai
Copy link
Author

rsarai commented Jan 8, 2021

Are you using Ivis' built-in nearest neighbour search, or are you pre-building the nearest neighbour matrix?

  • I used the snippet you provided before.

Since you mentioned that the results should be identical, I checked some lib versions, I was using ivis 1.8.4 as soon as I updated to 2.0.1 the issue was gone. Thank you for your support.

@rsarai rsarai closed this as completed Jan 8, 2021
@Szubie Szubie mentioned this issue Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants