Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'psx' is not defined in get_noise_indices() - issue for WINDOWS python users #16

Closed
stonk97 opened this issue Nov 22, 2019 · 11 comments
Closed

Comments

@stonk97
Copy link

stonk97 commented Nov 22, 2019

This is my code:

if __name__ == '__main__':
    .
    .
    .
    est_py, est_nm, est_inv, confident_joint, my_psx=estimate_py_noise_matrices_and_cv_pred_proba(
    X=X_train,
    s=train_labels_with_errors,
    clf = GaussianNB()
    )
    label_errors = get_noise_indices(train_labels_with_errors,my_psx,verbose=1)

I'm still getting this error, even if psx is declared as global in pruning.py

Traceback (most recent call last):
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\site-packages\cleanlab\pruning.py", line 109, in _prune_by_count
    noise_mask = np.zeros(len(psx), dtype=bool)
NameError: name 'psx' is not defined


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\Jacopo\.vscode\extensions\ms-python.python-2019.11.49689\pythonFiles\ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "c:\Users\Jacopo\.vscode\extensions\ms-python.python-2019.11.49689\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 432, in main
    run()
  File "c:\Users\Jacopo\.vscode\extensions\ms-python.python-2019.11.49689\pythonFiles\lib\python\old_ptvsd\ptvsd\__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\Users\Jacopo\Google Drive\TesiLulli\MachineLearning_Python\3classi\3 classi no BNP\CleanLab\dirty.py", line 28, in <module>
    label_errors = get_noise_indices(train_labels_with_errors,my_psx,verbose=1)
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\site-packages\cleanlab\pruning.py", line 336, in get_noise_indices
    noise_masks_per_class = p.map(_prune_by_count, range(K))
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\Jacopo\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
NameError: name 'psx' is not defined
@awoloshuk
Copy link

I also have the same issue

@UCASREN
Copy link

UCASREN commented Nov 26, 2019

@cgnorthcutt I also have the same issue

@cgnorthcutt
Copy link
Member

cgnorthcutt commented Nov 26, 2019

Hi folks @UCASREN @awoloshuk @stonk97 Thanks for sharing. I am unable to reproduce your error. here is a full working (for me) set of code, intended to reproduce your error.

import cleanlab
from cleanlab.latent_estimation import estimate_py_noise_matrices_and_cv_pred_proba
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits
from cleanlab.pruning import get_noise_indices

data = load_digits()

X_train = data['data']
y_train = data['target']

est_py, est_nm, est_inv, confident_joint, my_psx = estimate_py_noise_matrices_and_cv_pred_proba(
    X=X_train,
    s=y_train,
    clf = GaussianNB()
)

label_errors = get_noise_indices(
    y_train,
    my_psx,
    verbose=1,
)

However, I have no issues running this code. Can you give me more information for each of you about why you're having the error? Also can you each include python version, how you installed cleanlab, os version, etc. after running exactly the code above?

@UCASREN
Copy link

UCASREN commented Nov 26, 2019

I install cleanlab with 'pip install cleanlab', python version is 3.7.4, and Win10 environment. When I run cleanlab-master/examples/iris_simple_example.ipynb, it gets below errors:

`
WITHOUT confident learning, Iris dataset test accuracy: 0.6

Now we show the improvement using confident learning to characterize the noise
and learn on the data that is (with high confidence) labeled correctly.

WITH confident learning (noise matrix given),


RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\pruning.py", line 109, in _prune_by_count
noise_mask = np.zeros(len(psx), dtype=bool)
NameError: name 'psx' is not defined
"""

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last)
in
9 print()
10 print('WITH confident learning (noise matrix given),', end=" ")
---> 11 _ = rp.fit(X_train, s, noise_matrix=noise_matrix)
12 pred = rp.predict(X_test)
13 print("Iris dataset test accuracy:", round(accuracy_score(pred, y_test),2))

D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\classification.py in fit(self, X, s, psx, thresholds, noise_matrix, inverse_noise_matrix)
295 inverse_noise_matrix = self.inverse_noise_matrix,
296 confident_joint = self.confident_joint,
--> 297 prune_method = self.prune_method,
298 )
299

D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\pruning.py in get_noise_indices(s, psx, inverse_noise_matrix, confident_joint, frac_noise, num_to_remove_per_class, prune_method, sorted_index_method, multi_label, n_jobs, verbose)
334 )
335 else:
--> 336 noise_masks_per_class = p.map(_prune_by_count, range(K))
337 else: # n_jobs = 1, so no parallelization
338 noise_masks_per_class = [_prune_by_count(k) for k in range(K)]

D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):

D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):

NameError: name 'psx' is not defined
`

@cgnorthcutt , thank you!

@cgnorthcutt
Copy link
Member

@UCASREN can you please run my example above?

@cgnorthcutt
Copy link
Member

cgnorthcutt commented Nov 26, 2019

SOLUTION

Also can everyone @stonk97 @awoloshuk @UCASREN please try changing the pruning line to

label_errors = get_noise_indices(
    y_train,
    my_psx,
    n_jobs=1,  # BE SURE TO ADD THIS -- it turns off multiprocessing
)

where i have turned off multiprocessing. My guess is that multiprocessing in python is not working correctly with Windows. For now, this work-around (turning off multiprocessing) should solve your issue.

@UCASREN
Copy link

UCASREN commented Nov 26, 2019

@cgnorthcutt yes, when I change the pruning line, it is working properly. Thank you very much~

@stonk97
Copy link
Author

stonk97 commented Nov 26, 2019

@cgnorthcutt the code you wrote runs perfectly! I think you spotted the problem...Windows
Thank you very much!

@stonk97 stonk97 closed this as completed Nov 26, 2019
@cgnorthcutt cgnorthcutt changed the title 'psx' is not defined in get_noise_indices() 'psx' is not defined in get_noise_indices() - issue for WINDOWS python users Nov 26, 2019
@kagalkot
Copy link

kagalkot commented Dec 21, 2019

@cgnorthcutt , I am implementing Rank Pruning algorithm on IRIS dataset, after using below mentioned code, "NameError: name 'psx' is not defined" is resolved. but now it is "NameError: name 'my_psx' is not defined"
label_errors = get_noise_indices(
y_train,
my_psx,
n_jobs=1, # BE SURE TO ADD THIS -- it turns off multiprocessing
)

@cgnorthcutt
Copy link
Member

Yes because my_psx does not exist. You should write psx=my_psx

@cgnorthcutt
Copy link
Member

cgnorthcutt commented Feb 17, 2020

@kagalkot @stonk97 @awoloshuk @UCASREN Issues using Windows should be fixed. The main commit with the fix is cgnorthcutt@cf93242.

cleanlab now supports Windows multiprocessing and Python 3.4, 3.5, 3.6, 3.7 natively.

Upgrade your cleanlab version to 0.1.1 for the fix. To update just type pip install cleanlab in your terminal and it should install the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants