Skip to content

Conversation

bibikar
Copy link
Contributor

@bibikar bibikar commented Apr 26, 2019

@oleksandr-pavlyk

This PR contains both refactoring and adding new benchmarks since the last PR.

  • Universally read from .npy files as described in numpy's NEP-0001. note: The current reader npyfile.h used in native code does not really understand dtypes and will simply read the entire array into memory, relying on the caller's interpretation for this data.
  • Factor out all dataset generation into make_datasets.py
  • Move sklearn benches to sklearn/ for clarity
  • Add daal4py benches for correlation/cosine distances, linear/ridge regression, kmeans, SVM
  • Add native/sklearn benchmarks for logistic regression with L_BFGS solver and RandomForest regression and classification
  • Move kmeans.predict benchmarks into kmeans bench files
  • Update thread setting to use daal4py API in sklearn/daal4py benches
  • Update license headers

X_init = np.load(args.filei)
X_mult = np.vstack((X,) * args.data_multiplier)

kmeans = KMeans(n_clusters=10, n_jobs=int(args.num_threads), tol=1e-16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_jobs is not controlling the number of threads used by DAAL, if we are running this in IDP. What does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on line 30, the following code

num_threads, daal_version = bench.prepare_benchmark(args)

runs both daal4py.daalinit(nthreads=args.num_threads) and disables finiteness checking for sklearn.

Do we want to remove the n_jobs argument from KMeans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it, to ensure that upstream KMeans is run in parallel as well.

@bibikar bibikar merged commit 2bb768e into IntelPython:master May 8, 2019
razdoburdin pushed a commit to razdoburdin/scikit-learn_bench that referenced this pull request Jun 13, 2023
Updating with macOS build details, and adjusted build flow.
* fixing the links in the sphinx docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants