Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable chunked_forward() on AVX512 CPUs #179

Merged
merged 7 commits into from
Jan 4, 2023

Conversation

borzunov
Copy link
Collaborator

@borzunov borzunov commented Jan 4, 2023

See benchmarks in #176.

if self.word_embeddings.weight.numel() * 4 < 0.9 * psutil.virtual_memory().total:
logger.warning(
"Running the client with dtype bfloat16 on CPU may be slow, since your CPU doesn't support AVX512. "
"Consider loading the model with torch_dtype='float32'"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, we can also suggest to use fast_but_approximate=True here, which will run faiss under the hood for greedy and top_k generation

@borzunov borzunov force-pushed the no-chunked-forward-on-avx512 branch from 203e2ca to ce76beb Compare January 4, 2023 18:20
@borzunov borzunov force-pushed the no-chunked-forward-on-avx512 branch from ce76beb to 7880687 Compare January 4, 2023 18:25
@borzunov borzunov merged commit 5569838 into main Jan 4, 2023
@borzunov borzunov deleted the no-chunked-forward-on-avx512 branch January 4, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant