-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cargo fails to build on Apple M1 #10
Comments
Do you have cuda installed ? If yes, try setting |
And no, you shouldn't need cuda, but I may have messed up my options. It's going to be slow on M1, as there's no matmul acceleration on M1 for now, I focused on inteal and cuda. |
I have a working matmul acceleration using the Accelerate framework on M1, it's very fast, here is how you do it: #include <Accelerate/Accelerate.h>
void acc_sgemm(int m, int n, int k, float *A, float *B, float *C) {
//A[m][k]
//B[k][n]
//C[m][n]
cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, m, B, k, 0.0, C, m);
}
void acc_sgemm_t(int m, int n, int k, float *A, float *B, float *C) {
//A[k][m] (to be transposed)
//B[k][n]
//C[m][n]
cblas_sgemm(CblasColMajor, CblasTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, k, 0.0, C, m);
} and compile with:
Let's get it working first without acceleration. I don't have cuda, I don't think it even works on M1, correct? Do you know what I need to do to compile your code and run it? |
I fixed on |
It works now: $ cargo run --example run --release
Compiling serde v1.0.152
Compiling tokio-util v0.7.7
Compiling onig_sys v69.8.1
Compiling tower v0.4.13
Compiling tokio-native-tls v0.3.1
Compiling axum-core v0.3.2
Compiling derive_builder v0.12.0
Compiling unicode-normalization-alignments v0.1.12
Compiling esaxx-rs v0.1.8
Compiling rand v0.8.5
Compiling itertools v0.9.0
Compiling onig v6.4.0
Compiling thread-tree v0.3.3
Compiling tower-http v0.4.0
Compiling h2 v0.3.16
Compiling sharded-slab v0.1.4
Compiling tracing-log v0.1.3
Compiling futures-executor v0.3.26
Compiling thread_local v1.1.7
Compiling encoding_rs v0.8.32
Compiling sync_wrapper v0.1.2
Compiling ipnet v2.7.1
Compiling fast_gpt2 v0.1.0 (/Users/ondrej/repos/fast_gpt2)
Compiling matchit v0.7.0
Compiling unicode_categories v0.1.1
Compiling rawpointer v0.2.1
Compiling base64 v0.21.0
Compiling matrixmultiply v0.3.2
Compiling tracing-subscriber v0.3.16
Compiling futures v0.3.26
Compiling tower-http v0.3.5
Compiling memmap2 v0.5.10
Compiling serde_json v1.0.93
Compiling serde_urlencoded v0.7.1
Compiling serde_path_to_error v0.1.9
Compiling spm_precompiled v0.1.4
Compiling safetensors v0.2.9 (https://github.com/huggingface/safetensors#488d945c)
Compiling tokenizers v0.13.2 (https://github.com/huggingface/tokenizers?branch=main#ac552ff8)
Compiling hyper v0.14.24
Compiling hyper-tls v0.5.0
Compiling axum v0.6.9
Compiling reqwest v0.11.14
Finished release [optimized] target(s) in 36.00s
Running `target/release/examples/run`
Downloading "https://huggingface.co/gpt2/resolve/main/model.safetensors" into "model-gpt2.safetensors"
Safetensors 108.405665s
Downloading "https://huggingface.co/gpt2/resolve/main/tokenizer.json" into "tokenizer-gpt2.json"
Tokenizer 109.764477208s
Loaded & encoded 110.203344708s
Loop in 83.535791ms
Loop in 81.04725ms
Loop in 80.296291ms
Loop in 79.598666ms
Loop in 82.383833ms
Loop in 79.87375ms
Loop in 80.236916ms
Loop in 80.190708ms
Loop in 80.888ms
Loop in 79.666916ms
Result Ok("My name is John. I'm a man of God. I")
Total Inference 111.011249708s Thanks for the fix! |
|
Somehow |
Here is what I got:
Do you require a GPU to run
fast_gpt2
?The text was updated successfully, but these errors were encountered: