Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 114 additions & 1 deletion .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4474,4 +4474,117 @@ AssetLib
PerformanceStudio
VkThread
precompiled
rollouts
rollouts
Bhusari
DLLAMA
FlameGraph
FlameGraphs
JSP
KBC
MMIO
Paravirtualized
PreserveFramePointer
Servlet
TDISP
VirtIO
WebSocket
agentpath
alarmtimer
aoss
apb
ata
bpf
brendangregg
chipidea
clk
cma
counterintuitive
cpuhp
cros
csd
devfreq
devlink
dma
dpaa
dwc
ecurity
edma
evice
filelock
filemap
flamegraphs
fsl
glink
gpu
hcd
hns
hw
hwmon
icmp
initcall
iomap
iommu
ipi
irq
jbd
jvmti
kmem
ksm
kvm
kyber
libata
libperf
lockd
mdio
memcg
mmc
mtu
musb
napi
ncryption
netfs
netlink
nfs
ntegrity
nterface
oom
optee
pagemap
paravirtualized
percpu
printk
pwm
qcom
qdisc
ras
rcu
regmap
rgerganov’s
rotocol
rpcgss
rpmh
rseq
rtc
sched
scmi
scsi
skb
smbus
smp
spi
spmi
sunrpc
swiotlb
tegra
thp
tlb
udp
ufs
untrusted
uring
virtio
vmalloc
vmscan
workqueue
xdp
xhci
24 changes: 15 additions & 9 deletions content/learning-paths/servers-and-cloud-computing/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ key_ip:
maintopic: true
operatingsystems_filter:
- Android: 2
- Linux: 154
- macOS: 10
- Linux: 157
- macOS: 11
- Windows: 14
pinned_modules:
- module:
Expand All @@ -22,8 +22,8 @@ subjects_filter:
- Containers and Virtualization: 29
- Databases: 15
- Libraries: 9
- ML: 28
- Performance and Architecture: 60
- ML: 29
- Performance and Architecture: 62
- Storage: 1
- Web: 10
subtitle: Optimize cloud native apps on Arm for performance and cost
Expand All @@ -47,6 +47,8 @@ tools_software_languages_filter:
- ASP.NET Core: 2
- Assembly: 4
- assembly: 1
- Async-profiler: 1
- AWS: 1
- AWS CDK: 2
- AWS CodeBuild: 1
- AWS EC2: 2
Expand All @@ -65,7 +67,7 @@ tools_software_languages_filter:
- C++: 8
- C/C++: 2
- Capstone: 1
- CCA: 6
- CCA: 7
- Clair: 1
- Clang: 10
- ClickBench: 1
Expand All @@ -77,18 +79,19 @@ tools_software_languages_filter:
- Daytona: 1
- Demo: 3
- Django: 1
- Docker: 17
- Docker: 18
- Envoy: 2
- ExecuTorch: 1
- FAISS: 1
- FlameGraph: 1
- Flink: 1
- Fortran: 1
- FunASR: 1
- FVP: 4
- GCC: 22
- gdb: 1
- Geekbench: 1
- GenAI: 11
- GenAI: 12
- GitHub: 6
- GitLab: 1
- Glibc: 1
Expand All @@ -114,7 +117,7 @@ tools_software_languages_filter:
- Linaro Forge: 1
- Litmus7: 1
- Llama.cpp: 1
- LLM: 9
- LLM: 10
- llvm-mca: 1
- LSE: 1
- MariaDB: 1
Expand All @@ -132,6 +135,7 @@ tools_software_languages_filter:
- Ollama: 1
- ONNX Runtime: 1
- OpenBLAS: 1
- OpenJDK-21: 1
- OpenShift: 1
- OrchardCore: 1
- PAPI: 1
Expand All @@ -144,7 +148,7 @@ tools_software_languages_filter:
- RAG: 1
- Redis: 3
- Remote.It: 2
- RME: 6
- RME: 7
- Runbook: 71
- Rust: 2
- snappy: 1
Expand All @@ -161,6 +165,7 @@ tools_software_languages_filter:
- TensorFlow: 2
- Terraform: 11
- ThirdAI: 1
- Tomcat: 1
- Trusted Firmware: 1
- TSan: 1
- TypeScript: 1
Expand All @@ -173,6 +178,7 @@ tools_software_languages_filter:
- Whisper: 1
- WindowsPerf: 1
- WordPress: 3
- wrk2: 1
- x265: 1
- zlib: 1
- Zookeeper: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ If everything was built correctly, you should see a list of all the available fl

Communication between the master node and the worker nodes occurs through a socket created on each worker. This socket listens for incoming data from the master—such as model parameters, tokens, hidden states, and other inference-related information.
{{% notice Note %}}The RPC feature in llama.cpp is not secure by default, so you should never expose it to the open internet. To mitigate this risk, ensure that the security groups for all your EC2 instances are properly configured—restricting access to only trusted IPs or internal VPC traffic. This helps prevent unauthorized access to the RPC endpoints.{{% /notice %}}
Use the following command to start the listeneing on the worker nodes:
Use the following command to start the listening on the worker nodes:
```bash
bin/rpc-server -p 50052 -H 0.0.0.0 -t 64
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ llama_perf_context_print: eval time = 77429.95 ms / 127 runs ( 609
llama_perf_context_print: total time = 79394.06 ms / 132 tokens
llama_perf_context_print: graphs reused = 0
```
That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the power of llama.cpp RPC functionality. The following table provides brief description of the metrics from `llama_perf`: <br><br>
That's it! You have successfully run the llama-3.1-8B model on CPUs with the power of llama.cpp RPC functionality. The following table provides brief description of the metrics from `llama_perf`: <br><br>

| Log Line | Description |
|-------------------|-----------------------------------------------------------------------------|
Expand All @@ -200,11 +200,11 @@ That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the powe
| eval time | Time to generate output tokens by forward-passing through the model. |
| total time | Total time for both prompt processing and token generation (excludes model load). |

Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for disributed inference:
Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for distributed inference:
```bash
bin/llama-server -m /home/ubuntu/model.gguf --port 8080 --rpc "$worker_ips" -ngl 99
```
At the very end of the output to the above command, you will see somethin like the following:
At the very end of the output to the above command, you will see something like the following:
```output
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Move the executable to somewhere in your PATH:
sudo cp wrk /usr/local/bin
```

3. Finally, you can run the benchamrk of Tomcat through wrk2.
3. Finally, you can run the benchmark of Tomcat through wrk2.
```bash
wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
```
Expand Down