Ollama 是一个开源项目，旨在简化在本地机器上运行大型语言模型（LLMs）的过程。它提供用户友好的界面和功能，使先进的人工智能技术变得易于获取且可定制。

**安装 Ollama**

```sh
curl -fsSL https://ollama.com/install.sh | sh
```

如果你发现 Ollama 没有默认调用你的 GPU，或者你通过容器（如 Docker）运行 Ollama，你可以通过设置此变量来显式设置 GPU 的分配策略。

操作步骤：
- 设置环境变量：`export OLLAMA_NUM_GPU=-1`
- 启动 Ollama：`ollama serve`

> 可以通过 `ollama list` 查看可用模型。

要使用 Ollama，它需要作为后台服务与脚本并行运行。由于 Jupyter Notebook 设计为顺序执行代码块，这使得同时运行两个代码块变得困难。作为变通方案，我们将使用 Python 的 subprocess 创建服务，确保其不会阻塞任何单元格的执行。

通过命令 `ollama serve` 可启动服务。

`time.sleep(5)` 添加了延迟，确保 Ollama 服务启动完成后再下载模型。

In [17]:
import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

time=2026-01-12T10:01:39.070Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/codespace/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAM

**拉取模型**

使用 `ollama pull qwen3:0.6b` 下载 LLM 模型。

其他模型请访问 https://ollama.com/library

In [18]:
!ollama pull qwen3:0.6b

[GIN] 2026/01/12 - 10:01:52 | 200 |      50.745µs |       127.0.0.1 | HEAD     "/"
[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[GIN] 2026/01/12 - 10:01:53 | 200 |  962.297607ms |       127.0.0.1 | POST     "/api/pull"
[?2026h[?25l[1Gpulling manifest [K
pulling 7f4030143c1c: 100% ▕██████████████████▏ 522 MB                         [K
pulling ae370d884f10: 100% ▕██████████████████▏ 1.7 KB                         [K
pulling d18a5cc71b84: 100% ▕██████████████████▏  11 KB                         [K
pulling cff3f395ef37: 100% ▕██████████████████▏  120

In [10]:
# Global dependencies required by this notebook:
%pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.2.1-py3-none-any.whl.metadata (25 kB)
Downloading python_dotenv-1.2.1-py3-none-any.whl (21 kB)
Installing collected packages: python-dotenv
[0mSuccessfully installed python-dotenv-1.2.1
Note: you may need to restart the kernel to use updated packages.


In [12]:
import os
from IPython.display import Markdown
from dotenv import load_dotenv

load_dotenv()

True

**下面是 ollama SDK 调用模型简单的例子**

In [1]:
%pip install ollama

Collecting ollama
  Downloading ollama-0.6.1-py3-none-any.whl.metadata (4.3 kB)
Collecting pydantic>=2.9 (from ollama)
  Downloading pydantic-2.12.5-py3-none-any.whl.metadata (90 kB)
Collecting annotated-types>=0.6.0 (from pydantic>=2.9->ollama)
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.41.5 (from pydantic>=2.9->ollama)
  Downloading pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting typing-inspection>=0.4.2 (from pydantic>=2.9->ollama)
  Downloading typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB)
Downloading ollama-0.6.1-py3-none-any.whl (14 kB)
Downloading pydantic-2.12.5-py3-none-any.whl (463 kB)
Downloading pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m37.8 MB/s[0m  [33m0:00:00[0m
[?25hDownloading annotated_types-0.7.0-py3-none-any.

In [19]:
from ollama import Client

client = Client(host="http://localhost:11434")

resp = client.chat(
    model="qwen3:0.6b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "请解释一下机器学习的基本概念/no_think"},
    ],
)

display(Markdown(resp["message"]["content"]))

time=2026-01-12T10:02:06.421Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax"
time=2026-01-12T10:02:06.495Z level=INFO source=server.go:245 msg="enabling flash attention"
time=2026-01-12T10:02:06.495Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/codespace/.ollama/models/blobs/sha256-7f4030143c1c477224c5434f8272c662a8b042079a0a584f0a27a1684fe2e1fa --port 41917"
time=2026-01-12T10:02:06.496Z level=INFO source=sched.go:443 msg="system memory" total="15.6 GiB" free="11.3 GiB" free_swap="0 B"
time=2026-01-12T10:02:06.496Z level=INFO source=server.go:746 msg="loading model" "model layers"=29 requested=-1
time=2026-01-12T10:02:06.507Z level=INFO source=runner.go:1405 msg="starting ollama engine"
time=2026-01-12T10:02:06.508Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:41917"
time=2026-01-12T10:02:06.518Z leve

[GIN] 2026/01/12 - 10:02:16 | 200 | 10.066402388s |       127.0.0.1 | POST     "/api/chat"


机器学习是一种人工智能技术，旨在让计算机从数据中学习规律和模式，从而做出预测或决策。核心概念包括：

1. **数据**：训练模型的基础，包括原始数据集和标注信息（如图片中的类别标签）。
2. **算法**：用于处理数据并生成预测或决策的数学方法（如线性回归、决策树等）。
3. **目标**：模型的最终目的是识别规律、预测结果或优化决策过程。
4. **应用场景**：广泛应用于图像识别、推荐系统、自然语言处理等领域。

简单来说，机器学习就像“用数据训练智能体”，让它从经验中“学习”如何解决问题。

**下面是使用 `langchain-openai` 调用模型简单的例子**

In [7]:
%pip install langchain[openai]

Collecting langchain[openai]
  Downloading langchain-1.2.3-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-core<2.0.0,>=1.2.1 (from langchain[openai])
  Downloading langchain_core-1.2.7-py3-none-any.whl.metadata (3.7 kB)
Collecting langgraph<1.1.0,>=1.0.2 (from langchain[openai])
  Downloading langgraph-1.0.5-py3-none-any.whl.metadata (7.4 kB)
Collecting langchain-openai (from langchain[openai])
  Downloading langchain_openai-1.1.7-py3-none-any.whl.metadata (2.6 kB)
Collecting jsonpatch<2.0.0,>=1.33.0 (from langchain-core<2.0.0,>=1.2.1->langchain[openai])
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<1.0.0,>=0.3.45 (from langchain-core<2.0.0,>=1.2.1->langchain[openai])
  Downloading langsmith-0.6.2-py3-none-any.whl.metadata (15 kB)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-core<2.0.0,>=1.2.1->langchain[openai])
  Downloading tenacity-9.1.2-py3-none-any.whl.metadata (1.2 kB)
Collecting uuid-utils<1.0,>=0.12.0 (from lang

In [None]:
from langchain.chat_models.base import init_chat_model
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers.string import StrOutputParser

template = """Question: {question}
Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = init_chat_model(
    model="qwen3:0.6b",
    model_provider="openai",
    base_url="http://localhost:11434/v1",
    api_key=os.getenv("API_KEY"),
)

chain = prompt | model | StrOutputParser()

display(Markdown(chain.invoke({"question": "What's the length of hypotenuse in a right angled triangle"})))

The length of the hypotenuse in a right-angled triangle can be determined using **Pythagoras' Theorem**, which states that in a right-angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides:  

$$
c^2 = a^2 + b^2
$$  

Where:  
- $ c $ is the length of the hypotenuse,  
- $ a $ and $ b $ are the lengths of the other two sides.  

Since no specific values are provided, the length of the hypotenuse is expressed using this formula rather than numerical values.  

**Final Answer:**  
The length of the hypotenuse is given by the formula $ c^2 = a^2 + b^2 $.  

For example, if $ a = 3 $ and $ b = 4 $, the hypotenuse $ c $ would be:  
$$
c = \sqrt{3^2 + 4^2} = \sqrt{25} = 5.
$$  

This means the final answer is left in the format $ c^2 $ as requested.