一、环境准备

TinyLlama 是一个 1.1B 参数的轻量化 Llama 变体，可以在消费级 GPU（如 RTX 3060 / A100）上运行。官方模型来自 TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
。

安装依赖：

conda create -n tinyllama python=3.10 -y
conda activate tinyllama

pip install torch==2.2.1 transformers==4.43.3 accelerate sentencepiece bitsandbytes
pip install datasets tqdm
pip install torchprofile thop

二、模型加载与推理

使用 Hugging Face 的 transformers API 直接运行推理：

In [9]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Tinyllama/Tinyllama-1.1B-intermediate-step-1431k-3T"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

prompt = "Explain the concept of adaptive computation time in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Explain the concept of adaptive computation time in simple terms.

A: Adaptive computation time means the time required to compute an answer to a problem. 
For example, if you have to calculate a 32-bit integer sum from 2^32 to 2^64, then the time required for the computation is 2^64 - 2^32. 

A: In general,


三、性能分析（FLOPs 与推理时间）

可以测量单次推理的延迟与大致计算量：

In [None]:
import torch    
import time
from thop import profile

prompt = "What is adaptive reasoning in neural networks?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# 计时
start = time.time()
_ = model.generate(**inputs, max_new_tokens=64)
torch.cuda.synchronize()
print(f"Inference time: {time.time() - start:.3f} sec")

# FLOPs (仅前向)
flops, params = profile(model, inputs=(inputs["input_ids"],))
print(f"FLOPs: {flops/1e9:.2f} GFLOPs | Params: {params/1e6:.2f} M")

AssertionError: Torch not compiled with CUDA enabled

In [11]:
import torch
print(torch.cuda.is_available())  # True 表示成功
print(torch.version.cuda)         # 显示CUDA版本


False
None
