## 单机多卡训练
- 采用`paddle.distributed.spawn`函数来启动单机多卡训练
- 同时原有的`paddle.distributed.launch`依然保存

**launch启动**

In [None]:
# 高层API场景

# 单机单卡启动，默认使用第0号卡
$ python train.py

# 单机多卡启动，默认使用当前可见的所有卡
$ python -m paddle.distributed.launch train.py

# 单机多卡启动，设置当前使用的第0号和第1号卡
$ python -m paddle.distributed.launch --gpus='0,1' train.py

# 单机多卡启动，设置当前使用第0号和第1号卡
$ export CUDA_VISIBLE_DEVICES=0,1
$ python -m paddle.distributed.launch train.py

In [None]:
# 基础API场景：需要修改单机单卡的代码3处
import paddle
import paddle.distributed as dist # 1. 导入相关包

# 加载数据集
train_dataset = paddle.vision.datasets.MNIST(mode='train')
test_dataset = paddle.vision.datasets.MNIST(mode='test')

# 2. 初始化并行环境
dist.init_parallel_env()

# 定义网络结构
mnist = paddle.nn.Sequential(
    paddle.nn.Flatten(1, -1),
    paddle.nn.Linear(784, 512),
    paddle.nn.ReLU(),
    paddle.nn.Dropout(0.2),
    paddle.nn.Linear(512, 10)
)

# 用 DataLoader 实现数据加载
train_loader = paddle.io.DataLoader(train_dataset, batch_size=32, shuffle=True)

# 3. 增加paddle.DataParallel封装
mnist = paddle.DataParallel(mnist)
mnist.train()

# 设置迭代次数
epochs = 5

# 设置优化器
optim = paddle.optimizer.Adam(parameters=mnist.parameters())
for epoch in range(epochs):
    for batch_id, data in enumerate(train_loader()):
        x_data = data[0]            # 训练数据
        y_data = data[1]            # 训练数据标签
        predicts = mnist(x_data)    # 预测结果
        # 计算损失 等价于 prepare 中loss的设置
        loss = paddle.nn.functional.cross_entropy(predicts, y_data)
        # 计算准确率 等价于 prepare 中metrics的设置
        acc = paddle.metric.accuracy(predicts, y_data)
        # 下面的反向传播、打印训练信息、更新参数、梯度清零都被封装到 Model.fit() 中
        # 反向传播
        loss.backward()
        if (batch_id+1) % 1800 == 0:
            print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy()))
        # 更新参数
        optim.step()
        # 梯度清零
        optim.clear_grad() 

Exception in thread Thread-10:
Traceback (most recent call last):
  File "D:\anaconda\envs\lc-or\lib\threading.py", line 973, in _bootstrap_inner
    self.run()
  File "D:\anaconda\envs\lc-or\lib\threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "D:\anaconda\envs\lc-or\lib\site-packages\paddle\fluid\dataloader\dataloader_iter.py", line 212, in _thread_loop
    batch = self._dataset_fetcher.fetch(indices,
  File "D:\anaconda\envs\lc-or\lib\site-packages\paddle\fluid\dataloader\fetcher.py", line 134, in fetch
    data = self.collate_fn(data)
  File "D:\anaconda\envs\lc-or\lib\site-packages\paddle\fluid\dataloader\collate.py", line 77, in default_collate_fn
    return [default_collate_fn(fields) for fields in zip(*batch)]
  File "D:\anaconda\envs\lc-or\lib\site-packages\paddle\fluid\dataloader\collate.py", line 77, in <listcomp>
    return [default_collate_fn(fields) for fields in zip(*batch)]
  File "D:\anaconda\envs\lc-or\lib\site-packages\paddle\fluid

In [None]:
# 修改完后保存文件，然后使用跟高层API相同的启动方式即可。
# 单机多卡启动，默认使用当前可见的所有卡
$ python -m paddle.distributed.launch train.py

# 单机多卡启动，设置当前使用的第0号和第1号卡
$ python -m paddle.distributed.launch --gpus '0,1' train.py

# 单机多卡启动，设置当前使用第0号和第1号卡
$ export CUDA_VISIBLE_DEVICES=0,1
$ python -m paddle.distributed.launch train.py  

**spawn**方式启动
- launch方式启动，以文件为单位启动多进程，需要用户在启动时调用paddle.distributed.launch，对进程管理较高
- spawn可以更好的控制进程，在打印日志，退出进程时更友好

In [None]:
from __future__ import print_function

import paddle
import paddle.nn as nn
import paddle.optimizer as opt
import paddle.distributed as dist

