digoal
2023-03-17
PostgreSQL , PolarDB , nvidia , debian , aigc , cuda
一台家里的老笔记本, 配置还可以(i7 8550u, 16G内存, nvme SM961 256G SSD, Nvidia MX150 GPU), 拿出来当cuda服务器测试用.
用于AIGC, PG-hetro的测试
https://github.com/heterodb/pg-strom
《PostgreSQL GPU 加速(HeteroDB pg_strom) (GPU计算, GPU-DIO-Nvme SSD, 列存, GPU内存缓存)》
《[未完待续] PostgreSQL HeteroDB GPU 加速 - pl/cuda , pg-strom , heterodb》
《使用 PGStrom 2 (GPU JOIN, BulkScan, GpuPreAgg, ...)》
- 1 下载debian 11 gnome-non-free x86_64镜像, 制作debian USB安装介质;
- 2 安装debian, 配置无线网络, 优化无线网络, 设置默认启动非图形界面, 安装openssh-server, 配置sshd;
- 3 编写MacOS 自动ssh连接debian脚本;
1、下载debian, 选择就近、网络带宽较大的网站下载:
注意debian是自由软件, 商业软件的firmware包都没有打包到debian的原生镜像中, 导致使用原生debian iso可能安装过程还需要额外提供firmware, 非常麻烦, 幸好有non-free版本, 会带上这些firmware. 所以我们下载gnome + non-free iso来制作usb 安装盘.
debian-live-11.6.0-amd64-gnome+nonfree.iso
2、检查checksum, 对比debian给出的sum是否一致, 确保下载的iso文件正确.
shasum -a 256 ./debian-live-11.6.0-amd64-gnome+nonfree.iso
3、插入usb, 并 umount usb(注意不是推出.)
diskutil list
/dev/disk2 为u盘设备号
digoaldeAir:Downloads digoal$ diskutil umountDisk /dev/disk2
Unmount of all volumes on disk2 was successful
4、将镜像写入usb, 使用 /dev/rdisk
更快.
digoaldeAir:Downloads digoal$ sudo dd if=./debian-live-11.6.0-amd64-gnome+nonfree.iso of=/dev/rdisk2 bs=1m
Password:
3649+1 records in
3649+1 records out
3826831360 bytes transferred in 331.014593 secs (11560914 bytes/sec)
为什么呢? 在OS X中,每个磁盘在/dev
中可能有两个路径引用:
/dev/disk
#是缓冲设备,这意味着要发送的所有数据都经过了额外的处理。/dev/rdisk
#是原始路径,速度更快。
在Class 4 SD卡上,使用rdisk路径的差异大约快20倍。
5、dd完成后, 推出usb
1、配置bios, 使用uefi模式引导
- bios 打开 security boot (使用uefi, 不开启tpm, 使用insecure boot.)
后面安装时才能正常引导, 理论上使用bios传统模式, 配置mbr分区也能引导, 但是不知道撒情况反正我这没成功. (uefi需使用gpt分区表, 同时使用efi分区)
/dev/nvme0n1p1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)
/dev/nvme0n1p1 511M 5.8M 506M 2% /boot/efi
2、使用USB设备启动, 安装debian 11, 安装时不要配置无线网络, 跳过即可. 如果你配置了网络, 在更新security 包时使用国外源会特别慢.
可以在安装好之后, 配置国内的源再更新.
3、安装完毕, 移除usb后, 重启, 完成安装.
4、配置国内的apt源.
vi /etc/apt/sources.list
deb https://mirrors.nju.edu.cn/debian/ bullseye main non-free contrib
deb-src https://mirrors.aliyun.com/debian/ bullseye main non-free contrib
deb https://mirrors.aliyun.com/debian-security/ bullseye-security main
deb-src https://mirrors.aliyun.com/debian-security/ bullseye-security main
deb https://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
deb-src https://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
deb https://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib
deb-src https://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib
apt update
apt-get reinstall apt-transport-https ca-certificates
sed -i "s@http://mirrors.nju.edu.cn@https://mirrors.nju.edu.cn@g" /etc/apt/sources.list
apt update
5、安装openssh-server 并配置 sshd 服务.
apt-get install openssh-server
配置sshd:
vi /etc/ssh/sshd_config
去除如下几行的注释:
Port 22
ListenAddress 0.0.0.0
PasswordAuthentication yes
PermitRootLogin yes
TCPKeepAlive yes
重启sshd服务:
systemctl restart ssh.service
可以在图形界面用NetManager(或者在shell中使用nmcli配置)配置无线网卡, 建议配置静态地址, 静态DNS.
这里开始就可以用第四步的shell远程登录了, 更加方便.
6、配置默认启动到命令行界面
https://blog.csdn.net/joker00007/article/details/120658259
sudo systemctl set-default multi-user.target
如果要改回来图形界面启动:
sudo systemctl set-default graphical.target
如果要在命令行界面启动图形界面:
startx
7、关闭无线网卡电源管理, 获得更稳定的无线网络性能
查询无线网卡设备名
iwconfig
wlp5s0 IEEE 802.11 ESSID:"Redmi_keting_5G"
Mode:Managed Frequency:5.805 GHz Access Point: 28:D1:27:5A:A3:A8
Bit Rate=433.3 Mb/s Tx-Power=22 dBm
Retry short limit:7 RTS thr:off Fragment thr:off
Power Management:off
Link Quality=55/70 Signal level=-55 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:43 Missed beacon:0
临时关闭无线网卡电源管理
sudo iwconfig wlp5s0 power off
永久关闭无线网卡电源管理
cd /etc/NetworkManager/conf.d/
sudo vi default-wifi-powersave-on.conf
[connection]
wifi.powersave = 2
8、其他配置, 笔记本当服务器用, 所以需要配置如下.
以下在图形界面即可操作:
-
1 locate: shanghai; collate: en_US; language: en_US;
-
2 关闭蓝牙
-
3 插线时不使用节能, 不待机等
-
4 禁止网卡节能, 防止网络卡顿
-
5 关闭自动安装更新 (仅提示安全类更新, 但是需要人工进行更新)
-
6 cpu主频设置: https://blog.csdn.net/xuershuai/article/details/122023817
root@localhost:~# cpufreq-info |grep governors
available cpufreq governors: performance, powersave
设置cpu为节能或性能模式
cpufreq-set -g powersave
OR
cpufreq-set -g performance
- 7 合盖和电源按钮配置, 防止盒盖后休眠:
sudo vi /etc/systemd/logind.conf
设置如下:
HandlePowerKey=ignore
HandleLidSwitch=ignore
重启systemd-logind服务
sudo systemctl restart systemd-logind
https://phoenixnap.com/kb/nvidia-drivers-debian
0、安装依赖
apt -y install linux-headers-$(uname -r) build-essential libglvnd-dev pkg-config
1、hardware info:
root@localhost:~# lspci |grep -i nvidia
01:00.0 3D controller: NVIDIA Corporation GP108M [GeForce MX150] (rev a1)
2、download driver:
https://www.nvidia.com/Download/index.aspx?lang=en
输入条件, search 结果例如:
https://us.download.nvidia.com/XFree86/Linux-x86_64/525.89.02/NVIDIA-Linux-x86_64-525.89.02.run
scp NVIDIA-Linux-x86_64-525.89.02.run root@192.168.28.155:/root/
3、read driver install readme:
在addion info里面有, 例如
http://us.download.nvidia.cn/XFree86/Linux-x86_64/525.89.02/README/installdriver.html
4、install driver:
先禁用nouveau
root@localhost:~# vi /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0
root@localhost:~# update-initramfs -u
root@localhost:~# reboot
root@localhost:~# lsmod|grep no
root@localhost:~#
安装依赖:
apt install -y libglvnd-dev xserver-xorg-dev xutils-dev
安装
sh ./NVIDIA-Linux-x86_64-525.89.02.run
现在安装正常 (不使用签名安装. 不安装32位兼容. 注册到内核dkms. 不配置nvidia-xconfig).
万一配置了nvidia-xconfig, 导致startx无法进入图形界面, 可以删除/etc/X11/xorg.conf, 就可以进入了(也就是使用核心显卡进入.) 因为我只需要mx150用来做cuda的AIGC以及 PG-strom数据库加速, 而不需要它来做图形加速.
reboot
5、check driver:
nvidia-smi
root@localhost:~# nvidia-smi
Fri Mar 17 21:58:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 30C P0 N/A / N/A | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
以下cuda的安装可选, 因为nvida提供了nvidia-docker, 不需要在宿主机上安装cuda, 只需要安装nvidia-driver即可.
6、download cuda:
https://developer.nvidia.com/cuda-downloads
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
7、read cuda install readme:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
8、install cuda:
sh cuda_12.1.0_530.30.02_linux.run
root@localhost:~# sh ./cuda_12.1.0_530.30.02_linux.run
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-12.1/
Please make sure that
-- 按如下提示设置.bashrc和ld.so.conf一下
- PATH includes /usr/local/cuda-12.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.1/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log
9、AIGC:
《在debian中部署"人工智能生成内容"(Artificial Intelligence Generated Content,简称 AIGC)》
《Linux Mac ssh 客户端长连接防断连 - tcp心跳 TCPKeepAlive,ServerAliveInterval,ServerAliveCountMax》
《Linux/Mac ssh 自动输入密码 - expect使用》
vi haier.sh
#!/usr/bin/expect
set user "digoal"
set host "192.168.28.155"
set port "22"
set pwd "rootroot"
spawn ssh -o TCPKeepAlive=yes -o ServerAliveInterval=15 -o ServerAliveCountMax=3 $user@$host -p $port
expect {
"yes/no" { send "yes\r"; exp_continue }
"password:" { send "$pwd\r" }
}
interact
chmod 500 haier.sh
登陆debian:
./haier.sh
vi /etc/hostname
haier-5000a
安装一些常用的包:
apt install -y git libreadline-dev libedit-dev g++ make cmake man-db vim dnsutils clang libssl-dev bash-completion
《MacOS 制作ubuntu USB安装介质并安装和配置Ubuntu, openssh-server和expect ssh登陆脚本》
《在debian中部署"人工智能生成内容"(Artificial Intelligence Generated Content,简称 AIGC)》
https://www.cnblogs.com/xaoyxc/p/16007960.html