Skip to content

Latest commit

 

History

History
467 lines (299 loc) · 15.3 KB

20230317_02.md

File metadata and controls

467 lines (299 loc) · 15.3 KB

在macOS中制作debian USB安装镜像, 在带Nvidia显卡笔记本上的安装部署debian 11 - 测试 PostgreSQL, AIGC cuda 应用

作者

digoal

日期

2023-03-17

标签

PostgreSQL , PolarDB , nvidia , debian , aigc , cuda


背景

一台家里的老笔记本, 配置还可以(i7 8550u, 16G内存, nvme SM961 256G SSD, Nvidia MX150 GPU), 拿出来当cuda服务器测试用.

用于AIGC, PG-hetro的测试

https://github.com/heterodb/pg-strom

《PostgreSQL GPU 加速(HeteroDB pg_strom) (GPU计算, GPU-DIO-Nvme SSD, 列存, GPU内存缓存)》

《[未完待续] PostgreSQL HeteroDB GPU 加速 - pl/cuda , pg-strom , heterodb》

《使用 PGStrom 2 (GPU JOIN, BulkScan, GpuPreAgg, ...)》

《试用 PGStrom》

内容简介

  • 1 下载debian 11 gnome-non-free x86_64镜像, 制作debian USB安装介质;
  • 2 安装debian, 配置无线网络, 优化无线网络, 设置默认启动非图形界面, 安装openssh-server, 配置sshd;
  • 3 编写MacOS 自动ssh连接debian脚本;

一、制作debian usb安装介质

1、下载debian, 选择就近、网络带宽较大的网站下载:

注意debian是自由软件, 商业软件的firmware包都没有打包到debian的原生镜像中, 导致使用原生debian iso可能安装过程还需要额外提供firmware, 非常麻烦, 幸好有non-free版本, 会带上这些firmware. 所以我们下载gnome + non-free iso来制作usb 安装盘.

https://mirrors.nju.edu.cn/debian-nonfree/cd-including-firmware/11.6.0-live%2Bnonfree/amd64/iso-hybrid/

debian-live-11.6.0-amd64-gnome+nonfree.iso

2、检查checksum, 对比debian给出的sum是否一致, 确保下载的iso文件正确.

shasum -a 256 ./debian-live-11.6.0-amd64-gnome+nonfree.iso  

3、插入usb, 并 umount usb(注意不是推出.)

diskutil list    
    
/dev/disk2 为u盘设备号    
    
digoaldeAir:Downloads digoal$ diskutil umountDisk /dev/disk2    
Unmount of all volumes on disk2 was successful    

4、将镜像写入usb, 使用 /dev/rdisk 更快.

digoaldeAir:Downloads digoal$ sudo dd if=./debian-live-11.6.0-amd64-gnome+nonfree.iso of=/dev/rdisk2 bs=1m    
Password:    
3649+1 records in    
3649+1 records out    
3826831360 bytes transferred in 331.014593 secs (11560914 bytes/sec)    

为什么呢? 在OS X中,每个磁盘在/dev中可能有两个路径引用:

  • /dev/disk #是缓冲设备,这意味着要发送的所有数据都经过了额外的处理。
  • /dev/rdisk #是原始路径,速度更快。

在Class 4 SD卡上,使用rdisk路径的差异大约快20倍。

5、dd完成后, 推出usb

二、使用usb在haier 5000a笔记本上安装debian

1、配置bios, 使用uefi模式引导

  • bios 打开 security boot (使用uefi, 不开启tpm, 使用insecure boot.)

后面安装时才能正常引导, 理论上使用bios传统模式, 配置mbr分区也能引导, 但是不知道撒情况反正我这没成功. (uefi需使用gpt分区表, 同时使用efi分区)

/dev/nvme0n1p1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)  
/dev/nvme0n1p1  511M  5.8M  506M   2% /boot/efi  

2、使用USB设备启动, 安装debian 11, 安装时不要配置无线网络, 跳过即可. 如果你配置了网络, 在更新security 包时使用国外源会特别慢.

可以在安装好之后, 配置国内的源再更新.

3、安装完毕, 移除usb后, 重启, 完成安装.

4、配置国内的apt源.

vi /etc/apt/sources.list  
  
deb https://mirrors.nju.edu.cn/debian/ bullseye main non-free contrib    
deb-src https://mirrors.aliyun.com/debian/ bullseye main non-free contrib    
deb https://mirrors.aliyun.com/debian-security/ bullseye-security main    
deb-src https://mirrors.aliyun.com/debian-security/ bullseye-security main    
deb https://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib    
deb-src https://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib    
deb https://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib    
deb-src https://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib   
  
apt update   
  
apt-get reinstall apt-transport-https ca-certificates     
    
    
sed -i "s@http://mirrors.nju.edu.cn@https://mirrors.nju.edu.cn@g" /etc/apt/sources.list    
    
apt update    

5、安装openssh-server 并配置 sshd 服务.

apt-get install openssh-server    

配置sshd:

vi /etc/ssh/sshd_config    
    
去除如下几行的注释:     
  
Port 22    
ListenAddress 0.0.0.0    
PasswordAuthentication yes    
PermitRootLogin yes  
TCPKeepAlive yes    

重启sshd服务:

systemctl restart ssh.service  

可以在图形界面用NetManager(或者在shell中使用nmcli配置)配置无线网卡, 建议配置静态地址, 静态DNS.

这里开始就可以用第四步的shell远程登录了, 更加方便.

6、配置默认启动到命令行界面

https://blog.csdn.net/joker00007/article/details/120658259

sudo systemctl set-default multi-user.target     

如果要改回来图形界面启动:

sudo systemctl set-default graphical.target    

如果要在命令行界面启动图形界面:

startx    

7、关闭无线网卡电源管理, 获得更稳定的无线网络性能

查询无线网卡设备名

iwconfig    
    
wlp5s0    IEEE 802.11  ESSID:"Redmi_keting_5G"      
          Mode:Managed  Frequency:5.805 GHz  Access Point: 28:D1:27:5A:A3:A8       
          Bit Rate=433.3 Mb/s   Tx-Power=22 dBm       
          Retry short limit:7   RTS thr:off   Fragment thr:off    
          Power Management:off    
          Link Quality=55/70  Signal level=-55 dBm      
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0    
          Tx excessive retries:0  Invalid misc:43   Missed beacon:0    

临时关闭无线网卡电源管理

sudo iwconfig wlp5s0 power off    

永久关闭无线网卡电源管理

cd /etc/NetworkManager/conf.d/    
sudo vi default-wifi-powersave-on.conf     
    
[connection]    
wifi.powersave = 2    

8、其他配置, 笔记本当服务器用, 所以需要配置如下.

以下在图形界面即可操作:

  • 1 locate: shanghai; collate: en_US; language: en_US;

  • 2 关闭蓝牙

  • 3 插线时不使用节能, 不待机等

  • 4 禁止网卡节能, 防止网络卡顿

  • 5 关闭自动安装更新 (仅提示安全类更新, 但是需要人工进行更新)

  • 6 cpu主频设置: https://blog.csdn.net/xuershuai/article/details/122023817

root@localhost:~# cpufreq-info |grep governors  
  available cpufreq governors: performance, powersave  
  
设置cpu为节能或性能模式  
  
cpufreq-set -g powersave  
OR  
cpufreq-set -g performance  
  • 7 合盖和电源按钮配置, 防止盒盖后休眠:
sudo vi /etc/systemd/logind.conf  
  
设置如下:  
HandlePowerKey=ignore  
HandleLidSwitch=ignore  
  
重启systemd-logind服务  
sudo systemctl restart systemd-logind  

三、配置 GeForce MX150 驱动

https://phoenixnap.com/kb/nvidia-drivers-debian

0、安装依赖

apt -y install linux-headers-$(uname -r) build-essential libglvnd-dev pkg-config

1、hardware info:

root@localhost:~# lspci |grep -i nvidia  
01:00.0 3D controller: NVIDIA Corporation GP108M [GeForce MX150] (rev a1)  

2、download driver:

https://www.nvidia.com/Download/index.aspx?lang=en

输入条件, search 结果例如:

https://www.nvidia.com/content/DriverDownloads/confirmation.php?url=/XFree86/Linux-x86_64/525.89.02/NVIDIA-Linux-x86_64-525.89.02.run&lang=us&type=TITAN

https://us.download.nvidia.com/XFree86/Linux-x86_64/525.89.02/NVIDIA-Linux-x86_64-525.89.02.run

scp NVIDIA-Linux-x86_64-525.89.02.run root@192.168.28.155:/root/  

3、read driver install readme:

在addion info里面有, 例如

http://us.download.nvidia.cn/XFree86/Linux-x86_64/525.89.02/README/installdriver.html

4、install driver:

先禁用nouveau

root@localhost:~# vi /etc/modprobe.d/nvidia-installer-disable-nouveau.conf  
  
# generated by nvidia-installer  
blacklist nouveau  
options nouveau modeset=0  
  
  
  
root@localhost:~# update-initramfs -u    
  
root@localhost:~# reboot  
  
root@localhost:~# lsmod|grep no  
root@localhost:~#   

安装依赖:

apt install -y libglvnd-dev xserver-xorg-dev xutils-dev  

安装

sh ./NVIDIA-Linux-x86_64-525.89.02.run  

现在安装正常 (不使用签名安装. 不安装32位兼容. 注册到内核dkms. 不配置nvidia-xconfig).

万一配置了nvidia-xconfig, 导致startx无法进入图形界面, 可以删除/etc/X11/xorg.conf, 就可以进入了(也就是使用核心显卡进入.) 因为我只需要mx150用来做cuda的AIGC以及 PG-strom数据库加速, 而不需要它来做图形加速.

reboot  

5、check driver:

nvidia-smi  
  
root@localhost:~# nvidia-smi   
Fri Mar 17 21:58:27 2023         
+-----------------------------------------------------------------------------+  
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |  
|-------------------------------+----------------------+----------------------+  
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |  
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |  
|                               |                      |               MIG M. |  
|===============================+======================+======================|  
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |  
| N/A   30C    P0    N/A /  N/A |      0MiB /  2048MiB |      0%      Default |  
|                               |                      |                  N/A |  
+-------------------------------+----------------------+----------------------+  
                                                                                 
+-----------------------------------------------------------------------------+  
| Processes:                                                                  |  
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |  
|        ID   ID                                                   Usage      |  
|=============================================================================|  
|  No running processes found                                                 |  
+-----------------------------------------------------------------------------+  

以下cuda的安装可选, 因为nvida提供了nvidia-docker, 不需要在宿主机上安装cuda, 只需要安装nvidia-driver即可.

6、download cuda:

https://developer.nvidia.com/cuda-downloads

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run  

7、read cuda install readme:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

8、install cuda:

sh cuda_12.1.0_530.30.02_linux.run  
  
  
root@localhost:~# sh ./cuda_12.1.0_530.30.02_linux.run   
===========  
= Summary =  
===========  
  
Driver:   Installed  
Toolkit:  Installed in /usr/local/cuda-12.1/  
  
Please make sure that    
-- 按如下提示设置.bashrc和ld.so.conf一下	  
 -   PATH includes /usr/local/cuda-12.1/bin  
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as root  
  
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.1/bin  
To uninstall the NVIDIA Driver, run nvidia-uninstall  
Logfile is /var/log/cuda-installer.log  

9、AIGC:

《在debian中部署"人工智能生成内容"(Artificial Intelligence Generated Content,简称 AIGC)》

四、配置macos ssh登陆debian脚本.

《Linux Mac ssh 客户端长连接防断连 - tcp心跳 TCPKeepAlive,ServerAliveInterval,ServerAliveCountMax》

《Linux/Mac ssh 自动输入密码 - expect使用》

vi haier.sh    
    
    
#!/usr/bin/expect    
    
set user "digoal"    
set host "192.168.28.155"    
set port "22"    
set pwd "rootroot"    
    
spawn ssh -o TCPKeepAlive=yes -o ServerAliveInterval=15 -o ServerAliveCountMax=3 $user@$host -p $port  
expect {  
"yes/no" { send "yes\r"; exp_continue }  
"password:" { send "$pwd\r" }  
}  
interact  
chmod 500 haier.sh    

登陆debian:

./haier.sh    
  
vi /etc/hostname
haier-5000a
安装一些常用的包:
apt install -y git libreadline-dev libedit-dev g++ make cmake man-db vim dnsutils clang libssl-dev bash-completion 

参考

《MacOS 制作ubuntu USB安装介质并安装和配置Ubuntu, openssh-server和expect ssh登陆脚本》

《在debian中部署"人工智能生成内容"(Artificial Intelligence Generated Content,简称 AIGC)》

https://www.cnblogs.com/xaoyxc/p/16007960.html

https://blog.csdn.net/wang_yi_wen/article/details/78947860

digoal's wechat