# KNN-Modulation-Classification

## 实验目的

利用 kNN 算法识别无线信号传输系统所使用的调制方式，辅助传统的 AMC 技术

## 实验内容

利用样本集训练 kNN 分类器，同时通过修改高阶累积量、计算高阶矩的数据数、kNN 参数，对比分析不同情况下识别的准确率

## 实验方法和算法原理

### 调制方式

- `BPSK`: $u(i)=\frac{1}{\sqrt{2}}(1-2b(i))+j(1-2b(i))$
- `QPSK`: $u(i)=\frac{1}{\sqrt{2}}[(1-2b(i))+j(1-2b(2i+1))]$
- `16QAM`: $u(i)=\frac{1}{\sqrt{10}}\{(1-4b(i))[2-(1-2b(4i+2))]+j(1-2b(4i+1))[2-(1-2b(4i+3))]\}$
- `64QAM`: $u(i)=\frac{1}{\sqrt{42}}\{(1-2b(6i))[4-(1-2b(6i+2))[2-(1-2b(6i+4))]]+j(1-2b(6i+1))[4-(1-2b(6i+3))[2-(1-2b(6i+5))]]\}$

### AMC 技术

根据信噪比自适应选择调制方式，但是发射机与接收机之间必须相互通信告知调制方式，从而占用部分频谱/时隙/功率资源

### 高阶累积量特征

- 高阶矩: $M_{pq}=E[X(t)^{(p-q)}X^*(t)^q]$
- 高阶累积量: $C_pq=\mathrm{cum}\{X(t),\cdots,X(t),X^*(t),\cdots,X^*(t)\}$
  - 其中$X(t)$为$p-q$项，$X^*(t)$为$q$项

### kNN

> kNN 算法(k-nearest neighbors algorithm)是模式识别领域的一种统计方法，可以用于分类和回归。用于分类时，输出是一个对象$k$个最近邻居中出现最多的类别

1. 计算各个数据点到样本点的距离
2. 对距离进行排序
3. 统计距离最小的$k$个点
4. 统计$k$个点所属的类别
5. 返回出现频率最高的类别作为结果

## 实验平台

- `OS`: `Manjaro Linux x86_64`
- `Kernel`: `5.4.78-1-MANJARO`
- `Python`: `3.8.6`
- `Jupyter Notebook`: `6.1.4`
- `Scikit-learn`: `0.23.2`
- `Matplotlib`: `3.3.3`
- `Numpy`: `1.19.4`

## 实验步骤

1. 运行位于`data_generation`中的`Matlab`代码生成数据保存到`data`中
   - 生成样本集: 运行`getSample.m`，生成数据供`kNN`算法训练
   - 生成测试数据集: 运行`getTest16QAM.m`、`getTestBPSK.m`、`getTestQPSK.m`，生成的数据名称格式为`test[process]-[N]-[snr].dat`
     - `process`: 调制方式，有`BPSK`、`QPSK`、`16QAM`
     - `N`: 用于计算高阶矩的每组数据数，有`200`、`500`
     - `snr`: 信噪比
2. 运行位于`modulation_recognition`中的`main.ipynb`，查看并分析结果


### 导入包及数据部分

In [None]:
import collections
from collections import defaultdict
from os.path import join

import matplotlib.pyplot as plt
import numpy as np  # 导入numpy，用于科学计算，如，矩阵运算
from sklearn.neighbors import KNeighborsClassifier  # 包装好的knn算法


def file2matrix(filename, n_features):
    """将文件中的数据转换为矩阵.

    Args:
        filename (string): 导入的文件名
        n_features (int)): 高阶特征量

    Returns:
        any, any: 矩阵
    """
    filepath = join("..", "data", filename)
    fr = open(filepath)
    number_of_lines = len(
        fr.readlines()
    )  # get the number of lines in the file
    # prepare matrix to return the number of features
    return_mat = np.zeros((number_of_lines, n_features))
    class_label_vector = []  # prepare labels return
    fr = open(filepath)
    for index, line in enumerate(fr.readlines()):
        line = line.strip()
        list_from_line = line.split("\t")
        return_mat[index, :] = list_from_line[0:n_features]  # chose features
        class_label_vector.append(float(list_from_line[-1]))
        # classLabelVector.append(float(0))
    return return_mat, class_label_vector


### 生成 KNN 分类器

In [None]:
Ls = (5, 9) # 高阶累积量
Ks = (3, 10, 24) # kNN参数
kNN_classifiers = ([], [])

for i, L in enumerate(Ls):

    for K in Ks:
        kNN_classifiers[i].append(KNeighborsClassifier(n_neighbors=K))

    data_x, data_y = file2matrix("sample.dat", L)

    # 拟合
    for kNN_classifier in kNN_classifiers[i]:
        kNN_classifier.fit(data_x, data_y)


### 定义 kNN 测试函数

In [None]:
def ModulationClassTest(n_features, SNR, method, n, labels, kNN_classifier):
    """测试单个数据集对某个调制方式的准确率.

    Args:
        n_features (int): 高阶累积量
        SNR (List[int]): 信噪比
        method (int): 调制方式
        n (int): 计算高阶矩的数据数
        labels (Tuple[str]): 调制方式的名称
        kNN_classifier (any): kNN分类器

    Returns:
        defaultdict[_KT, list]: 准确率
    """
    accuracy = defaultdict(list)
    # 分别代表 BPSK, QPSK, 16QAM, 64QAM
    numbers = (
        defaultdict(list),
        defaultdict(list),
        defaultdict(list),
        defaultdict(list),
    )
    for snr in SNR:
        filename = (
            "test" + labels[method] + "-" + str(n) + "-" + str(snr) + ".dat"
        )
        testDataMat, _ = file2matrix(filename, n_features)
        numTestVecs = testDataMat.shape[0]
        for i in range(4):
            numbers[i][snr] = 0.0
        for i in range(numTestVecs):

            X_predict = testDataMat[i, :].reshape(1, -1)
            y_predict = kNN_classifier.predict(X_predict)

            if y_predict == 2:
                numbers[0][snr] += 1.0
            elif y_predict == 4:
                numbers[1][snr] += 1.0
            elif y_predict == 16:
                numbers[2][snr] += 1.0
            elif y_predict == 64:
                numbers[3][snr] += 1.0

        accuracy[snr] = numbers[method][snr] / numTestVecs
        print("the total correct rate on %d dB SNR is:" % snr, (accuracy[snr]))
        accuracy = collections.OrderedDict(sorted(accuracy.items()))
        for i in range(4):
            print(
                ("正确" if i == method else "") + "判断为",
                labels[i] + ":",
                numbers[i][snr],
            )
    return accuracy


Ns = (200, 500)
labels = ("BPSK", "QPSK", "16QAM", "64QAM")


def ModulationClassTests(SNR, method):
    """测试所有数据集对某个调制方式的准确率，并绘制图形.

    Args:
        SNR (List[int]): 信噪比
        method (int): 调制方式
    """
    fig, axs = plt.subplots(2, figsize=(20, 16))
    fig.suptitle("SNR vs Accuracy - " + labels[method], fontsize=32)
    x = SNR

    for i, L in enumerate(Ls):
        for N in Ns:
            for j, K in enumerate(Ks):
                label = "N = " + str(N) + ", K = " + str(K)
                print(label + ":")
                accuracy = ModulationClassTest(
                    L, SNR, method, N, labels, kNN_classifiers[i][j]
                )
                print(accuracy)

                axs[i].plot(
                    x,
                    list(accuracy.values()),
                    label=label,
                    marker="o",
                    linewidth=2.0,
                    linestyle="dashed",
                )

        axs[i].set_title("L =" + str(L), fontsize=24)
        axs[i].legend(loc="upper left", frameon=False, fontsize=14)
        axs[i].grid()

    for ax in axs.flat:
        ax.set(
            xticks=np.arange(min(x), max(x) + 1, 2.0),
            yticks=np.arange(0, 1, 0.10),
            xlabel="SNR (dB)",
            ylabel="Test accuracy",
        )

    plt.show()


### BPSK ModulationClassTest

In [None]:
SNR = [2 * x for x in range(-2, 6)]
ModulationClassTests(SNR, 0)

### QPSK ModulationClassTest

In [None]:
SNR = [2 * x for x in range(-2, 6)]
ModulationClassTests(SNR, 1)

### 16QAM ModulationClassTest

In [None]:
SNR = [5 * x for x in range(0, 9)]
ModulationClassTests(SNR, 2)

## 实验总结

1. SNR 越大，识别正确率越高
2. 用于计算高阶矩的样本量越多，识别正确率越高
3. 调制阶数越高，识别正确率达到 1 时对应的 SNR 越高
4. 提升高阶累积量可以显著增加识别的正确率
