# 汽车评估
Dataset Characteristics  数据集特征:

`Multivariate  多元变量`

Subject Area  主题领域:

`Other  其他`

Associated Tasks  相关任务:

`Classification  分类`

Feature Type  特征类型:

`Categorical  分类`

Number of Instances  实例数量:

`1728`

Number of Features  特征数量:

`6`

Has Missing Values?  是否有缺失值？

`No  没有`

## 变量表
# Variables Table 变量表

| Variable Name 变量名称    | Role 角色 | Type 类型    | Description 描述    | Units 单元    | Missing Values 缺失值 |
|----------------------------|-----------|--------------|---------------------|---------------|-----------------------|
| buying 购买    | Feature 特征    | Categorical 分类    | buying price 购买价格    |    | no 没有    |
| maint 维护    | Feature 特征    | Categorical 分类    | price of the maintenance 维护价格    |    | no 没有    |
| doors 门    | Feature 特征    | Categorical 分类    | number of doors 门的数量    |    | no 没有    |
| persons 人员    | Feature 特征    | Categorical 分类    | capacity in terms of persons to carry 容量（以人数计算） |    | no 没有    |
| lug_boot    | Feature 特征    | Categorical 分类    | the size of luggage boot 行李箱的大小    |    | no 没有    |
| safety 安全    | Feature 特征    | Categorical 分类    | estimated safety of the car 汽车预估安全性    |    | no 没有    |
| class 类    | Target 目标    | Categorical 分类    | evaluation level (unacceptable, acceptable, good, very good) 评估等级（不合格，合格，良好，非常好） |    | no 没有    |

### 1. 导入必要库并读取数据

In [1]:
import pandas as pd
import numpy as np
import torch
from torch import nn
import random

In [2]:
X = pd.read_csv('data/car.data')
X.columns = ['buying','maint','doors','persons','lug_boot','safety','class']
y = X['class']
X = X.drop(['class'],axis=1)
X = np.array(X)
y = np.array(y)
X,y

(array([['vhigh', 'vhigh', '2', '2', 'small', 'med'],
        ['vhigh', 'vhigh', '2', '2', 'small', 'high'],
        ['vhigh', 'vhigh', '2', '2', 'med', 'low'],
        ...,
        ['low', 'low', '5more', 'more', 'big', 'low'],
        ['low', 'low', '5more', 'more', 'big', 'med'],
        ['low', 'low', '5more', 'more', 'big', 'high']], dtype=object),
 array(['unacc', 'unacc', 'unacc', ..., 'unacc', 'good', 'vgood'],
       dtype=object))

### 2. 进行数据处理，把分类特征转为离散变量

In [3]:
# 处理buying和maint两列，把low~vhigh对应为0～3
def preprocess_1(X,j):
    for i in range(X.shape[0]):
        if X[i][j] == 'low':
            X[i][j] = 0
        elif X[i][j] == 'med':
            X[i][j] = 1
        elif X[i][j] == 'high':
            X[i][j] = 2
        else:
            X[i][j] = 3
    return X
X = preprocess_1(X,0)
X = preprocess_1(X,1)
print(X)

[[3 3 '2' '2' 'small' 'med']
 [3 3 '2' '2' 'small' 'high']
 [3 3 '2' '2' 'med' 'low']
 ...
 [0 0 '5more' 'more' 'big' 'low']
 [0 0 '5more' 'more' 'big' 'med']
 [0 0 '5more' 'more' 'big' 'high']]


In [4]:
# 处理doors,persons两列,把以more结尾的字符串归结为一类
for i in range(X.shape[0]):
    if X[i][2].endswith('more'):
        X[i][2] = 5
    else:
        X[i][2] = int(X[i][2])
print(X)
for i in range(X.shape[0]):
    if X[i][3].endswith('more'):
        X[i][3] = random.choice([5,6])
    else:
        X[i][3] = int(X[i][3])
print(X)

[[3 3 2 '2' 'small' 'med']
 [3 3 2 '2' 'small' 'high']
 [3 3 2 '2' 'med' 'low']
 ...
 [0 0 5 'more' 'big' 'low']
 [0 0 5 'more' 'big' 'med']
 [0 0 5 'more' 'big' 'high']]
[[3 3 2 2 'small' 'med']
 [3 3 2 2 'small' 'high']
 [3 3 2 2 'med' 'low']
 ...
 [0 0 5 5 'big' 'low']
 [0 0 5 5 'big' 'med']
 [0 0 5 6 'big' 'high']]


In [5]:
# 处理lug_boot
for i in range(X.shape[0]):
    if X[i][4] == 'small':
        X[i][4] = 0
    elif X[i][4] == 'med':
        X[i][4] = 1
    else:
        X[i][4] = 2
print(X)

[[3 3 2 2 0 'med']
 [3 3 2 2 0 'high']
 [3 3 2 2 1 'low']
 ...
 [0 0 5 5 2 'low']
 [0 0 5 5 2 'med']
 [0 0 5 6 2 'high']]


In [6]:
# 处理safety
for i in range(X.shape[0]):
    if X[i][5] == 'low':
        X[i][5] = 0
    elif X[i][5] == 'med':
        X[i][5] = 1
    else:
        X[i][5] = 2
print(X)

[[3 3 2 2 0 1]
 [3 3 2 2 0 2]
 [3 3 2 2 1 0]
 ...
 [0 0 5 5 2 0]
 [0 0 5 5 2 1]
 [0 0 5 6 2 2]]


In [7]:
# 处理class
for i in range(y.shape[0]):
    if y[i] == 'unacc':
        y[i] = 0
    elif y[i] == 'acc':
        y[i] = 1
    elif y[i] == 'good':
        y[i] = 2
    else:
        y[i] = 3
print(y)

[0 0 0 ... 0 2 3]


### 3. 定义神经网络模型

In [8]:
device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.l1 = nn.Linear(6, 15)
        self.l2 = nn.Linear(15, 4)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.l1(x))
        x = self.l2(x)
        return x

net = Classifier()
net = net.to(device)

### 4. 定义学习率、优化器、损失函数...

In [9]:
loss_fn = nn.CrossEntropyLoss()
lr = 0.01
optimizer = torch.optim.Adam(params = net.parameters(), lr=lr)

### 5. 建立训练循环

In [10]:
num_epoch = 2000
X = torch.tensor(X.astype(int),dtype=torch.float32,device=device)
y = torch.tensor(y.astype(int),dtype=torch.float32,device=device)


In [11]:
from sklearn.metrics import accuracy_score
for epoch in range(num_epoch):
    net.train()
    y_pred_logit = net(X)
    loss = loss_fn(y_pred_logit,y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    net.eval()
    with torch.inference_mode():
        y_pred_prob = torch.softmax(y_pred_logit,dim=1)
        y_pred_label = torch.argmax(y_pred_prob,dim=1)
        val_accuracy = accuracy_score(y.cpu().numpy(), y_pred_label.cpu().numpy())
    if epoch%100==0:
        print(f'Validation Accuracy: {val_accuracy:.4f}')

Validation Accuracy: 0.2224
Validation Accuracy: 0.8193
Validation Accuracy: 0.8796
Validation Accuracy: 0.9137
Validation Accuracy: 0.9299
Validation Accuracy: 0.9404
Validation Accuracy: 0.9473
Validation Accuracy: 0.9543
Validation Accuracy: 0.9635
Validation Accuracy: 0.9647
Validation Accuracy: 0.9687
Validation Accuracy: 0.9734
Validation Accuracy: 0.9757
Validation Accuracy: 0.9751
Validation Accuracy: 0.9751
Validation Accuracy: 0.9745
Validation Accuracy: 0.9757
Validation Accuracy: 0.9774
Validation Accuracy: 0.9768
Validation Accuracy: 0.9780


最终正确率能够达到95%以上，很好