# 逻辑回归-癌症分类预测案例学习目标
- 掌握逻辑回归API的用法


--- 
# 1、逻辑回归API
```python
# solver可选参数:{'liblinear', 'sag', 'saga','newton-cg', 'lbfgs'}
#   对于小数据集来说，“liblinear”是个不错的选择，而“sag”和'saga'对于大型数据集会更快。
#   对于多分类问题，只有'newton-cg'， 'sag'， 'saga'和'lbfgs'可以处理多项损失
# penalty：正则化的种类
# C：正则化力度

sklearn.linear_model.LogisticRegression(solver='liblinear', penalty='l2', C = 1.0)
```

In [1]:
# 打入Pandas包
import pandas as pd
# 导入numpy包
import numpy as np
# 导入sklearn训练集和测试集划分包
from sklearn.model_selection import train_test_split
# 导入数据标准化模块
from sklearn.preprocessing import StandardScaler
# 导入逻辑回归模块
from sklearn.linear_model import LogisticRegression

In [4]:
# 1、获取数据
names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli', 'Mitoses', 'Class']

data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names=names)

In [31]:
# 2、数据处理并处理缺失值
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",names=names,na_values='?')
data = data.dropna()

# 3、确定特征值和目标值
x = data.iloc[:, 1:10]
# print('x:', x.head(1));
y = data["Class"]
# print('y:',y.head(1));

# 4、数据分割
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)

# 5、特征工程（标准化）
transfer = StandardScaler()
# 将同时拟合和转换数据
x_train = transfer.fit_transform(x_train)
# 仅对数据进行转换，不进行拟合过程
x_test = transfer.transform(x_test)

# 6、机器学习（逻辑回归）
estimator = LogisticRegression()
estimator.fit(x_train, y_train)

# 7、模型评估
y_predict = estimator.predict(x_test)
print(y_predict)
estimator.score(x_test, y_test)

[2 4 4 2 2 2 2 2 2 2 2 2 2 4 2 2 4 4 4 2 4 2 4 4 4 2 4 2 2 2 2 2 4 2 2 2 4
 2 2 2 2 4 2 4 4 4 4 2 4 4 2 2 2 2 2 4 2 2 2 2 4 4 4 4 2 4 2 2 4 2 2 2 2 4
 2 2 2 2 2 2 4 4 4 2 4 4 4 4 2 2 2 4 2 4 2 2 2 2 2 2 4 2 2 4 2 2 4 2 4 4 2
 2 2 2 4 2 2 2 2 2 2 4 2 4 2 2 2 4 2 4 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 4 2 4
 2 2 4 4 4 2 2 4 4 2 4 4 2 2 2 2 2 4 4 2 2 2 4]


0.9766081871345029

# 小结
- 逻辑回归的API 
```python
sklearn.linear_model.LogisticRegression
```
- 数据中存在缺失值或特殊值需要处理
- 准确率并不是衡量分类正确的唯一标准