**Load and return the breast cancer Wisconsin dataset**

In [1]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

In [2]:
cancer.data.shape

(569, 30)

In [3]:
cancer.target.shape

(569,)

In [30]:
print(cancer.DESCR)

Breast Cancer Wisconsin (Diagnostic) Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, field
        13 is Radius SE, field 23 is Worst Radius.

        

**Malignant = 0  
Benign = 1**

**Split the dataset to train and test**

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.25, random_state=42)

**import Logistic Regression and train the model**

In [6]:
from sklearn.linear_model import LogisticRegression

In [7]:
log_reg = LogisticRegression()

In [8]:
log_reg.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

**Make predictions**

In [9]:
pred = log_reg.predict(X_test)

**Calculate accuracy:**

In [10]:
acc_score = log_reg.score(X_test, y_test)

In [11]:
acc_score

0.965034965034965

**Create a list to show predictions next to real values**

In [13]:
list(cancer.target_names)

['malignant', 'benign']

**malignant = 0   
benign = 1**

In [15]:
import pandas as pd

In [16]:
d = {'predictions': pred, 'real values': y_test}

In [17]:
data = pd.DataFrame(data=d)

In [18]:
data

Unnamed: 0,predictions,real values
0,1,1
1,0,0
2,0,0
3,1,1
4,1,1
5,0,0
6,0,0
7,0,0
8,1,1
9,1,1


In [19]:
data.predictions == data['real values']

0       True
1       True
2       True
3       True
4       True
5       True
6       True
7       True
8       True
9       True
10      True
11      True
12      True
13      True
14      True
15      True
16      True
17      True
18      True
19      True
20     False
21      True
22      True
23      True
24      True
25      True
26      True
27      True
28      True
29      True
       ...  
113     True
114     True
115     True
116     True
117     True
118     True
119     True
120    False
121     True
122     True
123     True
124     True
125     True
126     True
127     True
128     True
129     True
130     True
131     True
132     True
133     True
134     True
135     True
136     True
137     True
138     True
139     True
140     True
141     True
142     True
Length: 143, dtype: bool

**Show which patients in the list is wrongly diagnosed by the model:**

In [46]:
wrong_predictions= []
for i in range(0,143):
    if data.predictions[i] != data['real values'][i]:
        wrong_predictions.append(data.predictions[i])
        print("wrongly diagnosed patient number:", i, 'as', wrong_predictions[-1])
    i=i+1



wrongly diagnosed patient number: 20 as 1
wrongly diagnosed patient number: 58 as 1
wrongly diagnosed patient number: 82 as 1
wrongly diagnosed patient number: 112 as 0
wrongly diagnosed patient number: 120 as 0
