<img src="logo.png" alt="Logo IFNMG" width="200">
<h1 style="text-align:center;"> Tópicos em IC: Análise exploratório de dados.</h1>
<p>Equipe: David Jansen, Iarah Gonçalves de Almeida, Paulo Borges</p>


<h2> Introdução</h2>
<p> A análise de dados será feita sobre o 
<a href="http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Prognostic%29">Breast Cancer Wisconsin (Prognostic) Data Set.</a> A manipulação dos dados para análise será feita utilizando a linguagem Python no ambiente Jupyter.</p>

<h2> Informação dos Atributos </h2>

<p> 1) ID number </p>

<p> 2) Outcome (R = recur, N = nonrecur) </p>

<p> 3) Time (recurrence time if field 2 = R, disease-free time if field 2	= N) </p>

<p> 4-33) Ten real-valued features are computed for each cell nucleus: </p>

<ol>
    <li> radius (mean of distances from center to points on the perimeter) </li>
    <li> texture (standard deviation of gray-scale values) </li>
    <li> perimeter </li>
    <li> area </li>
	<li> smoothness (local variation in radius lengths) </li>
	<li> compactness (perimeter^2 / area - 1.0) </li>
	<li> concavity (severity of concave portions of the contour) </li>
	<li> concave points (number of concave portions of the contour) </li>
	<li> symmetry </li>
	<li> fractal dimension ("coastline approximation" - 1) </li>
</ol>

<p> Several of the papers listed above contain detailed descriptions of how these features are computed. </p>

<p>The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.  For instance, field 4 is Mean Radius, field 14 is Radius SE, field 24 is Worst Radius. </p>

<p> Values for features 4-33 are recoded with four significant digits. <p>

<p> 34) Tumor size - diameter of the excised tumor in centimeters </p>
<p> 35) Lymph node status - number of positive axillary lymph nodes observed at time of surgery. </p>

<h2> Preparação do Ambiente Jupyter</h2>

In [82]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from urllib.request import urlopen
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

<h2> Importação dos dados através da URL de acesso</h2>

In [83]:
#URL onde se encontra a base de dados.
UCI_data_URL = 'https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wpbc.data'

#Nomeação das colunas.
names = ['id_number', 'outcome', 'time', 'radius_mean', 
         'texture_mean', 'perimeter_mean', 'area_mean', 
         'smoothness_mean', 'compactness_mean', 'concavity_mean',
         'concave_points_mean', 'symmetry_mean', 
         'fractal_dimension_mean', 'radius_se', 'texture_se', 
         'perimeter_se', 'area_se', 'smoothness_se', 
         'compactness_se', 'concavity_se', 'concave_points_se', 
         'symmetry_se', 'fractal_dimension_se', 
         'radius_worst', 'texture_worst', 'perimeter_worst',
         'area_worst', 'smoothness_worst', 
         'compactness_worst', 'concavity_worst', 
         'concave_points_worst', 'symmetry_worst', 
         'fractal_dimension_worst', 'tumor_size', 'lymph_node_status']

#Leitura do arquivo para o formato csv.
wpbc = pd.read_csv(urlopen(UCI_data_URL), names=names)

#Mostrando resultado em forma de tabela (10 primeiros).
wpbc.head(10)

Unnamed: 0,id_number,outcome,time,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,...,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave_points_worst,symmetry_worst,fractal_dimension_worst,tumor_size,lymph_node_status
0,119513,N,31,18.02,27.6,117.5,1013.0,0.09489,0.1036,0.1086,...,139.7,1436.0,0.1195,0.1926,0.314,0.117,0.2677,0.08113,5.0,5
1,8423,N,61,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,...,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,3.0,2
2,842517,N,116,21.37,17.44,137.5,1373.0,0.08836,0.1189,0.1255,...,159.1,1949.0,0.1188,0.3449,0.3414,0.2032,0.4334,0.09067,2.5,0
3,843483,N,123,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,...,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,2.0,0
4,843584,R,27,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,...,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,3.5,0
5,843786,R,77,12.75,15.29,84.6,502.7,0.1189,0.1569,0.1664,...,107.3,733.2,0.1706,0.4196,0.5999,0.1709,0.3485,0.1179,2.5,0
6,844359,N,60,18.98,19.61,124.4,1112.0,0.09087,0.1237,0.1213,...,152.6,1593.0,0.1144,0.3371,0.299,0.1922,0.2726,0.09581,1.5,?
7,844582,R,77,13.71,20.83,90.2,577.9,0.1189,0.1645,0.09366,...,110.6,897.0,0.1654,0.3682,0.2678,0.1556,0.3196,0.1151,4.0,10
8,844981,N,119,13.0,21.82,87.5,519.8,0.1273,0.1932,0.1859,...,106.2,739.3,0.1703,0.5401,0.539,0.206,0.4378,0.1072,2.0,1
9,845010,N,76,12.46,24.04,83.97,475.9,0.1186,0.2396,0.2273,...,97.65,711.4,0.1853,1.058,1.105,0.221,0.4366,0.2075,6.0,20
