# Dataset: Letter Recognition
__Objetivo__: identificar cada um de um grande número de exibições de pixels retangulares em preto-e-branco como uma das 26 letras maiúsculas no alfabeto inglês. As imagens dos caracteres foram baseadas em 20 fontes diferentes e cada letra dentro dessas 20 fontes foi aleatoriamente distorcida para produzir um arquivo de 20.000 estímulos únicos. Cada estímulo foi convertido em 16 atributos numéricos primitivos (momentos estatísticos e contagens de borda) que foram escalonados para se ajustarem a um intervalo de valores inteiros de 0 a 15.

| Características do conjunto de dados | Características do atributo | Tarefas associadas | Número de instâncias | Número de atributos  |
|:---:|:---:|:---:|:---:|:---:|
| Multivariada | Inteiro | Classificação | 20000 | 16 |

## Informações sobre os atributos

#### Variável de interesse (classes)

1. **lettr**: capital letter (26 valores de A a Z)

#### Variáveis independentes (características)

2. **x-box**: horizontal position of box (inteiro) 
3. **y-box**: vertical position of box (inteiro) 
4. **width**: width of box (inteiro) 
5. **high**: height of box (inteiro) 
6. **onpix**: total # on pixels (inteiro) 
7. **x-bar**: mean x of on pixels in box (inteiro) 
8. **y-bar**: mean y of on pixels in box (inteiro) 
9. **x2bar**: mean x variance (inteiro) 
10. **y2bar**: mean y variance (inteiro) 
11. **xybar**: mean x y correlation (inteiro) 
12. **x2ybr**: mean of x * x * y (inteiro) 
13. **xy2br**: mean of x * y * y (inteiro) 
14. **x-ege**: mean edge count left to right (inteiro) 
15. **xegvy**: correlation of x-ege with y (inteiro) 
16. **y-ege**: mean edge count bottom to top (inteiro) 
17. **yegvx**: correlation of y-ege with x (inteiro)

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

data = '~/Letter-Recognition/data_set/letter-recognition.data'
columns = ['lettr','x-box','y-box','width','high','onpix','x-bar','y-bar','x2bar','y2bar','xybar','x2ybr','xy2br','x-ege','xegvy','y-ege','yegvx']
df = pd.read_csv(data, names = columns)
df

Unnamed: 0,lettr,x-box,y-box,width,high,onpix,x-bar,y-bar,x2bar,y2bar,xybar,x2ybr,xy2br,x-ege,xegvy,y-ege,yegvx
0,T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
1,I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
2,D,4,11,6,8,6,10,6,2,6,10,3,7,3,7,3,9
3,N,7,11,6,6,3,5,9,4,6,4,4,10,6,10,2,8
4,G,2,1,3,1,1,8,6,6,6,6,5,9,1,7,5,10
5,S,4,11,5,8,3,8,8,6,9,5,6,6,0,8,9,7
6,B,4,2,5,4,4,8,7,6,6,7,6,6,2,8,7,10
7,A,1,1,3,2,1,8,2,2,2,8,2,8,1,6,2,7
8,J,2,2,4,4,2,10,6,2,6,12,4,8,1,6,1,7
9,M,11,15,13,9,7,13,2,6,2,12,1,9,8,1,1,8


In [2]:
df.shape

(20000, 17)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 17 columns):
lettr    20000 non-null object
x-box    20000 non-null int64
y-box    20000 non-null int64
width    20000 non-null int64
high     20000 non-null int64
onpix    20000 non-null int64
x-bar    20000 non-null int64
y-bar    20000 non-null int64
x2bar    20000 non-null int64
y2bar    20000 non-null int64
xybar    20000 non-null int64
x2ybr    20000 non-null int64
xy2br    20000 non-null int64
x-ege    20000 non-null int64
xegvy    20000 non-null int64
y-ege    20000 non-null int64
yegvx    20000 non-null int64
dtypes: int64(16), object(1)
memory usage: 2.6+ MB


In [4]:
df.describe()

Unnamed: 0,x-box,y-box,width,high,onpix,x-bar,y-bar,x2bar,y2bar,xybar,x2ybr,xy2br,x-ege,xegvy,y-ege,yegvx
count,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0
mean,4.02355,7.0355,5.12185,5.37245,3.50585,6.8976,7.50045,4.6286,5.17865,8.28205,6.454,7.929,3.0461,8.33885,3.69175,7.8012
std,1.913212,3.304555,2.014573,2.26139,2.190458,2.026035,2.325354,2.699968,2.380823,2.488475,2.63107,2.080619,2.332541,1.546722,2.567073,1.61747
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,5.0,4.0,4.0,2.0,6.0,6.0,3.0,4.0,7.0,5.0,7.0,1.0,8.0,2.0,7.0
50%,4.0,7.0,5.0,6.0,3.0,7.0,7.0,4.0,5.0,8.0,6.0,8.0,3.0,8.0,3.0,8.0
75%,5.0,9.0,6.0,7.0,5.0,8.0,9.0,6.0,7.0,10.0,8.0,9.0,4.0,9.0,5.0,9.0
max,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0


## __Ideia inicial:__ Trabalhar com Network Neural