# Regressão Multinomial com R

Para esta aula, estaremos trabalhando com o conjunto de dados obtidos a partir do nosso livro texto. Os dados fazem referência a um estudo que tem o objetivo de identificar os fatores que influenciam a escolha alimentar dos jacarés. 

Para tanto no estudo foram capturados 219 jacarés em quatro Lagos da Flórida.

A resposta para esse levantamento é nominal, retorna o tipo de alimento primário, em volume, encontrado no estômago do  jacaré observado. Desta maneira temos cinco categorias: 
 - Peixe; 
 - Invertebrado; 
 - Réptil; 
 - Pássaro; 
 - Outro.

Outras classificações foram estabelecidas no estudo quanto aos jacarés, de acordo com: 
 - **Lago de captura:** Hancock, Oklawaha, Trafford, George; 
 - **Gênero:** macho, fêmea;
 - **Tamanho:** 
   - $\leq$ 2,3 metros de comprimento;
   - $>$ 2,3 metros de comprimento.



## Carregar pacotes
Vamos carregar alguns pacotes para começar!

In [None]:
#Instalando os pacotes necessários
install.packages("foreign")
install.packages("nnet")
install.packages("ggplot2")
install.packages("reshape2")
install.packages("VGAM")
install.packages("knitr")
install.packages("Metrics")

In [None]:
#carregando os pacotes
require(foreign)
require(nnet)
require(ggplot2)
require(reshape2)
require(VGAM)
require(knitr)
require(Metrics)

## Os dados

Vamos começar lendo o arquivo train.csv em um read R.

In [3]:
ml <- read.csv("dados.csv", sep = ";")

In [4]:
head(ml)

id,lake,gender,size,food
45,Hancock,Female,> 2.3,Fish
80,Trafford,Male,<= 2.3,Fish
64,Oklawaha,Female,> 2.3,Reptile
19,Trafford,Female,> 2.3,Reptile
120,Trafford,Female,<= 2.3,Invertebrate
200,Trafford,Female,> 2.3,Fish


In [5]:
# Aplicando a função 'summary()' ao dataframe train.
s <- summary(ml)
s

       id              lake       gender        size               food   
 Min.   :  1.0   George  :63   Female:130   <= 2.3:124   Bird        :13  
 1st Qu.: 56.5   Hancock :55   Male  : 89   > 2.3 : 95   Fish        :94  
 Median :111.0   Oklawaha:48                             Invertebrate:61  
 Mean   :108.0   Trafford:53                             Other       :32  
 3rd Qu.:161.0                                           Reptile     :19  
 Max.   :219.0                                                            

In [6]:
# Convertendo o formato da saída das tabelas
kable(s[, c(2:5)], caption = "Tabela 1: Resumo estatístico dos dados.")



|   |      lake  |   gender  |    size   |          food  |
|:--|:-----------|:----------|:----------|:---------------|
|   |George  :63 |Female:130 |<= 2.3:124 |Bird        :13 |
|   |Hancock :55 |Male  : 89 |> 2.3 : 95 |Fish        :94 |
|   |Oklawaha:48 |NA         |NA         |Invertebrate:61 |
|   |Trafford:53 |NA         |NA         |Other       :32 |
|   |NA          |NA         |NA         |Reptile     :19 |
|   |NA          |NA         |NA         |NA              |

In [7]:
with(ml, table(food, lake))

              lake
food           George Hancock Oklawaha Trafford
  Bird              3       5        1        4
  Fish             33      30       18       13
  Invertebrate     20       4       19       18
  Other             6      13        3       10
  Reptile           1       3        7        8

In [8]:
with(ml, table(food, gender))

              gender
food           Female Male
  Bird              7    6
  Fish             59   35
  Invertebrate     33   28
  Other            18   14
  Reptile          13    6

In [9]:
with(ml, table(food, size))

              size
food           <= 2.3 > 2.3
  Bird              5     8
  Fish             49    45
  Invertebrate     45    16
  Other            19    13
  Reptile           6    13

## Modelo de Regressão Multinomial

In [10]:
ml$food2 <- relevel(ml$food, ref = "Fish")

In [11]:
ml.ajust <- multinom(food2 ~ lake + gender + size, data = ml)

# weights:  35 (24 variable)
initial  value 352.466903 
iter  10 value 270.650933
iter  20 value 268.935634
final  value 268.932740 
converged


In [12]:
ml.ajust

Call:
multinom(formula = food2 ~ lake + gender + size, data = ml)

Coefficients:
             (Intercept) lakeHancock lakeOklawaha lakeTrafford genderMale
Bird          -3.0386187   0.5753859  -0.55029893     1.237111  0.6064079
Invertebrate  -0.2939452  -1.7804428   0.91320520     1.155850  0.4629561
Other         -1.6833164   0.7665839   0.02605831     1.557776  0.2525889
Reptile       -4.0435542   1.1294399   2.53020293     3.061016  0.6275746
              size> 2.3
Bird          0.7302265
Invertebrate -1.3362685
Other        -0.2905753
Reptile       0.5570452

Residual Deviance: 537.8655 
AIC: 585.8655 

In [13]:
summary(ml.ajust)

Call:
multinom(formula = food2 ~ lake + gender + size, data = ml)

Coefficients:
             (Intercept) lakeHancock lakeOklawaha lakeTrafford genderMale
Bird          -3.0386187   0.5753859  -0.55029893     1.237111  0.6064079
Invertebrate  -0.2939452  -1.7804428   0.91320520     1.155850  0.4629561
Other         -1.6833164   0.7665839   0.02605831     1.557776  0.2525889
Reptile       -4.0435542   1.1294399   2.53020293     3.061016  0.6275746
              size> 2.3
Bird          0.7302265
Invertebrate -1.3362685
Other        -0.2905753
Reptile       0.5570452

Std. Errors:
             (Intercept) lakeHancock lakeOklawaha lakeTrafford genderMale
Bird           0.8319419   0.7952339    1.2098974    0.8661140  0.6888548
Invertebrate   0.3552706   0.6232018    0.4761174    0.4927870  0.3955221
Other          0.5209746   0.5685509    0.7777721    0.6256744  0.4663471
Reptile        1.1839215   1.1927674    1.1220866    1.1296991  0.6852699
             size> 2.3
Bird         0.6522862

In [14]:
z <- summary(ml.ajust)$coefficients/summary(ml.ajust)$standard.errors

In [15]:
p <- (1 - pnorm(abs(z), 0, 1)) * 2

In [16]:
p

Unnamed: 0,(Intercept),lakeHancock,lakeOklawaha,lakeTrafford,genderMale,size> 2.3
Bird,0.0002597596,0.469346389,0.64923077,0.153192223,0.3786897,0.262932046
Invertebrate,0.4080194681,0.004277626,0.05510781,0.01899974,0.2418023,0.001155127
Other,0.0012331869,0.177558082,0.97327284,0.012783087,0.5880714,0.527525975
Reptile,0.0006369062,0.343686137,0.02413906,0.006736743,0.3597684,0.388965428


In [17]:
exp(coef(ml.ajust))

Unnamed: 0,(Intercept),lakeHancock,lakeOklawaha,lakeTrafford,genderMale,size> 2.3
Bird,0.04790101,1.7778164,0.5767774,3.445644,1.833832,2.0755507
Invertebrate,0.74531732,0.1685635,2.4922981,3.176721,1.588764,0.2628246
Other,0.18575691,2.1524009,1.0264008,4.748251,1.287354,0.7478332
Reptile,0.01753504,3.093923,12.5560539,21.349232,1.873062,1.7455072


In [18]:
pp <- fitted(ml.ajust)
pp

Unnamed: 0,Fish,Bird,Invertebrate,Other,Reptile
1,0.6236473,0.11023107,0.02059253,0.18647129,0.05905776
2,0.1449064,0.04385936,0.54508877,0.16453726,0.10160821
3,0.4825214,0.02766960,0.23557205,0.06879919,0.18543775
4,0.3050732,0.10450871,0.18984118,0.20122758,0.19934930
5,0.2088077,0.03446367,0.49438679,0.18417267,0.07816917
6,0.3050732,0.10450871,0.18984118,0.20122758,0.19934930
7,0.4825214,0.02766960,0.23557205,0.06879919,0.18543775
8,0.1449064,0.04385936,0.54508877,0.16453726,0.10160821
9,0.2088077,0.03446367,0.49438679,0.18417267,0.07816917
10,0.5070734,0.07918849,0.10121267,0.26099788,0.05152752


## Desafio

   - Crie uma nova coluna e adicione ao dataset. Nesta coluna deverá conter o indice do cabeçalho que aoresenta maior probabilidade. 