<font color = 'blue'>
Content: 

1. [Multinomial Logistic Regression](#1)
   * 1.1 [Data](#2)
   * 1.2 [Data Partition](#3)
   * 1.3 [Multinominal Logistic Regression](#4)

1. [Model & Interpretation](#5)
    * 2.1. [Two Tail Z-test p-value](#6)
    * 2.2. [Second Model](#7)
1. [Misclassification Error and Confusion Matrix](#8)
    * 3.1. [Training Data](#9)
    * 3.2. [Testing Data](#10)
1. [Prediction and Model Assessment](#11)
1. [References](#12)



<a id = "1"></a><br>
# 1. Multinomial Logistic Regression
<a id = "2"></a><br>
# 1.1. Data

In [None]:
# Data
library (readr)

urlfile="https://raw.githubusercontent.com/bkrai/R-files-from-YouTube/main/Cardiotocographic.csv"
mydata<-read_csv(url(urlfile))
head(mydata)

# except NSP all variables are independent variables
# NSP is a dependent variable
## NSP = 1 --> Normal patient
## NSP = 2 --> Suspect patient
## NSP = 3 --> Pathologic patient
## N - Normal 
## S - Suspect
## P - Pathologic

In [None]:
# NSP is integer. We first need to convert from integer to factor
mydata$NSP <- as.factor(mydata$NSP)

<a id = "3"></a><br>
# 1.2. Data Partition
* CTG data
* Categorical response variables at three levels
* Data partition
* Multinomial Logistic Regression Model

In [None]:
# Data Partition
set.seed(222)
ind <- sample(2, nrow(mydata),
              replace = TRUE, # sampling for replacing
              prob = c(0.6, 0.4)) # we use probability for splitting


training <- mydata[ind == 1,  ]  # first data set we call training, 1 is first 60% of data
testing <- mydata[ind == 2, ] 
# training data has 1277 observations
# testing data has 849 observations

<a id = "4"></a><br>
## 1.3. Multinominal Logistic Regression

In [None]:
# Multinominal Logistic Regression
library(nnet)
training$NSP <- relevel(training$NSP, ref = "1") # as as reference we put 1 for normal patient
mymodel <- multinom(NSP~., data = training)    # it means all other variables, remember NSP is a dependent variable, apart from NSP we have 21 variables
# this is based on training data model
summary(mymodel)

<a id = "5"></a><br>
# 2. Model and Interpretation
<a id = "6"></a><br>
## 2.1. Two Tail Z-test p-value

In [None]:
# Finalizing model
## Two Tail Z-test p-value
z <- summary(mymodel)$coefficients/summary(mymodel)$standard.errors
p <- pnorm(1 - pnorm(abs(z), 0, 1)) * 2  # we add 2 due two tail z-test
p

<a id = "7"></a><br>
## 2.2. Second Model

In [None]:
# SECOND MODEL
# we will not use the value more than 0.05 
# let's adjust model again
mymodel <- multinom(NSP~. -MLTV -Width - Min - Max - Nmax -Nzeros - Tendency,
                    data = training)


z <- summary(mymodel)$coefficients/summary(mymodel)$standard.errors
p <- pnorm(1 - pnorm(abs(z), 0, 1)) * 2  # we add 2 due two tail z-test
p

In [None]:
# Interpretation
# Eqauation
# NSP = 2  patient is suspicious, NSP = 1 patient is normal, this log-odds
# we don't have y and nsp
# this is first equation
# In[P(NSP=2) / (NSP=1)] = -16,62047 + (-0.07164 * LB) + (-748.85498 * AC) + ....+ (0.0464* Varience)


# second equation
# NSP = 3 patient is Pathologic 
# In[P(NSP=3)/P(NSP=1)] = -18.55244 + (0.40854 * LB) + (-29.62735 * AC) + .... + (0.6643 / Varience)

<a id = "8"></a><br>
# 3. Confusion Matrix & Misclassification Error 
<a id = "9"></a><br>
## 3.1 Training Data

In [None]:
# Confusion Matrix & Misclassification Error - Training Data
p <- predict(mymodel, training)
head(p)

In [None]:
head(training$NSP)
# when we compare the outputs, out of six predictions we see that 5 are correct, last one is wrong

In [None]:
# top numbers of the table is actual numbers, prediction is on the left side
tab <- table(p, training$NSP)
tab

In [None]:
# accuracy
sum(diag(tab)) / sum(tab)

In [None]:
# misclassifications
1 - sum(diag(tab)) / sum(tab)

<a id = "10"></a><br>
## 3.2 Testing Data

In [None]:
# Confusion Matrix & Misclassification Error - testing Data
p1 <- predict(mymodel, testing)
tab1 <- table(p1, testing$NSP)
tab1

In [None]:
# missclassifications
1 - sum(diag(tab1)) / sum(tab1)

<a id = "11"></a><br>
# 4. Prediction & Model Assessment

In [None]:
# Accuracy and Sensitivity - training data
n <- table(training$NSP)
# In training data set has 1004 patients are normal, 169 of them is suspucious, 104 Pathologic
# let's see them as a ratio
n / sum(n)

In [None]:
tab / colSums(tab)
# this model 96 percent correct classification for 1, 59% correct for 2, 81% correct for 3
# we made a good job 1 and 3 
# 2 is not good as them as 

In [None]:
tab1 / colSums(tab1)

<a id = "12"></a><br>
# 5. References
* https://www.youtube.com/watch?v=S2rZp4L_nXo&list=PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG
* https://www.youtube.com/watch?v=oxRy2DMrOF4&list=PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG&index=2
* https://www.youtube.com/watch?v=11VY8CmNVDQ&list=PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG&index=3
* https://www.youtube.com/watch?v=POyTaeneHJY&list=PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG&index=4