https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/

https://www.r-bloggers.com/how-to-perform-a-logistic-regression-in-r/

In [1]:
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ purrr   0.2.4
✔ tibble  1.4.2     ✔ dplyr   0.7.4
✔ tidyr   0.8.0     ✔ stringr 1.3.0
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


In [2]:
dat <- tribble(
    ~logdose, ~total, ~killed,
    1.691, 59, 6,
    1.724, 60, 13,
    1.755, 62, 18,
    1.784, 56, 28,
    1.811, 63, 52,
    1.837, 59, 53,
    1.861, 62, 61,
    1.884, 60, 60)

In [4]:
model <- glm(killed/total ~ logdose , family = binomial(link = "logit"), data = dat)
#model <- glm(killed/total ~ logdose , family = binomial, data = dat)
summary(model)

“non-integer #successes in a binomial glm!”


Call:
glm(formula = killed/total ~ logdose, family = binomial(link = "logit"), 
    data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.20847  -0.04842   0.11004   0.16134   0.20704  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -60.48      40.06  -1.510    0.131
logdose        34.14      22.52   1.516    0.130

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.71186  on 7  degrees of freedom
Residual deviance: 0.18673  on 6  degrees of freedom
AIC: 8.0301

Number of Fisher Scoring iterations: 5


In [5]:
tmp <- dat %>% mutate(y = killed / total)
model <- glm(y ~ logdose , family = binomial(link = "logit"), data = tmp)
summary(model)

“non-integer #successes in a binomial glm!”


Call:
glm(formula = y ~ logdose, family = binomial(link = "logit"), 
    data = tmp)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.20847  -0.04842   0.11004   0.16134   0.20704  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -60.48      40.06  -1.510    0.131
logdose        34.14      22.52   1.516    0.130

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.71186  on 7  degrees of freedom
Residual deviance: 0.18673  on 6  degrees of freedom
AIC: 8.0301

Number of Fisher Scoring iterations: 5


-----

**How to do logistic regression in R when outcome is fractional (a ratio of two counts)?**  
https://stats.stackexchange.com/questions/26762/how-to-do-logistic-regression-in-r-when-outcome-is-fractional-a-ratio-of-two-co?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

*The glm function in R allows 3 ways to specify the formula for a logistic regression model.*

*The most common is that each row of the data frame represents a single observation and the response variable is either 0 or 1 (or a factor with 2 levels, or other varibale with only 2 unique values).*

*Another option is to use a 2 column matrix as the response variable with the first column being the counts of 'successes' and the second column being the counts of 'failures'.*

*You can also specify the response as a proportion between 0 and 1, then specify another column as the 'weight' that gives the total number that the proportion is from (so a response of 0.3 and a weight of 10 is the same as 3 'successes' and 7 'failures').*

*Either of the last 2 ways would fit what you are trying to do, the last seems the most direct for how you describe your data.*

**Adding weights to logistic regression for imbalanced data**  
https://stats.stackexchange.com/questions/164693/adding-weights-to-logistic-regression-for-imbalanced-data?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

```
glm(y ~ x1 + x2, weights = wt, data =data, family = binomial("logit"))
```

In [8]:
model <- glm(
    killed/total ~ logdose, 
    weights = total,
    family = binomial(link = "logit"), 
    data = dat)

#model <- glm(killed/total ~ logdose , family = binomial, data = dat)
summary(model)


Call:
glm(formula = killed/total ~ logdose, family = binomial(link = "logit"), 
    data = dat, weights = total)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5878  -0.4085   0.8442   1.2455   1.5860  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -60.740      5.182  -11.72   <2e-16 ***
logdose       34.286      2.913   11.77   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 284.202  on 7  degrees of freedom
Residual deviance:  11.116  on 6  degrees of freedom
AIC: 41.314

Number of Fisher Scoring iterations: 4
