## mclogit installation

We first install mclogit. Its not included in the env.yaml because it can only be installed from within R. Only run the cell below once in the environment to install it.

In [1]:
library("versions")  # use versions library to install mclogit version 0.9.4.2 for consistency with paper
install.versions("mclogit", "0.9.4.2")

also installing the dependencies ‘data.table’, ‘memisc’

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done


Now we still need to load the packages, same as import of python modules. This needs to be done each time for the notebook.

We also need to import some functions for the analysis of the statistical models.

In [1]:
library("mclogit")
source("mclogit-effects.r")

Loading required package: Matrix


## Importing the dataframe to R

In [2]:
# we can directly read the stored pandas dataframe to R
df_wt <- read.csv("../data/results/dataframe.csv")

In [3]:
# inspect the R dataframe
head(df_wt, n=5)

X,rlnCoordinateX,rlnCoordinateY,rlnCoordinateZ,rlnAngleRot,rlnAngleTilt,rlnAnglePsi,rlnImageName,rlnCtfImage,rlnRandomSubset,...,state,elongation,trapccdc,state_full,trailing_id,leading_id,tr_state_full,ld_state_full,tr_elongation,ld_elongation
0,872.7986,529.7619,286.2439,11.28056,82.88946,181.6519,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000082_6.90A.mrc,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000082_ctf_6.90A.mrc,1,...,NCLN,Dec,ap,NCLNCCDC47,-1,-1,non,non,non,non
1,401.9046,214.0999,138.9087,-61.55985,136.2531,-135.1444,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000074_6.90A.mrc,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000074_ctf_6.90A.mrc,1,...,OST,Dec,Unk,OST,43744,-1,OST,non,Rot1,non
2,824.8273,699.23,245.7589,-54.52035,85.32531,102.2301,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000066_6.90A.mrc,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000066_ctf_6.90A.mrc,1,...,OST,Dec,Unk,OST,22540,-1,TRAP,non,Pre+,non
3,854.6401,663.1552,180.3849,135.5634,108.8299,170.2104,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000131_6.90A.mrc,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000131_ctf_6.90A.mrc,2,...,OST,Dec,Unk,OST,-1,-1,non,non,non,non
4,365.4104,783.4767,171.0251,-170.5296,96.4524,78.46428,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000084_6.90A.mrc,/data2/mgemmer/WARPM/HEKFWT/000_Subtomograms/001_TM_bin4/tomo1.mrc/tomo1.mrc_Particles_TM_0000084_ctf_6.90A.mrc,1,...,TRAP,Dec,Unk,TRAP,93748,77732,Unk,Sol,Rot2,Rot2


In [4]:
# add some annotation => one for whether the ribosome has any polysome connections
# this way we can test the probability of classes being present in a polysome chain
df_wt$in_chain <- ((df_wt$trailing_id != -1) | (df_wt$leading_id != -1)) # has trailing or leading connection
df_wt$in_chain <- factor(df_wt$in_chain)

## Fitting a multinomial logit model with mixed effects

For the random parameter we first set (~1|date) meaning a random intercept per date, but fixed slope.

In [8]:
# create temporary dataframe with only ribosomes assigned to an elongation cycle intermediate
temp <- df_wt[ df_wt$elongation %in% c('Dec', 'Post', 'Pre', 'Pre+', 'Rot1', 'Rot1+', 
         'Rot2', 'RotIdle', 'Translocation', 'UnRotIdle'), , drop=FALSE ]
temp$elongation <- factor(temp$elongation)

model_elongation_state_in_chain_1a <- mblogit(in_chain ~ elongation, data = temp, random = ~1|date,
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_elongation_state_in_chain_1a)


Iteration 1 - deviance = 172541.8 - criterion = 1.860693
Iteration 2 - deviance = 172297 - criterion = 0.02642995
Iteration 3 - deviance = 172292.8 - criterion = 0.001013018
Iteration 4 - deviance = 172292.8 - criterion = 3.175223e-06
Iteration 5 - deviance = 172292.8 - criterion = 3.386836e-11
converged



Call:
mblogit(formula = in_chain ~ elongation, data = temp, random = ~1 | 
    date, control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for TRUE vs FALSE:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -0.01374    0.08287  -0.166 0.868326    
elongationPost          -0.10491    0.02115  -4.960 7.04e-07 ***
elongationPre            0.18660    0.03472   5.375 7.66e-08 ***
elongationPre+           0.30819    0.01593  19.347  < 2e-16 ***
elongationRot1           0.38057    0.03083  12.344  < 2e-16 ***
elongationRot1+          0.80285    0.03614  22.214  < 2e-16 ***
elongationRot2           0.33715    0.01883  17.905  < 2e-16 ***
elongationRotIdle       -2.83751    0.06312 -44.956  < 2e-16 ***
elongationTranslocation -0.10007    0.02903  -3.447 0.000567 ***
elongationUnRotIdle     -1.46087    0.03109 -46.993  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping level: date 
      

We can attempt to make the random effects more complex by making both the intercept and the slope random (~elongation|date). Now we get a large variance-covariance matrix for the random effects because we get a different slope per date for each predictor, while we first we only had a different intercept per date. The variance is not very high, the RotIdle state has the highest variance. This state also has a low abundance value so, its no suprising that it will have more variation per date.

In [26]:
# create temporary dataframe with only ribosomes assigned to an elongation cycle intermediate
temp <- df_wt[ df_wt$elongation %in% c('Dec', 'Post', 'Pre', 'Pre+', 'Rot1', 'Rot1+', 
         'Rot2', 'RotIdle', 'Translocation', 'UnRotIdle'), , drop=FALSE ]
temp$elongation <- factor(temp$elongation)

model_elongation_state_in_chain_1b <- mblogit(in_chain ~ elongation, data = temp, 
                                              random = list(~elongation|date),
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_elongation_state_in_chain_1b)


Iteration 1 - deviance = 172659 - criterion = 1.847799
Iteration 2 - deviance = 172401.8 - criterion = 0.02742811
Iteration 3 - deviance = 172396.3 - criterion = 0.001408096
Iteration 4 - deviance = 172395.6 - criterion = 1.595249e-05
Iteration 5 - deviance = 172394.4 - criterion = 9.18747e-05
Iteration 6 - deviance = 172394.4 - criterion = 1.708722e-06
Iteration 7 - deviance = 172394.5 - criterion = 1.283868e-07
Iteration 8 - deviance = 172394.4 - criterion = 3.04271e-08
Iteration 9 - deviance = 172394.4 - criterion = 6.539824e-09
converged



Call:
mblogit(formula = in_chain ~ elongation, data = temp, random = list(~elongation | 
    date), control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for TRUE vs FALSE:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)              0.01024    0.06700   0.153  0.87857    
elongationPost          -0.11225    0.04755  -2.360  0.01825 *  
elongationPre            0.17648    0.05732   3.079  0.00208 ** 
elongationPre+           0.26177    0.05937   4.409 1.04e-05 ***
elongationRot1           0.36381    0.05813   6.259 3.88e-10 ***
elongationRot1+          0.76768    0.06528  11.760  < 2e-16 ***
elongationRot2           0.29555    0.05553   5.323 1.02e-07 ***
elongationRotIdle       -2.60051    0.21057 -12.350  < 2e-16 ***
elongationTranslocation -0.12050    0.04474  -2.693  0.00707 ** 
elongationUnRotIdle     -1.34532    0.11064 -12.160  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping leve

By reordering the levels of the factor elongation, it is possible to query specific differences in the dataset. The multinomial logit phrases the equations a bit counterintuitive and depending on the ordering of your predictor variable the nul hypothesis are ordered differently. See the following reference, specifically the section about **Independence of irrelevant alternatives**: https://bookdown.org/sarahwerth2024/CategoricalBook/multinomial-logit-regression-r.html

In [102]:
# we will reorder the levels of the elongation factor in the dataframe
# this we way can more easily read statistical significance between the idle states and the active states
temp <- df_wt[ df_wt$elongation %in% c('Dec', 'Post', 'Pre', 'Pre+', 'Rot1', 'Rot1+', 
         'Rot2', 'RotIdle', 'Translocation', 'UnRotIdle'), , drop=FALSE ]
temp$elongation <- factor(temp$elongation, 
                          levels = c('UnRotIdle', 'RotIdle', 'Dec', 'Post', 'Pre', 
                                     'Pre+', 'Rot1', 'Rot1+', 'Rot2', 'Translocation'))

model_elongation_state_in_chain_1b2 <- mblogit(in_chain ~ elongation, data = temp, 
                                              random = list(~elongation|date),
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_elongation_state_in_chain_1b2)


Iteration 1 - deviance = 172661 - criterion = 1.847819
Iteration 2 - deviance = 172403.9 - criterion = 0.02746066
Iteration 3 - deviance = 172397 - criterion = 0.001183328
Iteration 4 - deviance = 172396.4 - criterion = 6.433608e-06
Iteration 5 - deviance = 172396.4 - criterion = 1.434121e-09
converged



Call:
mblogit(formula = in_chain ~ elongation, data = temp, random = list(~elongation | 
    date), control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for TRUE vs FALSE:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -1.3331     0.0884 -15.080   <2e-16 ***
elongationRotIdle        -1.2664     0.1417  -8.939   <2e-16 ***
elongationDec             1.3366     0.1254  10.658   <2e-16 ***
elongationPost            1.2330     0.1147  10.750   <2e-16 ***
elongationPre             1.5272     0.1260  12.119   <2e-16 ***
elongationPre+            1.6044     0.1657   9.681   <2e-16 ***
elongationRot1            1.7052     0.1498  11.380   <2e-16 ***
elongationRot1+           2.1136     0.1549  13.646   <2e-16 ***
elongationRot2            1.6406     0.1553  10.561   <2e-16 ***
elongationTranslocation   1.2200     0.1364   8.944   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping leve

Instead we can also attempt to add a random intercept for both the date and tomogram id, by either putting `list(~1|date, ~1|tomogram)` or `~1|date/tomogram`. The latter works as these random effects are nested, i.e. a tomogram was always collected during a specific collection sessions (the date represents collection session).

In [5]:
# create temporary dataframe with only ribosomes assigned to an elongation cycle intermediate
temp <- df_wt[ df_wt$elongation %in% c('Dec', 'Post', 'Pre', 'Pre+', 'Rot1', 'Rot1+', 
         'Rot2', 'RotIdle', 'Translocation', 'UnRotIdle'), , drop=FALSE ]
temp$elongation <- factor(temp$elongation)
temp$in_chain <- factor(temp$in_chain, levels=c('TRUE', 'FALSE'))

model_elongation_state_in_chain_1c <- mblogit(in_chain ~ elongation, data=temp, random=~1|tomogram,
                       control=mmclogit.control(epsilon=1e-08, maxit=100))
summary(model_elongation_state_in_chain_1c)


Iteration 1 - deviance = 171012.2 - criterion = 1.540481
Iteration 2 - deviance = 170988.7 - criterion = 0.02597973
Iteration 3 - deviance = 171039.1 - criterion = 0.001108574
Iteration 4 - deviance = 171045.7 - criterion = 5.874258e-06
Iteration 5 - deviance = 171047 - criterion = 6.026613e-08
Iteration 6 - deviance = 171047.3 - criterion = 2.319919e-09
converged



Call:
mblogit(formula = in_chain ~ elongation, data = temp, random = ~1 | 
    tomogram, control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for FALSE vs TRUE:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -0.05429    0.01929  -2.815  0.00488 ** 
elongationPost           0.09808    0.02149   4.565 5.00e-06 ***
elongationPre           -0.19046    0.03525  -5.403 6.57e-08 ***
elongationPre+          -0.31159    0.01617 -19.267  < 2e-16 ***
elongationRot1          -0.37459    0.03131 -11.964  < 2e-16 ***
elongationRot1+         -0.81143    0.03659 -22.174  < 2e-16 ***
elongationRot2          -0.34002    0.01915 -17.756  < 2e-16 ***
elongationRotIdle        2.83920    0.06358  44.654  < 2e-16 ***
elongationTranslocation  0.09140    0.02946   3.102  0.00192 ** 
elongationUnRotIdle      1.46071    0.03164  46.172  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping level: tomogram

In [15]:
p.adjust(summary(model_elongation_state_in_chain_1c)$coefficients[,4], method='fdr')

The variance-covariance matrix of the random effects (although now its just a single value), shows how much the output levels vary with the random variable. In this case we have one output level (true or false, but this can be condensed to a single value), and we see how much this varies with the random effect we included. In case of a matrix we dont want to see high values on the off-diagonals, higher values on the diagonals however just mean that the predictor varies with the date or tomogram number. This can also be a bad thing but depends on the variable and its abundance. In this case we see there is more variation per tomogram than there is per date. This is unsurprising as each date has many observations (approx. 20000 per date) while each tomogram has relatively little observations (approx. 100 per tomogram). The latter will inherently be more noisy. Its a good confirmation that there is no strange variance in the data though. 

In the most complex model we try to fit, we model a random effect for intercept and slope per tomogram.

In [63]:
# create temporary dataframe with only ribosomes assigned to an elongation cycle intermediate
temp <- df_wt[ df_wt$elongation %in% c('Dec', 'Post', 'Pre', 'Pre+', 'Rot1', 'Rot1+', 
         'Rot2', 'RotIdle', 'Translocation', 'UnRotIdle'), , drop=FALSE ]
temp$elongation <- factor(temp$elongation)

model_elongation_state_in_chain_1d <- mblogit(in_chain ~ elongation, data = temp, random = list(~elongation|tomogram),
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_elongation_state_in_chain_1d)


Iteration 1 - deviance = 178011 - criterion = 1.46815
Iteration 2 - deviance = 177487 - criterion = 0.02896193
Iteration 3 - deviance = 177201.3 - criterion = 0.002467847
Iteration 4 - deviance = 177099.8 - criterion = 0.0001228894
Iteration 5 - deviance = 177088.3 - criterion = 3.92408e-06
Iteration 6 - deviance = 177087.5 - criterion = 3.464916e-07
Iteration 7 - deviance = 177087.8 - criterion = 4.299052e-08
Iteration 8 - deviance = 177088 - criterion = 5.915139e-09
converged



Call:
mblogit(formula = in_chain ~ elongation, data = temp, random = list(~elongation | 
    tomogram), control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for TRUE vs FALSE:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)              0.05656    0.02136   2.649  0.00808 ** 
elongationPost          -0.10965    0.02748  -3.991 6.59e-05 ***
elongationPre            0.19286    0.03992   4.831 1.36e-06 ***
elongationPre+           0.30765    0.01997  15.409  < 2e-16 ***
elongationRot1           0.37886    0.03628  10.442  < 2e-16 ***
elongationRot1+          0.81095    0.03982  20.367  < 2e-16 ***
elongationRot2           0.33534    0.02408  13.926  < 2e-16 ***
elongationRotIdle       -2.91050    0.07513 -38.739  < 2e-16 ***
elongationTranslocation -0.10257    0.03600  -2.849  0.00439 ** 
elongationUnRotIdle     -1.44192    0.04294 -33.583  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Co-)Variances:
Grouping 

We run the custom made effects package implementation (see file `mclogit-effects.r`) to extract probabilities and 95% CI around them from the fitted model. We run it on model 1b with the random intercept + slope per date. This model showed us the collection date added some variance to the observations but no covariance could be observed.

In [None]:
fit.eff <- Effect.mblogit(model_elongation_state_in_chain_1c, 'elongation')
prediction <- data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
row.names(prediction) <- fit.eff$predictors  # these are the row names of this table
prediction

In [61]:
# we can right the results to a csv file as such:
write.csv(prediction, "../data/results/mblogit_elongation-state-in-chain.csv", row.names=TRUE)

## Probability of neighbour states in polysomes

In [6]:
# we create new column with neighbour state full
temp1 <- df_wt[ df_wt$ld_state_full != 'non', , drop=FALSE ]
temp1$n_state_full <- factor(temp1$ld_state_full)
temp2 <- df_wt[ df_wt$tr_state_full != 'non', , drop=FALSE ]
temp2$n_state_full <- factor(temp2$tr_state_full)

# want to know here the probability of neighbour classes in a polysome without considering trailing/leading positions
temp <- rbind(temp1, temp2)  # R automatically makes the rownames unique in the new dataframe

# group all multipass translocons
temp$n_state_full <- factor(ifelse(temp$n_state_full %in% c('NCLN', 'NCLNCCDC47', 'NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLN', as.character(temp$n_state_full)))

temp$state <- factor(temp$state, levels = c('TRAP', 'OST', 'NCLN', 'Sol', 'Unk'))

# fit a model for this
model_polysome_neighbour_state <- mblogit(n_state_full ~ state, data = temp, random = list(~1|tomogram),
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_polysome_neighbour_state)

# write this temp dataframe as we need it for plotting in python latter to calculate raw freqs per tomogram
write.csv(temp[, c('date', 'tomogram', 'state', 'n_state_full')], 
          "../data/results/mblogit_overall-polysome-neighbour-probability_raw-data.csv", row.names=TRUE)


Iteration 1 - deviance = 202273.9 - criterion = 0.8196797
Iteration 2 - deviance = 197118.9 - criterion = 0.05961137
Iteration 3 - deviance = 197002.1 - criterion = 0.0064342
Iteration 4 - deviance = 196933.8 - criterion = 0.0001385581
Iteration 5 - deviance = 196941 - criterion = 7.433591e-05
Iteration 6 - deviance = 196948.9 - criterion = 1.128022e-06
Iteration 7 - deviance = 196949.8 - criterion = 3.50183e-08
Iteration 8 - deviance = 196950 - criterion = 2.033859e-09
converged



Call:
mblogit(formula = n_state_full ~ state, data = temp, random = list(~1 | 
    tomogram), control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for OST vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.21885    0.04414  27.616  < 2e-16 ***
stateOST     1.57670    0.04776  33.011  < 2e-16 ***
stateNCLN   -2.47417    0.04915 -50.344  < 2e-16 ***
stateSol    -0.27291    0.08520  -3.203  0.00136 ** 
stateUnk    -0.51461    0.04857 -10.595  < 2e-16 ***

Equation for Sol vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.98442    0.07325  -13.44   <2e-16 ***
stateOST    -0.03422    0.08341   -0.41    0.682    
stateNCLN   -2.24562    0.09506  -23.62   <2e-16 ***
stateSol     3.87762    0.09522   40.72   <2e-16 ***
stateUnk     1.55785    0.07450   20.91   <2e-16 ***

Equation for TRAP vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.42289    0.05848  -7.231 4.80e-13 ***
stateOST     0.77363   

Good, no high co-variances with tomogram id.

In [7]:
# print the calculated probailities from the model
fit.eff <- Effect.mblogit(model_polysome_neighbour_state, 'state')
prediction <- data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
row.names(prediction) <- fit.eff$predictors
prediction

# and write to csv file
write.csv(prediction, "../data/results/mblogit_overall-polysome-neighbour-probability.csv", row.names=TRUE)

Unnamed: 0,prob.NCLN,prob.OST,prob.Sol,prob.TRAP,prob.Unk,L.prob.NCLN,L.prob.OST,L.prob.Sol,L.prob.TRAP,L.prob.Unk,U.prob.NCLN,U.prob.OST,U.prob.Sol,U.prob.TRAP,U.prob.Unk
TRAP,0.14470466,0.4895808,0.05406981,0.09480365,0.2168411,0.13515203,0.47390729,0.04787922,0.08686356,0.2044275,0.15481162,0.50527477,0.06100954,0.10338735,0.2297908
OST,0.04602487,0.7535071,0.01661889,0.06536125,0.1184879,0.04368723,0.7458532,0.0151986,0.06170995,0.113381,0.04848125,0.76100427,0.01816945,0.06921266,0.1237928
NCLN,0.55366143,0.1577859,0.02190058,0.06541987,0.2012322,0.5431112,0.14999086,0.01921697,0.06043103,0.1924166,0.56416354,0.16590697,0.0249494,0.07078953,0.2103466
Sol,0.02583548,0.0665325,0.46635456,0.02880141,0.4124761,0.02279084,0.06128344,0.45067645,0.02542453,0.3977458,0.02927467,0.07219657,0.48209942,0.03261179,0.4273647
Unk,0.10881795,0.2200635,0.19308042,0.05374977,0.4242883,0.10417536,0.21206772,0.1842436,0.0500591,0.4133868,0.11364119,0.22827349,0.202236,0.05769602,0.4352641


## Fit models for the leading/trailing states in polysomes

In [24]:
temp <- data.frame(df_wt)
temp$ld_state_full <- factor(ifelse(temp$ld_state_full %in% c('NCLN', 'NCLNCCDC47', 'NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLN', as.character(temp$ld_state_full)))
temp <- temp[temp$ld_state_full %in% c('NCLN', 'OST', 'TRAP', 'Sol', 'Unk'), , drop=FALSE]
temp$ld_state_full <- factor(temp$ld_state_full)

model_lead_state <- mblogit(ld_state_full ~ state, data = temp, random = ~1|tomogram,
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_lead_state)


Iteration 1 - deviance = 101014.5 - criterion = 0.8094286
Iteration 2 - deviance = 97955.7 - criterion = 0.07111355
Iteration 3 - deviance = 97782.67 - criterion = 0.01488656
Iteration 4 - deviance = 97688.12 - criterion = 0.001490879
Iteration 5 - deviance = 97681.31 - criterion = 0.0001041553
Iteration 6 - deviance = 97682.16 - criterion = 1.769941e-06
Iteration 7 - deviance = 97680.22 - criterion = 1.669827e-07
Iteration 8 - deviance = 97679.23 - criterion = 2.539702e-08
Iteration 9 - deviance = 97678.79 - criterion = 4.345814e-09
converged



Call:
mblogit(formula = ld_state_full ~ state, data = temp, random = ~1 | 
    tomogram, control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for OST vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.44523    0.04436  -32.58   <2e-16 ***
stateOST     4.13370    0.05200   79.49   <2e-16 ***
stateSol     2.52288    0.10141   24.88   <2e-16 ***
stateTRAP    2.55291    0.06570   38.86   <2e-16 ***
stateUnk     2.15250    0.05349   40.24   <2e-16 ***

Equation for Sol vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.5819     0.1089  -32.90   <2e-16 ***
stateOST      1.8185     0.1323   13.74   <2e-16 ***
stateSol      6.2849     0.1337   46.99   <2e-16 ***
stateTRAP     1.8605     0.1540   12.08   <2e-16 ***
stateUnk      3.9532     0.1111   35.59   <2e-16 ***

Equation for TRAP vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.51697    0.06811  -36.95   <2e-16 ***
stateOST     2.55695    0.07

In [26]:
# print the calculated probailities from the model
fit.eff <- Effect.mblogit(model_lead_state, 'state')
prediction <- data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
row.names(prediction) <- fit.eff$predictors
prediction

# and write to csv file
write.csv(prediction, "../data/results/mblogit_leading-polysome-translocon.csv", row.names=TRUE)

Unnamed: 0,prob.NCLN,prob.OST,prob.Sol,prob.TRAP,prob.Unk,L.prob.NCLN,L.prob.OST,L.prob.Sol,L.prob.TRAP,L.prob.Unk,U.prob.NCLN,U.prob.OST,U.prob.Sol,U.prob.TRAP,U.prob.Unk
NCLN,0.60364548,0.14227366,0.016794817,0.04871664,0.1885694,0.58936077,0.13219252,0.013626961,0.04295195,0.17691704,0.61775555,0.15298805,0.02068366,0.05521039,0.200802
OST,0.05285589,0.77745962,0.009062523,0.05501194,0.10561,0.04944917,0.76819619,0.007734302,0.05088481,0.09947285,0.05648337,0.78645576,0.0106164,0.05945285,0.1120787
Sol,0.02967336,0.08717274,0.442856719,0.03541244,0.4048847,0.02532005,0.07902589,0.422463328,0.03042917,0.3862443,0.03474846,0.09607185,0.46344459,0.04117713,0.4238036
TRAP,0.16833299,0.50959787,0.030100997,0.09043882,0.2015293,0.15489835,0.48912988,0.024409612,0.08025648,0.18592008,0.18268097,0.53003374,0.03706897,0.10177007,0.218098
Unk,0.11765085,0.23864656,0.170554319,0.04949913,0.4236491,0.11119289,0.22757017,0.15968496,0.04492033,0.40981249,0.12443137,0.25008752,0.18200329,0.05451802,0.4376066


In [27]:
temp <- data.frame(df_wt)
temp$tr_state_full <- factor(ifelse(temp$tr_state_full %in% c('NCLN', 'NCLNCCDC47', 'NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLN', as.character(temp$tr_state_full)))
temp <- temp[temp$tr_state_full %in% c('NCLN', 'OST', 'TRAP', 'Sol', 'Unk'), , drop=FALSE]
temp$tr_state_full <- factor(temp$tr_state_full)

model_trail_state <- mblogit(tr_state_full ~ state, data = temp, random = ~1|tomogram,
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_trail_state)


Iteration 1 - deviance = 104068.4 - criterion = 0.8267962
Iteration 2 - deviance = 101091.7 - criterion = 0.05891502
Iteration 3 - deviance = 101159.2 - criterion = 0.004288555
Iteration 4 - deviance = 101127.7 - criterion = 0.0002802274
Iteration 5 - deviance = 101136.3 - criterion = 2.461749e-05
Iteration 6 - deviance = 101143.5 - criterion = 3.171344e-06
Iteration 7 - deviance = 101147.2 - criterion = 5.005074e-07
Iteration 8 - deviance = 101148.8 - criterion = 8.263676e-08
Iteration 9 - deviance = 101149.4 - criterion = 1.364449e-08
Iteration 10 - deviance = 101149.7 - criterion = 2.237801e-09
converged



Call:
mblogit(formula = tr_state_full ~ state, data = temp, random = ~1 | 
    tomogram, control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for OST vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.17433    0.04091  -28.70   <2e-16 ***
stateOST     4.13667    0.05196   79.61   <2e-16 ***
stateSol     1.80665    0.13201   13.69   <2e-16 ***
stateTRAP    2.54730    0.07683   33.15   <2e-16 ***
stateUnk     1.86364    0.05324   35.01   <2e-16 ***

Equation for Sol vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.07491    0.08887  -34.60   <2e-16 ***
stateOST     2.54299    0.10173   25.00   <2e-16 ***
stateSol     6.30061    0.13375   47.11   <2e-16 ***
stateTRAP    2.70271    0.12589   21.47   <2e-16 ***
stateUnk     3.81397    0.09146   41.70   <2e-16 ***

Equation for TRAP vs NCLN:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.92679    0.05640  -34.16   <2e-16 ***
stateOST     2.56896    0.06

In [28]:
# print the calculated probailities from the model
fit.eff <- Effect.mblogit(model_trail_state, 'state')
prediction <- data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
row.names(prediction) <- fit.eff$predictors
prediction

# and write to csv file
write.csv(prediction, "../data/results/mblogit_trailing-polysome-translocon.csv", row.names=TRUE)

Unnamed: 0,prob.NCLN,prob.OST,prob.Sol,prob.TRAP,prob.Unk,L.prob.NCLN,L.prob.OST,L.prob.Sol,L.prob.TRAP,L.prob.Unk,U.prob.NCLN,U.prob.OST,U.prob.Sol,U.prob.TRAP,U.prob.Unk
NCLN,0.53122601,0.16416186,0.02453927,0.07735463,0.2027182,0.51762564,0.15377831,0.02072909,0.06993321,0.1915625,0.54478016,0.17510145,0.02902903,0.08549123,0.2143513
OST,0.03826064,0.74007621,0.02247714,0.07271818,0.1264678,0.03549847,0.72992281,0.02008389,0.06738859,0.1199125,0.04122855,0.74997864,0.02514825,0.07843383,0.1333273
Sol,0.02053183,0.03863999,0.51681055,0.0192691,0.4047485,0.01676131,0.03319828,0.49448652,0.01557617,0.3842417,0.02512885,0.04493222,0.53906769,0.02381638,0.4255933
TRAP,0.11676147,0.46086267,0.08047359,0.11144288,0.2304594,0.10422871,0.43835999,0.06918132,0.09866013,0.2124971,0.13058153,0.48352601,0.09342407,0.12565092,0.2494591
Unk,0.09870988,0.19666353,0.20669558,0.05630684,0.4416242,0.09273392,0.18659439,0.19371423,0.0510712,0.4278283,0.10502637,0.20713765,0.22030916,0.06204414,0.4555107


## Leading/trailing state extended

In [30]:
temp <- data.frame(df_wt)
temp$state_full <- factor(ifelse(temp$state_full %in% c('NCLN', 'NCLNCCDC47'), 
                                   'NCLN', as.character(temp$state_full)))
temp$state_full <- factor(ifelse(temp$state_full %in% c('NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLNTRAP', as.character(temp$state_full)))
temp$ld_state_full <- factor(ifelse(temp$ld_state_full %in% c('NCLN', 'NCLNCCDC47'), 
                                   'NCLN', as.character(temp$ld_state_full)))
temp$ld_state_full <- factor(ifelse(temp$ld_state_full %in% c('NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLNTRAP', as.character(temp$ld_state_full)))

model_lead_state_extended <- mblogit(ld_state_full ~ state_full, data = temp, random = ~1|tomogram,
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_lead_state_extended)


Iteration 1 - deviance = 293406 - criterion = 0.7800511
Iteration 2 - deviance = 276018.8 - criterion = 0.1559336
Iteration 3 - deviance = 274296.1 - criterion = 0.01640992
Iteration 4 - deviance = 274177 - criterion = 0.002924627
Iteration 5 - deviance = 274414.1 - criterion = 0.0006417718
Iteration 6 - deviance = 274704.5 - criterion = 0.0001012667
Iteration 7 - deviance = 274917.9 - criterion = 2.713744e-05
Iteration 8 - deviance = 275050.2 - criterion = 1.43921e-05
Iteration 9 - deviance = 275129 - criterion = 9.646623e-06
Iteration 10 - deviance = 275176.1 - criterion = 4.144139e-06
Iteration 11 - deviance = 275204.2 - criterion = 1.578939e-06
Iteration 12 - deviance = 275220.9 - criterion = 5.729397e-07
Iteration 13 - deviance = 275230.8 - criterion = 2.035648e-07
Iteration 14 - deviance = 275236.6 - criterion = 7.163148e-08
Iteration 15 - deviance = 275240 - criterion = 2.509537e-08
Iteration 16 - deviance = 275242.1 - criterion = 8.774939e-09
converged



Call:
mblogit(formula = ld_state_full ~ state_full, data = temp, random = ~1 | 
    tomogram, control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for NCLNTRAP vs NCLN:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -1.07921    0.05930 -18.198  < 2e-16 ***
state_fullNCLNTRAP  0.12531    0.07971   1.572    0.116    
state_fullOST       1.08753    0.07946  13.686  < 2e-16 ***
state_fullSol       0.70394    0.16935   4.157 3.23e-05 ***
state_fullTRAP      0.64146    0.10251   6.257 3.92e-10 ***
state_fullUnk       0.30226    0.07717   3.917 8.98e-05 ***

Equation for non vs NCLN:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)         1.13281    0.02883   39.29   <2e-16 ***
state_fullNCLNTRAP  0.57630    0.04886   11.79   <2e-16 ***
state_fullOST       2.91564    0.05434   53.65   <2e-16 ***
state_fullSol       4.24245    0.11064   38.34   <2e-16 ***
state_fullTRAP      1.39484    0.06726   20.74   <2e-16 ***
state_fu

In [31]:
# print the calculated probailities from the model
fit.eff <- Effect.mblogit(model_lead_state_extended, 'state_full')
prediction <- data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
row.names(prediction) <- fit.eff$predictors
prediction

# and write to csv file
write.csv(prediction, "../data/results/mblogit_leading-polysome-translocon_extended.csv", row.names=TRUE)

Unnamed: 0,prob.NCLN,prob.NCLNTRAP,prob.non,prob.OST,prob.Sol,prob.TRAP,prob.Unk,L.prob.NCLN,L.prob.NCLNTRAP,L.prob.non,...,L.prob.Sol,L.prob.TRAP,L.prob.Unk,U.prob.NCLN,U.prob.NCLNTRAP,U.prob.non,U.prob.OST,U.prob.Sol,U.prob.TRAP,U.prob.Unk
NCLN,0.193379214,0.065722898,0.6003216,0.04298666,0.005784345,0.018564485,0.07324082,0.184975774,0.059404469,0.5887923,...,0.004389541,0.015835213,0.06744157,0.202069771,0.072661458,0.6117407,0.04783042,0.007618964,0.021753763,0.07949625
NCLNTRAP,0.124684563,0.048033376,0.6887669,0.05759843,0.006239612,0.015624248,0.05905287,0.116589506,0.042520748,0.6762566,...,0.004584355,0.012806012,0.05319749,0.133256895,0.05422022,0.7010075,0.0641477,0.008487433,0.019050728,0.06550816
OST,0.010570555,0.010658896,0.605787,0.30476646,0.003679033,0.021912257,0.04262584,0.009657688,0.009563402,0.5949893,...,0.003142665,0.020228538,0.04033215,0.0115687,0.011878374,0.6164822,0.31578436,0.00430655,0.023732726,0.04504386
Sol,0.003660177,0.002514921,0.7905844,0.01817163,0.092105531,0.007471344,0.08549195,0.002970614,0.001960236,0.7827277,...,0.086058601,0.006414237,0.0810861,0.004509083,0.003226058,0.7982303,0.02010942,0.098531544,0.008701143,0.09011372
TRAP,0.04510559,0.02911523,0.5649061,0.21961047,0.013224787,0.039840945,0.08819688,0.04029026,0.025078898,0.550539,...,0.010716111,0.035226372,0.08113534,0.050466166,0.033778682,0.579165,0.23252546,0.01631107,0.045031802,0.09580892
Unk,0.020782139,0.009555784,0.7422129,0.06047281,0.0439977,0.012757047,0.11022162,0.019422799,0.008497857,0.7352384,...,0.040885496,0.011537366,0.1056367,0.022234458,0.010743989,0.7490663,0.06409285,0.047335113,0.014103827,0.11497996


In [32]:
temp <- data.frame(df_wt)
temp$state_full <- factor(ifelse(temp$state_full %in% c('NCLN', 'NCLNCCDC47'), 
                                   'NCLN', as.character(temp$state_full)))
temp$state_full <- factor(ifelse(temp$state_full %in% c('NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLNTRAP', as.character(temp$state_full)))
temp$tr_state_full <- factor(ifelse(temp$tr_state_full %in% c('NCLN', 'NCLNCCDC47'), 
                                   'NCLN', as.character(temp$tr_state_full)))
temp$tr_state_full <- factor(ifelse(temp$tr_state_full %in% c('NCLNTRAP', 'NCLNTRAPCCDC47'), 
                                   'NCLNTRAP', as.character(temp$tr_state_full)))

model_trail_state_extended <- mblogit(tr_state_full ~ state_full, data = temp, random = ~1|tomogram,
                       control=mmclogit.control(epsilon = 1e-08, maxit = 100))
summary(model_trail_state_extended)


Iteration 1 - deviance = 293800.8 - criterion = 0.7832841
Iteration 2 - deviance = 276338.5 - criterion = 0.165574
Iteration 3 - deviance = 274527 - criterion = 0.01570048
Iteration 4 - deviance = 274326.1 - criterion = 0.004602206
Iteration 5 - deviance = 274618 - criterion = 0.0007112536
Iteration 6 - deviance = 274941.5 - criterion = 0.0001902867
Iteration 7 - deviance = 275197.2 - criterion = 4.05049e-05
Iteration 8 - deviance = 275358.3 - criterion = 8.618248e-06
Iteration 9 - deviance = 275445.9 - criterion = 7.534617e-06
Iteration 10 - deviance = 275492 - criterion = 3.818506e-06
Iteration 11 - deviance = 275517.8 - criterion = 1.477356e-06
Iteration 12 - deviance = 275532.6 - criterion = 5.276391e-07
Iteration 13 - deviance = 275541 - criterion = 1.8304e-07
Iteration 14 - deviance = 275545.9 - criterion = 6.272607e-08
Iteration 15 - deviance = 275548.7 - criterion = 2.137976e-08
Iteration 16 - deviance = 275550.4 - criterion = 7.271116e-09
converged



Call:
mblogit(formula = tr_state_full ~ state_full, data = temp, random = ~1 | 
    tomogram, control = mmclogit.control(epsilon = 1e-08, maxit = 100))

Equation for NCLNTRAP vs NCLN:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -0.92253    0.05838 -15.803  < 2e-16 ***
state_fullNCLNTRAP  0.12703    0.07982   1.591 0.111523    
state_fullOST       0.75144    0.08550   8.789  < 2e-16 ***
state_fullSol       0.56179    0.21176   2.653 0.007978 ** 
state_fullTRAP      0.27529    0.13220   2.082 0.037303 *  
state_fullUnk       0.27533    0.07974   3.453 0.000555 ***

Equation for non vs NCLN:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)         1.01598    0.02926   34.72   <2e-16 ***
state_fullNCLNTRAP  0.80550    0.05162   15.60   <2e-16 ***
state_fullOST       3.18381    0.05847   54.45   <2e-16 ***
state_fullSol       4.90497    0.14060   34.89   <2e-16 ***
state_fullTRAP      2.19989    0.08284   26.55   <2e-16 ***
state_fu

In [33]:
# print the calculated probailities from the model
fit.eff <- Effect.mblogit(model_trail_state_extended, 'state_full')
prediction <- data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
row.names(prediction) <- fit.eff$predictors
prediction

# and write to csv file
write.csv(prediction, "../data/results/mblogit_trailing-polysome-translocon_extended.csv", row.names=TRUE)

Unnamed: 0,prob.NCLN,prob.NCLNTRAP,prob.non,prob.OST,prob.Sol,prob.TRAP,prob.Unk,L.prob.NCLN,L.prob.NCLNTRAP,L.prob.non,...,L.prob.Sol,L.prob.TRAP,L.prob.Unk,U.prob.NCLN,U.prob.NCLNTRAP,U.prob.non,U.prob.OST,U.prob.Sol,U.prob.TRAP,U.prob.Unk
NCLN,0.194658271,0.077379311,0.5376594,0.05270213,0.009910904,0.031716049,0.09597391,0.186159852,0.070161327,0.525666,...,0.007995849,0.0278971,0.08926147,0.203447668,0.085271761,0.5496094,0.058189631,0.01227895,0.036038408,0.10313396
NCLNTRAP,0.105982108,0.047835682,0.6550758,0.082454048,0.010347931,0.031952512,0.06635188,0.098444061,0.042266589,0.6419005,...,0.008105354,0.027619033,0.06012502,0.114024355,0.054097118,0.6680171,0.090486774,0.01320272,0.036940092,0.07317341
OST,0.008753796,0.007377272,0.5836359,0.305741461,0.009688665,0.031034541,0.05376836,0.007925342,0.006520413,0.5726081,...,0.008731224,0.028748328,0.05112272,0.009668008,0.008345786,0.5945807,0.31718051,0.01074996,0.033496294,0.05654277
Sol,0.002187795,0.001525252,0.8155368,0.006993035,0.095568922,0.003578897,0.07460933,0.001671682,0.001117421,0.8082923,...,0.089936413,0.002897925,0.07065114,0.002862796,0.002081622,0.8225675,0.008149008,0.10151483,0.004419179,0.07877049
TRAP,0.026538402,0.013892616,0.6614681,0.153023942,0.027496124,0.038589814,0.07899101,0.022898336,0.01133071,0.6479935,...,0.023685157,0.03404012,0.07240412,0.030738912,0.017023802,0.6746858,0.163904343,0.03190024,0.043720083,0.0861215
Unk,0.015983469,0.008367536,0.7523053,0.048054795,0.051401848,0.013988398,0.10989864,0.014796814,0.007392599,0.7454701,...,0.04820922,0.012665509,0.10543345,0.017263622,0.009469821,0.7590163,0.051187792,0.05479373,0.0154473,0.11452873
