This is my undergraduate thesis, which contains part of the code and experimental results.
I found that if there is no penalty in training LR, the LR will never converge, which means that the loss function will decrease continually but never arrive in 0.
Also interesting, the L2 of parameters in LR will keep increasing in training.
When I draw this parameters, it shows that training can reduce the loss function(SSE is decreasing), but it is actually useless work(have already found the separation boundary).
The result in SVM, there is no situation above in SVM. Considering that SVM's hinge loss can descrease to 0, and in the linearly totally sperable situation, SVM can quickly find the seperabel boundary.
I also provide the paper.ipynb, which is already runned. (The picture's