# Sync your Juputer notebook with your Latex project

Interactive notebooks such as Jupyter and Google Colab are getting more and more popular recently. This is due to their simplicity, power and friendly user interface. As a result, they are being extensively used in scientific writing. Through this guide I am going to present some easy steps that will help to automate the transfer of information between notebooks and Latex. 

What you need:
* Jupyter
* Github account

Throughout this guide I am using a regression example. Scientists are known to play around with different regression specifications which makes the transfer of the results very time consuming or, on the other hand a source of miscommunication between teams that work on the same project. Additionally, the final part will outline the process of automatically writing regression equations based on the used features in order to avoid mistakes and speed up the process.

The idea is to use functions that automatically print information in Latex format.

## Step 1
* Create a new GitHub project.
* Clone the GitHub directory locally.
* Copy or create a new Latex project in the cloned directory.

Sample .tex file:

## Step 2
* Create a Jupyter notebook in the cloned directory.
* Load packages
* Run the analysis and use the helper function -print_results- to print the results in Latex format.

In [3]:
import statsmodels.api as sm
import pdfkit
from IPython.display import Markdown as md
from bs4 import BeautifulSoup
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

In [15]:
def print_results(model, task = ''):
    print('\subsection{}'.format(task))
    results_summary = model.summary()

    results_as_html = results_summary.tables[0].as_html()
    print(pd.read_html(results_as_html, header=0, index_col=0)[0].to_latex())
    
    results_as_html = results_summary.tables[1].as_html()
    res = pd.read_html(results_as_html, header=0, index_col=0)[0]
    if 'z' in res.columns:
        res = res[['coef', 'std err', 'z', 'P>|z|']]
        res.columns = ['coef', 'std err', 'z', 'P-value']
    else: 
        res = res[['coef', 'std err', 't', 'P>|t|']]
        res.columns = ['coef', 'std err', 't', 'P-value']
    print(res.to_latex())

In [4]:
data = pd.read_csv('sample.csv')

In [9]:
cols = ['char_count', 'word_count', 'word_density', 'punctuation_count', 'title_word_count']
y = 'retweet_count'

In [11]:
model = sm.OLS.from_formula('{} ~ {}'.format(y, '+'.join(cols)), data=data[cols+[y]].dropna()).fit()

In [16]:
print_results(model, 'Regression 1')

\subsectionRegression 1
\begin{tabular}{lllr}
\toprule
{} &     retweet\_count &           R-squared: &        0.009 \\
Dep. Variable:    &                   &                      &              \\
\midrule
Model:            &               OLS &      Adj. R-squared: &        0.009 \\
Method:           &     Least Squares &         F-statistic: &      377.200 \\
Date:             &  Mon, 28 Sep 2020 &  Prob (F-statistic): &        0.000 \\
Time:             &          12:25:07 &      Log-Likelihood: & -2166100.000 \\
No. Observations: &            200000 &                 AIC: &  4332000.000 \\
Df Residuals:     &            199994 &                 BIC: &  4332000.000 \\
Df Model:         &                 5 &                  NaN &          NaN \\
Covariance Type:  &         nonrobust &                  NaN &          NaN \\
\bottomrule
\end{tabular}

\begin{tabular}{lrrrr}
\toprule
{} &        coef &  std err &       t &  P-value \\
\midrule
Intercept         &  12200.0000 &  338.8

## Step 3
Transform the notebook output into a .tex file.
1. Use nbconvert to convert into html
2. Filter out input cells
3. Use BeautifulSoup to find all the cells that print output
4. Write output to a .tex file. (Since the printed output follows the Latex format there is no need for a converter

In [17]:
!jupyter nbconvert JupyterLatex.ipynb --to html

[NbConvertApp] Converting notebook JupyterLatex.ipynb to html
[NbConvertApp] Writing 286042 bytes to JupyterLatex.html


In [21]:
FILE = "JupyterLatex.html"
directory = ''
output = directory+'output.tex'

with open(FILE, 'r') as html_file:
    content = html_file.read()

content = content.replace("div.input_area {","div.input_area {\n\tdisplay: none;")    
content = content.replace(".prompt {",".prompt {\n\tdisplay: none;")

In [22]:
soup = BeautifulSoup(content)

s = ''
for i in soup.find_all('div',{'class':'output_subarea output_stream output_stdout output_text'}):
    s+=i.text.split('Warnings')[0]

text_file = open(output, "w")
n = text_file.write(s)
text_file.close()

## Step 4
Tweak the main.tex file to include the results.

## Step 5
Commit changes to GitHub. To be able to commit changes within the notebook, the notebook must be at the same directory as the project.

In [24]:
!git pull
!git add .
!git commit -m changes
!git push

Already up to date.
[master e3f64b0] changes
 2 files changed, 200046 insertions(+), 200002 deletions(-)
 rewrite sample.csv (87%)
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 4 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 2.24 MiB | 711.00 KiB/s, done.
Total 4 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.[K
To https://github.com/aphoti01/JupyterLatex.git
   cd01193..e3f64b0  master -> master
