# Generating docx Tables for Word

## An Example Document
Here you see the pdf of the sample word document generated by the code explained below.

## Building Tables for MS Word

First we load some data, generate variable labels, and prepare the data.

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
import pyfixest as pf
import statsmodels.formula.api as smf
import maketables as mt

# Load sample dataset
df = pd.read_csv("data/salaries.csv")

# Set variable labels
# Define variable labels
labels = {
    "logwage": "ln(Wage)",
    "wage": "Wage",
    "age": "Age",
    "female": "Female",
    "tenure": "Years of Tenure",
    "occupation": "Occupation",
    "worker_type": "Worker Type",
    "education": "Education Level",
    "promoted": "Promotion"
}

# Set default labels 
mt.MTable.DEFAULT_LABELS = labels

# Generate a categorical variable for gender from the dummy variable
df["gender"] = df["female"].map({0: "Male", 1: "Female"})

We generate a table with descriptive statistics using `DTable`:

In [2]:
# Descriptive statistics
tab1 =   mt.DTable(df, vars=["wage", "age", "tenure"],
                       bycol=["worker_type"], byrow="gender",
                       stats=["count", "mean", "std"],
                       caption="Descriptive statistics by worker type and gender",
                       format_spec = {'mean': ',.2f', 'std': '.2f',})



## Using the `update_docx` method

You can save the table to a word document with `tab1.save(type="docx", file_name="../docs/PaperTest1.docx")`, but the most convenient way to work with word documents is `update_docx`which:
- Checks whether the file with the passed name exists, and if not creates a new word document and adds the table.

- If the file exists,  updates the respective table at the position specified with `tab_num`. That is `tab_num=3` replaces the third table in the existing document with the table. When there is not yet a third table, the table is just appended at the end of the docment. 

- Each time you run the code the table is updated without changing other content of the word document. So you can write our paper or thesis and again run the code which does not affect your text, but updates the table.

- Note: With `show=True` you can also display the table on the screen at the same time as updating the document for instance when you want to inspect it in a jupyter notebook or qmd file. 

In [3]:
# Fill/update the first table in the document to display the descriptive statistics:
tab1.update_docx(file_name="docs/WordOutput.docx", tab_num=1, show=False)

Now we can add for instance a regression table using PyFixest:

In [4]:
# Here we use (py)fixest's stepwise notation to estimate several regressions in one go
# And directly generate a regression table with the results
tab2=mt.ETable(pf.feols("logwage+wage~ age + female + sw0(age:female)", data=df),
                caption="Wage regressions")

# Fill/update the second table in the document
tab2.update_docx(file_name="docs/WordOutput.docx", tab_num=2, show=False)

And add a further table where we now estimate a probit using Statsmodels.

In [5]:

# Fit your models 
est1 = smf.ols("promoted ~ tenure + female + worker_type", data=df).fit()
est2 = smf.probit("promoted ~ tenure + female + worker_type", data=df).fit(disp=0)

# Make the table
tab3= mt.ETable([est1, est2],
                keep=["tenure", "female", "worker_type"],
                model_stats=["N","r2","pseudo_r2"],
                model_heads=["OLS","Probit"],
                caption="Prediting Promotions")

# Fill/update the third table in the document
tab3.update_docx(file_name="docs/WordOutput.docx", tab_num=3, show=False)

Note that the code also automatically sets word Labels to the tables that allow standard word [cross references](https://support.microsoft.com/en-us/office/create-a-cross-reference-300b208c-e45a-487a-880b-a02767d9774b). When you open the created document in word, just mark the whole text (with Ctrl + A) and press F9 and the table numbers are updated and you can add cross references. 