# Fitting large maximum entropy models with simulation - Berger machine translation example

## Here is an example with simulation on a tiny problem

It demonstrates how to use simulation conceptually and the API of `maxentropy`.

As in `example_berger.py`, this is the machine translation example
-- English to French -- from the paper 'A maximum entropy approach
to natural language processing' by Berger et
al., 1996.

Consider the translation of the English word 'in' into French.  We
notice in a corpus of parallel texts the following facts:

    (1)    p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1
    (2)    p(dans) + p(en) = 3/10
    (3)    p(dans) + p(à)  = 1/2

This code finds the probability distribution with maximal entropy
subject to these constraints **without enumerating the sample space**,
using importance sampling instead.

This is way overkill for this tiny problem (which can be solved analytically),
but it demonstrates how to use simulation in principle to solve larger problems
on a continuous or larger discrete sample space.

In [1]:
from __future__ import print_function

import sys

import maxentropy
from maxentropy.maxentutils import dictsampler

In [2]:
import numpy as np

In [3]:
samplespace = ['dans', 'en', 'à', 'au cours de', 'pendant']

In [4]:
@np.vectorize
def f0(x):
    return x in samplespace

@np.vectorize
def f1(x):
    return x == 'dans' or x == 'en'

@np.vectorize
def f2(x):
    return x == 'dans' or x == 'à'

f = [f0, f1, f2]

In [5]:
f0('dans')

array(True)

In [6]:
# Define a uniform instrumental distribution for sampling
samplefreq = {e: 1 for e in samplespace}

In [7]:
auxiliary_sampler = dictsampler(samplefreq, size=10**5, return_probs='logprob')

In [8]:
next(auxiliary_sampler)

(array(['à', 'au cours de', 'pendant', ..., 'dans', 'à', 'pendant'],
       dtype=object),
 array([-1.60943791, -1.60943791, -1.60943791, ..., -1.60943791,
        -1.60943791, -1.60943791]))

In [36]:
model = maxentropy.BigModel(f, auxiliary_sampler)

In [10]:
# Default: model.algorithm = 'CG'
# Can choose from ['CG', 'BFGS', 'LBFGSB', 'Powell', 'Nelder-Mead']

In [11]:
# Now set the desired feature expectations
K = [1.0, 0.3, 0.5]

In [37]:
model.verbose = True

# Fit the model
# model.avegtol = 1e-5
model.fit(K)

Grad eval #0
  norm of gradient = 0.14579623177572185
Function eval # 0
  dual is  1.6094379124340996
Function eval # 1
  dual is  1.5903213447586424
Grad eval #1
  norm of gradient = 0.11644689345025408
Function eval # 2
  dual is  1.5564066000530865
Grad eval #2
  norm of gradient = 0.00647132555535496
Iteration # 0
Function eval # 3
  dual is  1.5564066000530865
Function eval # 4
  dual is  1.556370150276239
Grad eval #3
  norm of gradient = 0.004797540291035025
Function eval # 5
  dual is  1.5563257182949286
Grad eval #4
  norm of gradient = 0.0007113168828275429
Iteration # 1
Function eval # 6
  dual is  1.5563257182949286
Function eval # 7
  dual is  1.556325262666814
Grad eval #5
  norm of gradient = 0.0005697037450739942
Function eval # 8
  dual is  1.5563244471244049
Grad eval #6
  norm of gradient = 3.8747927192612445e-06
Iteration # 2
Function eval # 9
  dual is  1.5563244471244049
Function eval # 10
  dual is  1.5563244471108282
Grad eval #7
  norm of gradient = 3.132493153

In [38]:
# Output the true distribution
print("Fitted model parameters are:")
model.params

Fitted model parameters are:


array([ 2.79112485e-12, -5.40133852e-01,  4.96790840e-01])

In [39]:
smallmodel = maxentropy.Model(f, samplespace)
smallmodel.setparams(model.params)

In [40]:
print("\nFitted distribution is:")
smallmodel.showdist()


Fitted distribution is:
	x = dans            	p(x) = 0.1847
	x = en              	p(x) = 0.1124
	x = à               	p(x) = 0.3170
	x = au cours de     	p(x) = 0.1929
	x = pendant         	p(x) = 0.1929


In [41]:
p = smallmodel.probdist()

In [42]:
# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("\tp['dans'] + p['en'] = 0.3")
print("\tp['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print("\tp['dans'] + p['en'] =", p[0] + p[1])
print("\tp['dans'] + p['à']  = " + str(p[0]+p[2]))

print("\nEstimated error in constraint satisfaction (should be close to 0):\n"
        + str(abs(model.expectations() - K)))
print("\nTrue error in constraint satisfaction (should be close to 0):\n" +
        str(abs(smallmodel.expectations() - K)))


Desired constraints:
	p['dans'] + p['en'] = 0.3
	p['dans'] + p['à']  = 0.5

Actual expectations under the fitted model:
	p['dans'] + p['en'] = 0.29713439115856966
	p['dans'] + p['à']  = 0.5017701049704802

Estimated error in constraint satisfaction (should be close to 0):
[1.31894495e-13 4.47697435e-13 2.07722728e-12]

True error in constraint satisfaction (should be close to 0):
[2.22044605e-16 2.86560884e-03 1.77010497e-03]
