# Berger machine translation example

Machine translation example -- English to French -- from the paper 'A
maximum entropy approach to natural language processing' by Berger et
al., 1996.

Consider the translation of the English word 'in' into French.  We
notice in a corpus of parallel texts the following facts:

    (1)    p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1
    (2)    p(dans) + p(en) = 3/10
    (3)    p(dans) + p(à)  = 1/2

This code finds the probability distribution with maximal entropy
subject to these constraints.


In [1]:
import numpy as np
import maxentropy

In [2]:
def f0(x):
    return x in samplespace

def f1(x):
    return x=='dans' or x=='en'

def f2(x):
    return x=='dans' or x=='à'

In [3]:
features = [f0, f1, f2]

samplespace = ['dans', 'en', 'à', 'au cours de', 'pendant']

# Now set the desired feature expectations
target_expectations = [1.0, 0.3, 0.5]

X = np.atleast_2d(target_expectations)

In [4]:
smallmodel = maxentropy.MinDivergenceModel(features, samplespace,
                                           vectorized=False,
                                           verbose=False,
                                           algorithm='BFGS')

In [5]:
# Fit the model
smallmodel.fit(X)

MinDivergenceModel(algorithm='BFGS',
                   features=[<function f0 at 0x10149789d8>,
                             <function f1 at 0x1014978950>,
                             <function f2 at 0x1a150b3bf8>],
                   matrix_format='csr_matrix', prior_log_pdf=None,
                   samplespace=['dans', 'en', 'à', 'au cours de', 'pendant'],
                   vectorized=False, verbose=False)

In [7]:
# How well are the constraints satisfied?
assert np.allclose(X[0, :], smallmodel.expectations())

In [8]:
# Manually test if the constraints are satisfied:
p = smallmodel.probdist()
assert np.isclose(p.sum(), target_expectations[0])
assert np.isclose(p[0] + p[1], target_expectations[1])
assert np.isclose(p[0] + p[2], target_expectations[2])

In [9]:
# Output the distribution
print("\nFitted model parameters are:\n" + str(smallmodel.params))
print("\nFitted distribution is:")
for j, x in enumerate(smallmodel.samplespace):
    print(f"\tx = {x:15s}: p(x) = {p[j]:.4f}")


Fitted model parameters are:
[-9.48248421e-16 -5.24869390e-01  4.87527727e-01]

Fitted distribution is:
	x = dans           : p(x) = 0.1859
	x = en             : p(x) = 0.1141
	x = à              : p(x) = 0.3141
	x = au cours de    : p(x) = 0.1929
	x = pendant        : p(x) = 0.1929


In [11]:
# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("\tp['dans'] + p['en'] = 0.3")
print("\tp['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print("\tp['dans'] + p['en'] =", p[0] + p[1])
print("\tp['dans'] + p['à']  =", p[0] + p[2])


Desired constraints:
	p['dans'] + p['en'] = 0.3
	p['dans'] + p['à']  = 0.5

Actual expectations under the fitted model:
	p['dans'] + p['en'] = 0.29999999965207746
	p['dans'] + p['à']  = 0.4999999981007823


## Simulated version

The `maxentropy` package supports simulation for estimating continuous models or discrete ones that are too large to enumerate (e.g. for whole sentences in a natural language).

Here we repeat the above example with simulation. The problem is tiny but we demonstrate how to use simulation conceptually and the API differences when using simulation with `maxentropy` versus with small discrete models.

The following code finds the probability distribution with maximal entropy subject to the same constraints as above **without enumerating the sample space**, using importance sampling instead.

In [12]:
from maxentropy.utils import dictsampler

In [13]:
def f0(x):
    return x in samplespace

def f1(x):
    return x == 'dans' or x == 'en'

def f2(x):
    return x == 'dans' or x == 'à'

f = [f0, f1, f2]

In [14]:
# Define a uniform instrumental distribution for sampling
samplefreq = {e: 1 for e in samplespace}

In [15]:
auxiliary_sampler = dictsampler(samplefreq, size=10**5)

In [16]:
next(auxiliary_sampler)

(array(['à', 'pendant', 'pendant', ..., 'pendant', 'en', 'dans'],
       dtype=object),
 array([-1.60943791, -1.60943791, -1.60943791, ..., -1.60943791,
        -1.60943791, -1.60943791]))

In [17]:
bigmodel = maxentropy.MCMinDivergenceModel(f, auxiliary_sampler,
                                           vectorized=False,
                                           verbose=False)

In [18]:
# Now set the desired feature expectations
target_expectations = [1.0, 0.3, 0.5]

X = np.atleast_2d(target_expectations)

In [19]:
bigmodel.fit(X)

MCMinDivergenceModel(algorithm='CG',
                     auxiliary_sampler=<method-wrapper '__next__' of generator object at 0x1a1548b408>,
                     feature_functions=None, matrix_format='csc_matrix',
                     prior_log_pdf=None, vectorized=None, verbose=False)

In [20]:
# Output the true distribution
print("Fitted model parameters are:")
bigmodel.params

Fitted model parameters are:


array([ 8.28032816e-12, -5.14381304e-01,  4.78186108e-01])

We can also use discrete model fitted above to evaluate how good these fitted parameters are:

In [22]:
smallmodel.setparams(bigmodel.params)
p = smallmodel.probdist()

In [23]:
print("\nFitted distribution is:")
for j, x in enumerate(samplespace):
    print(f"\tx = {x:15s}: p(x) = {p[j]:.4f}")


Fitted distribution is:
	x = dans           : p(x) = 0.1864
	x = en             : p(x) = 0.1155
	x = à              : p(x) = 0.3117
	x = au cours de    : p(x) = 0.1932
	x = pendant        : p(x) = 0.1932


In [24]:
# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("\tp['dans'] + p['en'] = 0.3")
print("\tp['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print(f"\tp['dans'] + p['en'] = {p[0] + p[1]:.4f}")
print(f"\tp['dans'] + p['à']  = {p[0] + p[2]:.4f}")


Desired constraints:
	p['dans'] + p['en'] = 0.3
	p['dans'] + p['à']  = 0.5

Actual expectations under the fitted model:
	p['dans'] + p['en'] = 0.3019
	p['dans'] + p['à']  = 0.4980


In [27]:
print("\nEstimated error in constraint satisfaction (should be close to 0):\n",
        abs(bigmodel.expectations() - X))
print("\nTrue error in constraint satisfaction (should be close to 0):\n",
        abs(smallmodel.expectations() - X))


Estimated error in constraint satisfaction (should be close to 0):
 [[8.10462808e-14 1.72560798e-08 1.16287774e-08]]

True error in constraint satisfaction (should be close to 0):
 [[0.         0.00187074 0.00195854]]
