# Fitting maximum entropy models - Berger machine translation example

## Example use of the maximum entropy module:

Machine translation example -- English to French -- from the paper 'A
maximum entropy approach to natural language processing' by Berger et
al., 1996.

Consider the translation of the English word 'in' into French.  We
notice in a corpus of parallel texts the following facts:

    (1)    p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1
    (2)    p(dans) + p(en) = 3/10
    (3)    p(dans) + p(à)  = 1/2

This code finds the probability distribution with maximal entropy
subject to these constraints.


In [1]:
import maxentropy.skmaxent

a_grave = u'\u00e0'

samplespace = ['dans', 'en', a_grave, 'au cours de', 'pendant']



In [2]:
def f0(x):
    return x in samplespace

def f1(x):
    return x=='dans' or x=='en'

def f2(x):
    return x=='dans' or x==a_grave

f = [f0, f1, f2]

In [3]:
model = maxentropy.skmaxent.MinDivergenceModel(f, samplespace, vectorized=False, verbose=True)

Computing feature 0 of 3 ...
Computing feature 1 of 3 ...
Computing feature 2 of 3 ...


In [4]:
# Now set the desired feature expectations
K = [1.0, 0.3, 0.5]

In [5]:
import numpy as np

In [6]:
np.array(K, ndmin=2)

array([[1. , 0.3, 0.5]])

In [7]:
# Fit the model
model.fit(np.array(K, ndmin=2))

Grad eval #0
  norm of gradient = 0.1414213562373095
Function eval # 0
  dual is  1.6094379124341003
Function eval # 1
  dual is  1.5914375789899213
Grad eval #1
  norm of gradient = 0.11314687047891211
Function eval # 2
  dual is  1.559227944266887
Grad eval #2
  norm of gradient = 0.006971629366968201
Iteration # 0
Function eval # 3
  dual is  1.559227944266887
Function eval # 4
  dual is  1.5591856277233611
Grad eval #3
  norm of gradient = 0.005172758674855188
Function eval # 5
  dual is  1.5591338601935893
Grad eval #4
  norm of gradient = 0.0008214255815072707
Iteration # 1
Function eval # 6
  dual is  1.5591338601935893
Function eval # 7
  dual is  1.5591332526793276
Grad eval #5
  norm of gradient = 0.0006576606967423743
Function eval # 8
  dual is  1.5591321673085197
Grad eval #6
  norm of gradient = 3.6663277937352617e-06
Iteration # 2
Function eval # 9
  dual is  1.5591321673085197
Function eval # 10
  dual is  1.5591321672964076
Grad eval #7
  norm of gradient = 2.942583870

MinDivergenceModel(algorithm='CG',
                   features=[<function f0 at 0x11c0231e0>,
                             <function f1 at 0x11c023268>,
                             <function f2 at 0x11c0232f0>],
                   matrix_format='csr_matrix', prior_log_probs=None,
                   samplespace=['dans', 'en', 'à', 'au cours de', 'pendant'],
                   vectorized=False, verbose=True)

In [14]:
model.F.todense()

matrix([[1., 1., 1., 1., 1.],
        [1., 1., 0., 0., 0.],
        [1., 0., 1., 0., 0.]])

In [15]:
# Output the distribution
print("\nFitted model parameters are:\n" + str(model.params))


Fitted model parameters are:
[ 6.88841437e-16 -5.24869398e-01  4.87527722e-01]


In [16]:
print("\nFitted distribution is:")
p = model.probdist()
for j in range(len(model.samplespace)):
    x = model.samplespace[j]
    print("\tx = %-15s" %(x + ":",) + " p(x) = "+str(p[j]))


Fitted distribution is:
	x = dans:           p(x) = 0.1858571548328559
	x = en:             p(x) = 0.11414284287637219
	x = à:              p(x) = 0.314142841693927
	x = au cours de:    p(x) = 0.1929285802984224
	x = pendant:        p(x) = 0.1929285802984224


In [17]:
# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("\tp['dans'] + p['en'] = 0.3")
print("\tp['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print("\tp['dans'] + p['en'] =", p[0] + p[1])
print("\tp['dans'] + p['à']  = " + str(p[0]+p[2]))
# (Or substitute "x.encode('latin-1')" if you have a primitive terminal.)


Desired constraints:
	p['dans'] + p['en'] = 0.3
	p['dans'] + p['à']  = 0.5

Actual expectations under the fitted model:
	p['dans'] + p['en'] = 0.2999999977092281
	p['dans'] + p['à']  = 0.49999999652678284


In [18]:
import numpy as np
np.allclose(model.expectations(), K)

True