# Fitting maximum entropy models - Berger machine translation example

Copyright (c) 2003-2017 Ed Schofield

## Example use of the maximum entropy module:

Machine translation example -- English to French -- from the paper 'A
maximum entropy approach to natural language processing' by Berger et
al., 1996.

Consider the translation of the English word 'in' into French.  We
notice in a corpus of parallel texts the following facts:

    (1)    p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1
    (2)    p(dans) + p(en) = 3/10
    (3)    p(dans) + p(à)  = 1/2

This code finds the probability distribution with maximal entropy
subject to these constraints.


In [22]:
import maxentropy.skmaxent

a_grave = u'\u00e0'

samplespace = ['dans', 'en', a_grave, 'au cours de', 'pendant']



In [23]:
def f0(x):
    return x in samplespace

def f1(x):
    return x=='dans' or x=='en'

def f2(x):
    return x=='dans' or x==a_grave

f = [f0, f1, f2]

In [24]:
model = maxentropy.skmaxent.MinDivergenceModel(f, samplespace, vectorized=False, verbose=True)

Computing feature 0 of 3 ...
Computing feature 1 of 3 ...
Computing feature 2 of 3 ...


In [44]:
# Now set the desired feature expectations
K = [1.0, 0.3, 0.5]

In [36]:
import numpy as np

In [37]:
np.array(K, ndmin=2)

array([[ 1.     ,  0.31111,  0.5    ]])

In [38]:
# Fit the model
model.fit(np.array(K, ndmin=2))

Grad eval #11
  norm of gradient = 0.011109957106591984
Function eval # 16
  dual is  1.56496346393
Function eval # 17
  dual is  1.56485301222
Grad eval #12
  norm of gradient = 0.008780776567546288
Function eval # 18
  dual is  1.56467162285
Grad eval #13
  norm of gradient = 0.0019147257193012923
Iteration # 5
Function eval # 19
  dual is  1.56467162285
Function eval # 20
  dual is  1.56466844082
Grad eval #14
  norm of gradient = 0.0014494133675170145
Function eval # 21
  dual is  1.56466424222
Grad eval #15
  norm of gradient = 9.538409218778289e-05
Iteration # 6
Function eval # 22
  dual is  1.56466424222
Function eval # 23
  dual is  1.56466423416
Grad eval #16
  norm of gradient = 7.377016865671103e-05
Function eval # 24
  dual is  1.56466422224
Grad eval #17
  norm of gradient = 1.6959382098134323e-05
Iteration # 7
Function eval # 25
  dual is  1.56466422224
Function eval # 26
  dual is  1.56466422199
Grad eval #18
  norm of gradient = 1.3067243115138425e-05
Function eval # 27

MinDivergenceModel(algorithm='CG',
          features=[<function f0 at 0x11719aea0>, <function f1 at 0x11719a158>, <function f2 at 0x11719a048>],
          matrix_format='csr_matrix', prior_log_probs=None,
          samplespace=['dans', 'en', 'à', 'au cours de', 'pendant'],
          vectorized=False, verbose=True)

In [39]:
model.F.todense()

matrix([[ 1.,  1.,  1.,  1.,  1.],
        [ 1.,  1.,  0.,  0.,  0.],
        [ 1.,  0.,  1.,  0.,  0.]])

In [40]:
# Output the distribution
print("\nFitted model parameters are:\n" + str(model.params))


Fitted model parameters are:
[ -8.78685714e-16  -4.71177354e-01   4.79745017e-01]


In [41]:
print("\nFitted distribution is:")
p = model.probdist()
for j in range(len(model.samplespace)):
    x = model.samplespace[j]
    print("\tx = %-15s" %(x + ":",) + " p(x) = "+str(p[j]))


Fitted distribution is:
	x = dans:           p(x) = 0.192168808772
	x = en:             p(x) = 0.118941191424
	x = à:              p(x) = 0.307831191302
	x = au cours de:    p(x) = 0.190529404251
	x = pendant:        p(x) = 0.190529404251


In [42]:
# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("\tp['dans'] + p['en'] = 0.3")
print("\tp['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print("\tp['dans'] + p['en'] =", p[0] + p[1])
print("\tp['dans'] + p['à']  = " + str(p[0]+p[2]))
# (Or substitute "x.encode('latin-1')" if you have a primitive terminal.)


Desired constraints:
	p['dans'] + p['en'] = 0.3
	p['dans'] + p['à']  = 0.5

Actual expectations under the fitted model:
	p['dans'] + p['en'] = 0.311110000196
	p['dans'] + p['à']  = 0.500000000074


In [43]:
import numpy as np
np.allclose(model.expectations(), K)

True