# Fitting maximum entropy models - Berger machine translation example

Copyright (c) 2003-2017 Ed Schofield

## Example use of the maximum entropy module:

Machine translation example -- English to French -- from the paper 'A
maximum entropy approach to natural language processing' by Berger et
al., 1996.

Consider the translation of the English word 'in' into French.  We
notice in a corpus of parallel texts the following facts:

    (1)    p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1
    (2)    p(dans) + p(en) = 3/10
    (3)    p(dans) + p(à)  = 1/2

This code finds the probability distribution with maximal entropy
subject to these constraints.


In [12]:
import maxentropy

a_grave = u'\u00e0'

samplespace = ['dans', 'en', a_grave, 'au cours de', 'pendant']

model = maxentropy.Model(samplespace)
model.verbose = True

In [13]:
def f0(x):
    return x in samplespace

def f1(x):
    return x=='dans' or x=='en'

def f2(x):
    return x=='dans' or x==a_grave

f = [f0, f1, f2]

# Now set the desired feature expectations
K = [1.0, 0.3, 0.5]

In [14]:
# Fit the model
model.fit(f, K)

Grad eval #0
  norm of gradient = 0.1414213562373095
Function eval # 0
  dual is  1.60943791243
Function eval # 1
  dual is  1.59143757899
Grad eval #1
  norm of gradient = 0.11314687047891211
Function eval # 2
  dual is  1.55922794427
Grad eval #2
  norm of gradient = 0.006971629366968201
Iteration # 0
Function eval # 3
  dual is  1.55922794427
Function eval # 4
  dual is  1.55918562772
Grad eval #3
  norm of gradient = 0.005172758674855188
Function eval # 5
  dual is  1.55913386019
Grad eval #4
  norm of gradient = 0.0008214255815072707
Iteration # 1
Function eval # 6
  dual is  1.55913386019
Function eval # 7
  dual is  1.55913325268
Grad eval #5
  norm of gradient = 0.0006576606967423743
Function eval # 8
  dual is  1.55913216731
Grad eval #6
  norm of gradient = 3.6663277937352617e-06
Iteration # 2
Function eval # 9
  dual is  1.55913216731
Function eval # 10
  dual is  1.5591321673
Grad eval #7
  norm of gradient = 2.942583870445164e-06
Function eval # 11
  dual is  1.55913216727

In [16]:
model.F.todense()

matrix([[ 1.,  1.,  1.,  1.,  1.],
        [ 1.,  1.,  0.,  0.,  0.],
        [ 1.,  0.,  1.,  0.,  0.]])

In [17]:
# Output the distribution
print("\nFitted model parameters are:\n" + str(model.params))


Fitted model parameters are:
[  6.49522780e-16  -5.24869186e-01   4.87527740e-01]


In [18]:
print("\nFitted distribution is:")
p = model.probdist()
for j in range(len(model.samplespace)):
    x = model.samplespace[j]
    print("\tx = %-15s" %(x + ":",) + " p(x) = "+str(p[j]))


Fitted distribution is:
	x = dans:           p(x) = 0.185857184119
	x = en:             p(x) = 0.114142858774
	x = à:              p(x) = 0.314142824584
	x = au cours de:    p(x) = 0.192928566261
	x = pendant:        p(x) = 0.192928566261


In [10]:
# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("\tp['dans'] + p['en'] = 0.3")
print("\tp['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print("\tp['dans'] + p['en'] =", p[0] + p[1])
print("\tp['dans'] + p['à']  = " + str(p[0]+p[2]))
# (Or substitute "x.encode('latin-1')" if you have a primitive terminal.)


Desired constraints:
	p['dans'] + p['en'] = 0.3
	p['dans'] + p['à']  = 0.5

Actual expectations under the fitted model:
	p['dans'] + p['en'] = 0.300000042893
	p['dans'] + p['à']  = 0.500000008703
