-
Notifications
You must be signed in to change notification settings - Fork 2
/
joint_sequential_SATs.py
1462 lines (1214 loc) · 58.8 KB
/
joint_sequential_SATs.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
'''
A joint, additive model for speech acts and topic codes
(and transitions). See paper:
"A Generative Joint, Additive, Sequential Model of Topics and
Speech Acts in Patient-Doctor Communication".
Byron C. Wallace, Thomas A. Trikalinos, M. Barton Laws,
Ira B. Wilson and Eugene Charniak. EMNLP 2013.
for discussion of the model. Parameter inference is done via Newton
optimization and is based largely on the method outlined by
Eisenstein et al. (ICML 2011), however we ignore the variance
component (\tau) here.
Unfortunately, this implementation is rather tightly coupled to
transcripts.py, in the sense that it relies on it for various counts
regarding the data, but it should be possible to modify
to handle other data sources. Moreover, we have not been able to
secure IRB approval to release the actual data :(.
Another note here is that this code (in addition to being rather coupled
to the task of patient-doctor communication) is extremely verbose;
I have avoided all attempts to be clever, favoring explicitness.
This means, however, that there is a lot of redundancy in the code.
Questions, &etc. should be sent to byron_wallace@brown.edu.
'''
import math
import copy
import pdb
import numpy
import scipy
import process_results
''' a few globals. '''
PRETTY_STR = "\n" + "".join(["-"]*30) + "\n"
THRESHOLD = .0001 # arbitrary but seems reasonable
ACC_THRESHOLD = .0025 # ditto
### step size for descent step -- just make sure its small.
emission_gamma = transition_gamma = .025
class JointSequential:
def __init__(self, tnb, joint=True, topics_only=False, transition_interactions=True):
'''
tnb -- this is a JointModel instance; its name ("tnb") is
due to obscure historical reasons ;). See the
transcripts.py module for its definition.
'''
self.tnb = tnb
self.iter = 0
# this indicates whether or not we are using the
# joint -- topic and speech act -- model.
# if not, then if topics_only is True, we assume that
# are interested in modeling the topics (and topic
# transitions)
self.joint = joint
self.topics_only = topics_only # only matters if not joint
# setup \eta vectors for each topic and speech act probabilities;
# also setup the \sigma's for transitions
self.topic_etas = {}
self.topic_proportions = {}
self.speech_act_etas = {}
self.speech_act_proportions = {}
# emission pairs taking into consideration
# topic and speech act
self.topic_sa_interaction_etas = {}
self.dimensions = tnb.get_dimensions()
self.converged_topics = []
self.converged_sas = []
for topic in self.tnb.topic_set:
self.topic_etas[topic] = numpy.zeros(self.dimensions)
self.speech_act_etas = {}
for sa in self.tnb.speech_act_set:
self.speech_act_etas[sa] = numpy.zeros(self.dimensions)
# for deltas in the likelihoods and accuracies
self.prev_ll = -float("inf")
self.diff_ll = float("inf")
self.prev_acc_y = -float("inf")
self.prev_acc_s = -float("inf")
self.acc_diff_y = float("inf")
self.acc_diff_s = float("inf")
self.diff_F = float("inf")
self.prev_F = -float("inf")
# the \Beta's reflect the 'adjusted' distributions
# for the corresponding category.
self.beta_topics = {}
self.beta_sa = {}
# pairs of topics/speech acts to adjusted transition
# probabilities
self.lambda_joint_y = {}
self.lambda_joint_s = {}
# componenents for transitions; analagous to \etas.
# the sigmas are the (exp) additive terms that perturb
# the lambdas (transition probabilities)
self.sigma_y_y = {} # topics to topics
self.sigma_s_y = {} # speech acts to topics
self.sigma_s_s = {} # speech acts to speech acts
self.sigma_y_s = {} # topics to speech acts
# interaction terms for *transitions*
# these will map pairs to componenents
self.sigma_interactions_y = {}
self.sigma_interactions_s = {}
self.transition_interactions = transition_interactions
### initialize sigmas
for sa in self.tnb.get_speech_act_set(include_special_states=True):
self.sigma_s_s[sa] = numpy.zeros(self.tnb.num_speech_acts)
self.sigma_s_y[sa] = numpy.zeros(self.tnb.num_topics)
for y in self.tnb.get_topic_set(include_special_states=True):
self.sigma_y_y[y] = numpy.zeros(self.tnb.num_topics)
self.sigma_y_s[y] = numpy.zeros(self.tnb.num_speech_acts)
# we will only update pairs that occur more than
# k times together
pair_freq_min = 10
self.frequent_pairs = []
for pair in self.tnb.topic_sa_pairs:
self.topic_sa_interaction_etas[pair] = numpy.zeros(self.dimensions)
self.sigma_interactions_y[pair] = numpy.zeros(self.tnb.num_topics)
self.sigma_interactions_s[pair] = numpy.zeros(self.tnb.num_speech_acts)
if self.tnb.pair_counts[pair] > pair_freq_min and not "STOP" in pair and not "START" in pair:
self.frequent_pairs.append(pair)
print "modeling interactions for {0} pairs".format(len(self.frequent_pairs))
# probabilities of words given topics, speech acts
self.beta_joint = {}
self.calc_beta_joint()
self.calc_lambdas_joint()
self.calc_topic_proportions()
self.calc_speech_act_proportions()
def calc_F_on_hold_out(self):
'''
calculates F-score on the hold out portion
of the *training* dataset -- this is *not*
looking at the test set. this is just an easy
way to monitor performance / decide when to stop
optimization.
'''
topic_preds, sa_preds = \
self.predict_set_sequential_joint(self.tnb.held_out_cases_X)
return process_results.calc_metrics(
self.tnb.held_out_cases_Y, self.tnb.held_out_cases_S,
preds_Y=topic_preds, preds_S=sa_preds)["avg_F"]
def estimate_params(self, max_iters=100):
print "calculating initial F ..."
init_F = self.calc_F_on_hold_out()
print "initial F -- {0}".format(init_F)
self.prev_F = init_F
while self.diff_F > 0:
self.step()
cur_F = self.calc_F_on_hold_out()
self.diff_F = cur_F - self.prev_F
print "current F (on hold-out): {0}; previous F: {1}; diff: {2}".\
format(cur_F, self.prev_F, self.diff_F)
self.prev_F = cur_F
self.iter += 1
def step(self):
''' one optimization step. '''
print "{0} on iteration {1}".format(PRETTY_STR, self.iter)
# note that the betas are updated within this
# method
self.update_topic_etas(joint=True)
print "ok. updating (speech act) etas..."
self.update_speech_act_etas(joint=True) # again, betas are updated here
print "now updating topic/sa **interaction** etas..."
self.update_topic_sa_interaction_etas()
### assuming joint case here.
print "updating (joint) transition lambdas..."
self.update_transition_lambdas_joint()
print "now updating topic/sa **interaction** sigmas"
self.update_interaction_transition_lambdas_y()
self.update_interaction_transition_lambdas_s()
ll = self.joint_sequential_log_likelihood()
def calc_topic_proportions(self):
# note that we calc *log* probs
N = float(len(self.tnb.Y))
for topic in self.tnb.topic_set:
p_topic = float(self.tnb.Y.count(topic)) / N
print "p of topic {0}:{1}".format(topic, p_topic)
self.topic_proportions[topic] = numpy.log(p_topic)
print "(log) topic proportions calculated: {0}".\
format(self.topic_proportions)
def calc_speech_act_proportions(self):
# again, we work on the log-scale
N = float(len(self.tnb.S))
for sa in self.tnb.speech_act_set:
p_sa = float(self.tnb.S.count(sa)) / N
print "p of speech act {0}:{1}".format(sa, p_sa)
self.speech_act_proportions[sa] = numpy.log(p_sa)
print "(log) speech act proportions calculated: {0}".\
format(self.speech_act_proportions)
def calc_lambdas(self):
'''
this is for the transitions in the *univariate* (not joint)
case.
'''
if self.topics_only:
for y in self.tnb.get_topic_set(include_special_states=True):
self.lambda_y_y[y] = self._calc_lambda(self.sigma_y_y[y], topic_trans=True)
else:
for s in self.tnb.get_speech_act_set(include_special_states=True):
self.lambda_s_s[s] = self._calc_lambda(self.sigma_s_s[s], topic_trans=False)
def _get_topic_trans_z(self, sigma_k):
z = 0.0
# j ranges over the number of topics.
for j in xrange(self.tnb.num_topics):
z += numpy.exp(sigma_k[j] + self.tnb.pi_topic[j])
return z
def _get_sa_trans_z(self, sigma_k):
z = 0.0
# and here j ranges over the number of speech acts.
for j in xrange(self.tnb.num_speech_acts):
z += numpy.exp(sigma_k[j] + self.tnb.pi_sa[j])
return z
def calc_lambdas_joint(self):
for target_y in self.tnb.get_topic_set(include_special_states=True):
for pair in self.tnb.topic_sa_pairs:
self.lambda_joint_y[pair] = self._calc_lambda_joint(pair, to_topic_trans=True)
for target_sa in self.tnb.get_speech_act_set(include_special_states=True):
for pair in self.tnb.topic_sa_pairs:
self.lambda_joint_s[pair] = self._calc_lambda_joint(pair, to_topic_trans=False)
def _calc_lambda_joint(self, pair, to_topic_trans=True):
topic, sa = pair
# \pi is the (log) background frequency
pi = self.tnb.pi_topic if to_topic_trans else self.tnb.pi_sa
# both of these are vectors of length |topics|
sigma_y, sigma_s = None, None
sigma_interaction = None
if to_topic_trans:
sigma_y = self.sigma_y_y[topic]
sigma_s = self.sigma_s_y[sa]
sigma_interaction = self.sigma_interactions_y[pair]
else:
sigma_y = self.sigma_y_s[topic]
sigma_s = self.sigma_s_s[sa]
sigma_interaction = self.sigma_interactions_s[pair]
lambda_joint = numpy.exp(pi + sigma_y + sigma_s + sigma_interaction)
z = sum(lambda_joint)
return lambda_joint/z
def _calc_lambda(self, sigma_k, topic_trans=True):
'''
\lambda_k <- exp(\sigma_k + \pi) / \sum_j(exp(\sigma_kj + \pi_i))
note that \pi will be either \pi_topic or \pi_sa, depending on which
we are updating a component of
'''
# \pi is the (log) background frequency
pi = self.tnb.pi_topic if topic_trans else self.tnb.pi_sa
lambda_k = numpy.exp(sigma_k + pi)
# normalize
z = self._get_topic_trans_z(sigma_k) if topic_trans else self._get_sa_trans_z(sigma_k)
if z == 0:
# something is wrong -- should never happen
pdb.set_trace()
return lambda_k/z
def calc_beta_joint(self):
'''
word distribution adjusted for both topics and
speech acts
'''
for topic in self.tnb.topic_set:
topic_eta = self.topic_etas[topic]
for sa in self.tnb.speech_act_set:
sa_eta = self.speech_act_etas[sa]
### add the interaction term
interaction_eta = self.topic_sa_interaction_etas[(topic, sa)]
beta_t_sa = self._calc_beta(topic_eta + sa_eta + interaction_eta)
self.beta_joint[(topic, sa)] = beta_t_sa
def y_transition_prob_joint(self, y, y_prev, s_prev, z=None):
'''
probability of transitioning to y given that the previous
topic was y_prev and the previous speech act was s_prev
'''
if z is None:
z = self.z_for_joint_to_y(y_prev, s_prev)
y_index = self.tnb.topics_to_indices[y]
pi_y = self.tnb.pi_topic[y_index]
trans_prob = numpy.exp(\
pi_y +\
self.sigma_y_y[y_prev][y_index] +\
self.sigma_s_y[s_prev][y_index] +\
self.sigma_interactions_y[(y_prev, s_prev)][y_index])
trans_prob = trans_prob/z
return trans_prob
def s_transition_prob_joint(self, s, y_prev, s_prev, z=None):
if z is None:
z = self.z_for_joint_to_s(y_prev, s_prev)
s_index = self.tnb.speech_acts_to_indices[s]
pi_s = self.tnb.pi_sa[s_index]
trans_prob = numpy.exp(\
pi_s +\
self.sigma_s_s[s_prev][s_index] +\
self.sigma_y_s[y_prev][s_index] +\
self.sigma_interactions_s[(y_prev, s_prev)][s_index])
trans_prob = trans_prob/z
return trans_prob
def _calc_beta(self, eta_k):
'''
\beta_k <- exp(\eta_k + m) / \sum_i(exp(\eta_ki + m_i))
'''
beta_k = numpy.exp(eta_k + self.tnb.m)
# renormalize
z = 0.0
for i in xrange(self.dimensions):
z += numpy.exp(eta_k[i] + self.tnb.m[i])
# for now we will leave vectors as 1xw, but note that
# later we will assume that beta's are wx1 vectors,
# rather than 1xw; hence you will need to transpose these, e.g.,
# > numpy.mat(beta_k).T
return beta_k / z
''' univariate '''
def calc_topic_betas(self):
for topic in self.tnb.topic_set:
self.beta_topics[topic] = self._calc_beta(self.topic_etas[topic])
def calc_speech_act_betas(self):
for speech_act in self.tnb.speech_act_set:
self.beta_sa[speech_act] = self._calc_beta(self.speech_act_etas[speech_act])
def update_topic_sa_interaction_etas(self):
print "updating ({0}) pairs".format(len(self.frequent_pairs))
prev_ll = self.joint_sequential_log_likelihood()
for pair in self.frequent_pairs:
delta = self.get_delta_interaction_eta(pair)
delta = numpy.array(delta.T)[0] # this is (w,)
prev_interaction_eta = copy.copy(self.topic_sa_interaction_etas[pair])
self.topic_sa_interaction_etas[pair] = \
self.topic_sa_interaction_etas[pair] - (emission_gamma*delta)
print "\nupdating interaction pair: {0}".format(pair)
print "pair {0} delta: {1}".format(pair, delta)
print "eta_interaction: {0}\n".format(self.topic_sa_interaction_etas[pair])
self.calc_beta_joint()
ll = self.joint_sequential_log_likelihood()
print "ll after updating interaction {0}:{1}".format(pair, ll)
diff_ll_for_pair = ll - prev_ll
print "diff ll {0}".format(diff_ll_for_pair)
if diff_ll_for_pair < THRESHOLD or math.isnan(ll):
# 'reject'
print "updated rejected!"
self.topic_sa_interaction_etas[pair] = prev_interaction_eta
else:
prev_ll = ll
self.calc_beta_joint()
def update_speech_act_etas(self, joint=True):
'''
update speech act component \eta_k according to:
\eta_k^t <- \eta_k^(t-1) - \delta \eta_k
'''
if joint:
prev_ll = self.joint_sequential_log_likelihood()
else:
prev_ll = self.speech_act_log_likelihood()
speech_acts = [sa for sa in self.tnb.speech_act_set if not sa in self.converged_sas]
for sa in speech_acts:
delta = None
if joint:
delta = self.get_delta_speech_act_joint(sa)
else:
delta = self.get_delta_speech_act(sa)
delta = numpy.array(delta.T)[0] # this is (w,)
prev_etas_for_sa = copy.copy(self.speech_act_etas[sa])
# update
self.speech_act_etas[sa] = self.speech_act_etas[sa] - (emission_gamma*delta)
print "\n\n\nupdating speech act: {0}".format(sa)
print "speech act {0} delta: {1}".format(sa, delta)
print "eta_sa: {0}\n".format(self.speech_act_etas[sa])
ll = None
if joint:
# note that joint_sequential_log_likelihood does not
# use the betas directly, but rather uses the \etas.
ll = self.joint_sequential_log_likelihood()
else:
self.calc_speech_act_betas()
ll = self.speech_act_log_likelihood()
print "ll after updating speech act {0}:{1}".format(sa, ll)
diff_ll_for_sa = ll - prev_ll
print "diff ll {0}".format(diff_ll_for_sa)
## if the likelihood decreases (or increases negligibly)
if diff_ll_for_sa < THRESHOLD or math.isnan(ll):
print "speech act {0} has converged!".format(sa)
self.speech_act_etas[sa] = prev_etas_for_sa
# note that we don't update the prev_ll here.
else:
prev_ll = ll
# update the \betas to account for new
# speech act \etas
if joint:
self.calc_beta_joint()
else:
self.calc_speech_act_betas()
def _contains_any(self, l, x):
'''
helper method; return True iff any elements of x are in l
'''
for x_i in x:
if x_i in l:
return True
return False
def _pairs_without_special_states(self, include_start=False):
to_remove = ["STOP"]
if not include_start:
to_remove.append("START")
return [pair for pair in self.tnb.topic_sa_pairs \
if not self._contains_any(pair, to_remove)]
def calc_joint_Zs(self):
joint_Zs = {}
#for y,s in self.tnb.topic_sa_pairs:
for y,s in self._pairs_without_special_states():
joint_Zs[(y,s)] = self.get_joint_z(y, s)
return joint_Zs
def get_joint_z(self, y, s):
return sum(numpy.exp(self.tnb.m + \
self.topic_etas[y] + \
self.speech_act_etas[s] +\
self.topic_sa_interaction_etas[(y,s)]))
def g_eta_topic_joint(self, topic):
'''
partial derivative for the component corresponding to the
given topic, taking into consideration the joint \betas.
'''
observed_topic_counts = self.tnb.get_c_topic(topic)
expected = numpy.zeros(self.tnb.get_dimensions())
# iterate (/marginalize) over speech acts
for sa in self.tnb.speech_act_set:
# observed word counts for this topic and speech act pair
c_topic_sa = self.tnb.get_c_joint(topic, sa)
C_topic_sa = sum(c_topic_sa)
# joint beta reflecting word distribution for this topic/speech
# act pair
beta_t_sa = self.beta_joint[(topic, sa)]
# expected word counts
expected += C_topic_sa * beta_t_sa
partial_deriv = observed_topic_counts - expected
partial_deriv = numpy.matrix(partial_deriv).T
return partial_deriv
def g_eta_speech_act_joint(self, speech_act):
'''
partial derivative for the component corresponding to the
given speech act, taking into consideration the joint \betas
(i.e., taking into account the topics).
'''
observed_speech_act_word_counts = self.tnb.get_c_speech_act(speech_act)
expected = numpy.zeros(self.tnb.get_dimensions())
# iterate over topics
for topic in self.tnb.topic_set:
# observed word counts for this topic and speech act pair
c_topic_sa = self.tnb.get_c_joint(topic, speech_act)
C_topic_sa = sum(c_topic_sa)
# joint beta reflecting word distribution for this topic/speech
# act pair
beta_t_sa = self.beta_joint[(topic, speech_act)]
expected += C_topic_sa * beta_t_sa
partial_deriv = observed_speech_act_word_counts - expected
partial_deriv = numpy.matrix(partial_deriv).T
return partial_deriv
def g_eta_interaction(self, pair):
'''
partial derivative for the component corresponding to the
given *interaction pair* (topic, speech act).
'''
#observed_pair_word_counts = self.tnb.joint_token_counts[pair]
observed_pair_word_counts = self.tnb.get_c_pair(pair)
# the expected is just the number of times we observed this
# (topic, speech act) pair times the current beta
num_times_pair_observed = sum(observed_pair_word_counts) #self.tnb.pair_counts[pair]
expected = num_times_pair_observed * self.beta_joint[pair]
partial_deriv = observed_pair_word_counts - expected
partial_deriv = numpy.matrix(partial_deriv).T
return partial_deriv
def update_topic_etas(self, joint=False):
'''
update topic eta according to:
\eta_k^t <- \eta_k^(t-1) - \delta \eta_k
if joint is True, then we take the speech act
etas into consideration when we update the topic_etas
'''
prev_ll = None
if joint:
prev_ll = self.joint_sequential_log_likelihood()
else:
prev_ll = self.topic_log_likelihood()
topics = [t for t in self.tnb.topic_set if not t in self.converged_topics]
for topic in topics:
print "getting delta for topic {0}".format(topic)
delta = None
if joint:
delta = self.get_delta_topic_joint(topic)
else:
delta = self.get_delta_topic(topic)
# you have to transform the delta 'matrix' here
# into an array, otherwise you get 'matrix too big'
# exceptions.
delta = numpy.array(delta.T)[0] # this is (w,)
prev_etas_for_topic = copy.copy(self.topic_etas[topic])
# update
self.topic_etas[topic] = self.topic_etas[topic] - (emission_gamma*delta)
print "\nupdating topic: {0}".format(topic)
print "topic {0} delta: {1}".format(topic, delta)
print "eta_topic: {0}\n".format(self.topic_etas[topic])
ll = None
if joint:
ll = self.joint_sequential_log_likelihood()
else:
ll = self.topic_log_likelihood()
print "ll after updating topic {0}:{1}".format(topic, ll)
diff_ll_for_topic = ll - prev_ll
print "diff in ll: {0}".format(diff_ll_for_topic)
# @TODO raise exception on isnan -- or at least a warning --
# because this indicates badness (probably)
if diff_ll_for_topic < THRESHOLD or math.isnan(ll):
print "topic {0} has converged!".format(topic)
# use previous value
self.topic_etas[topic] = prev_etas_for_topic
self.converged_topics.append(topic)
else:
prev_ll = ll
# update betas to reflect new topic \etas
if joint:
self.calc_beta_joint()
else:
self.calc_topic_betas()
def update_transition_lambdas_joint(self):
topics_to_update = self.tnb.get_topic_set(include_special_states=True, exclude_stop_state=True)
speech_acts_to_update = self.tnb.get_speech_act_set(include_special_states=True, exclude_stop_state=True)
# catch the previous ll
prev_ll = self.joint_sequential_log_likelihood()
before_lambda_updates_ll = prev_ll
print "updating y->y transition sigmas"
''' first update the topic to topic transitions and the speech act
to speech act transitions '''
for topic in topics_to_update:
# old value
old_sigma = self.sigma_y_y[topic]
# update topic-to-topic transition components
delta_topic_y = self.get_delta_y_y_joint(topic)
delta_topic_y = numpy.array(delta_topic_y.T)[0]
self.sigma_y_y[topic] = self.sigma_y_y[topic] - (transition_gamma*delta_topic_y)
new_ll = self.joint_sequential_log_likelihood()
print "ll - prev_ll: {0}".format(new_ll-prev_ll)
if (new_ll - prev_ll) < THRESHOLD:
# then don't update this sigma
print "not updating y->y for topic {0}".format(topic)
self.sigma_y_y[topic] = old_sigma
else:
prev_ll = new_ll
prev_ll = self.joint_sequential_log_likelihood()
print "\nupdating s->s transition sigmas"
for speech_act in speech_acts_to_update:
old_sigma = self.sigma_s_s[speech_act]
delta_sa_s = self.get_delta_s_s_joint(speech_act)
delta_sa_s = numpy.array(delta_sa_s.T)[0]
self.sigma_s_s[speech_act] = self.sigma_s_s[speech_act] - (transition_gamma*delta_sa_s)
#print "speech act delta = {0} (update: {1})".format(delta_sa_s, transition_gamma*delta_sa_s)
new_ll = self.joint_sequential_log_likelihood()
print "ll - prev_ll: {0}".format(new_ll-prev_ll)
if (new_ll - prev_ll) < THRESHOLD:
# then don't update this sigma
print "not updating s->s for speech act {0}".format(speech_act)
self.sigma_s_s[speech_act] = old_sigma
else:
prev_ll = new_ll
# recalculate lambdas
self.calc_lambdas_joint()
print "\n\nlikelihood after updating 'primary' transitions: {0} (difference of {1})".\
format(new_ll, new_ll-before_lambda_updates_ll)
ll_after_primary = new_ll
print "updating y->s transition sigmas"
''' now update the 'secondary' transitions: topics to speech acts; speech
acts to topics '''
for topic in topics_to_update:
old_sigma = self.sigma_y_s[topic]
delta_topic_s = self.get_delta_y_s_joint(topic)
# fixes the dimensions to be (|speech acts|,)
delta_topic_s = numpy.array(delta_topic_s.T)[0]
self.sigma_y_s[topic] = self.sigma_y_s[topic] - (transition_gamma*delta_topic_s)
new_ll = self.joint_sequential_log_likelihood()
print "ll - prev_ll: {0}".format(new_ll-prev_ll)
if (new_ll - prev_ll) < THRESHOLD:
# then don't update this sigma
print "not updating y->s for topic {0}".format(topic)
self.sigma_y_s[topic] = old_sigma
else:
prev_ll = new_ll
self.calc_lambdas_joint()
print "updating s->y transition sigmas"
for speech_act in speech_acts_to_update:
old_sigma = self.sigma_s_y[speech_act]
delta_sa_y = self.get_delta_s_y_joint(speech_act)
delta_sa_y = numpy.array(delta_sa_y.T)[0]
self.sigma_s_y[speech_act] = self.sigma_s_y[speech_act] - (transition_gamma*delta_sa_y)
new_ll = self.joint_sequential_log_likelihood()
print "ll - prev_ll: {0}".format(new_ll-prev_ll)
if (new_ll - prev_ll) < THRESHOLD:
# then don't update this sigma
print "not updating s->y for speech act {0}".format(speech_act)
self.sigma_s_y[speech_act] = old_sigma
else:
prev_ll = new_ll
print "likelihood after updating 'secondary' transitions: {0} (difference of {1})".\
format(new_ll, new_ll-ll_after_primary)
self.calc_lambdas_joint()
def update_interaction_transition_lambdas_y(self):
prior_ll = prev_ll = self.joint_sequential_log_likelihood()
print "ll prior to updating interaction transitions (y): {0}".format(prior_ll)
for pair in self.frequent_pairs:
delta_interaction_y = self.get_delta_interaction_y(pair)
delta = numpy.array(delta_interaction_y.T)[0]
prev_val = copy.deepcopy(self.sigma_interactions_y[pair])
self.sigma_interactions_y[pair] = self.sigma_interactions_y[pair] - (transition_gamma*delta)
new_ll = self.joint_sequential_log_likelihood()
print "previous ll: {0}, new ll {1}, diff: {2}".format(prev_ll, new_ll, new_ll - prev_ll)
if new_ll - prev_ll < THRESHOLD:
print "rejecting interaction transition update!"
self.sigma_interactions_y[pair] = prev_val
else:
print "new sigma interactions (y) for pair {0}: {1}".format(pair, self.sigma_interactions_y[pair])
prev_ll = new_ll
print "delta ll after updating interaction transition terms for topics: {0}".\
format(prev_ll - prior_ll)
self.calc_lambdas_joint()
### interactions
def get_delta_interaction_y(self, pair):
topic, sa = pair
K_pair_y = self.calc_k_pair_y(pair) # (diagonal) |topics|x|topics|
g_pair_y = self.g_sigma_pair_y(pair)
K_gradiant = K_pair_y * g_pair_y # |topics x 1|
T_pair_y = self.tnb.get_T_y_joint(topic, sa) # scalar
lambda_pair_y = numpy.mat(self.lambda_joint_y[pair]).T # |topics| x 1
TKL = T_pair_y * K_pair_y * lambda_pair_y # |topics|x1
z = 1 + T_pair_y * lambda_pair_y.T * K_pair_y * lambda_pair_y
if z == 0:
print " delta interaction y -- divisor is 0... setting to 1."
z = 1.0
delta_pair_y = K_gradiant - TKL/z * (lambda_pair_y.T * K_gradiant)
return delta_pair_y
def update_interaction_transition_lambdas_s(self):
prior_ll = prev_ll = self.joint_sequential_log_likelihood()
for pair in self.frequent_pairs:
delta_interaction_s = self.get_delta_interaction_s(pair)
delta = numpy.array(delta_interaction_s.T)[0]
prev_val = copy.copy(self.sigma_interactions_s[pair])
self.sigma_interactions_s[pair] = self.sigma_interactions_s[pair] - (transition_gamma*delta)
new_ll = self.joint_sequential_log_likelihood()
print "previous ll: {0}, new ll {1}, diff: {2}".format(prev_ll, new_ll, new_ll - prev_ll)
if new_ll - prev_ll < THRESHOLD:
print "rejecting interaction transition update!"
self.sigma_interactions_s[pair] = prev_val
else:
print "new sigma interactions (s) for pair {0}: {1}".format(pair, self.sigma_interactions_s[pair])
prev_ll = new_ll
print "delta ll after updating interaction transition terms for speech acts: {0}".\
format(prev_ll - prior_ll)
self.calc_lambdas_joint()
def get_delta_interaction_s(self, pair):
topic, sa = pair
K_pair_s = self.calc_k_pair_s(pair) # (diagonal) |speech acts|x|speech acts|
g_pair_s = self.g_sigma_pair_s(pair)
K_gradiant = K_pair_s * g_pair_s # |speech acts x 1|
T_pair_s = self.tnb.get_T_s_joint(topic, sa) # scalar
lambda_pair_s = numpy.mat(self.lambda_joint_s[pair]).T # |speech acts| x 1
TKL = T_pair_s * K_pair_s * lambda_pair_s # |speech acts|x1
z = 1 + T_pair_s * lambda_pair_s.T * K_pair_s * lambda_pair_s
if z == 0:
print " delta interaction s -- divisor is 0... setting to 1."
z = 1.0
delta_pair_s = K_gradiant - TKL/z * (lambda_pair_s.T * K_gradiant)
return delta_pair_s
'''
these delta's are for the emission probabilities
'''
def get_delta_topic_joint(self, topic):
g_eta_topic = self.g_eta_topic_joint(topic) # wx1
H_inv_topic_joint = self.get_H_inv_topic_joint(topic) # wxw
return H_inv_topic_joint * g_eta_topic
def get_delta_speech_act_joint(self, topic):
g_eta_sa = self.g_eta_speech_act_joint(topic) # wx1
H_inv_speech_act_joint = self.get_H_inv_speech_act_joint(topic) # wxw
return H_inv_speech_act_joint * g_eta_sa
def _invert_v(self, v):
# v cannot contain any zeros!
return [1.0/v_i for v_i in v]
def get_A_topic_joint(self, topic):
A_topic = numpy.zeros(self.tnb.get_dimensions())
for sa in self.tnb.get_speech_act_set():
C_topic_sa = self.tnb.get_C_joint(topic, sa)
beta_topic_sa = self.beta_joint[(topic, sa)]
A_topic += C_topic_sa * beta_topic_sa
A_topic = self._invert_v(-1 * A_topic)
m = n = self.tnb.get_dimensions()
return scipy.sparse.spdiags(A_topic, 0, m, n)
def get_A_speech_act_joint(self, sa):
A_sa = numpy.zeros(self.tnb.get_dimensions())
# take the expectation over topics
for topic in self.tnb.get_topic_set():
C_topic_sa = self.tnb.get_C_joint(topic, sa)
beta_topic_sa = self.beta_joint[(topic, sa)]
A_sa += C_topic_sa * beta_topic_sa
A_sa = self._invert_v(-1 * A_sa)
m = n = self.tnb.get_dimensions()
return scipy.sparse.spdiags(A_sa, 0, m, n)
def get_H_inv_topic_joint(self, topic):
m = self.tnb.get_dimensions()
H_inv = numpy.zeros((m,m))
A = self.get_A_topic_joint(topic)
for sa in self.tnb.get_speech_act_set(include_special_states=False):
C_topic_sa = self.tnb.get_C_joint(topic, sa)
beta_topic_sa = self.beta_joint[(topic, sa)]
numerator = \
A * C_topic_sa * beta_topic_sa * beta_topic_sa.T * A
denom = \
1 + C_topic_sa * beta_topic_sa.T * A * beta_topic_sa
H_inv += numerator / denom
H_inv = A - H_inv
return H_inv
def get_H_inv_speech_act_joint(self, speech_act):
m = self.tnb.get_dimensions()
H_inv = numpy.zeros((m,m))
A = self.get_A_speech_act_joint(speech_act)
for topic in self.tnb.get_topic_set(include_special_states=False):
C_topic_sa = self.tnb.get_C_joint(topic, speech_act)
beta_topic_sa = self.beta_joint[(topic, speech_act)]
numerator = \
A * C_topic_sa * beta_topic_sa * beta_topic_sa.T * A
denom = \
1 + C_topic_sa * beta_topic_sa.T * A * beta_topic_sa
H_inv += numerator / denom
H_inv = A - H_inv
return H_inv
def get_A_interaction(self, pair):
C_pair = sum(self.tnb.get_c_pair(pair))#self.tnb.pair_counts[pair]
beta_topic_sa = self.beta_joint[pair]
A_interaction = -1* (C_pair * beta_topic_sa)
A_interaction = self._invert_v(A_interaction)
m = n = self.tnb.get_dimensions()
return scipy.sparse.spdiags(A_interaction, 0, m, n)
def get_delta_interaction_eta(self, pair):
m = self.tnb.get_dimensions()
H_inv = numpy.zeros((m,m))
A = self.get_A_interaction(pair) # wxw
g_eta_interaction = self.g_eta_interaction(pair) # wx1
A_gradiant = A * g_eta_interaction # wx1
C_pair = sum(self.tnb.get_c_pair(pair))#self.tnb.pair_counts[pair] # scalar (1x1)
beta_pair = numpy.mat(self.beta_joint[pair]).T # wx1
CAB = C_pair * A * beta_pair # numerator -- wx1
z = 1 + C_pair * beta_pair.T * A * beta_pair # the divisor; a scalar (1x1)
if z == 0:
z = 1.0
delta_pair = A_gradiant - CAB/z * (beta_pair.T * A_gradiant) # ultimately wx1
return delta_pair
def calc_k_pair_y(self, pair): # from pairs to topics
topic, sa = pair
T_pair_y = self.tnb.get_T_y_joint(topic, sa)
lambda_pair_y = self.lambda_joint_y[pair]
k_pair_y = -1 * (T_pair_y * lambda_pair_y) # expected
k_pair_y = self._invert_v(k_pair_y)
m = n = self.tnb.num_topics
return scipy.sparse.spdiags(k_pair_y, 0, m, n)
def calc_k_pair_s(self, pair): # from pairs to speech acts
topic, sa = pair
T_pair_s = self.tnb.get_T_s_joint(topic, sa)
lambda_pair_s = self.lambda_joint_s[pair]
k_pair_s = -1 * (T_pair_s * lambda_pair_s) # expected
k_pair_s = self._invert_v(k_pair_s)
m = n = self.tnb.num_speech_acts
return scipy.sparse.spdiags(k_pair_s, 0, m, n)
'''
these deltas are for the transition probabilities.
'''
def get_delta_y_y_joint(self, topic):
g_y_y_joint = self.g_sigma_y_y_joint(topic)
H_y_y_joint_inv = self.get_H_inv_transition_y_y_joint(topic)
return H_y_y_joint_inv * g_y_y_joint
def get_delta_y_s_joint(self, topic):
g_y_s_joint = self.g_sigma_y_s_joint(topic)
H_y_s_joint_inv = self.get_H_inv_transition_y_s_joint(topic)
return H_y_s_joint_inv * g_y_s_joint
def get_delta_s_s_joint(self, sa):
g_s_s_joint = self.g_sigma_s_s_joint(sa)
H_s_s_joint_inv = self.get_H_inv_transition_s_s_joint(sa)
return H_s_s_joint_inv * g_s_s_joint
def get_delta_s_y_joint(self, sa):
g_s_y_joint = self.g_sigma_s_y_joint(sa)
H_s_y_joint_inv = self.get_H_inv_transition_s_y_joint(sa)
return H_s_y_joint_inv * g_s_y_joint
def g_sigma_y_y_joint(self, topic):
# first get the observed transition counts out of
# topic topic
# 1 x |topics|. observed transition counts out of topic
# (into other topics)
t_y_y = self.tnb.get_t_y_y(topic)
T_y_y = sum(t_y_y) # scalar (sum); total transitions out of topic
# now calculate the expected, marginalizing over
# the speech acts
expected = 0.0
for sa in self.tnb.get_speech_act_set(include_special_states=True):
pair = (topic, sa)
# expected transition probabilities, under current model
cur_lambda_joint = self.lambda_joint_y[pair]
C_topic_sa = self.tnb.get_T_y_joint(topic, sa)
expected += cur_lambda_joint * C_topic_sa
partial_deriv = t_y_y - expected
partial_deriv = numpy.matrix(partial_deriv).T
return partial_deriv
def g_sigma_y_s_joint(self, topic):
t_y_s = self.tnb.get_t_y_s(topic) # 1 x |speech acts|.
T_y_s = sum(t_y_s) # scalar (sum); total transitions out of topic into speech acts
# now calculate the expected, 'marginalizing' over speech acts
expected = 0.0
for sa in self.tnb.get_speech_act_set(include_special_states=True):
pair = (topic, sa)
# expected transition probabilities, under current model
cur_lambda_joint = self.lambda_joint_s[pair]
# total number of times we were in this topic *and*
# this speech act
C_topic_sa = self.tnb.get_T_s_joint(topic, sa)
expected += cur_lambda_joint * C_topic_sa
partial_deriv = t_y_s - expected
partial_deriv = numpy.matrix(partial_deriv).T
return partial_deriv
def g_sigma_s_y_joint(self, speech_act):
t_s_y = self.tnb.get_t_s_y(speech_act) # 1 x |topics|. observed transition counts from speech act to topics.
T_s_y = sum(t_s_y) # scalar (sum); total transitions out of speech act
# now calculate the expected, marginalizing over topics
expected = 0.0
for topic in self.tnb.get_topic_set(include_special_states=True):
pair = (topic, speech_act)
# expected transition probabilities, under current model
cur_lambda_joint = self.lambda_joint_y[pair]
# total number of times we were in this topic *and* speech act
C_topic_sa = self.tnb.get_T_y_joint(topic, speech_act)
expected += cur_lambda_joint * C_topic_sa
partial_deriv = t_s_y - expected
partial_deriv = numpy.matrix(partial_deriv).T
return partial_deriv
def g_sigma_s_s_joint(self, speech_act):
t_s_s = self.tnb.get_t_s_s(speech_act) # 1 x |topics|. observed transition counts from speech act to speech acts