# Next Sentence Prediction With BERT

* HuggingFace의 BERTNSP를 통해 순서 문제를 풀어보고자 함
* 기본으로 박스의 지문에서 시작하여
* (A), (B), (C)의 문장을 한번씩 Pair로 사용하여 확률을 계산한다.
* 최대 확률을 2번째 문장으로 잡고 3번째 문장을 예측하면, 4번째 문장까지 바로 구할 수 있다.

In [1]:
from torch.nn.functional import softmax
from transformers import BertForNextSentencePrediction, BertTokenizer

In [None]:
# load pretrained model and a pretrained tokenizer
model = BertForNextSentencePrediction.from_pretrained('bert-base-cased')
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

In [169]:
seq_A = 'I like cookies !'
seq_B = 'Do you like them ?'
seq_C = 'I fell in love with that woman'

In [410]:
def nsp(criterion_sentence, next_sentence, verbose = 0):
    encoded = tokenizer.encode_plus(criterion_sentence, text_pair=next_sentence, return_tensors='pt')

    seq_relationship_logits = model(**encoded)[0]

    # we still need softmax to convert the logits into probabilities
    # index 0: sequence B is a continuation of sequence A
    # index 1: sequence B is a random sequence
    probs = softmax(seq_relationship_logits, dim=1)
    if verbose != 0:
        print(f'{criterion_sentence} \n >>>  {next_sentence} \n')
    return probs

In [407]:
# seq_A와 seq_C는 확률이 적지만, seq_A와 seq_B는 확률이 높다!

print(nsp(seq_A, seq_C))
print(nsp(seq_A, seq_B))

I like cookies ! 
 >>>  I fell in love with that woman 

tensor([[0.4180, 0.5820]], grad_fn=<SoftmaxBackward>)
I like cookies ! 
 >>>  Do you like them ? 

tensor([[9.9993e-01, 6.7607e-05]], grad_fn=<SoftmaxBackward>)


# 2020 / 21 수능 순서 문제에 적용

In [413]:
# 2020 수능 영어 짝수형 37번 - 2번: BAC

first = 'Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’.'
sentences = {'A': 'Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day.',
             'B': 'This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent. Closer analysis of scientific discoveries, for instance, led historians to ask whether the dates of discoveries and their discoverers can be identified precisely.',
             'C': 'Nor does the traditional view recognise the role that non-intellectual factors, especially institutional and socio-economic ones, play in scientific developments. Most importantly, however, the traditional historian of science seems blind to the fact that the concepts, questions and standards that they use to frame the past are themselves subject to historical change.'}

def sentence_order(first,sentences):
    sentence = sentences.copy()
    start = first
    answer_sent = [first]
    answer_index = []
    
    while len(sentence) > 1:
        sent_dic = {}
        for s_key in sentence.keys():
            false_prob = nsp(start, sentence[s_key])[0][1]
            sent_dic[s_key] = false_prob
            
        min_prob = min(sent_dic.items(), key=lambda x: x[1])

        start = start +' '+sentence[min_prob[0]]

        answer_index.append(min_prob[0])
        answer_sent.append(sentence.pop(min_prob[0]))
        
    # last sentence append
    answer_sent.append(list(sentence.values())[0])
    answer_index.append(list(sentence.keys())[0])
    
    return answer_sent, answer_index


whole_sentence, answer = sentence_order(first, sentences)

In [415]:
print(' '.join(whole_sentence))

Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’. This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent. Closer analysis of scientific discoveries, for instance, led historians to ask whether the dates of discoveries and their discoverers can be identified precisely. Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day. Nor does the traditional view recognise the role that non-intellectual factors, especially institutional

In [417]:
print(f'Answer is {'=>'.join(answer)}')

SyntaxError: invalid syntax (<ipython-input-417-cc7c300bd872>, line 1)

In [412]:
# Answer 2: B-A-C
first = "The objective of battle, to “throw” the enemy and to make him defenseless, may temporarily blind commanders and even strategists to the larger purpose of war. War is never an isolated act, nor is it ever only one decision."

sentences = {'A': "To be political, a political entity or a representative of a political entity, whatever its constitutional form, has to have an intention, a will. That intention has to be clearly expressed.",
            'B':"In the real world, war’s larger purpose is always a political purpose. It transcends the use of force. This insight was famously captured by Clausewitz’s most famous phrase, “War is a mere continuation of politics by other means.",
            'C':"And one side’s will has to be transmitted to the enemy at some point during the confrontation (it does not have to be publicly communicated). A violent act and its larger political intention must also be attributed to one side at some point during the confrontation. History does not know of acts of war without eventual attribution."}
A = "To be political, a political entity or a representative of a political entity, whatever its constitutional form, has to have an intention, a will. That intention has to be clearly expressed."
B = "In the real world, war’s larger purpose is always a political purpose. It transcends the use of force. This insight was famously captured by Clausewitz’s most famous phrase, “War is a mere continuation of politics by other means."
C = "And one side’s will has to be transmitted to the enemy at some point during the confrontation (it does not have to be publicly communicated). A violent act and its larger political intention must also be attributed to one side at some point during the confrontation. History does not know of acts of war without eventual attribution."

In [393]:
nsp(first, A)
nsp(first, B)
nsp(first, C)

The objective of battle, to “throw” the enemy and to make him defenseless, may temporarily blind commanders and even strategists to the larger purpose of war. War is never an isolated act, nor is it ever only one decision. 
 >>>  To be political, a political entity or a representative of a political entity, whatever its constitutional form, has to have an intention, a will. That intention has to be clearly expressed. 

The objective of battle, to “throw” the enemy and to make him defenseless, may temporarily blind commanders and even strategists to the larger purpose of war. War is never an isolated act, nor is it ever only one decision. 
 >>>  In the real world, war’s larger purpose is always a political purpose. It transcends the use of force. This insight was famously captured by Clausewitz’s most famous phrase, “War is a mere continuation of politics by other means. 

The objective of battle, to “throw” the enemy and to make him defenseless, may temporarily blind commanders and eve

tensor([[9.9998e-01, 1.9218e-05]], grad_fn=<SoftmaxBackward>)

In [394]:
print(nsp(B, A))
print(nsp(B, C))

In the real world, war’s larger purpose is always a political purpose. It transcends the use of force. This insight was famously captured by Clausewitz’s most famous phrase, “War is a mere continuation of politics by other means. 
 >>>  To be political, a political entity or a representative of a political entity, whatever its constitutional form, has to have an intention, a will. That intention has to be clearly expressed. 

tensor([[9.9984e-01, 1.6341e-04]], grad_fn=<SoftmaxBackward>)
In the real world, war’s larger purpose is always a political purpose. It transcends the use of force. This insight was famously captured by Clausewitz’s most famous phrase, “War is a mere continuation of politics by other means. 
 >>>  And one side’s will has to be transmitted to the enemy at some point during the confrontation (it does not have to be publicly communicated). A violent act and its larger political intention must also be attributed to one side at some point during the confrontation. Hist

In [379]:
# Answer 5: C-B-A
first = "Experts have identified a large number of measures that promote energy efficiency. Unfortunately many of them are not cost effective. This is a fundamental requirement for energy efficiency investment from an economic perspective."

sentences = {'A': "And this has direct repercussions at the individual level: households can reduce the cost of electricity and gas bills, and improve their health and comfort, while companies can increase their competitiveness and their productivity. Finally, the market for energy efficiency could contribute to the economy through job and firms creation.",
             'B': "There are significant externalities to take into account and there are also macroeconomic effects. For instance, at the aggregate level, improving the level of national energy efficiency has positive effects on macroeconomic issues such as energy dependence, climate change, health, national competitiveness and reducing fuel poverty.",
             'C': "However, the calculation of such cost effectiveness is not easy: it is not simply a case of looking at private costs and comparing them to the reductions achieved."}
A = "And this has direct repercussions at the individual level: households can reduce the cost of electricity and gas bills, and improve their health and comfort, while companies can increase their competitiveness and their productivity. Finally, the market for energy efficiency could contribute to the economy through job and firms creation."
B = "There are significant externalities to take into account and there are also macroeconomic effects. For instance, at the aggregate level, improving the level of national energy efficiency has positive effects on macroeconomic issues such as energy dependence, climate change, health, national competitiveness and reducing fuel poverty."
C = "However, the calculation of such cost effectiveness is not easy: it is not simply a case of looking at private costs and comparing them to the reductions achieved."

In [186]:
nsp(first, A)
nsp(first, B)
nsp(first, C)

Experts have identified a large number of measures that promote energy efficiency. Unfortunately many of them are not cost effective. This is a fundamental requirement for energy efficiency investment from an economic perspective. >>> And this has direct repercussions at the individual level: households can reduce the cost of electricity and gas bills, and improve their health and comfort, while companies can increase their competitiveness and their productivity. Finally, the market for energy efficiency could contribute to the economy through job and firms creation.
tensor([[9.9999e-01, 1.3497e-05]], grad_fn=<SoftmaxBackward>)
Experts have identified a large number of measures that promote energy efficiency. Unfortunately many of them are not cost effective. This is a fundamental requirement for energy efficiency investment from an economic perspective. >>> There are significant externalities to take into account and there are also macroeconomic effects. For instance, at the aggregate

In [187]:
nsp(A, B)
nsp(A, C)

And this has direct repercussions at the individual level: households can reduce the cost of electricity and gas bills, and improve their health and comfort, while companies can increase their competitiveness and their productivity. Finally, the market for energy efficiency could contribute to the economy through job and firms creation. >>> There are significant externalities to take into account and there are also macroeconomic effects. For instance, at the aggregate level, improving the level of national energy efficiency has positive effects on macroeconomic issues such as energy dependence, climate change, health, national competitiveness and reducing fuel poverty.
tensor([[9.9999e-01, 1.4316e-05]], grad_fn=<SoftmaxBackward>)
And this has direct repercussions at the individual level: households can reduce the cost of electricity and gas bills, and improve their health and comfort, while companies can increase their competitiveness and their productivity. Finally, the market for ene

In [376]:
# 2020 수능 영어 짝수형 36번 - 5번: CBA

first = 'Movies may be said to support the dominant culture and to serve as a means for its reproduction over time.'
sentences = {'A' : 'The bad guys are usually punished; the romantic couple almost always find each other despite the obstacles and difficulties they encounter on the path to true love; and the way we wish the world to be is how, in the movies, it more often than not winds up being. No doubt it is this utopian aspect of movies that accounts for why we enjoy them so much.',
             'B' : 'The simple answer to this question is that movies do more than present two-hour civics lessons or editorials on responsible behavior. They also tell stories that, in the end, we find satisfying.',
             'C' : 'But one may ask why audiences would find such movies enjoyable if all they do is give cultural directives and prescriptions for proper living. Most of us would likely grow tired of such didactic movies and would probably come to see them as propaganda, similar to the cultural artwork that was common in the Soviet Union and other autocratic societies.'}

In [205]:
nsp(first, A)
nsp(first, B)
nsp(first, C)

Movies may be said to support the dominant culture and to serve as a means for its reproduction over time. 
 >>>  The bad guys are usually punished; the romantic couple almost always find each other despite the obstacles and difficulties they encounter on the path to true love; and the way we wish the world to be is how, in the movies, it more often than not winds up being. No doubt it is this utopian aspect of movies that accounts for why we enjoy them so much.
tensor([[0.9978, 0.0022]], grad_fn=<SoftmaxBackward>)
Movies may be said to support the dominant culture and to serve as a means for its reproduction over time. 
 >>>  The simple answer to this question is that movies do more than present two-hour civics lessons or editorials on responsible behavior. They also tell stories that, in the end, we find satisfying.
tensor([[9.9998e-01, 2.4017e-05]], grad_fn=<SoftmaxBackward>)
Movies may be said to support the dominant culture and to serve as a means for its reproduction over time. 


In [206]:
nsp(C, A)
nsp(C, B)

But one may ask why audiences would find such movies enjoyable if all they do is give cultural directives and prescriptions for proper living. Most of us would likely grow tired of such didactic movies and would probably come to see them as propaganda, similar to the cultural artwork that was common in the Soviet Union and other autocratic societies. 
 >>>  The bad guys are usually punished; the romantic couple almost always find each other despite the obstacles and difficulties they encounter on the path to true love; and the way we wish the world to be is how, in the movies, it more often than not winds up being. No doubt it is this utopian aspect of movies that accounts for why we enjoy them so much.
tensor([[9.9983e-01, 1.7474e-04]], grad_fn=<SoftmaxBackward>)
But one may ask why audiences would find such movies enjoyable if all they do is give cultural directives and prescriptions for proper living. Most of us would likely grow tired of such didactic movies and would probably come

In [222]:
# 2020 수능 영어 짝수형 37번 - 2번: BAC

first = 'Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’.'
sentences = {'A': 'Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day.',
             'B': 'This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent. Closer analysis of scientific discoveries, for instance, led historians to ask whether the dates of discoveries and their discoverers can be identified precisely.',
             'C': 'Nor does the traditional view recognise the role that non-intellectual factors, especially institutional and socio-economic ones, play in scientific developments. Most importantly, however, the traditional historian of science seems blind to the fact that the concepts, questions and standards that they use to frame the past are themselves subject to historical change.'}

In [219]:
A =  'Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day.'
B = 'This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent. Closer analysis of scientific discoveries, for instance, led historians to ask whether the dates of discoveries and their discoverers can be identified precisely.'
C =  'Nor does the traditional view recognise the role that non-intellectual factors, especially institutional and socio-economic ones, play in scientific developments. Most importantly, however, the traditional historian of science seems blind to the fact that the concepts, questions and standards that they use to frame the past are themselves subject to historical change.'

In [209]:
nsp(first, A)
nsp(first, B)
nsp(first, C)

Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’. 
 >>>  Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day.
tensor([[9.9997e-01, 3.1038e-05]], grad_fn=<SoftmaxBackward>)
Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’. 
 >>>  This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent

In [212]:
nsp(first +' ' + B, A)
nsp(first + ' ' + B, C)

Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’. This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent. Closer analysis of scientific discoveries, for instance, led historians to ask whether the dates of discoveries and their discoverers can be identified precisely. 
 >>>  Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day.
tensor([[9.9999e-01, 1.0861e-05]], grad_fn=<SoftmaxBackward>)
Traditionally, Kuhn claims, the pri

In [402]:
# 2020 수능 영어 짝수형 37번 - 2번: BAC

first = 'Traditionally, Kuhn claims, the primary goal of historians of science was ‘to clarify and deepen an understanding of contemporary scientific methods or concepts by displaying their evolution’.'
sentences = {'A': 'Some discoveries seem to entail numerous phases and discoverers, none of which can be identified as definitive. Furthermore, the evaluation of past discoveries and discoverers according to present-day standards does not allow us to see how significant they may have been in their own day.',
             'B': 'This entailed relating the progressive accumulation of breakthroughs and discoveries. Only that which survived in some form in the present was considered relevant. In the mid-1950s, however, a number of faults in this view of history became apparent. Closer analysis of scientific discoveries, for instance, led historians to ask whether the dates of discoveries and their discoverers can be identified precisely.',
             'C': 'Nor does the traditional view recognise the role that non-intellectual factors, especially institutional and socio-economic ones, play in scientific developments. Most importantly, however, the traditional historian of science seems blind to the fact that the concepts, questions and standards that they use to frame the past are themselves subject to historical change.'}

def sentence_order(first,sentences):
    sentence = sentences.copy()
    start = first
    answer_sent = [first]
    answer_index = []
    
    while len(sentence) > 1:
        sent_dic = {}
        for s_key in sentence.keys():
            false_prob = nsp(start, sentence[s_key])[0][1]
            sent_dic[s_key] = false_prob
            
        min_prob = min(sent_dic.items(), key=lambda x: x[1])

        start = start +' '+sentence[min_prob[0]]

        answer_index.append(min_prob[0])
        answer_sent.append(sentence.pop(min_prob[0]))
        
    # last sentence append
    answer_sent.append(list(sentence.values())[0])
    answer_index.append(list(sentence.keys())[0])
    
    return answer_sent, answer_index


sentence_order(first, sentences)