Stanford Core NLPの共参照解析の結果に基づき，文中の参照表現（mention）を代表参照表現（representative mention）に置換せよ．ただし，置換するときは，「代表参照表現（参照表現）」のように，元の参照表現が分かるように配慮せよ．

In [1]:
import xml.etree.ElementTree as ET

In [2]:
# 解析結果のxmlをパース
root = ET.parse('./nlp.txt.xml')

In [3]:
# coreferenceの列挙し、代表参照表現に置き換える場所情報の辞書を作成
#   辞書は{(sentence id, 開始token id), (終了token id, 代表参照表現)}...
replaces = {}
for coreference in root.iterfind('./document/coreference/coreference'):

    # 代表参照表現の取得
    representative = coreference.findtext('./mention[@representative="true"]/text')

    # 代表参照表現以外のmention列挙、辞書に追加
    for mention in coreference.iterfind('./mention'):
        if mention.get('representative') == None:
            sentence_id = mention.findtext('sentence')
            start = mention.findtext('start')
            end = int(mention.findtext('end')) - 1 # endはずらす
            
            # すでに辞書にあっても気にせず更新(後勝ち)
            replaces[(sentence_id, start)] = (end, representative)        

In [4]:
# 本文をreplacesで置き換えながら表示
for sentence in root.iterfind('./document/sentences/sentence'):
    sentence_id = sentence.get('id')

    for token in sentence.iterfind('./tokens/token'):
        token_id = token.get('id')

        # 置換スタート
        if (sentence_id, token_id) in replaces:

            # 辞書から終了位置と代表参照表現を取り出し
            (end, representative) = replaces[(sentence_id, token_id)]

            # 代表参照表現＋カッコを挿入(end=''で改行なし)
            print('「' + representative + '」 (', end='')

        # token出力(end=''で改行なし)
        print(token.findtext('word'), end='')

        # 置換の終わりなら閉じカッコを挿入(end=''で改行なし)
        if int(token_id) == end:
            print(')', end='')
            end = 0
            
        # 文末(ピリオド)などの前にスペースが付加されることは気にしない(end=''で改行なし)
        print(' ', end='')

    print()     # sentence単位で改行

Natural language processing From Wikipedia , the free encyclopedia Natural language processing -LRB- NLP -RRB- is a field of computer science) , artificial intelligence , and linguistics concerned with the interactions between computers and human -LRB- natural -RRB- languages . 
As such , 「NLP」 (NLP) is related to the area of humani-computer interaction . 
Many challenges in 「NLP」 (NLP) involve natural language understanding , that is , enabling computers to derive meaning from human or natural language input , and others involve natural language generation . 
History The history of 「NLP」 (NLP) generally starts in the 1950s , although work can be found from earlier periods . 
In 1950 , Alan Turing published an article titled `` Computing Machinery and Intelligence '' which proposed what is now called the Turing test as a criterion of intelligence . 
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English . 
The authors cl