空白を単語の区切りとみなし，50の出力を入力として受け取り，1行1単語の形式で出力せよ．ただし，文の終端では空行を出力せよ．

In [1]:
import re

In [2]:
fname = './nlp.txt'

In [3]:
def nlp_lines():
    '''nlp.txtを1文ずつ読み込むジェネレータ
    nlp.txtを順次読み込んで1文ずつ返す

    戻り値：
    1文の文字列
    '''
    with open(fname) as lines:

        # 文切り出しの正規表現コンパイル
        pattern = re.compile(r'''
            (
                ^                   # 行頭
                .*?                 # 任意のn文字、最少マッチ
                [\.|;|:|\?|!]       # . or ; or : or ? or !
            )
            \s                      # 空白文字
            (
                [A-Z].*             # 英大文字以降（＝次の文以降)

            )
        ''', re.MULTILINE + re.VERBOSE + re.DOTALL)
        for line in lines:

            line = line.strip()     # 前後の空白文字除去
            while len(line) > 0:

                # 行から1文を取得
                match = pattern.match(line)
                if match:

                    # 切り出した文を返す
                    yield match.group(1)        # 先頭の文
                    line = match.group(2)       # 次の文以降

                else:
                    
                    # 区切りがないので、最後までが1文(表題などピリオド等で終わらない行)
                    yield line
                    line = ''

In [4]:
def nlp_words():
    '''nlp.txtを1単語ずつ返すジェネレータ
    文の終わりでは空文字を返す。

    戻り値：
    1単語、ただし文の終わりでは空文字を返す
    '''
    for line in nlp_lines():

        # 単語に分解、終端の区切り文字は除去して返す(スペース連続を考慮してオリジナルから変更)
        # for word in line.split(' '):
        for word in re.split(" +", line):
            yield word.rstrip('.,;:?!')

        # 文の終わりは空文字
        yield ''

In [5]:
# 読み込み
for word in nlp_words():
    print(word)

Natural
language
processing

From
Wikipedia
the
free
encyclopedia

Natural
language
processing
(NLP)
is
a
field
of
computer
science
artificial
intelligence
and
linguistics
concerned
with
the
interactions
between
computers
and
human
(natural)
languages

As
such
NLP
is
related
to
the
area
of
humani-computer
interaction

Many
challenges
in
NLP
involve
natural
language
understanding
that
is
enabling
computers
to
derive
meaning
from
human
or
natural
language
input
and
others
involve
natural
language
generation

History

The
history
of
NLP
generally
starts
in
the
1950s
although
work
can
be
found
from
earlier
periods

In
1950
Alan
Turing
published
an
article
titled
"Computing
Machinery
and
Intelligence"
which
proposed
what
is
now
called
the
Turing
test
as
a
criterion
of
intelligence

The
Georgetown
experiment
in
1954
involved
fully
automatic
translation
of
more
than
sixty
Russian
sentences
into
English

The
authors
claimed
that
within
three
or
five
years
machine
translation
would
be
a
solved
