# Генератор упражнений по английскому языку

## Задание

**Цель** - создать веб-приложение, которое автоматически преобразует предоставленный текст в упражнения по английскому языку, используя технологии обработки естественного языка (NLP).

**Задачи**

- Изучить имеющиеся примеры упражнений;
- Разработать методы преобразования текста в подобные задания;
- Разработать модуль (класс или набор функций), осуществляющий преобразование текста;
- Создать приложение для демонстрации заказчику, например, с помощью платформы Streamlit.

**Исходные данные**

Little Red Cap. Jacob and Wilhelm Grimm - Source: "Rothkäppchen," Kinder- und Hausmärchen, 1st ed. (Berlin: Realschulbuchhandlung, 1812), v. 1, no. 26, pp. 113-18. Translated by D. L. Ashliman.

## Парсинг данных

In [1]:
import re
import random

import pandas as pd
import gensim.downloader as api
import spacy
import en_core_web_sm
import contractions
import pyinflect

from sentence_splitter import SentenceSplitter
from langdetect import detect
from word_forms.word_forms import get_word_forms

In [2]:
def read_text(file_path):
    with open(file_path, encoding='utf-8') as file:
        text = file.read()
    return text

In [3]:
text = read_text('Little_Red_Cap_Jacob_and_Wilhelm_Grimm.txt')

Выполним сегментацию - разделим текст на предложения.

In [4]:
splitter = SentenceSplitter(language='en')
sentences = splitter.split(text=text)

* `sentence` - исходное предложение,
* `type` - тип упражнения,
* `description` - задание,
* `object` - объект задания,
* `response_options` - варианты ответа,
* `right_answer` - правильный ответ.

In [5]:
df = pd.DataFrame(columns=['sentence', 'type', 'description', 'object', 'response_options', 'right_answer'])

In [6]:
sentences = list(filter(lambda x: x, sentences))

In [7]:
df['sentence'] = sentences

In [8]:
df

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Little Red Cap,,,,,
1,Jacob and Wilhelm Grimm,,,,,
2,Once upon a time there was a sweet little girl.,,,,,
3,"Everyone who saw her liked her, but most of al...",,,,,
4,Once she gave her a little cap made of red vel...,,,,,
...,...,...,...,...,...,...
107,Marie Hassenpflug (1788-1856) provided them wi...,,,,,
108,The German title of this tale is Rotkäppchen (...,,,,,
109,Link to an English translation of the Grimms' ...,,,,,
110,Link to the German text of the Grimms' final v...,,,,,


## Разработка заданий

In [9]:
# загрузка предварительно обученной модели с помощью API Gensim Downloader
model = api.load('glove-wiki-gigaword-100')

In [10]:
# a general-purpose model with tagging, parsing, lemmatization and named entity recognition
# малая модель spacy
nlp = spacy.load('en_core_web_sm')

### Выбор пропущенного слова

#### Выбор правильной формы глагола

In [11]:
def select_verb_form(row):
    token_list = [token for token in nlp(row['sentence']) if token.pos_ == 'VERB']
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
    
    try:
        word = random.choice(token_list).text
        response_options = list(get_word_forms(word.lower())['v'])
        if not response_options:
            return row
        response_options = [w for w in response_options if ' not' not in w]
        if word.istitle():
            response_options = [w.title() for w in response_options]
        random.shuffle(response_options)

        row['type'] = 'select_verb_form'
        row['description'] = 'Выберите глагол в правильной форме'
        row['object'] = re.sub(f'\\b{word}\\b', '_____', row['sentence'], count=1)
        row['response_options'] = response_options
        row['right_answer'] = word
    except:
        pass
    return row

In [12]:
df_1 = (df
        .copy()
        .apply(select_verb_form, axis=1)
        .dropna().reset_index(drop=True))

In [13]:
df_1.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Once upon a time there was a sweet little girl.,select_verb_form,Выберите глагол в правильной форме,Once upon a time there _____ a sweet little girl.,"[was, are, being, am, isn't, weren't, aren't, ...",was
1,"Everyone who saw her liked her, but most of al...",select_verb_form,Выберите глагол в правильной форме,"Everyone who saw her liked her, but most of al...","[knowing, knew, knows, known, know]",know
2,Once she gave her a little cap made of red vel...,select_verb_form,Выберите глагол в правильной форме,Once she gave her a little cap _____ of red ve...,"[makes, make, made, making]",made
3,"Because it suited her so well, and she wanted ...",select_verb_form,Выберите глагол в правильной форме,"Because it suited her so well, and she _____ t...","[wanted, want, wants, wanting]",wanted
4,"One day her mother said to her, ""Come Little R...",select_verb_form,Выберите глагол в правильной форме,"One day her mother _____ to her, ""Come Little ...","[said, saying, says, say]",said


In [14]:
df_1.shape

(93, 6)

#### Выбор вспомогательного глагола

In [15]:
def select_auxiliary_verb(row):
    token_list = [token for token in nlp(contractions.fix(row['sentence'])) if token.pos_ == 'AUX']
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
    
    try:
        word = random.choice(token_list).text
        response_options = list(get_word_forms(word.lower())['v'])
        if not response_options:
            return row
        response_options = [w for w in response_options if not("n't" in w or ' not' in w)]
        
        if word.istitle():
            response_options = [w.title() for w in response_options]
        random.shuffle(response_options)

        row['type'] = 'select_auxiliary_verb'
        row['description'] = 'Выберите вспомогательный глагол'
        obj = re.sub(r'\b(can)(not)\b', r'\1 \2', contractions.fix(row['sentence']), flags=re.IGNORECASE)
        row['object'] = re.sub(f'\\b{word}\\b', '_____', obj, count=1)
        row['response_options'] = response_options
        row['right_answer'] = word
    except:
        pass
    return row

In [16]:
df_2 = (df
        .copy()
        .apply(select_auxiliary_verb, axis=1)
        .dropna().reset_index(drop=True))

In [17]:
df_2.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,"Everyone who saw her liked her, but most of al...",select_auxiliary_verb,Выберите вспомогательный глагол,"Everyone who saw her liked her, but most of al...","[do, did, does, done, doing]",did
1,"Because it suited her so well, and she wanted ...",select_auxiliary_verb,Выберите вспомогательный глагол,"Because it suited her so well, and she wanted ...","[being, be, am, is, been, are, were, was]",be
2,Here is a piece of cake and a bottle of wine.,select_auxiliary_verb,Выберите вспомогательный глагол,Here _____ a piece of cake and a bottle of wine.,"[am, be, were, is, being, been, was, are]",is
3,"She is sick and weak, and they will do her well.",select_auxiliary_verb,Выберите вспомогательный глагол,"She is sick and weak, and they _____ do her well.","[will, willing, willed]",will
4,"She did not know what a wicked animal he was, ...",select_auxiliary_verb,Выберите вспомогательный глагол,She did not know what a wicked animal he _____...,"[was, be, being, been, are, were, is, am]",was


In [18]:
df_2.shape

(49, 6)

#### Выбор подходящего по смыслу слова

In [19]:
def select_similar_word(row):
    
    token_list = [token for token in nlp(row['sentence']) if token.pos_ in ['NOUN', 'ADV', 'ADJ']]
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
    
    try:
        word = random.choice(token_list).text
        response_options = [tup[0] for tup in model.similar_by_word(word.lower(), topn=3)] + [word]
        if word.istitle():
            response_options = [w.title() for w in response_options]
        random.shuffle(response_options)

        row['type'] = 'select_word_from_similar_words'
        row['description'] = 'Выберите подходящее по смыслу слово'
        row['object'] = re.sub(f'\\b{word}\\b', '_____', row['sentence'], count=1)
        row['response_options'] = response_options
        row['right_answer'] = word
    except:
        pass
    return row

In [20]:
df_3 = (df
        .copy()
        .apply(select_similar_word, axis=1)
        .dropna().reset_index(drop=True))

In [21]:
df_3.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Little Red Cap,select_word_from_similar_words,Выберите подходящее по смыслу слово,_____ Red Cap,"[Little, Too, Much, Bit]",Little
1,Once upon a time there was a sweet little girl.,select_word_from_similar_words,Выберите подходящее по смыслу слово,Once upon a _____ there was a sweet little girl.,"[before, this, time, when]",time
2,"Everyone who saw her liked her, but most of al...",select_word_from_similar_words,Выберите подходящее по смыслу слово,"Everyone who saw her liked her, but most of al...","[last, coming, next, week]",next
3,Once she gave her a little cap made of red vel...,select_word_from_similar_words,Выберите подходящее по смыслу слово,Once she gave her a little cap made of _____ v...,"[red, green, blue, yellow]",red
4,"Because it suited her so well, and she wanted ...",select_word_from_similar_words,Выберите подходящее по смыслу слово,"Because it suited her _____ well, and she want...","[too, but, even, so]",so


In [22]:
df_3.shape

(103, 6)

### Заполнение пропущенного слова

#### Заполнение пропущенного вспомогательного глагола

In [23]:
def fill_missing_aux(row):
    
    token_list = [token for token in nlp(contractions.fix(row['sentence'])) if token.pos_ == 'AUX']
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
 
    word = random.choice(token_list).text
    
    row['type'] = 'fill_missing_aux'
    row['description'] = 'Впишите пропущенный вспомогательный глагол'
    obj = re.sub(r'\b(can)(not)\b', r'\1 \2', contractions.fix(row['sentence']), flags=re.IGNORECASE)
    row['object'] = re.sub(fr'\b{word}\b', '_____', obj, count=1)
    row['response_options'] = []
    row['right_answer'] = word

    return row

In [24]:
df_4 = (df
        .copy()
        .apply(fill_missing_aux, axis=1)
        .dropna().reset_index(drop=True))

In [25]:
df_4.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,"Everyone who saw her liked her, but most of al...",fill_missing_aux,Впишите пропущенный вспомогательный глагол,"Everyone who saw her liked her, but most of al...",[],did
1,"Because it suited her so well, and she wanted ...",fill_missing_aux,Впишите пропущенный вспомогательный глагол,"Because it suited her so well, and she wanted ...",[],be
2,Here is a piece of cake and a bottle of wine.,fill_missing_aux,Впишите пропущенный вспомогательный глагол,Here _____ a piece of cake and a bottle of wine.,[],is
3,"She is sick and weak, and they will do her well.",fill_missing_aux,Впишите пропущенный вспомогательный глагол,"She is sick and weak, and they _____ do her well.",[],will
4,"Behave yourself on the way, and do not leave t...",fill_missing_aux,Впишите пропущенный вспомогательный глагол,"Behave yourself on the way, and do not leave t...",[],be


In [26]:
df_4.shape

(55, 6)

In [27]:
df_4.right_answer.unique()

array(['did', 'be', 'is', 'will', 'was', 'are', 'should', 'does', 'must',
       'have', 'do', 'could', 'am', 'had', 'has', 'were', 'been', 'can'],
      dtype=object)

#### Заполнение пропущенного определителя

In [28]:
def fill_missing_det(row):
    
    token_list = [token for token in nlp(row['sentence']) if token.tag_ == 'DT']
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
 
    word = random.choice(token_list).text
    
    row['type'] = 'fill_missing_det'
    row['description'] = 'Впишите пропущенный определитель'
    row['object'] = re.sub(fr'\b{word}\b', '_____', row['sentence'], count=1)
    row['response_options'] = []
    row['right_answer'] = word

    return row

In [29]:
df_5 = (df
        .copy()
        .apply(fill_missing_det, axis=1)
        .dropna().reset_index(drop=True))

In [30]:
df_5.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Once upon a time there was a sweet little girl.,fill_missing_det,Впишите пропущенный определитель,Once upon _____ time there was a sweet little ...,[],a
1,"Everyone who saw her liked her, but most of al...",fill_missing_det,Впишите пропущенный определитель,"Everyone who saw her liked her, but most of al...",[],the
2,Once she gave her a little cap made of red vel...,fill_missing_det,Впишите пропущенный определитель,Once she gave her _____ little cap made of red...,[],a
3,"Because it suited her so well, and she wanted ...",fill_missing_det,Впишите пропущенный определитель,"Because it suited her so well, and she wanted ...",[],the
4,Here is a piece of cake and a bottle of wine.,fill_missing_det,Впишите пропущенный определитель,Here is _____ piece of cake and a bottle of wine.,[],a


In [31]:
df_5.shape

(79, 6)

In [32]:
df_5.right_answer.unique()

array(['a', 'the', 'The', 'some', 'an', 'all', 'that', 'this', 'A'],
      dtype=object)

#### Заполнение пропущенного притяжательного местоимения

In [33]:
def fill_missing_prp(row):
    
    token_list = [token for token in nlp(row['sentence']) if token.tag_ == 'PRP$']
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
 
    word = random.choice(token_list).text
    
    row['type'] = 'fill_missing_prp'
    row['description'] = 'Впишите пропущенное притяжательное местоимение'
    row['object'] = re.sub(fr'\b{word}\b', '_____', row['sentence'], count=1)
    row['response_options'] = []
    row['right_answer'] = word

    return row

In [34]:
df_6 = (df
        .copy()
        .apply(fill_missing_prp, axis=1)
        .dropna().reset_index(drop=True))

In [35]:
df_6.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,"Everyone who saw her liked her, but most of al...",fill_missing_prp,Впишите пропущенное притяжательное местоимение,"Everyone who saw _____ liked her, but most of ...",[],her
1,"One day her mother said to her, ""Come Little R...",fill_missing_prp,Впишите пропущенное притяжательное местоимение,"One day _____ mother said to her, ""Come Little...",[],her
2,Take them to your grandmother.,fill_missing_prp,Впишите пропущенное притяжательное местоимение,Take them to _____ grandmother.,[],your
3,Mind your manners and give her my greetings.,fill_missing_prp,Впишите пропущенное притяжательное местоимение,Mind _____ manners and give her my greetings.,[],your
4,"Behave yourself on the way, and do not leave t...",fill_missing_prp,Впишите пропущенное притяжательное местоимение,"Behave yourself on the way, and do not leave t...",[],your


In [36]:
df_6.shape

(22, 6)

In [37]:
df_6.right_answer.unique()

array(['her', 'your', 'Her', 'my', 'his'], dtype=object)

### Структура предложения

#### Группа слов существительного

In [38]:
def determine_type_of_noun_phrases(row):
    chunk_list = [ch for ch in nlp(row['sentence']).noun_chunks if len(ch) > 2]
    if len(chunk_list) < 2 or detect(row['sentence']) != 'en':
        return row
    noun_chunk = random.choice(chunk_list)
    response_options = list({spacy.explain(ch.root.dep_) for ch in nlp(row['sentence']).noun_chunks})
    if len(response_options) < 2:
        return row
    random.shuffle(response_options)
    
    row['type'] = 'base_noun_phrases'
    row['description'] = 'Чем является главное существительное в выделенной фразе'
    row['object'] = re.sub(fr'\b({noun_chunk.text})\b', r'**\1**', row['sentence'])
    row['response_options'] = response_options
    row['right_answer'] = spacy.explain(noun_chunk.root.dep_)
    return row

In [39]:
df_7 = (df
        .copy()
        .apply(determine_type_of_noun_phrases, axis=1)
        .dropna().reset_index(drop=True))

In [40]:
df_7.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,"""Her house is a good quarter hour from here in...",base_noun_phrases,Чем является главное существительное в выделен...,"""Her house is a good quarter hour from here in...","[attribute, object of preposition, nominal sub...",object of preposition
1,"""Oh, grandmother, what big ears you have!""",base_noun_phrases,Чем является главное существительное в выделен...,"""**Oh, grandmother**, what big ears you have!""","[direct object, nominal subject]",direct object
2,"""Oh, grandmother, what big eyes you have!""",base_noun_phrases,Чем является главное существительное в выделен...,"""Oh, grandmother, **what big eyes** you have!""","[direct object, nominal subject]",direct object
3,"""Oh, grandmother, what big hands you have!""",base_noun_phrases,Чем является главное существительное в выделен...,"""Oh, grandmother, **what big hands** you have!""","[nominal subject, direct object]",direct object
4,"""Oh, grandmother, what a horribly big mouth yo...",base_noun_phrases,Чем является главное существительное в выделен...,"""**Oh, grandmother**, what a horribly big mout...","[nominal subject, direct object]",direct object


In [41]:
df_7.shape

(14, 6)

In [42]:
df_7.right_answer.unique()

array(['object of preposition', 'direct object', 'nominal subject',
       'attribute', 'appositional modifier'], dtype=object)

#### Части речи

In [43]:
def restore_order_of_parts_of_speech(row):
    sentence = re.sub('"', '', contractions.fix(row['sentence']))
    token_list = [spacy.explain(token.pos_) for token in nlp(sentence)]
    if len(token_list) < 3 or len(token_list) > 10 or detect(row['sentence']) != 'en':
        return row

    response_options = token_list[:]
    random.shuffle(response_options)
    
    row['type'] = 'part_of_speech'
    row['description'] = 'Восстановите порядок следования частей речи в предложении'
    row['object'] = sentence
    row['response_options'] = response_options
    row['right_answer'] = token_list
    return row

In [44]:
df_8 = (df
        .copy()
        .apply(restore_order_of_parts_of_speech, axis=1)
        .dropna().reset_index(drop=True))

In [45]:
df_8.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Little Red Cap,part_of_speech,Восстановите порядок следования частей речи в ...,Little Red Cap,"[proper noun, proper noun, adjective]","[adjective, proper noun, proper noun]"
1,Jacob and Wilhelm Grimm,part_of_speech,Восстановите порядок следования частей речи в ...,Jacob and Wilhelm Grimm,"[coordinating conjunction, proper noun, proper...","[proper noun, coordinating conjunction, proper..."
2,Take them to your grandmother.,part_of_speech,Восстановите порядок следования частей речи в ...,Take them to your grandmother.,"[adposition, pronoun, noun, verb, punctuation,...","[verb, pronoun, adposition, pronoun, noun, pun..."
3,Mind your manners and give her my greetings.,part_of_speech,Восстановите порядок следования частей речи в ...,Mind your manners and give her my greetings.,"[pronoun, noun, pronoun, coordinating conjunct...","[verb, pronoun, noun, coordinating conjunction..."
4,Little Red Cap promised to obey her mother.,part_of_speech,Восстановите порядок следования частей речи в ...,Little Red Cap promised to obey her mother.,"[proper noun, verb, punctuation, proper noun, ...","[adjective, proper noun, proper noun, verb, pa..."


In [46]:
df_8.shape

(37, 6)

#### Расстановка слов в правильном порядке

In [47]:
def restore_word_order(row):
    sentence = re.sub('"', '', contractions.fix(row['sentence']))
    token_list = [token.text for token in nlp(sentence)]
    if len(token_list) < 3 or len(token_list) > 9 or detect(row['sentence']) != 'en':
        return row

    response_options = token_list[:]
    random.shuffle(response_options)
    
    row['type'] = 'word_order'
    row['description'] = 'Расставьте слова предложения в правильном порядке'
    row['object'] = ' '
    row['response_options'] = response_options
    row['right_answer'] = token_list
    return row

In [48]:
df_9 = (df
        .copy()
        .apply(restore_word_order, axis=1)
        .dropna().reset_index(drop=True))

In [49]:
df_9.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Little Red Cap,word_order,Расставьте слова предложения в правильном порядке,,"[Cap, Little, Red]","[Little, Red, Cap]"
1,Jacob and Wilhelm Grimm,word_order,Расставьте слова предложения в правильном порядке,,"[and, Grimm, Wilhelm, Jacob]","[Jacob, and, Wilhelm, Grimm]"
2,Take them to your grandmother.,word_order,Расставьте слова предложения в правильном порядке,,"[Take, grandmother, them, ., your, to]","[Take, them, to, your, grandmother, .]"
3,Mind your manners and give her my greetings.,word_order,Расставьте слова предложения в правильном порядке,,"[greetings, ., her, Mind, give, my, and, your,...","[Mind, your, manners, and, give, her, my, gree..."
4,Little Red Cap promised to obey her mother.,word_order,Расставьте слова предложения в правильном порядке,,"[to, mother, obey, Little, ., Cap, her, Red, p...","[Little, Red, Cap, promised, to, obey, her, mo..."


In [50]:
df_9.shape

(28, 6)

### Выбор верного предложения

In [51]:
def select_correct_sentence(row):
    sentence = re.sub('"', '', contractions.fix(row['sentence']))
    sentence = re.sub(r'\b(can)(not)\b', r'\1 \2', sentence, flags=re.IGNORECASE)
    sentence_1, sentence_2 = sentence, sentence

    token_list = [token.text for token in nlp(sentence) if token.pos_ in ['VERB', 'AUX']]
    if len(token_list) == 0 or detect(row['sentence']) != 'en':
        return row
    
    try:
        random.shuffle(token_list)
        word_count = 0
        while word_count < len(token_list) and word_count <= 1:
            words = list(get_word_forms(token_list[word_count].lower())['v'])
            if not words:
                return row
            words = [w for w in words if not("n't" in w or ' not' in w or w == token_list[word_count].lower())]
            if len(words) < 2:
                return row

            if token_list[word_count].istitle():
                words = [w.title() for w in words]
            random.shuffle(words)

            sentence_1 = re.sub(fr'\b{token_list[word_count]}\b', words[0], sentence_1, count=1)
            sentence_2 = re.sub(fr'\b{token_list[word_count]}\b', words[1], sentence_2, count=1)
            word_count += 1

        response_options = [sentence, sentence_1, sentence_2]
        random.shuffle(response_options)
        row['type'] = 'select_sentence'
        row['description'] = 'Выберите верное предложение'
        row['object'] = ' '
        row['response_options'] = response_options
        row['right_answer'] = sentence
    except:
        pass
    return row

In [52]:
df_10 = df.copy().apply(select_correct_sentence, axis=1)
df_10

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Little Red Cap,,,,,
1,Jacob and Wilhelm Grimm,,,,,
2,Once upon a time there was a sweet little girl.,select_sentence,Выберите верное предложение,,[Once upon a time there was a sweet little gir...,Once upon a time there was a sweet little girl.
3,"Everyone who saw her liked her, but most of al...",select_sentence,Выберите верное предложение,,"[Everyone who saw her likes her, but most of a...","Everyone who saw her liked her, but most of al..."
4,Once she gave her a little cap made of red vel...,select_sentence,Выберите верное предложение,,[Once she gave her a little cap made of red ve...,Once she gave her a little cap made of red vel...
...,...,...,...,...,...,...
107,Marie Hassenpflug (1788-1856) provided them wi...,select_sentence,Выберите верное предложение,,[Marie Hassenpflug (1788-1856) provided them w...,Marie Hassenpflug (1788-1856) provided them wi...
108,The German title of this tale is Rotkäppchen (...,select_sentence,Выберите верное предложение,,[The German title of this tale was Rotkäppchen...,The German title of this tale is Rotkäppchen (...
109,Link to an English translation of the Grimms' ...,select_sentence,Выберите верное предложение,,[Linking to an English translation of the Grim...,Link to an English translation of the Grimms' ...
110,Link to the German text of the Grimms' final v...,,,,,


In [53]:
df_10 = (df
        .copy()
        .apply(select_correct_sentence, axis=1)
        .dropna().reset_index(drop=True))

In [54]:
df_10.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Once upon a time there was a sweet little girl.,select_sentence,Выберите верное предложение,,[Once upon a time there am a sweet little girl...,Once upon a time there was a sweet little girl.
1,"Everyone who saw her liked her, but most of al...",select_sentence,Выберите верное предложение,,"[Everyone who saw her likes her, but most of a...","Everyone who saw her liked her, but most of al..."
2,Once she gave her a little cap made of red vel...,select_sentence,Выберите верное предложение,,[Once she given her a little cap makes of red ...,Once she gave her a little cap made of red vel...
3,"Because it suited her so well, and she wanted ...",select_sentence,Выберите верное предложение,,"[Because it suited her so well, and she wantin...","Because it suited her so well, and she wanted ..."
4,"One day her mother said to her, ""Come Little R...",select_sentence,Выберите верное предложение,,"[One day her mother said to her, Come Little R...","One day her mother said to her, Come Little Re..."


In [55]:
df_10.shape

(98, 6)

### Объединение данных в один датафрейм

In [56]:
data = pd.concat([df_1, df_2, df_3, df_4, df_5, df_6, df_7, df_8, df_9, df_10], ignore_index=True)
data.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Once upon a time there was a sweet little girl.,select_verb_form,Выберите глагол в правильной форме,Once upon a time there _____ a sweet little girl.,"[was, are, being, am, isn't, weren't, aren't, ...",was
1,"Everyone who saw her liked her, but most of al...",select_verb_form,Выберите глагол в правильной форме,"Everyone who saw her liked her, but most of al...","[knowing, knew, knows, known, know]",know
2,Once she gave her a little cap made of red vel...,select_verb_form,Выберите глагол в правильной форме,Once she gave her a little cap _____ of red ve...,"[makes, make, made, making]",made
3,"Because it suited her so well, and she wanted ...",select_verb_form,Выберите глагол в правильной форме,"Because it suited her so well, and she _____ t...","[wanted, want, wants, wanting]",wanted
4,"One day her mother said to her, ""Come Little R...",select_verb_form,Выберите глагол в правильной форме,"One day her mother _____ to her, ""Come Little ...","[said, saying, says, say]",said


In [57]:
data.shape

(578, 6)

In [58]:
# сохранение датафрейма с заданиями
# data.to_json('data_js.json')

Разработанный набор функций сохранен в файле `english_exercises.py`.

In [59]:
from english_exercises import (select_verb_form,
                               select_auxiliary_verb,
                               select_similar_word,
                               fill_missing_aux,
                               fill_missing_det,
                               fill_missing_prp,
                               determine_type_of_noun_phrases,
                               restore_order_of_parts_of_speech,
                               restore_word_order,
                               select_correct_sentence)

def create_df(text):
    splitter = SentenceSplitter(language='en')
    sentences = splitter.split(text=text)
    sentences = list(filter(lambda x: x, sentences))
    df = pd.DataFrame(columns=['sentence', 'type', 'description', 'object', 'response_options', 'right_answer'])
    df['sentence'] = sentences
    df_1 = df.copy().apply(select_verb_form, axis=1).dropna().reset_index(drop=True)
    df_2 = df.copy().apply(select_auxiliary_verb, axis=1).dropna().reset_index(drop=True)
    df_3 = df.copy().apply(select_similar_word, axis=1).dropna().reset_index(drop=True)
    df_4 = df.copy().apply(fill_missing_aux, axis=1).dropna().reset_index(drop=True)
    df_5 = df.copy().apply(fill_missing_det, axis=1).dropna().reset_index(drop=True)
    df_6 = df.copy().apply(fill_missing_prp, axis=1).dropna().reset_index(drop=True)
    df_7 = df.copy().apply(determine_type_of_noun_phrases, axis=1).dropna().reset_index(drop=True)
    df_8 = df.copy().apply(restore_order_of_parts_of_speech, axis=1).dropna().reset_index(drop=True)
    df_9 = df.copy().apply(restore_word_order, axis=1).dropna().reset_index(drop=True)
    df_10 = df.copy().apply(select_correct_sentence, axis=1).dropna().reset_index(drop=True)
    return pd.concat([df_1, df_2, df_3, df_4, df_5, df_6, df_7, df_8, df_9, df_10], ignore_index=True)

In [60]:
c_data = create_df(text)
c_data.head()

Unnamed: 0,sentence,type,description,object,response_options,right_answer
0,Once upon a time there was a sweet little girl.,select_verb_form,Выберите глагол в правильной форме,Once upon a time there _____ a sweet little girl.,"[isn't, am, aren't, be, being, been, are, were...",was
1,"Everyone who saw her liked her, but most of al...",select_verb_form,Выберите глагол в правильной форме,"Everyone who saw her liked her, but most of al...","[give, giving, given, gave, gives]",give
2,Once she gave her a little cap made of red vel...,select_verb_form,Выберите глагол в правильной форме,Once she _____ her a little cap made of red ve...,"[gave, gives, giving, give, given]",gave
3,"Because it suited her so well, and she wanted ...",select_verb_form,Выберите глагол в правильной форме,"Because it suited her so well, and she wanted ...","[known, knows, know, knowing, knew]",known
4,"One day her mother said to her, ""Come Little R...",select_verb_form,Выберите глагол в правильной форме,"One day her mother said to her, ""_____ Little ...","[Coming, Come, Came, Comes]",Come


In [61]:
c_data.shape

(584, 6)

In [62]:
c_data['type'].unique()

array(['select_verb_form', 'select_auxiliary_verb',
       'select_word_from_similar_words', 'fill_missing_aux',
       'fill_missing_det', 'fill_missing_prp', 'base_noun_phrases',
       'part_of_speech', 'word_order', 'select_sentence'], dtype=object)

## Вывод

Было создано веб-приложение с помощью библиотеки Streamlit, которое из англоязычного текста формирует упражнения. Это приложение дает возможность пользователям практиковать английский язык на основе их любимых произведений.

На данный момент реализовано генерирование нескольких упражнений, а именно:

- на выбор пропущенного слова в предложении:
    - правильной формы глагола,
    - вспомогательного глагола,
    - подходящего по смыслу слова,
- на заполнение пропусков в предложении:
    - вспомогательным глаголом,
    - определителем,
    - притяжательным местоимением,
- на структуру предложения:
    - тип главного существительного в именной группе (noun phrase),
    - части речи слов в предложении,
    - порядок слов в предложении.

На основе текста "Красная Шапочка" братьев Гримм был подготовлен датасет, включающий каждое предложение, тип упражнения, описание упражнения, преобразованное предложение (само задание), варианты ответов и правильный ответ. Для каждого предложения проверялась возможность генерирования всех разработанных упражнений. Если пользователь не введет свой текст, по умолчанию будет загружен датасет на основе текста "Красная Шапочка".

Набор функций, осуществляющий преобразования текста, находится в файле `english_exercises.py`. Само приложение - `english_exercises_app.py`. Посмотреть работу приложения можно через платформу Streamlit.