#Избор на модел

Както вече стана ясно, оценяването на "разбиране" става чрез оценка дали на базата на оценявания текст може да се отговори на избрани въпроси. За целта е необходим "модел", който получава текст и въпроси и изчислява за всеки въпрос дали текста е достатъчен за да може да му се отговори. Такъв модел е сложен за реализиране и няма еднозначно решение. Има множество различни варианти на модели, които могат да постигнат апроксимация на валидно решение. Изборът на "най-подходящ" модел измежду множество такива е необходимо да се осъществи чрез оценящи метрики. В конкретния случай релевантните метрики са:

1. **True positives (TP)**: Това са случаите, при които моделът правилно предсказва позитивен резултат. Например, ако имаме тест за заболяване, TP брои колко пъти моделът правилно предсказва, че даден човек има заболяването.

2. **False positives (FP)**: Това са случаите, при които моделът грешно предсказва позитивен резултат. Използвайки същия пример, FP брои колко пъти моделът грешно предсказва, че някой има заболяването, когато всъщност не го има.

3. **True negatives (TN)**: Това са случаите, при които моделът правилно предсказва негативен резултат. Това брои колко пъти моделът правилно предсказва, че даден човек НЕ има заболяването.

4. **False negatives (FN)**: Това са случаите, при които моделът грешно предсказва негативен резултат. Това брои колко пъти моделът грешно предсказва, че човек не има заболяването, когато всъщност го има.

5. **Accuracy (Точност)**: Това е процентът на правилно предсказаните случаи от общия брой примери. Изчислява се по следната формула: Accuracy = (TP + TN) / (TP + TN + FP + FN).

6. **Positive Predictive Value (Precision, Прецизност)**: Колко от предсказаните като позитивни случаи са наистина позитивни. Изчислява се по формулата: Precision = TP / (TP + FP).

7. **True Positive Rate (Recall, Sensitivity, Чувствителност)**: От всички истински позитивни случаи, колко са били правилно предсказани като такива. Формула: Recall = TP / (TP + FN).

8. **True Negative Rate (Specificity, Специфичност)**: От всички истински негативни случаи, колко са били правилно предсказани като такива. Формула: Specificity = TN / (TN + FP).

9. **Negative Predictive Value**: От всички предсказани като негативни случаи, колко наистина са негативни. Формула: NPV = TN / (TN + FN).

10. **F1 score (F1 резултат)**: Хармонично средно между прецизността и чувствителността. Той се използва за балансиране на двата аспекта и е особено полезен, когато класовете са несбалансирани. Формула: F1 = 2 * ((Precision * Recall) / (Precision + Recall)).

11. **Macro-average F1 Score**: Средната стойност на F1 резултатите за всеки клас, когато имаме повече от два класа. Това означава, че изчисляваме F1 резултат за всеки клас независимо и след това взимаме средната стойност на тези резултати.

За изчисляването на горните метрики за всеки модел са нужни дефинирани набор от тестове, за които ще се тества модела. Всеки такъв текст съдържа следното:
1. **Текст (масив от изречения)**
2. **Въпроси (масив от въпроси)**
3. **Отговори (масив от изречения от текста, които за нужни за отговор на въпросите)**

(Конкретните тестове се намират в Приложение 1, ....)

(Изчисляването на метриките за моделите на база на тестовете се намират в Приложение 1, ....)

След като се изчислят всички метрики за всеки един от моделите на база на тестовете, се придобива представа как се представят различните имплементации и коя имплементация на модел би била най-добър избор.

(Резултатите от тестовете се намират Приложение 1, ....)

Следват описание на моделите подредени по нисходящ ред на база на F-score. Тоест първият описан модел е с най-добър F-score, a последния с най-нисък.

1. **Advanced BERT модел**:
  Този код представя модел, който използва естествено-езикова обработка (ЕЕО), за да определи дали даден въпрос може да бъде отговорен, като се използва предоставен текст.
  
  Теория:

  Моделът, който се използва тук, е `bert-large-uncased-whole-word-masking-finetuned-squad`. Той е част от BERT моделите, разработени от Google. BERT (Bidirectional Encoder Representations from Transformers) е превратна точка в областта на естествения език и е бил обучен чрез маскиране на думи в текст, за да предсказва тези думи.

  - **bert-large-uncased**: Говорим за голямото издание на BERT модела, което не различава главни от малки букви ("uncased").
    
  - **whole-word-masking**: По време на обучение, вместо да маскира отделни символи или части от думи, BERT маскира цели думи.

  - **finetuned-squad**: Моделът е фина настройка (или допълнително обучение) върху SQuAD (Stanford Question Answering Dataset). Това означава, че той е специално обучен да отговаря на въпроси на базата на предоставен отрив от текст.

  Обяснение на кода:

  1. **Инициализация**:
      - Името на модела се задава.
      - Токенизаторът (BertTokenizer) се инициализира с предварително обучения модел. Токенизаторът преобразува текстови редове в числови индекси, които моделът може да обработва.
      - BERT моделът се зарежда също с предварително обучената версия.

  2. **Метод `check`**:
      - Методът приема текст и списък с въпроси.
      - За всеки въпрос:
          a. Текстът и въпросът се токенизират и се преобразуват в тензор с `max_length=512`.
          b. Токенизираният вход се подава на BERT модела.
          c. От модела се получават два набора от резултати: `start_logits` и `end_logits`, които представляват вероятностите за всяка позиция в текста да бъде начало или край на отговора на въпроса.
          d. Намират се индексите на позициите с най-висока вероятност за начало и край на отговора.
          e. Ако тези вероятности надвишават зададената прагова стойност (`confidence_threshold`), то въпросът се счита за "отговорим" и съответният отговор се извлича от текста. В противен случай се приема, че няма отговор на въпроса в текста.
          f. Резултатите се добавят към списъка `results`.

2. **Synonym Word Counter модел**:
  Този код представя модел, който използва естествено-езикова обработка (ЕЕО), за да определи дали даден въпрос може да бъде отговорен, като се използва предоставен текст.

  Теория:

  * **Токенизация**: Разбиване на текста на отделни думи или знаци на препинание.
  * **Стоп думи**: Обикновени думи (напр. „и“, „на“, „е“), които обикновено се игнорират при анализ на текст, тъй като те се срещат много често и обикновено не носят много информация.
  * **Лематизация**: Преобразуване на дума към нейната основна форма. Например, "бягащ" става "бягам".
  * **Синоними**: В този контекст, синонимите се използват, за да се определи дали дума от въпроса има смислово близък еквивалент в текста.

  Обяснение на кода:

  1. **__init__()**: Инициализира инстанцията и изтегля необходимите ресурси от `nltk`.
    
  2. **preprocess(text)**:
      - **Токенизация**: Разделя текста на отделни думи.
      - **Филтриране**: Изключва стоп думите и тези, които не са буквено-цифрови.
      - **Лематизация**: Връща думите към тяхната основна форма.

  3. **has_synonym_match(word1, word2)**: Проверява дали двете думи са синоними. Връща `True` ако са, иначе `False`.

  4. **count_matches(text_words, question)**:
      - **Екзактни съвпадения**: Добавя към резултата стойността на `exact_match_weight` за всяка дума от въпроса, която се съдържа в текста.
      - **Синонимни съвпадения**: Ако дума от въпроса не се съдържа директно в текста, програмата използва `wordnet.wup_similarity` за оценка на сходството между думата и всеки от евентуалните й синоними в текста. Това се умножава по `synonym_match_weight`.
      - **Оценка на релевантността**: На края се връща общата релевантност на въпроса спрямо текста.

  5. **check(text, questions)**:
      - Преработва текста и обработва всеки от въпросите, използвайки метода `count_matches`.
      - Добавя към резултатите дали релевантността на въпроса е по-голяма от 0.5.

3. **Base BERT модел (Bidirectional Encoder Representations from Transformers):**
  Този код представлява модел за проверка на "отговаряемост", използващ BERT модела за въпроси и отговори, предварително обучен с датасета SQuAD 2.0.

  Теория:

  1. **BERT (Bidirectional Encoder Representations from Transformers)**: BERT е трансформаторен базиран модел, който е обучен да представя текстове като вградени вектори, като се вземат предвид контекста на всяка дума от двете страни (двупосочно). BERT е показал отлични резултати в редица задачи по естествен език, включително въпроси и отговори.

  2. **SQuAD 2.0 (Stanford Question Answering Dataset)**: Това е датасет за въпроси и отговори, който съдържа въпроси, свързани с параграфи от Уикипедия. Особеността на SQuAD 2.0 е, че включва въпроси, които нямат отговор в предоставения текст. Това позволява обучение на модели, които могат да определят дали даден въпрос има отговор в текста или не.

  Обяснение на кода:

  1. **__init__()**: Зарежда предварително обучените `tokenizer` и BERT модел за въпроси и отговори. Тук използва версията на модела, обучена с датасета SQuAD 2.0.

  2. **check(text, questions)**:
      - За всеки въпрос от списъка:
          - Кодира въпроса и текста с помощта на `tokenizer`, добавя специални токени и ги преобразува в тензори, подходящи за подаване на модела.
          - Използва модела за извличане на вероятностите за началото и края на отговора в текста.
          - Намира индексите на началото и края на най-вероятния отговор.
          - Извлича отговора като подмножество от токените на текста.
          - Проверява дали извлеченият отговор е равен на специалния токен `[CLS]`. Ако е, това означава, че в текста няма отговор на въпроса. В противен случай се приема, че има отговор.
          - Добавя резултата в списъка `answerabilities`.

4. **FAST Text Similarity модел:**
  Този модел използва вградени представяния на думи, генерирани с помощта на `fastText`, за да определи дали даден въпрос може да бъде отговорен с помощта на предоставен текст. Подобно на предишния модел, той използва косинусно подобие за сравнение на векторните представяния. Нека обясним детайлно:

  Теория:

  1. **fastText Embeddings**: fastText е модел за обучение на векторни представяния на думи. Особеността на fastText е, че той представя всяка дума като комбинация от символни n-грами. Това му позволява да генерира вектори дори за думи, които не са в речника му (например, редки или непознати думи).

  2. **Косинусно подобие**: Използвано за изчисляване на подобието между два вектора, както беше обяснено и в предишния отговор.

  Обяснение на кода:

  1. **__init__()**: Зарежда предварително обучения модел `fastText` от дадения файл 'cc.en.300.bin'.

  2. **get_embedding(text)**:
      - Разделя текста на думи.
      - За всяка дума използва `get_word_vector` на модела, за да генерира векторно представяне.
      - Взема средната стойност от всички вектори на думи, за да представи целия текст/изречение като един вектор.
      - Ако няма вектори (например, ако текстът е празен), връща нулев вектор с размерността на модела.

  3. **compute_similarity(text1, text2)**:
      - Генерира векторните представяния на двата текста.
      - Използва косинусно подобие за изчисляване на подобието между двата вектора.

  4. **can_answer(text, question, threshold=0.51)**:
      - Разделя предоставения текст на отделни изречения.
      - Сравнява въпроса с всяко изречение от текста, използвайки функцията за изчисление на подобие.
      - Връща `True`, ако максималното подобие между въпроса и някое от изреченията в текста е по-голямо от зададения праг (threshold).

  5. **check(text, questions)**: Оценява дали всяка от предоставените въпроси може да бъде отговорен с предоставения текст.

5. **Sentence Embedding Similarity модел:**
  Този модел използва векторни вградени (embeddings) представяния на изречения за да определи дали даден въпрос може да бъде отговорен с помощта на предоставен текст. По-специално, той използва модела `SentenceTransformer` с архитектурата 'paraphrase-distilroberta-base-v1' за генериране на векторните представяния. Нека обясним в детайли:

  Теория:

  1. **Sentence Embeddings**: Това са векторни представяния на изречения. Вместо да представяме отделни думи като вектори, можем да получим един вектор, който представя цялото изречение. С `SentenceTransformer`, векторите се обучават така, че семантично подобни изречения да са близо във векторното пространство.

  2. **Косинусно подобие**: Метод за изчисляване на подобие (или различие) между два вектора. Приема стойности между -1 и 1, където 1 означава идентични вектори, а -1 означава напълно различни.

  Обяснение на кода:

  1. **__init__()**: Инициализира модела `SentenceTransformer` с архитектурата 'paraphrase-distilroberta-base-v1'.

  2. **get_embedding(text)**: Генерира и връща векторно представяне (embedding) на предоставеното изречение/текст.

  3. **compute_similarity(text1, text2)**:
      - Изчислява векторните представяния на двата текста.
      - Използва косинусното подобие за да определи колко подобни са двата вектора. Това дава стойност между 0 (различни) и 1 (идентични).

  4. **can_answer(text, question, threshold=0.3)**:
      - Разделя предоставения текст на отделни изречения.
      - Сравнява въпроса с всяко изречение от текста, използвайки функцията за изчисление на подобие.
      - Връща `True`, ако максималното подобие между въпроса и някое от изреченията в текста е по-голямо от зададения праг (threshold).

  5. **check(text, questions)**: Оценява дали всяка от предоставените въпроси може да бъде отговорен с предоставения текст.


##Тестове

In [None]:
test_case_1 = {
    'sentences': [
        "The sun is a fascinating celestial object that many poets have written about.",
        "It's the star at the center of the Solar System.",
        "On a more trivial note, many people enjoy sunbathing during the summer.",
        "The sun is a nearly perfect sphere of hot plasma.",
        "Random fact: tomatoes are one of the most popular fruits used in cuisines worldwide.",
        "This plasma is heated to incandescence by nuclear fusion reactions in its core.",
        "Have you ever thought about the beauty of seashells? They can be so intricate.",
        "The sun radiates energy mainly as light and ultraviolet radiation.",
        "Without the sun, life on Earth would be unimaginable.",
        "Cats are known to nap in sunny spots in households."
    ],
    'questions': [
        "What is at the center of the Solar System?",
        "What type of reactions occur in the sun's core?",
        "How does the sun radiate energy?"
    ],
    'answer': [
        "It's the star at the center of the Solar System.",
        "This plasma is heated to incandescence by nuclear fusion reactions in its core.",
        "The sun radiates energy mainly as light and ultraviolet radiation."
    ]
}

test_case_2 = {
    'sentences': [
        "Elephants are quite unique animals.",
        "They belong to the family Elephantidae and the order Proboscidea.",
        "I've always been a fan of detective novels, especially ones with unexpected plot twists.",
        "Traditionally, two species of elephants are recognized: the African elephant and the Asian elephant.",
        "Speaking of continents, did you know that Antarctica is the driest, windiest, and coldest continent?",
        "Elephants are primarily found in sub-Saharan Africa, South Asia, and Southeast Asia.",
        "It's amazing to think how ancient civilizations like the Egyptians built their empires.",
        "Being herbivores, elephants mainly consume grasses and leaves.",
        "The color blue is often associated with serenity and calmness."
    ],
    'questions': [
        "What do elephants primarily eat?",
        "Where are elephants traditionally found?",
        "To which family do elephants belong?"
    ],
    'answer': [
        "Being herbivores, elephants mainly consume grasses and leaves.",
        "Elephants are primarily found in sub-Saharan Africa, South Asia, and Southeast Asia.",
        "They belong to the family Elephantidae and the order Proboscidea."
    ]
}

test_case_3 = {
    'sentences': [
        "Mount Everest is a challenge many climbers dream of conquering.",
        "It stands as Earth's highest mountain above sea level.",
        "Interestingly, honeybees have five eyes.",
        "This giant is situated in the Mahalangur Himal sub-range of the Himalayas.",
        "There's something truly mesmerizing about the melodies of old jazz songs.",
        "The border between China and Nepal houses this majestic mountain.",
        "On a different note, computer programming requires a lot of patience and persistence.",
        "The elevation of Mount Everest rises to 8,848.86 m (29,031.7 ft).",
        "Do you think squirrels recognize each other when they meet?",
        "The history of exploration related to this mountain is filled with both triumphs and tragedies."
    ],
    'questions': [
        "Where is Mount Everest located?",
        "What is the elevation of Mount Everest?",
        "Is Mount Everest the tallest mountain below sea level?"
    ],
    'answer': [
        "The border between China and Nepal houses this majestic mountain.",
        "The elevation of Mount Everest rises to 8,848.86 m (29,031.7 ft).",
        "It stands as Earth's highest mountain above sea level."
    ]
}

test_case_4 = {
    'sentences': [
        "The world of technology is constantly evolving.",
        "Artificial intelligence (AI) is a testament to that.",
        "It's surprising to think about how much bread is consumed worldwide.",
        "AI is intelligence showcased by machines, contrasting with the natural intelligence humans and animals display.",
        "Next time you have a salad, consider adding some nuts; they're great for texture.",
        'The field of AI is defined as the study of "intelligent agents", devices perceiving their environment and acting to achieve goals.',
        "By the way, the deepest part of the ocean is called the Mariana Trench.",
        "Such agents aim to maximize their chances of success.",
        "On weekends, many people enjoy recreational activities like hiking or reading."
    ],
    'questions': [
        "How does artificial intelligence differ from natural intelligence?",
        'What is an "intelligent agent" in the context of AI?',
        "Are AI agents designed to minimize their chance of success?"
    ],
    'answer': [
        "AI is intelligence showcased by machines, contrasting with the natural intelligence humans and animals display.",
        'The field of AI is defined as the study of "intelligent agents", devices perceiving their environment and acting to achieve goals.',
        "Such agents aim to maximize their chances of success."
    ]
}

test_case_5 = {
    'sentences': [
        "Nature has many wonders, one of which is the Great Barrier Reef.",
        "This reef system is the world's largest, comprising over 2,900 individual reefs.",
        "Bicycles were introduced in the 19th century and are now number approximately one billion worldwide.",
        "Additionally, the reef spans 900 islands, stretching over 2,300 kilometers.",
        "In unrelated news, the Milky Way is a barred spiral galaxy.",
        "The Coral Sea, off the coast of Queensland, Australia, is where the reef is located.",
        "One of my favorite hobbies is stargazing on clear nights.",
        "Researchers and scientists continually study the reef's ecosystem and marine life.",
        "Did you know that snails can sleep for up to 3 years?"
    ],
    'questions': [
        "How many individual reefs make up the Great Barrier Reef?",
        "Where can the Great Barrier Reef be found?",
        "Is the Great Barrier Reef smaller than the Red Sea coral system?"
    ],
    'answer': [
        "This reef system is the world's largest, comprising over 2,900 individual reefs.",
        "The Coral Sea, off the coast of Queensland, Australia, is where the reef is located.",
        "This reef system is the world's largest, comprising over 2,900 individual reefs."
    ]
}

test_case_6 = {
    'sentences': [
        "Mars, often referred to as the Red Planet, has always fascinated astronomers.",
        "It is the fourth planet from the Sun in our solar system.",
        "Many cultures around the world have their own delicious variations of dumplings.",
        "Mars has surface features both of the Moon and Earth.",
        "Water ice has been detected on the planet's surface.",
        "The Martian atmosphere is about 95% carbon dioxide.",
        "In your next visit to a beach, try building a sandcastle! It's quite fun.",
        "Scientists are keen on finding signs of past life on Mars.",
        "Orbiters and rovers sent to Mars aim to study its geology and climate."
    ],
    'questions': [
        "What percentage of the Martian atmosphere is carbon dioxide?",
        "Where does Mars rank in order from the Sun in our solar system?",
        "Have scientists discovered water on Mars?"
    ],
    'answer': [
        "The Martian atmosphere is about 95% carbon dioxide.",
        "It is the fourth planet from the Sun in our solar system.",
        "Water ice has been detected on the planet's surface."
    ]
}

test_case_7 = {
    'sentences': [
        "Jazz is a music genre that originated in the African-American communities.",
        "It has roots in blues and ragtime.",
        "Cacti are interesting plants that store water in their stems.",
        "Jazz has seen various styles emerge, like bebop, funk, and swing.",
        "The improvisation aspect is one of the hallmarks of jazz music.",
        "Louis Armstrong is one of the most influential figures in jazz.",
        "On a different topic, honey has antibacterial properties."
    ],
    'questions': [
        "Who is a notable figure in jazz music?",
        "Where did jazz music originate?",
        "What's a distinct characteristic of jazz?"
    ],
    'answer': [
        "Louis Armstrong is one of the most influential figures in jazz.",
        "Jazz is a music genre that originated in the African-American communities.",
        "The improvisation aspect is one of the hallmarks of jazz music."
    ]
}

test_case_8 = {
    'sentences': [
        "The Nile is a major river in northeastern Africa.",
        "It is commonly regarded as the longest river in the world.",
        "Speaking of cooking, a pinch of salt can elevate most dishes.",
        "The Nile River has two main tributaries: the White Nile and the Blue Nile.",
        "It flows through several countries, including Egypt and Sudan.",
        "The river has played a vital role in the civilizations that have risen on its banks.",
        "By the way, flamingos are pink due to their diet of shrimp and algae."
    ],
    'questions': [
        "Why are flamingos pink?",
        "What are the two main tributaries of the Nile?",
        "Which countries does the Nile flow through?"
    ],
    'answer': [
        "Flamingos are pink due to their diet of shrimp and algae.",
        "The Nile River has two main tributaries: the White Nile and the Blue Nile.",
        "It flows through several countries, including Egypt and Sudan."
    ]
}

test_case_9 = {
    'sentences': [
        "Basketball is a popular sport played worldwide.",
        "It was invented in 1891 by Dr. James Naismith.",
        "Interestingly, the initial games used a soccer ball and two peach baskets as goals.",
        "There's a saying that a book is a window to another world.",
        "In a standard game, each team has five players on the court.",
        "Scoring involves shooting the ball into the opposing team's basket.",
        "The International Basketball Federation (FIBA) governs the game globally."
    ],
    'questions': [
        "Who invented basketball?",
        "How many players from each team are on the court in a standard basketball game?",
        "What was used as goals in the initial games of basketball?"
    ],
    'answer': [
        "It was invented in 1891 by Dr. James Naismith.",
        "In a standard game, each team has five players on the court.",
        "Interestingly, the initial games used a soccer ball and two peach baskets as goals."
    ]
}

test_case_10 = {
    'sentences': [
        "Pandas are native to the mountainous regions of China.",
        "They primarily eat bamboo and are known for their distinct black and white coat.",
        "Next time you bake, try adding some chocolate chips to your cookies.",
        "Unfortunately, these creatures are classified as vulnerable due to habitat loss.",
        "Conservation efforts are ongoing to ensure their survival.",
        "On another note, the Pyramids of Giza are truly marvelous structures.",
        "Despite their size, pandas are quite agile and are good tree climbers."
    ],
    'questions': [
        "What do pandas primarily eat?",
        "Why are pandas considered vulnerable?",
        "Where can pandas be found in the wild?"
    ],
    'answer': [
        "They primarily eat bamboo and are known for their distinct black and white coat.",
        "Unfortunately, these creatures are classified as vulnerable due to habitat loss.",
        "Pandas are native to the mountainous regions of China."
    ]
}

test_case_11 = {
    'sentences': [
        "The Amazon rainforest is one of the world's greatest natural resources.",
        "It spans over nine countries in South America.",
        "Due to its immense size, it's often called the 'lungs of the Earth'.",
        "In the realm of fantasy, dragons are often depicted as powerful and majestic creatures.",
        "The rainforest houses an incredibly diverse range of wildlife and plants.",
        "Sadly, deforestation and logging activities threaten its very existence.",
        "Indigenous tribes have lived in harmony with the forest for thousands of years."
    ],
    'questions': [
        "Where is the Amazon rainforest located?",
        "Why is it referred to as the 'lungs of the Earth'?",
        "What are the threats to the Amazon rainforest?"
    ],
    'answer': [
        "The Amazon rainforest is one of the world's greatest natural resources.",
        "It spans over nine countries in South America.",
        "Due to its immense size, it's often called the 'lungs of the Earth'.",
        "Sadly, deforestation and logging activities threaten its very existence."
    ]
}

test_case_12 = {
    'sentences': [
        "Vincent van Gogh was a Dutch post-impressionist painter.",
        "One of his most famous works is 'Starry Night'.",
        "During his lifetime, he wasn't recognized for his talent and sold very few paintings.",
        "Contrary to popular belief, sharks don't typically target humans for prey.",
        "Van Gogh struggled with mental health issues throughout his life.",
        "He was known for his use of color and emotional intensity in his works.",
        "On a side note, coffee is the second most traded commodity after crude oil."
    ],
    'questions': [
        "Who painted 'Starry Night'?",
        "How was van Gogh's career during his lifetime?",
        "What was distinctive about van Gogh's art?"
    ],
    'answer': [
        "Vincent van Gogh was a Dutch post-impressionist painter.",
        "One of his most famous works is 'Starry Night'.",
        "During his lifetime, he wasn't recognized for his talent and sold very few paintings.",
        "He was known for his use of color and emotional intensity in his works."
    ]
}

test_case_13 = {
    'sentences': [
        "Tea is a beloved beverage consumed by millions worldwide.",
        "It's derived from the Camellia sinensis plant.",
        "Different processing methods lead to various types like green, black, and oolong tea.",
        "Speaking of beverages, fresh orange juice is rich in vitamin C.",
        "Tea culture is deeply rooted in countries like China, India, and England.",
        "The Boston Tea Party was a significant event in American history, unrelated to tea's consumption habits.",
        "Matcha is a type of powdered green tea popular in Japan."
    ],
    'questions': [
        "Where does tea come from?",
        "What are the different types of tea?",
        "Where is Matcha tea popular?"
    ],
    'answer': [
        "It's derived from the Camellia sinensis plant.",
        "Different processing methods lead to various types like green, black, and oolong tea.",
        "Matcha is a type of powdered green tea popular in Japan."
    ]
}

test_case_14 = {
    'sentences': [
        "Soccer, also known as football in many countries, is a popular sport.",
        "The game is played with two teams of eleven players each.",
        "On a culinary note, sushi is a delightful Japanese dish.",
        "The objective is to score by putting the ball into the opposing team's goal.",
        "The FIFA World Cup is the premier event in soccer, attracting teams from around the globe.",
        "Brazil has won the World Cup a record five times."
    ],
    'questions': [
        "How many players are on a soccer team?",
        "What is the objective of soccer?",
        "Which country has won the most World Cups?"
    ],
    'answer': [
        "The game is played with two teams of eleven players each.",
        "The objective is to score by putting the ball into the opposing team's goal.",
        "Brazil has won the World Cup a record five times."
    ]
}

test_case_15 = {
    'sentences': [
        "The moon orbits around the Earth roughly every 27.3 days.",
        "It's Earth's only natural satellite.",
        "Have you ever noticed how cats are curious by nature?",
        "Solar eclipses occur when the moon passes between the Earth and the sun.",
        "There are various phases of the moon, from new moon to full moon.",
        "Neil Armstrong and Buzz Aldrin were the first humans to walk on the moon in 1969.",
        "On another topic, chocolate was once used as currency in ancient civilizations."
    ],
    'questions': [
        "How often does the moon orbit the Earth?",
        "Who were the first humans on the moon?",
        "What happens during a solar eclipse?"
    ],
    'answer': [
        "The moon orbits around the Earth roughly every 27.3 days.",
        "Neil Armstrong and Buzz Aldrin were the first humans to walk on the moon in 1969.",
        "Solar eclipses occur when the moon passes between the Earth and the sun."
    ]
}

test_case_16 = {
    'sentences': [
        "Shakespeare is often hailed as one of the greatest playwrights in history.",
        "His plays like 'Romeo and Juliet' and 'Hamlet' are world-renowned.",
        "By the way, hummingbirds are the only birds that can fly backward.",
        "Shakespeare's works have been translated into every major language.",
        "He wrote a total of 39 plays, 154 sonnets, and two long narrative poems.",
        "Stratford-upon-Avon in England is where Shakespeare was born and raised.",
        "Did you know that chocolate was once used as currency?"
    ],
    'questions': [
        "Where was Shakespeare born?",
        "How many plays did Shakespeare write?",
        "Name one of Shakespeare's famous plays."
    ],
    'answer': [
        "Stratford-upon-Avon in England is where Shakespeare was born and raised.",
        "He wrote a total of 39 plays, 154 sonnets, and two long narrative poems.",
        "His plays like 'Romeo and Juliet' and 'Hamlet' are world-renowned."
    ]
}

test_case_17 = {
    'sentences': [
        "Coffee is a brewed drink prepared from roasted coffee beans.",
        "It originates from tropical areas of Africa.",
        "The majority of coffee species grow in the equatorial region.",
        "On another topic, the Eiffel Tower is a must-visit when in Paris.",
        "Caffeine, a central nervous system stimulant, is a major component of coffee.",
        "Different brewing methods include espresso, French press, and drip brewing.",
        "Did you hear about the whale that could sing?"
    ],
    'questions': [
        "Where does coffee originate from?",
        "What is a major component of coffee?",
        "Name one method of brewing coffee."
    ],
    'answer': [
        "It originates from tropical areas of Africa.",
        "Caffeine, a central nervous system stimulant, is a major component of coffee.",
        "Different brewing methods include espresso, French press, and drip brewing."
    ]
}

test_case_18 = {
    'sentences': [
        "The Amazon Rainforest is the world's largest tropical rainforest.",
        "It spans across nine countries in South America.",
        "By the way, polar bears are excellent swimmers.",
        "The Amazon River, the second longest river in the world, flows through this rainforest.",
        "The forest is home to an incredibly diverse range of wildlife and plants.",
        "Sadly, deforestation poses a major threat to this ecological treasure.",
        "Chocolate chip cookies are loved globally."
    ],
    'questions': [
        "How many countries does the Amazon Rainforest span across?",
        "What major river flows through the Amazon?",
        "What is a threat to the Amazon Rainforest?"
    ],
    'answer': [
        "It spans across nine countries in South America.",
        "The Amazon River, the second longest river in the world, flows through this rainforest.",
        "Sadly, deforestation poses a major threat to this ecological treasure."
    ]
}

test_case_19 = {
    'sentences': [
        "The Internet is a vast network that connects computers worldwide.",
        "It facilitates data exchange and communication across the globe.",
        "On a side note, ostriches have the largest eyes among land animals.",
        "Websites are accessed via browsers like Chrome, Firefox, or Safari.",
        "Email, social media, and online banking are some applications of the Internet.",
        "Tim Berners-Lee is credited with inventing the World Wide Web in 1989.",
        "Did you know that a group of kangaroos is called a troop?"
    ],
    'questions': [
        "Who is credited with inventing the World Wide Web?",
        "What does the Internet facilitate?",
        "How are websites accessed?"
    ],
    'answer': [
        "Tim Berners-Lee is credited with inventing the World Wide Web in 1989.",
        "It facilitates data exchange and communication across the globe.",
        "Websites are accessed via browsers like Chrome, Firefox, or Safari."
    ]
}

test_case_20 = {
    'sentences': [
        "Penguins are flightless birds commonly found in the Southern Hemisphere.",
        "They are excellent swimmers and can dive deep underwater for food.",
        "By the way, the Mona Lisa has no eyebrows in the famous painting.",
        "Emperor penguins are the tallest species, reaching heights of up to 1.2 meters.",
        "Their primary diet consists of fish, squid, and krill.",
        "Penguins have a thick layer of blubber to keep them warm in cold temperatures.",
        "Speaking of cold, ice cream is a favorite treat in summers."
    ],
    'questions': [
        "What is the primary diet of penguins?",
        "Why do penguins have a thick layer of blubber?",
        "Which species of penguin is the tallest?"
    ],
    'answer': [
        "Their primary diet consists of fish, squid, and krill.",
        "Penguins have a thick layer of blubber to keep them warm in cold temperatures.",
        "Emperor penguins are the tallest species, reaching heights of up to 1.2 meters."
    ]
}

test_case_21 = {
    'sentences': [
        "The human brain is a marvel of nature and incredibly complex.",
        "It's responsible for controlling all voluntary and involuntary actions.",
        "Random trivia: the honeybee is the only insect that produces food consumed by humans.",
        "With billions of neurons, the brain processes information at astonishing speeds.",
        "Activities like reading, dancing, and even dreaming are controlled by the brain.",
        "The hippocampus plays a crucial role in memory formation.",
        "On another note, the Grand Canyon is a magnificent natural wonder in the U.S."
    ],
    'questions': [
        "What is the function of the hippocampus?",
        "How many neurons does the human brain have?",
        "Which insect produces food for humans?"
    ],
    'answer': [
        "The hippocampus plays a crucial role in memory formation.",
        "With billions of neurons, the brain processes information at astonishing speeds.",
        "the honeybee is the only insect that produces food consumed by humans."
    ]
}

test_case_22 = {
    'sentences': [
        "The pyramids of Egypt are ancient marvels, built as tombs for pharaohs.",
        "The Great Pyramid of Giza is the largest among them.",
        "Did you know? The blue whale's tongue can weigh as much as an elephant.",
        "These structures have stood for thousands of years and remain largely intact.",
        "Some theories suggest that the pyramids were constructed using astronomical alignments.",
        "Besides tombs, pyramids also had ceremonial and religious significance.",
        "Speaking of ancient, did you know Rome wasn't built in a day?"
    ],
    'questions': [
        "What was the primary purpose of the pyramids?",
        "Which is the largest pyramid in Egypt?",
        "Were the pyramids built overnight?"
    ],
    'answer': [
        "The pyramids of Egypt are ancient marvels, built as tombs for pharaohs.",
        "The Great Pyramid of Giza is the largest among them.",
        "Speaking of ancient, did you know Rome wasn't built in a day?"
    ]
}

test_case_23 = {
    'sentences': [
        "The Mona Lisa is an iconic painting by Leonardo da Vinci.",
        "It's housed in the Louvre Museum in Paris, France.",
        "On a different note, kangaroos are indigenous to Australia.",
        "The painting's enigmatic smile has been a subject of debate for centuries.",
        "Despite its small size, the Mona Lisa attracts millions of visitors each year.",
        "Leonardo used the sfumato technique for the painting's soft transitions.",
        "In pop culture, Mona Lisa has been referenced in numerous songs and movies."
    ],
    'questions': [
        "Where can the Mona Lisa be found?",
        "Who painted the Mona Lisa?",
        "What technique did Leonardo use for the painting?"
    ],
    'answer': [
        "It's housed in the Louvre Museum in Paris, France.",
        "The Mona Lisa is an iconic painting by Leonardo da Vinci.",
        "Leonardo used the sfumato technique for the painting's soft transitions."
    ]
}

test_case_24 = {
    'sentences': [
        "Basketball is a popular sport played by two teams of five players each.",
        "The objective is to shoot the ball into the opposing team's basket.",
        "Changing gears, polar bears primarily feed on seals.",
        "Michael Jordan is often considered the greatest basketball player of all time.",
        "The game involves dribbling, shooting, and defending to gain points.",
        "The NBA (National Basketball Association) is the premier league for this sport in the U.S.",
        "Interestingly, a basketball court's dimensions can vary depending on the league."
    ],
    'questions': [
        "How many players are in a basketball team?",
        "Who is often considered the greatest basketball player?",
        "What's the main league for basketball in the U.S.?"
    ],
    'answer': [
        "Basketball is a popular sport played by two teams of five players each.",
        "Michael Jordan is often considered the greatest basketball player of all time.",
        "The NBA (National Basketball Association) is the premier league for this sport in the U.S."
    ]
}

test_case_25 = {
    'sentences': [
        "Mars, often termed the 'Red Planet', is the fourth planet from the Sun.",
        "It's named after the Roman god of war due to its reddish appearance.",
        "Switching subjects, the violin is a string instrument with a rich history.",
        "Robotic missions from Earth have explored Mars for signs of past life.",
        "Mars has two small moons, Phobos and Deimos.",
        "The planet's surface is replete with valleys, deserts, and polar ice caps.",
        "One day on Mars, known as a sol, is slightly longer than a day on Earth."
    ],
    'questions': [
        "Why is Mars called the 'Red Planet'?",
        "How many moons does Mars have?",
        "What is the duration of a day on Mars?"
    ],
    'answer': [
        "Mars, often termed the 'Red Planet', is the fourth planet from the Sun because of its reddish appearance.",
        "Mars has two small moons, Phobos and Deimos.",
        "One day on Mars, known as a sol, is slightly longer than a day on Earth."
    ]
}

test_case_26 = {
    'sentences': [
        "The piano is a versatile musical instrument played using a keyboard.",
        "It was invented in Italy during the early 18th century by Bartolomeo Cristofori.",
        "In unrelated trivia, a cat's purring can help lower stress levels in humans.",
        "Classical, jazz, and pop are genres where the piano plays a central role.",
        "There are typically 88 keys on a standard piano, split between black and white keys.",
        "Beethoven and Chopin are among the renowned composers for the piano.",
        "Did you ever notice that the moon can be seen during daylight sometimes?"
    ],
    'questions': [
        "Who invented the piano?",
        "How many keys are on a standard piano?",
        "Name a genre where the piano is prominently used."
    ],
    'answer': [
        "It was invented in Italy during the early 18th century by Bartolomeo Cristofori.",
        "There are typically 88 keys on a standard piano, split between black and white keys.",
        "Classical, jazz, and pop are genres where the piano plays a central role."
    ]
}

test_case_27 = {
    'sentences': [
        "Roses are flowering plants with over three hundred species.",
        "They are primarily native to the Northern Hemisphere.",
        "On another note, did you know that honey never spoils?",
        "Roses have been symbols of love, beauty, and war over the centuries.",
        "The city of Portland, Oregon is known as the 'City of Roses'.",
        "Rose oil is valuable and used in the luxury perfume industry.",
        "It's amazing how many stars can be seen from a dark location away from city lights."
    ],
    'questions': [
        "How many species of roses are there?",
        "What is the city of Portland known as?",
        "For what is rose oil primarily used?"
    ],
    'answer': [
        "Roses are flowering plants with over three hundred species.",
        "The city of Portland, Oregon is known as the 'City of Roses'.",
        "Rose oil is valuable and used in the luxury perfume industry."
    ]
}

test_case_28 = {
    'sentences': [
        "Mars, often referred to as the Red Planet, is the fourth planet from the Sun.",
        "It is named after the Roman god of war due to its reddish appearance.",
        "Speaking of colors, blue whales are the largest animals on Earth.",
        "There is evidence of liquid water in the past on Mars, as seen from valleys and dried-up riverbeds.",
        "One of Mars' prominent features is Olympus Mons, the tallest volcano in the solar system.",
        "Mars has two small moons: Phobos and Deimos.",
        "Have you ever tried skydiving? It's exhilarating!"
    ],
    'questions': [
        "Why is Mars called the Red Planet?",
        "What is a notable feature on Mars related to a volcano?",
        "How many moons does Mars have?"
    ],
    'answer': [
        "Mars, often referred to as the Red Planet, is the fourth planet from the Sun.",
        "One of Mars' prominent features is Olympus Mons, the tallest volcano in the solar system.",
        "Mars has two small moons: Phobos and Deimos."
    ]
}

test_case_29 = {
    'sentences': [
        "Basketball is a popular sport played between two teams of five players each.",
        "The objective is to shoot a ball through the opponent's hoop to score points.",
        "On a different topic, zebras have a unique stripe pattern, much like human fingerprints.",
        "The NBA (National Basketball Association) is the premier professional basketball league in the United States.",
        "Basketball was invented in 1891 by Dr. James Naismith.",
        "Common defensive strategies include man-to-man and zone defense.",
        "Interestingly, owls can rotate their heads up to 270 degrees."
    ],
    'questions': [
        "Who invented basketball?",
        "What is the main objective of basketball?",
        "Which league is considered the premier basketball league in the United States?"
    ],
    'answer': [
        "Basketball was invented in 1891 by Dr. James Naismith.",
        "The objective is to shoot a ball through the opponent's hoop to score points.",
        "The NBA (National Basketball Association) is the premier professional basketball league in the United States."
    ]
}

test_case_30 = {
    'sentences': [
        "Tea is a beverage that's been consumed for thousands of years.",
        "It is made by steeping cured or dried tea leaves in hot water.",
        "Speaking of water, did you know that dolphins are known to help fishermen catch fish in some cultures?",
        "Green tea, black tea, and oolong are some of the primary types of tea.",
        "China is often considered the birthplace of tea.",
        "Tea has played a significant role in many historical events, such as the Boston Tea Party.",
        "On a side note, the pyramids of Egypt are among the Seven Wonders of the Ancient World."
    ],
    'questions': [
        "How is tea typically prepared?",
        "Where is the birthplace of tea?",
        "Name one historical event where tea played a significant role."
    ],
    'answer': [
        "It is made by steeping cured or dried tea leaves in hot water.",
        "China is often considered the birthplace of tea.",
        "Tea has played a significant role in many historical events, such as the Boston Tea Party."
    ]
}

test_case_31 = {
    'sentences': [
        "The Eiffel Tower is one of the most iconic landmarks in Paris, France.",
        "It was completed in 1889 and stands at 324 meters tall.",
        "By the way, butterflies taste with their feet - a fascinating tidbit about nature.",
        "The tower was initially criticized by some of France's leading artists and intellectuals.",
        "However, it is now one of the most-visited paid monuments in the world.",
        "It was named after Gustave Eiffel, whose company designed and built the tower.",
        "Random fact: pineapples are not a single fruit but a group of berries fused together."
    ],
    'questions': [
        "Who was the Eiffel Tower named after?",
        "How tall is the Eiffel Tower?",
        "Was the Eiffel Tower initially well-received by French artists?"
    ],
    'answer': [
        "It was named after Gustave Eiffel, whose company designed and built the tower.",
        "It was completed in 1889 and stands at 324 meters tall.",
        "The tower was initially criticized by some of France's leading artists and intellectuals."
    ]
}

test_case_32 = {
    'sentences': [
        "Chocolate is derived from the roasted seeds of Theobroma cacao.",
        "It's consumed in various forms, including bars, truffles, and beverages.",
        "Speaking of sweetness, hummingbirds are the only birds that can fly backward.",
        "The Mayans and Aztecs valued cacao beans as currency and used them in rituals.",
        "Dark, milk, and white are the primary types of chocolate.",
        "Belgium and Switzerland are particularly famous for their high-quality chocolates.",
        "On a side note, a group of flamingos is called a 'flamboyance'."
    ],
    'questions': [
        "What are the primary types of chocolate?",
        "Which ancient civilizations valued cacao beans as currency?",
        "For what is Belgium particularly famous?"
    ],
    'answer': [
        "Dark, milk, and white are the primary types of chocolate.",
        "The Mayans and Aztecs valued cacao beans as currency and used them in rituals.",
        "Belgium and Switzerland are particularly famous for their high-quality chocolates."
    ]
}

test_case_33 = {
    'sentences': [
        "The Amazon Rainforest is the world's largest tropical rainforest.",
        "It spans across nine countries in South America.",
        "By the way, an octopus has three hearts - two pump blood to the gills and one to the rest of the body.",
        "The Amazon River flows through this rainforest and is the second-longest river in the world.",
        "This ecosystem is home to an incredibly diverse range of wildlife and plant species.",
        "Unfortunately, deforestation threatens the existence of this crucial biome.",
        "Did you know that strawberries have their seeds on the outside?"
    ],
    'questions': [
        "What is the primary threat to the Amazon Rainforest?",
        "How many countries does the Amazon Rainforest span?",
        "Which river flows through this rainforest?"
    ],
    'answer': [
        "Unfortunately, deforestation threatens the existence of this crucial biome.",
        "It spans across nine countries in South America.",
        "The Amazon River flows through this rainforest and is the second-longest river in the world."
    ]
}

test_case_34 = {
    'sentences': [
        "Yoga is an ancient practice that originated in India around 5,000 years ago.",
        "It involves a combination of physical postures, breathing techniques, and meditation.",
        "On a different topic, snails can sleep for up to three years.",
        "There are many styles of yoga, including Hatha, Ashtanga, and Kundalini.",
        "Yoga is known to improve flexibility, strength, and mental well-being.",
        "It's commonly practiced worldwide for relaxation and stress reduction.",
        "Here's a fun fact: A group of frogs is called an 'army'."
    ],
    'questions': [
        "Where did yoga originate?",
        "What are the benefits of practicing yoga?",
        "Name one style of yoga."
    ],
    'answer': [
        "Yoga is an ancient practice that originated in India around 5,000 years ago.",
        "Yoga is known to improve flexibility, strength, and mental well-being.",
        "There are many styles of yoga, including Hatha, Ashtanga, and Kundalini."
    ]
}

test_case_35 = {
    'sentences': [
        "Vincent van Gogh was a Dutch post-impressionist painter.",
        "He is famous for works like 'Starry Night' and 'The Sunflowers'.",
        "On a side note, the average lifespan of a dragonfly is just 24 hours.",
        "Van Gogh's style is known for its bold colors and dramatic, impulsive brushwork.",
        "Though he struggled with mental health issues, his influence on 20th-century art is immense.",
        "Surprisingly, he only sold a few paintings during his lifetime.",
        "Did you know that camels have three sets of eyelids to protect their eyes from sand?"
    ],
    'questions': [
        "For which two artworks is van Gogh particularly known?",
        "How is van Gogh's style characterized?",
        "Did van Gogh achieve significant commercial success during his lifetime?"
    ],
    'answer': [
        "He is famous for works like 'Starry Night' and 'The Sunflowers'.",
        "Van Gogh's style is known for its bold colors and dramatic, impulsive brushwork.",
        "Surprisingly, he only sold a few paintings during his lifetime."
    ]
}

test_case_36 = {
    'sentences': [
        "Penguins are flightless birds known for their waddling gait.",
        "They are primarily found in the Southern Hemisphere, especially Antarctica.",
        "On a lighter note, did you know bananas are berries, but strawberries aren't?",
        "Penguins are excellent swimmers, using their flippers for propulsion.",
        "The Emperor Penguin is the tallest of all penguin species.",
        "They feed primarily on fish and small marine creatures.",
        "Random fact: honey never spoils. Ancient pots of honey found in Egyptian tombs are still safe to eat."
    ],
    'questions': [
        "Where are penguins primarily found?",
        "What is notable about the Emperor Penguin?",
        "Do penguins use their feet for swimming?"
    ],
    'answer': [
        "They are primarily found in the Southern Hemisphere, especially Antarctica.",
        "The Emperor Penguin is the tallest of all penguin species.",
        "Penguins are excellent swimmers, using their flippers for propulsion."
    ]
}

test_case_37 = {
    'sentences': [
        "The pyramids of Egypt are monumental structures built as tombs for pharaohs.",
        "The Pyramid of Giza is one of the Seven Wonders of the Ancient World.",
        "Interestingly, a cat's nose print is as unique as a human's fingerprint.",
        "These pyramids were built during the Third Millennium BC.",
        "Workers used massive limestone blocks to construct them.",
        "The Sphinx, with the body of a lion and a pharaoh's head, is another iconic monument near the pyramids.",
        "By the way, did you know that octopuses have blue blood?"
    ],
    'questions': [
        "Why were the pyramids of Egypt constructed?",
        "What is unique about the Pyramid of Giza?",
        "What is the Sphinx?"
    ],
    'answer': [
        "The pyramids of Egypt are monumental structures built as tombs for pharaohs.",
        "The Pyramid of Giza is one of the Seven Wonders of the Ancient World.",
        "The Sphinx, with the body of a lion and a pharaoh's head, is another iconic monument near the pyramids."
    ]
}

test_case_38 = {
    'sentences': [
        "Shakespeare was an English playwright, poet, and actor.",
        "He wrote classics like 'Romeo and Juliet', 'Macbeth', and 'Hamlet'.",
        "Random trivia: A shrimp's heart is located in its head.",
        "Shakespeare's works are known for their profound impact on English literature.",
        "His plays have been translated into every major living language.",
        "The Globe Theatre in London was associated with him.",
        "Did you know that almonds are a member of the peach family?"
    ],
    'questions': [
        "Which theater was associated with Shakespeare?",
        "Name one classic written by Shakespeare.",
        "What's notable about the translations of Shakespeare's works?"
    ],
    'answer': [
        "The Globe Theatre in London was associated with him.",
        "He wrote classics like 'Romeo and Juliet', 'Macbeth', and 'Hamlet'.",
        "His plays have been translated into every major living language."
    ]
}

test_case_39 = {
    'sentences': [
        "The Grand Canyon is a steep-sided canyon carved by the Colorado River.",
        "It's located in the state of Arizona in the USA.",
        "On a different note, kangaroos can't walk backward.",
        "It's among the major tourist destinations in the US, attracting millions annually.",
        "The Grand Canyon is around 277 miles long, up to 18 miles wide, and over a mile deep.",
        "It was designated a World Heritage Site by UNESCO in 1979.",
        "Fun fact: There are more possible iterations of a game of chess than there are atoms in the known universe."
    ],
    'questions': [
        "What river carved the Grand Canyon?",
        "How deep is the Grand Canyon?",
        "Was the Grand Canyon designated a World Heritage Site?"
    ],
    'answer': [
        "The Grand Canyon is a steep-sided canyon carved by the Colorado River.",
        "The Grand Canyon is around 277 miles long, up to 18 miles wide, and over a mile deep.",
        "It was designated a World Heritage Site by UNESCO in 1979."
    ]
}

test_case_40 = {
    'sentences': [
        "Mars is the fourth planet from the Sun in our solar system.",
        "It's often referred to as the 'Red Planet' because of its reddish appearance.",
        "Speaking of colors, goldfish can see both infrared and ultraviolet light.",
        "Mars has the largest volcano in the solar system, Olympus Mons.",
        "The planet has two moons, Phobos and Deimos.",
        "There's evidence suggesting there was once liquid water on Mars.",
        "Random tidbit: A jiffy is an actual unit of time – it's 1/100th of a second."
    ],
    'questions': [
        "Why is Mars called the 'Red Planet'?",
        "How many moons does Mars have?",
        "What's significant about Olympus Mons?"
    ],
    'answer': [
        "It's often referred to as the 'Red Planet' because of its reddish appearance.",
        "The planet has two moons, Phobos and Deimos.",
        "Mars has the largest volcano in the solar system, Olympus Mons."
    ]
}

test_case_41 = {
    'sentences': [
        "Chocolate is derived from cocoa beans, the dried and fermented seeds of the cacao tree.",
        "Many people enjoy chocolate in various forms, from dark to milk and even white chocolate.",
        "An unrelated tidbit: Polar bears have black skin under their thick white fur.",
        "Switzerland is renowned for its high-quality chocolate production.",
        "Cocoa trees are native to the deep tropical regions of the Americas.",
        "Historically, cocoa beans were used as a form of currency by ancient civilizations.",
        "Speaking of foods, did you know pineapples don't grow on trees? They grow in the center of a leafy plant."
    ],
    'questions': [
        "Where do cocoa beans come from?",
        "Which country is renowned for its chocolate?",
        "Were cocoa beans used in any unique way historically?"
    ],
    'answer': [
        "Chocolate is derived from cocoa beans, the dried and fermented seeds of the cacao tree.",
        "Switzerland is renowned for its high-quality chocolate production.",
        "Historically, cocoa beans were used as a form of currency by ancient civilizations."
    ]
}

test_case_42 = {
    'sentences': [
        "The Nile is a major river in northeastern Africa, often regarded as the longest river in the world.",
        "It flows north into the Mediterranean Sea.",
        "On a whimsical note: Cows have best friends and can become stressed if they're separated.",
        "Ancient Egypt was heavily dependent on the Nile for agriculture.",
        "The river's annual flooding brought nutrient-rich silt, ideal for crops.",
        "The Nile has two major tributaries, the White Nile and the Blue Nile.",
        "In other news, bamboo is the fastest-growing plant on Earth."
    ],
    'questions': [
        "What is the significance of the Nile to Ancient Egypt?",
        "Into which sea does the Nile flow?",
        "Name one tributary of the Nile."
    ],
    'answer': [
        "Ancient Egypt was heavily dependent on the Nile for agriculture.",
        "It flows north into the Mediterranean Sea.",
        "The Nile has two major tributaries, the White Nile and the Blue Nile."
    ]
}

test_case_43 = {
    'sentences': [
        "The Eiffel Tower is an iconic iron lattice tower located in Paris, France.",
        "It was initially criticized by some of France's leading artists and intellectuals, but now is one of the most recognized structures in the world.",
        "A quirky fact: Lobsters have blue blood due to the presence of hemocyanin.",
        "The tower was built as the entrance arch for the 1889 World's Fair.",
        "Standing at 324 meters, it was the tallest man-made structure until the completion of the Chrysler Building in New York.",
        "Visitors can climb or take an elevator to get breathtaking views of Paris from its platforms.",
        "Ever wondered about jellyfish? They're 95% water."
    ],
    'questions': [
        "Why was the Eiffel Tower built?",
        "How tall is the Eiffel Tower?",
        "What can visitors do at the Eiffel Tower?"
    ],
    'answer': [
        "The tower was built as the entrance arch for the 1889 World's Fair.",
        "Standing at 324 meters, it was the tallest man-made structure until the completion of the Chrysler Building in New York.",
        "Visitors can climb or take an elevator to get breathtaking views of Paris from its platforms."
    ]
}

test_case_44 = {
    'sentences': [
        "The Amazon Rainforest is the world's largest tropical rainforest, covering over 5.5 million square kilometers.",
        "It's home to a vast array of biodiversity, including millions of species of insects, plants, birds, and mammals.",
        "For a change of subject: Did you know that honeybees can recognize human faces?",
        "The Amazon River, the second-longest river in the world, runs through this rainforest.",
        "Unfortunately, deforestation poses a significant threat to the Amazon and its inhabitants.",
        "The rainforest spans across nine countries, including Brazil, Peru, and Colombia.",
        "On another topic, potatoes were the first food to be grown in space."
    ],
    'questions': [
        "What is the size of the Amazon Rainforest?",
        "Which river runs through the Amazon?",
        "How many countries does the rainforest span across?"
    ],
    'answer': [
        "The Amazon Rainforest is the world's largest tropical rainforest, covering over 5.5 million square kilometers.",
        "The Amazon River, the second-longest river in the world, runs through this rainforest.",
        "The rainforest spans across nine countries, including Brazil, Peru, and Colombia."
    ]
}

test_case_45 = {
    'sentences': [
        "Yoga is an ancient physical, mental, and spiritual practice that originated in India.",
        "The word 'yoga' derives from Sanskrit and means to join or unite, symbolizing the union of body and mind.",
        "Switching gears: Cats can make over 100 different sounds, while dogs can make around 10.",
        "There are many styles of yoga, including Hatha, Vinyasa, and Ashtanga.",
        "Practicing yoga can offer various benefits like increased flexibility, improved respiration, and stress reduction.",
        "The International Day of Yoga is observed on June 21st.",
        "A neat fact: Butterflies taste with their feet."
    ],
    'questions': [
        "What does the word 'yoga' mean?",
        "What are some benefits of practicing yoga?",
        "When is the International Day of Yoga observed?"
    ],
    'answer': [
        "The word 'yoga' derives from Sanskrit and means to join or unite, symbolizing the union of body and mind.",
        "Practicing yoga can offer various benefits like increased flexibility, improved respiration, and stress reduction.",
        "The International Day of Yoga is observed on June 21st."
    ]
}

test_case_46 = {
    'sentences': [
        "Penguins are flightless birds native to the Southern Hemisphere, especially Antarctica.",
        "Despite their inability to fly, they are excellent swimmers and divers.",
        "A surprising tidbit: Bananas are berries, but strawberries are not!",
        "Penguins primarily feed on fish and krill.",
        "There are various species of penguins, with the Emperor Penguin being the tallest and heaviest.",
        "Climate change and overfishing are threats to their natural habitats.",
        "In other matters, the Mona Lisa has no eyebrows!"
    ],
    'questions': [
        "What do penguins primarily eat?",
        "Which species of penguin is the tallest?",
        "Can penguins fly?"
    ],
    'answer': [
        "Penguins primarily feed on fish and krill.",
        "There are various species of penguins, with the Emperor Penguin being the tallest and heaviest.",
        "Penguins are flightless birds native to the Southern Hemisphere, especially Antarctica."
    ]
}

test_case_47 = {
    'sentences': [
        "The Sahara is the world's largest hot desert, covering much of North Africa.",
        "Despite its arid conditions, it's home to a variety of wildlife, including camels and gazelles.",
        "Switching topics: The hummingbird is the only bird that can fly backward.",
        "The Sahara experiences extremely high temperatures, often exceeding 40°C (104°F).",
        "Sand dunes in the Sahara can reach heights of up to 180 meters.",
        "Oases, spots with water in the desert, are critical for survival for many species.",
        "On a lighter note, a group of flamingos is called a 'flamboyance'."
    ],
    'questions': [
        "What is the primary wildlife found in the Sahara?",
        "How high can temperatures in the Sahara go?",
        "What is the importance of oases in the Sahara?"
    ],
    'answer': [
        "Despite its arid conditions, it's home to a variety of wildlife, including camels and gazelles.",
        "The Sahara experiences extremely high temperatures, often exceeding 40°C (104°F).",
        "Oases, spots with water in the desert, are critical for survival for many species."
    ]
}

test_case_48 = {
    'sentences': [
        "The Grand Canyon is a steep-sided canyon carved by the Colorado River in Arizona, USA.",
        "It's one of the most famous natural wonders and attracts millions of tourists annually.",
        "A quirky fact: The shortest war in history was between Britain and Zanzibar in 1896 and lasted 38 minutes.",
        "The canyon is over a mile deep and spans over 277 miles in length.",
        "The area offers breathtaking vistas, hiking trails, and river-rafting opportunities.",
        "Geological studies suggest the Grand Canyon to be over six million years old.",
        "Speaking of wonders, did you know the Great Wall of China is not visible from the Moon with the naked eye?"
    ],
    'questions': [
        "By which river was the Grand Canyon formed?",
        "How deep is the Grand Canyon?",
        "What are some activities available at the Grand Canyon?"
    ],
    'answer': [
        "The Grand Canyon is a steep-sided canyon carved by the Colorado River in Arizona, USA.",
        "The canyon is over a mile deep and spans over 277 miles in length.",
        "The area offers breathtaking vistas, hiking trails, and river-rafting opportunities."
    ]
}

test_case_49 = {
    'sentences': [
        "Kangaroos are marsupials endemic to Australia.",
        "They are best known for their strong hind legs, used primarily for jumping, and their large pouches for carrying joeys (baby kangaroos).",
        "Diverging a bit: An octopus has three hearts and blue blood.",
        "There are four main species of kangaroo: the Red, Eastern Grey, Western Grey, and Antilopine.",
        "Kangaroos are herbivores, primarily grazing on grass.",
        "With their strong tails for balance, they can reach speeds up to 65km/h (40 mph).",
        "Switching tracks, did you know that a crocodile cannot stick its tongue out?"
    ],
    'questions': [
        "What are kangaroos known for?",
        "How many main species of kangaroo are there?",
        "How fast can kangaroos go?"
    ],
    'answer': [
        "They are best known for their strong hind legs, used primarily for jumping, and their large pouches for carrying joeys (baby kangaroos).",
        "There are four main species of kangaroo: the Red, Eastern Grey, Western Grey, and Antilopine.",
        "With their strong tails for balance, they can reach speeds up to 65km/h (40 mph)."
    ]
}

test_case_50 = {
    'sentences': [
        "Mount Fuji, located on Honshu Island, is the highest mountain in Japan.",
        "It's an active stratovolcano that last erupted in 1707-08.",
        "Fun aside: A day on Venus is longer than a year on Venus due to its rotation!",
        "The mountain is a well-known symbol of Japan and is frequently depicted in art and photography.",
        "Climbing Mount Fuji is a popular activity, especially during the summer climbing season.",
        "Its beautifully symmetrical cone is a result of three volcanic cones: Komitake, Ko-Fuji, and the youngest, Shin-Fuji.",
        "On another topic, the unicorn is the national animal of Scotland."
    ],
    'questions': [
        "Where is Mount Fuji located?",
        "When was Mount Fuji's last eruption?",
        "Why is Mount Fuji's cone symmetrical?"
    ],
    'answer': [
        "Mount Fuji, located on Honshu Island, is the highest mountain in Japan.",
        "It's an active stratovolcano that last erupted in 1707-08.",
        "Its beautifully symmetrical cone is a result of three volcanic cones: Komitake, Ko-Fuji, and the youngest, Shin-Fuji."
    ]
}

test_case_51 = {
    'sentences': [
        "The Nile is the world's longest river, flowing through several African countries.",
        "Ancient Egyptian civilization flourished along its banks, using its waters for irrigation.",
        "Changing topics: The heart of a blue whale can weigh as much as a car.",
        "The Nile has two main tributaries: the White Nile and the Blue Nile.",
        "It empties into the Mediterranean Sea after its long journey.",
        "Interestingly, despite its importance, the Nile does not flow through Rwanda."
    ],
    'questions': [
        "Which civilizations prospered along the Nile?",
        "Into which sea does the Nile empty?",
        "Does the Nile pass through Rwanda?"
    ],
    'answer': [
        "Ancient Egyptian civilization flourished along its banks, using its waters for irrigation.",
        "It empties into the Mediterranean Sea after its long journey.",
        "Interestingly, despite its importance, the Nile does not flow through Rwanda."
    ]
}

test_case_52 = {
    'sentences': [
        "Coffee is a brewed drink prepared from roasted coffee beans.",
        "It's believed to have originated in Ethiopia and later spread to other parts of the world.",
        "Shifting gears: Honey never spoils and can remain consumable for thousands of years.",
        "Brazil is the largest producer of coffee globally.",
        "Coffee consumption has been linked to numerous health benefits but also has some side effects when consumed in excess."
    ],
    'questions': [
        "Where is coffee believed to have originated?",
        "Which country is the largest producer of coffee?",
        "Is honey's shelf life short?"
    ],
    'answer': [
        "It's believed to have originated in Ethiopia and later spread to other parts of the world.",
        "Brazil is the largest producer of coffee globally.",
        "Shifting gears: Honey never spoils and can remain consumable for thousands of years."
    ]
}

test_case_53 = {
    'sentences': [
        "The Amazon rainforest, also known as Amazonia, spans over nine countries in South America.",
        "It's the world's largest tropical rainforest, famed for its biodiversity.",
        "Random fact: The fingerprints of a koala are so similar to humans that they can taint crime scenes.",
        "The Amazon River flows through this rainforest and is also one of the longest rivers in the world.",
        "Many indigenous tribes live in the Amazon, relying on its resources."
    ],
    'questions': [
        "How many countries does the Amazon rainforest cover?",
        "Why is the Amazon rainforest famous?",
        "Do indigenous tribes inhabit the Amazon?"
    ],
    'answer': [
        "The Amazon rainforest, also known as Amazonia, spans over nine countries in South America.",
        "It's the world's largest tropical rainforest, famed for its biodiversity.",
        "Many indigenous tribes live in the Amazon, relying on its resources."
    ]
}

test_case_54 = {
    'sentences': [
        "Chocolate is derived from the seeds of the Theobroma cacao tree.",
        "The Mayans and Aztecs once valued cacao beans as currency.",
        "Speaking of treats: There are more possible iterations of a game of chess than there are atoms in the known universe.",
        "Dark chocolate, in particular, has been linked to various health benefits.",
        "Switzerland is one of the top consumers of chocolate per capita."
    ],
    'questions': [
        "From which tree are chocolate seeds derived?",
        "Who used cacao beans as currency?",
        "Which country is a top consumer of chocolate?"
    ],
    'answer': [
        "Chocolate is derived from the seeds of the Theobroma cacao tree.",
        "The Mayans and Aztecs once valued cacao beans as currency.",
        "Switzerland is one of the top consumers of chocolate per capita."
    ]
}

test_case_55 = {
    'sentences': [
        "The Eiffel Tower, located in Paris, is one of the most iconic landmarks in the world.",
        "It was constructed as the entrance arch for the 1889 World's Fair.",
        "Side note: A group of frogs is called an 'army'.",
        "The tower stands at 324 meters tall and was once the tallest man-made structure in the world.",
        "Many tourists ascend it daily to view the Parisian skyline."
    ],
    'questions': [
        "Where is the Eiffel Tower located?",
        "Why was the Eiffel Tower constructed?",
        "How tall is the Eiffel Tower?"
    ],
    'answer': [
        "The Eiffel Tower, located in Paris, is one of the most iconic landmarks in the world.",
        "It was constructed as the entrance arch for the 1889 World's Fair.",
        "The tower stands at 324 meters tall and was once the tallest man-made structure in the world."
    ]
}

test_case_56 = {
    'sentences': [
        "Pandas are bear species native to south-central China.",
        "They are known for their distinct black and white coat.",
        "Switching subjects: There are more fake flamingos in the world than real ones.",
        "Despite their carnivorous digestive system, pandas primarily eat bamboo.",
        "Conservation efforts have been put in place to protect the dwindling panda population."
    ],
    'questions': [
        "Where are pandas native to?",
        "What is the primary food source of pandas?",
        "Are there more real or fake flamingos in the world?"
    ],
    'answer': [
        "Pandas are bear species native to south-central China.",
        "Despite their carnivorous digestive system, pandas primarily eat bamboo.",
        "Switching subjects: There are more fake flamingos in the world than real ones."
    ]
}

test_case_57 = {
    'sentences': [
        "Mars, often called the Red Planet, is the fourth planet from the Sun.",
        "It's named after the Roman god of war due to its reddish appearance.",
        "Fun tidbit: Bananas are berries, but strawberries are not.",
        "Mars has the tallest volcano in the solar system named Olympus Mons.",
        "Water ice has been found at the polar caps of Mars."
    ],
    'questions': [
        "Why is Mars called the Red Planet?",
        "What is the tallest volcano on Mars?",
        "Is there water on Mars?"
    ],
    'answer': [
        "Mars, often called the Red Planet, is the fourth planet from the Sun.",
        "It's named after the Roman god of war due to its reddish appearance.",
        "Mars has the tallest volcano in the solar system named Olympus Mons.",
        "Water ice has been found at the polar caps of Mars."
    ]
}

test_case_58 = {
    'sentences': [
        "Shakespeare, a renowned playwright, penned 39 plays and 154 sonnets.",
        "Some of his most famous works include 'Romeo and Juliet' and 'Hamlet'.",
        "Speaking of classics: The original game of Monopoly was circular.",
        "The Globe Theatre in London is associated with Shakespeare's plays.",
        "He lived during the Elizabethan era, influencing English literature profoundly."
    ],
    'questions': [
        "How many plays did Shakespeare write?",
        "Which theatre is associated with Shakespeare?",
        "During which era did Shakespeare live?"
    ],
    'answer': [
        "Shakespeare, a renowned playwright, penned 39 plays and 154 sonnets.",
        "The Globe Theatre in London is associated with Shakespeare's plays.",
        "He lived during the Elizabethan era, influencing English literature profoundly."
    ]
}

test_case_59 = {
    'sentences': [
        "The Great Wall of China stretches over 21,196 km (13,171 mi).",
        "It was built primarily to protect Chinese states and empires from raids and invasions.",
        "Random trivia: The shortest war in history was between Britain and Zanzibar in 1896. Zanzibar surrendered after 38 minutes.",
        "Different sections of the wall were built by various Chinese dynasties over centuries.",
        "Contrary to popular belief, the Great Wall is not visible from space with the naked eye."
    ],
    'questions': [
        "Why was the Great Wall of China built?",
        "Is the Great Wall of China visible from space?",
        "How long is the Great Wall of China?"
    ],
    'answer': [
        "It was built primarily to protect Chinese states and empires from raids and invasions.",
        "Contrary to popular belief, the Great Wall is not visible from space with the naked eye.",
        "The Great Wall of China stretches over 21,196 km (13,171 mi)."
    ]
}

test_case_60 = {
    'sentences': [
        "The Grand Canyon, located in Arizona, is a significant landmark carved by the Colorado River.",
        "It's over a mile deep and up to 18 miles wide in places.",
        "On another note: Kangaroos can't walk backward.",
        "Many layers of the canyon reveal nearly two billion years of Earth's geological history.",
        "Visitors are often left in awe by its immense size and its intricate and colorful landscape."
    ],
    'questions': [
        "Where is the Grand Canyon located?",
        "How deep is the Grand Canyon?"
    ],
    'answer': [
        "The Grand Canyon, located in Arizona, is a significant landmark carved by the Colorado River.",
        "It's over a mile deep and up to 18 miles wide in places."
    ]
}

test_case_61 = {
    'sentences': [
        "The Great Wall of China can be seen from space.",
        "It's a series of fortifications made of stone, brick, and other materials.",
        "On another topic: The honeybee is the only insect that produces food eaten by humans.",
        "The wall stretches from east-to-west along the northern borders of China."
    ],
    'questions': [
        "What is the Great Wall made of?",
        "Can the Great Wall be seen from space?",
        "Which insect produces food for humans?"
    ],
    'answer': [
        "It's a series of fortifications made of stone, brick, and other materials.",
        "The Great Wall of China can be seen from space.",
        "On another topic: The honeybee is the only insect that produces food eaten by humans."
    ]
}

test_case_62 = {
    'sentences': [
        "Dolphins are known for their high intelligence.",
        "They are marine mammals, not fish.",
        "Switching gears: Cucumbers are actually fruits, not vegetables.",
        "Dolphins communicate using a variety of clicks, whistle-like sounds, and other vocalizations."
    ],
    'questions': [
        "Are dolphins mammals or fish?",
        "How do dolphins communicate?",
        "Is a cucumber a fruit or vegetable?"
    ],
    'answer': [
        "They are marine mammals, not fish.",
        "Dolphins communicate using a variety of clicks, whistle-like sounds, and other vocalizations.",
        "Switching gears: Cucumbers are actually fruits, not vegetables."
    ]
}

test_case_63 = {
    'sentences': [
        "The Sahara is the largest hot desert in the world.",
        "It's located in North Africa and is roughly the size of the United States.",
        "Random tidbit: Bananas are berries, but strawberries are not.",
        "The Sahara has a wide variety of landscapes including sand dunes, rocky areas, and mountains."
    ],
    'questions': [
        "Where is the Sahara desert located?",
        "Is the Sahara the world's coldest desert?",
        "Are bananas considered berries?"
    ],
    'answer': [
        "It's located in North Africa and is roughly the size of the United States.",
        "The Sahara is the largest hot desert in the world.",
        "Random tidbit: Bananas are berries, but strawberries are not."
    ]
}

test_case_64 = {
    'sentences': [
        "The Eiffel Tower was originally intended to be a temporary structure.",
        "It was built for the 1889 Paris Exposition and was almost torn down afterwards.",
        "A quick fact: The world's smallest bone is found in the human ear.",
        "Today, the Eiffel Tower is one of the most recognized structures in the world."
    ],
    'questions': [
        "Why was the Eiffel Tower built?",
        "Where is the world's smallest bone located?"
    ],
    'answer': [
        "It was built for the 1889 Paris Exposition and was almost torn down afterwards.",
        "A quick fact: The world's smallest bone is found in the human ear."
    ]
}

test_case_65 = {
    'sentences': [
        "Chocolate is derived from the beans of the cacao tree.",
        "White chocolate contains no cocoa solids.",
        "By the way: The word 'alphabet' comes from the first two letters of the Greek alphabet: alpha and beta.",
        "Dark chocolate has antioxidants and is considered healthier than milk chocolate."
    ],
    'questions': [
        "What is chocolate derived from?",
        "Which type of chocolate contains no cocoa solids?",
        "Where does the word 'alphabet' come from?"
    ],
    'answer': [
        "Chocolate is derived from the beans of the cacao tree.",
        "White chocolate contains no cocoa solids.",
        "By the way: The word 'alphabet' comes from the first two letters of the Greek alphabet: alpha and beta."
    ]
}

test_case_66 = {
    'sentences': [
        "The Amazon Rainforest produces 20% of the world's oxygen.",
        "It's often referred to as the 'lungs of the Earth'.",
        "On a different note: A group of flamingos is called a 'flamboyance'.",
        "The Amazon is home to a vast array of biodiversity, including many species that are not found anywhere else."
    ],
    'questions': [
        "How much of the world's oxygen does the Amazon produce?",
        "What is a group of flamingos called?",
        "Why is the Amazon referred to as the 'lungs of the Earth'?"
    ],
    'answer': [
        "The Amazon Rainforest produces 20% of the world's oxygen.",
        "On a different note: A group of flamingos is called a 'flamboyance'.",
        "It's often referred to as the 'lungs of the Earth'."
    ]
}

test_case_67 = {
    'sentences': [
        "Mars is often referred to as the 'Red Planet' due to its reddish appearance.",
        "It's the fourth planet from the Sun in our solar system.",
        "Interestingly, the world record for the longest hiccuping spree is 68 years.",
        "Mars has a very thin atmosphere, mostly composed of carbon dioxide."
    ],
    'questions': [
        "Why is Mars called the 'Red Planet'?",
        "Which position does Mars hold from the Sun?",
        "How long is the world record for hiccuping?"
    ],
    'answer': [
        "Mars is often referred to as the 'Red Planet' due to its reddish appearance.",
        "It's the fourth planet from the Sun in our solar system.",
        "Interestingly, the world record for the longest hiccuping spree is 68 years."
    ]
}

test_case_68 = {
    'sentences': [
        "The Pacific Ocean is the largest and deepest of Earth's oceanic divisions.",
        "It covers more area than all of Earth's landmass combined.",
        "Switching subjects: The fingerprints of a koala are so similar to humans that they can be mistaken in a crime scene.",
        "The Pacific Ocean is bounded by the Americas to the east and Asia and Australia to the west."
    ],
    'questions': [
        "Which ocean is the largest?",
        "How similar are koala fingerprints to human ones?",
        "Which continents does the Pacific Ocean border?"
    ],
    'answer': [
        "The Pacific Ocean is the largest and deepest of Earth's oceanic divisions.",
        "Switching subjects: The fingerprints of a koala are so similar to humans that they can be mistaken in a crime scene.",
        "The Pacific Ocean is bounded by the Americas to the east and Asia and Australia to the west."
    ]
}

test_case_69 = {
    'sentences': [
        "The human body consists of about 60% water.",
        "Drinking enough water is essential for maintaining health.",
        "On a different note: The world's shortest war was between Britain and Zanzibar on August 27, 1896. Zanzibar surrendered after 38 minutes.",
        "Water helps regulate body temperature, lubricate joints, and eliminate wastes."
    ],
    'questions': [
        "What percentage of the human body is water?",
        "Why is water important for the body?",
        "How long was the world's shortest war?"
    ],
    'answer': [
        "The human body consists of about 60% water.",
        "Water helps regulate body temperature, lubricate joints, and eliminate wastes.",
        "On a different note: The world's shortest war was between Britain and Zanzibar on August 27, 1896. Zanzibar surrendered after 38 minutes."
    ]
}

test_case_70 = {
    'sentences': [
        "Mount Everest is the highest mountain above sea level.",
        "Its elevation is 29,032 feet (8,849 meters).",
        "On another subject: The unicorn is the national animal of Scotland.",
        "Mount Everest is part of the Himalaya range in Asia."
    ],
    'questions': [
        "How tall is Mount Everest?",
        "What is the national animal of Scotland?",
        "Where is Mount Everest located?"
    ],
    'answer': [
        "Its elevation is 29,032 feet (8,849 meters).",
        "On another subject: The unicorn is the national animal of Scotland.",
        "Mount Everest is part of the Himalaya range in Asia."
    ]
}

test_case_71 = {
    'sentences': [
        "The speed of light in a vacuum is approximately 299,792,458 meters per second.",
        "Nothing can travel faster than the speed of light in a vacuum.",
        "Interestingly, the original name for butterfly was flutterby.",
        "Light has properties of both waves and particles."
    ],
    'questions': [
        "What is the speed of light in a vacuum?",
        "Can anything travel faster than light?",
        "What was the original name for butterfly?"
    ],
    'answer': [
        "The speed of light in a vacuum is approximately 299,792,458 meters per second.",
        "Nothing can travel faster than the speed of light in a vacuum.",
        "Interestingly, the original name for butterfly was flutterby."
    ]
}

test_case_72 = {
    'sentences': [
        "The Nile is the longest river in the world.",
        "It flows through northeastern Africa for about 6,650 km (4,130 miles).",
        "A different topic: The dot over the letter 'i' is called a tittle.",
        "The Nile River is a crucial source of water for many countries in Africa."
    ],
    'questions': [
        "Where does the Nile River flow?",
        "How long is the Nile River?",
        "What is the dot over the letter 'i' called?"
    ],
    'answer': [
        "It flows through northeastern Africa for about 6,650 km (4,130 miles).",
        "The Nile is the longest river in the world.",
        "A different topic: The dot over the letter 'i' is called a tittle."
    ]
}

test_case_73 = {
    'sentences': [
        "Bananas are berries, but strawberries are not.",
        "Botanically speaking, a berry is a fleshy fruit without a stone produced from a single flower containing one ovary.",
        "By the way, did you know that the world's oldest known grapevine is over 400 years old?",
        "Bananas contain several essential nutrients and can aid digestion and heart health."
    ],
    'questions': [
        "What is the botanical definition of a berry?",
        "Are bananas considered berries?",
        "How old is the world's oldest known grapevine?"
    ],
    'answer': [
        "Botanically speaking, a berry is a fleshy fruit without a stone produced from a single flower containing one ovary.",
        "Bananas are berries, but strawberries are not.",
        "By the way, did you know that the world's oldest known grapevine is over 400 years old?"
    ]
}

test_case_74 = {
    'sentences': [
        "The Great Wall of China is over 13,000 miles long.",
        "It was built to protect the Chinese states and empires against raids and invasions.",
        "Interestingly, honey never spoils. Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still perfectly good to eat.",
        "The Great Wall has been rebuilt, maintained, and enhanced over various dynasties."
    ],
    'questions': [
        "Why was the Great Wall of China built?",
        "How long is the Great Wall of China?",
        "How long can honey last?"
    ],
    'answer': [
        "It was built to protect the Chinese states and empires against raids and invasions.",
        "The Great Wall of China is over 13,000 miles long.",
        "Interestingly, honey never spoils. Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still perfectly good to eat."
    ]
}

test_case_75 = {
    'sentences': [
        "The Pacific Ocean is the largest and deepest ocean basin on Earth.",
        "It covers more than 60 million square miles (165 million square kilometers).",
        "On a different note, penguins are native to the Southern Hemisphere, especially Antarctica.",
        "The Pacific Ocean is larger than all of the Earth's land area combined."
    ],
    'questions': [
        "How big is the Pacific Ocean?",
        "Are penguins native to the Northern Hemisphere?",
        "How does the Pacific Ocean's size compare to Earth's land area?"
    ],
    'answer': [
        "It covers more than 60 million square miles (165 million square kilometers).",
        "On a different note, penguins are native to the Southern Hemisphere, especially Antarctica.",
        "The Pacific Ocean is larger than all of the Earth's land area combined."
    ]
}

test_case_76 = {
    'sentences': [
        "Venus is the second planet from the sun in our solar system.",
        "It is sometimes referred to as the sister planet to Earth, due to their similar size and proximity.",
        "Speaking of space, a day on Mercury is longer than a year on Mercury.",
        "Venus has a thick atmosphere with clouds of sulfuric acid."
    ],
    'questions': [
        "Which planet is referred to as the sister planet to Earth?",
        "Why is Venus called Earth's sister planet?",
        "What peculiar fact is true about a day on Mercury?"
    ],
    'answer': [
        "Venus is the second planet from the sun in our solar system.",
        "It is sometimes referred to as the sister planet to Earth, due to their similar size and proximity.",
        "Speaking of space, a day on Mercury is longer than a year on Mercury."
    ]
}

test_case_77 = {
    'sentences': [
        "The Sahara Desert is the largest hot desert in the world.",
        "It stretches across several countries in North Africa.",
        "Speaking of deserts, cacti store water to survive in such arid conditions.",
        "The Sahara is approximately the size of the United States."
    ],
    'questions': [
        "Where is the Sahara Desert located?",
        "What is a notable survival feature of cacti?",
        "How large is the Sahara in comparison to countries?"
    ],
    'answer': [
        "It stretches across several countries in North Africa.",
        "Speaking of deserts, cacti store water to survive in such arid conditions.",
        "The Sahara is approximately the size of the United States."
    ]
}

test_case_78 = {
    'sentences': [
        "The Amazon rainforest is the world's largest tropical rainforest.",
        "It is home to a myriad of plant and animal species.",
        "Chocolate, beloved by many, is made from cocoa beans, which come from the cacao tree.",
        "The Amazon spans over nine countries in South America."
    ],
    'questions': [
        "Where is the Amazon rainforest located?",
        "What do cocoa beans produce?",
        "How many countries does the Amazon span across?"
    ],
    'answer': [
        "The Amazon spans over nine countries in South America.",
        "Chocolate, beloved by many, is made from cocoa beans, which come from the cacao tree.",
        "The Amazon spans over nine countries in South America."
    ]
}

test_case_79 = {
    'sentences': [
        "Kangaroos are marsupials native to Australia.",
        "They are known for their strong tail and large powerful hind legs.",
        "Pizza is believed to have originated in Naples, Italy.",
        "Kangaroos use hopping as their primary method of locomotion."
    ],
    'questions': [
        "Where are kangaroos native to?",
        "What is kangaroo's primary method of movement?",
        "Where is the origin of pizza believed to be?"
    ],
    'answer': [
        "Kangaroos are marsupials native to Australia.",
        "Kangaroos use hopping as their primary method of locomotion.",
        "Pizza is believed to have originated in Naples, Italy."
    ]
}

test_case_80 = {
    'sentences': [
        "Jupiter is the largest planet in our solar system.",
        "It is primarily composed of hydrogen and helium.",
        "Swans mate for life and are often seen in pairs.",
        "Jupiter has a strong magnetic field and dozens of moons."
    ],
    'questions': [
        "What is the largest planet in our solar system?",
        "What are the primary components of Jupiter?",
        "How do swans typically behave in terms of mating?"
    ],
    'answer': [
        "Jupiter is the largest planet in our solar system.",
        "It is primarily composed of hydrogen and helium.",
        "Swans mate for life and are often seen in pairs."
    ]
}

test_case_81 = {
    'sentences': [
        "Mount Kilimanjaro is the tallest mountain in Africa.",
        "It is a dormant volcano located in Tanzania.",
        "On another topic, pandas are primarily herbivores, but they belong to the order Carnivora.",
        "The summit of Kilimanjaro is called Uhuru Peak."
    ],
    'questions': [
        "Where is Mount Kilimanjaro located?",
        "What is unique about panda's diet given their classification?",
        "What is the summit of Kilimanjaro named?"
    ],
    'answer': [
        "It is a dormant volcano located in Tanzania.",
        "On another topic, pandas are primarily herbivores, but they belong to the order Carnivora.",
        "The summit of Kilimanjaro is called Uhuru Peak."
    ]
}

test_case_82 = {
    'sentences': [
        "The Nile is often considered the longest river in the world.",
        "It flows through northeastern Africa.",
        "Did you know that the heart of a shrimp is located in its head?",
        "The Nile has two primary tributaries: the Blue Nile and the White Nile."
    ],
    'questions': [
        "Where does the Nile river flow?",
        "How many primary tributaries does the Nile have, and what are they?"
    ],
    'answer': [
        "It flows through northeastern Africa.",
        "The Nile has two primary tributaries: the Blue Nile and the White Nile."
    ]
}

test_case_83 = {
    'sentences': [
        "The human brain is an intricate organ responsible for thinking, emotion, and processing sensory information.",
        "It weighs approximately three pounds in adults.",
        "Interestingly, a bolt of lightning can measure up to three million volts.",
        "The brain is made up of billions of neurons that transmit signals."
    ],
    'questions': [
        "What does the human brain process?",
        "How many volts can a lightning bolt measure?",
        "What are the primary cells in the brain?"
    ],
    'answer': [
        "The human brain is an intricate organ responsible for thinking, emotion, and processing sensory information.",
        "Interestingly, a bolt of lightning can measure up to three million volts.",
        "The brain is made up of billions of neurons that transmit signals."
    ]
}

test_case_84 = {
    'sentences': [
        "The Eiffel Tower is one of the most iconic landmarks in Paris.",
        "It was constructed in 1889 as the entrance arch to the 1889 World's Fair.",
        "On a food note, sushi is a traditional Japanese dish of vinegared rice accompanied by various ingredients.",
        "The Eiffel Tower stands at a height of 324 meters."
    ],
    'questions': [
        "Why was the Eiffel Tower constructed?",
        "What is the height of the Eiffel Tower?",
        "What is sushi made of?"
    ],
    'answer': [
        "It was constructed in 1889 as the entrance arch to the 1889 World's Fair.",
        "The Eiffel Tower stands at a height of 324 meters.",
        "On a food note, sushi is a traditional Japanese dish of vinegared rice accompanied by various ingredients."
    ]
}

test_case_85 = {
    'sentences': [
        "The Grand Canyon is a steep-sided canyon carved by the Colorado River in Arizona.",
        "It is one of the most famous natural landmarks in the United States.",
        "Speaking of nature, chameleons have the ability to change color based on their mood or surroundings.",
        "The Grand Canyon is over a mile deep at its deepest point."
    ],
    'questions': [
        "Where is the Grand Canyon located?",
        "How deep is the Grand Canyon?",
        "Why do chameleons change color?"
    ],
    'answer': [
        "The Grand Canyon is a steep-sided canyon carved by the Colorado River in Arizona.",
        "The Grand Canyon is over a mile deep at its deepest point.",
        "Speaking of nature, chameleons have the ability to change color based on their mood or surroundings."
    ]
}

test_case_86 = {
    'sentences': [
        "Elephants are the largest living land animals.",
        "They are known for their long trunks and large ears.",
        "On a separate note, the game of chess originated in India.",
        "Elephants have a strong memory and form complex social structures."
    ],
    'questions': [
        "What is notable about the size of elephants?",
        "Where did chess originate?",
        "What is unique about the social structure and memory of elephants?"
    ],
    'answer': [
        "Elephants are the largest living land animals.",
        "On a separate note, the game of chess originated in India.",
        "Elephants have a strong memory and form complex social structures."
    ]
}

test_case_87 = {
    'sentences': [
        "The Great Wall of China stretches over 13,000 miles.",
        "It was built to protect Chinese states and empires from invasions.",
        "In other news, honey never spoils and can last for thousands of years.",
        "The wall's construction began in the 7th century BC."
    ],
    'questions': [
        "Why was the Great Wall of China built?",
        "How long can honey last without spoiling?",
        "When did the construction of the Great Wall start?"
    ],
    'answer': [
        "It was built to protect Chinese states and empires from invasions.",
        "In other news, honey never spoils and can last for thousands of years.",
        "The wall's construction began in the 7th century BC."
    ]
}

test_case_88 = {
    'sentences': [
        "Penguins are flightless birds primarily found in the Southern Hemisphere.",
        "Despite their inability to fly, they are excellent swimmers.",
        "Separately, the art of origami comes from Japan and involves folding paper into intricate designs.",
        "The Emperor Penguin is the tallest and heaviest of all penguin species."
    ],
    'questions': [
        "Where are penguins primarily found?",
        "Which penguin is the tallest and heaviest?",
        "What does origami involve?"
    ],
    'answer': [
        "Penguins are flightless birds primarily found in the Southern Hemisphere.",
        "The Emperor Penguin is the tallest and heaviest of all penguin species.",
        "Separately, the art of origami comes from Japan and involves folding paper into intricate designs."
    ]
}

test_case_89 = {
    'sentences': [
        "The Louvre Museum in Paris is the world's largest art museum.",
        "It is home to thousands of works of art, including the Mona Lisa.",
        "Interestingly, the coconut tree is often called the tree of life due to its versatility.",
        "The Louvre was originally built as a fortress in the 12th century."
    ],
    'questions': [
        "What significant artwork is housed in the Louvre?",
        "What was the original purpose of the Louvre?",
        "Why is the coconut tree called the tree of life?"
    ],
    'answer': [
        "It is home to thousands of works of art, including the Mona Lisa.",
        "The Louvre was originally built as a fortress in the 12th century.",
        "Interestingly, the coconut tree is often called the tree of life due to its versatility."
    ]
}

test_case_90 = {
    'sentences': [
        "Neptune is the eighth and farthest planet from the Sun in our solar system.",
        "It is known for its beautiful deep blue color due to the presence of methane.",
        "On Earth, the process of photosynthesis allows plants to convert sunlight into energy.",
        "Neptune's winds are the fastest in the solar system, reaching speeds of up to 2,100 km/h."
    ],
    'questions': [
        "Why does Neptune appear blue?",
        "Which process lets plants convert sunlight to energy?",
        "How fast are the winds on Neptune?"
    ],
    'answer': [
        "It is known for its beautiful deep blue color due to the presence of methane.",
        "On Earth, the process of photosynthesis allows plants to convert sunlight into energy.",
        "Neptune's winds are the fastest in the solar system, reaching speeds of up to 2,100 km/h."
    ]
}

test_case_91 = {
    'sentences': [
        "The violin is a stringed musical instrument with a hollow wooden body.",
        "It is played with a bow made of horsehair.",
        "Separately, the Sahara Desert is famous for its vast sand dunes, some of which can reach heights of 500 feet.",
        "The violin is considered one of the most versatile and expressive instruments in the world."
    ],
    'questions': [
        "How is a violin played?",
        "What is notable about the sand dunes in the Sahara?",
        "What makes the violin distinct among musical instruments?"
    ],
    'answer': [
        "It is played with a bow made of horsehair.",
        "Separately, the Sahara Desert is famous for its vast sand dunes, some of which can reach heights of 500 feet.",
        "The violin is considered one of the most versatile and expressive instruments in the world."
    ]
}

test_case_92 = {
    'sentences': [
        "Giraffes are the tallest mammals on Earth.",
        "Their long necks allow them to eat leaves and buds from tall trees.",
        "In another context, the Mariana Trench is the deepest part of the world's oceans.",
        "A giraffe's legs alone are taller than many humans."
    ],
    'questions': [
        "What do giraffes typically eat?",
        "How do giraffes compare in height to humans?",
        "What is significant about the Mariana Trench?"
    ],
    'answer': [
        "Their long necks allow them to eat leaves and buds from tall trees.",
        "A giraffe's legs alone are taller than many humans.",
        "In another context, the Mariana Trench is the deepest part of the world's oceans."
    ]
}

test_case_93 = {
    'sentences': [
        "The Colosseum in Rome is an ancient amphitheater used for gladiatorial contests and public spectacles.",
        "It could hold up to 80,000 spectators.",
        "On a different topic, the hummingbird is the only bird that can fly backward.",
        "The Colosseum was built in 80 AD and is a symbol of the Roman Empire's grandeur."
    ],
    'questions': [
        "What was the primary use of the Colosseum?",
        "How many spectators could the Colosseum accommodate?",
        "What is unique about the flight of a hummingbird?"
    ],
    'answer': [
        "The Colosseum in Rome is an ancient amphitheater used for gladiatorial contests and public spectacles.",
        "It could hold up to 80,000 spectators.",
        "On a different topic, the hummingbird is the only bird that can fly backward."
    ]
}

test_case_94 = {
    'sentences': [
        "Bamboo is the fastest-growing plant on Earth.",
        "Some species of bamboo can grow up to 3 feet in a 24-hour period.",
        "Separately, the game of basketball was invented by Dr. James Naismith in 1891.",
        "Bamboo is a vital building material in many Asian cultures."
    ],
    'questions': [
        "How quickly can some bamboo species grow?",
        "Who invented basketball and when?",
        "What is the cultural significance of bamboo?"
    ],
    'answer': [
        "Some species of bamboo can grow up to 3 feet in a 24-hour period.",
        "Separately, the game of basketball was invented by Dr. James Naismith in 1891.",
        "Bamboo is a vital building material in many Asian cultures."
    ]
}

test_case_95 = {
    'sentences': [
        "Mount Everest is the highest mountain above sea level in the world.",
        "It has an elevation of 29,032 feet (8,849 meters).",
        "In contrast, the city of Venice is famous for its canals and gondolas.",
        "Climbing Mount Everest presents challenges like extreme cold and altitude sickness."
    ],
    'questions': [
        "What is the elevation of Mount Everest?",
        "What challenges do climbers face on Mount Everest?",
        "What mode of transport is associated with Venice?"
    ],
    'answer': [
        "It has an elevation of 29,032 feet (8,849 meters).",
        "Climbing Mount Everest presents challenges like extreme cold and altitude sickness.",
        "In contrast, the city of Venice is famous for its canals and gondolas."
    ]
}

test_case_96 = {
    'sentences': [
        "The heart is a vital organ that pumps blood throughout the body.",
        "It is responsible for supplying oxygen and nutrients to the cells.",
        "On another note, the Pyramids of Giza in Egypt are one of the Seven Wonders of the Ancient World.",
        "The human heart beats over 100,000 times a day."
    ],
    'questions': [
        "What is the main function of the heart?",
        "How often does the human heart beat in a day?",
        "Why are the Pyramids of Giza significant?"
    ],
    'answer': [
        "The heart is a vital organ that pumps blood throughout the body.",
        "It is responsible for supplying oxygen and nutrients to the cells.",
        "The human heart beats over 100,000 times a day.",
        "On another note, the Pyramids of Giza in Egypt are one of the Seven Wonders of the Ancient World."
    ]
}

test_case_97 = {
    'sentences': [
        "Jupiter is the largest planet in our solar system.",
        "It has a strong magnetic field and is primarily composed of hydrogen and helium.",
        "In another domain, Shakespeare is considered one of the greatest playwrights in English literature.",
        "The Great Red Spot on Jupiter is a persistent high-pressure region, resembling a storm."
    ],
    'questions': [
        "What is Jupiter primarily composed of?",
        "What is the significance of the Great Red Spot on Jupiter?",
        "Who is considered one of the greatest playwrights in English literature?"
    ],
    'answer': [
        "It has a strong magnetic field and is primarily composed of hydrogen and helium.",
        "The Great Red Spot on Jupiter is a persistent high-pressure region, resembling a storm.",
        "In another domain, Shakespeare is considered one of the greatest playwrights in English literature."
    ]
}

test_case_98 = {
    'sentences': [
        "Chocolate is made from the beans of the cacao tree.",
        "It has been consumed by humans for thousands of years in various forms.",
        "Separately, the Eiffel Tower in Paris is one of the most recognizable structures in the world.",
        "Dark chocolate has been shown to have health benefits, including improving heart health."
    ],
    'questions': [
        "From what is chocolate made?",
        "What is the significance of the Eiffel Tower?",
        "What are some health benefits of dark chocolate?"
    ],
    'answer': [
        "Chocolate is made from the beans of the cacao tree.",
        "Separately, the Eiffel Tower in Paris is one of the most recognizable structures in the world.",
        "Dark chocolate has been shown to have health benefits, including improving heart health."
    ]
}

test_case_99 = {
    'sentences': [
        "The Nile is the longest river in the world, flowing through northeastern Africa.",
        "It has two main tributaries: the White Nile and the Blue Nile.",
        "On a different note, the piano is a versatile musical instrument with 88 keys.",
        "The Nile has been instrumental in the development of the civilizations of ancient Egypt."
    ],
    'questions': [
        "What are the main tributaries of the Nile?",
        "How has the Nile impacted ancient civilizations?",
        "How many keys does a piano have?"
    ],
    'answer': [
        "It has two main tributaries: the White Nile and the Blue Nile.",
        "The Nile has been instrumental in the development of the civilizations of ancient Egypt.",
        "On a different note, the piano is a versatile musical instrument with 88 keys."
    ]
}

test_case_100 = {
    'sentences': [
        "The human brain is the center of the nervous system and is responsible for thought, memory, and emotion.",
        "It weighs, on average, about 1.4 kilograms.",
        "In contrast, the Grand Canyon in the United States is a steep-sided canyon carved by the Colorado River.",
        "The brain has over 86 billion nerve cells or neurons."
    ],
    'questions': [
        "What functions does the human brain perform?",
        "How many neurons does the brain have?",
        "What natural feature carved the Grand Canyon?"
    ],
    'answer': [
        "The human brain is the center of the nervous system and is responsible for thought, memory, and emotion.",
        "The brain has over 86 billion nerve cells or neurons.",
        "In contrast, the Grand Canyon in the United States is a steep-sided canyon carved by the Colorado River."
    ]
}


test_cases = [
    test_case_1,
    test_case_2,
    test_case_3,
    test_case_4,
    test_case_5,
    test_case_6,
    test_case_7,
    test_case_8,
    test_case_9,
    test_case_10,
    test_case_11,
    test_case_12,
    test_case_13,
    test_case_14,
    test_case_15,
    test_case_16,
    test_case_17,
    test_case_18,
    test_case_19,
    test_case_20,
    test_case_21,
    test_case_22,
    test_case_23,
    test_case_24,
    test_case_25,
    test_case_26,
    test_case_27,
    test_case_28,
    test_case_29,
    test_case_30,
    test_case_31,
    test_case_32,
    test_case_33,
    test_case_34,
    test_case_35,
    test_case_36,
    test_case_37,
    test_case_38,
    test_case_39,
    test_case_40,
    test_case_41,
    test_case_42,
    test_case_43,
    test_case_44,
    test_case_45,
    test_case_46,
    test_case_47,
    test_case_48,
    test_case_49,
    test_case_50,
    test_case_51,
    test_case_52,
    test_case_53,
    test_case_54,
    test_case_55,
    test_case_56,
    test_case_57,
    test_case_58,
    test_case_59,
    test_case_60,
    test_case_61,
    test_case_62,
    test_case_63,
    test_case_64,
    test_case_65,
    test_case_66,
    test_case_67,
    test_case_68,
    test_case_69,
    test_case_70,
    test_case_71,
    test_case_72,
    test_case_73,
    test_case_74,
    test_case_75,
    test_case_76,
    test_case_77,
    test_case_78,
    test_case_79,
    test_case_80,
    test_case_81,
    test_case_82,
    test_case_83,
    test_case_84,
    test_case_85,
    test_case_86,
    test_case_87,
    test_case_88,
    test_case_89,
    test_case_90,
    test_case_91,
    test_case_92,
    test_case_93,
    test_case_94,
    test_case_95,
    test_case_96,
    test_case_97,
    test_case_98,
    test_case_99,
    test_case_100
]

##Инсталации и конфигурации

In [None]:
pip install fasttext



In [None]:
# Download the pretrained FastText vectors for English
!wget https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz

# Decompress the file
!gunzip -f cc.en.300.bin.gz


--2023-10-27 00:20:19--  https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 52.84.251.27, 52.84.251.106, 52.84.251.114, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|52.84.251.27|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4503593528 (4.2G) [application/octet-stream]
Saving to: ‘cc.en.300.bin.gz’


2023-10-27 00:20:38 (225 MB/s) - ‘cc.en.300.bin.gz’ saved [4503593528/4503593528]



In [None]:
pip install flask transformers torch nltk scikit-learn sentence_transformers



In [None]:
!pip install sentence-transformers



##Имплементация на модели

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

class BaseBertAnswerabilityChecker:
    def __init__(self):
        self.tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2")
        self.model = AutoModelForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")

    def check(self, text, questions):
        answerabilities = []
        for question in questions:
            inputs = self.tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="pt")
            input_ids = inputs["input_ids"].tolist()[0]
            text_tokens = self.tokenizer.convert_ids_to_tokens(input_ids)

            outputs = self.model(**inputs)
            answer_start_scores = outputs.start_logits
            answer_end_scores = outputs.end_logits

            answer_start_index = torch.argmax(answer_start_scores)
            answer_end_index = torch.argmax(answer_end_scores)

            answer = self.tokenizer.convert_tokens_to_string(text_tokens[answer_start_index:answer_end_index+1])

            answerabilities.append({
                "question": question,
                "answer": 1 if answer.strip() != '[CLS]' else 0
            })

        return answerabilities


In [None]:
import nltk
from nltk.corpus import stopwords, wordnet
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import WordNetLemmatizer

class SynonymWordCounterAnswerabilityChecker:
    def __init__(self):
        nltk.download('punkt')
        nltk.download('stopwords')
        nltk.download('averaged_perceptron_tagger')
        nltk.download('wordnet')

    def preprocess(self, text):
        words = word_tokenize(text.lower())
        stop_words = set(stopwords.words('english'))
        words = [word for word in words if word.isalnum() and word not in stop_words]
        lemmatizer = WordNetLemmatizer()
        words = [lemmatizer.lemmatize(word) for word in words]
        return set(words)

    def has_synonym_match(self, word1, word2):
        synsets1 = wordnet.synsets(word1)
        synsets2 = wordnet.synsets(word2)
        return any(s1 in synsets2 for s1 in synsets1)

    def count_matches(self, text_words, question):
        question_words = self.preprocess(question)

        exact_match_weight = 2.0
        synonym_match_weight = 1.0

        total_weights = 0
        match_score = 0

        for qw in question_words:
            if qw in text_words:
                match_score += exact_match_weight
                total_weights += exact_match_weight
            else:
                synonym_scores = [wordnet.wup_similarity(wordnet.synsets(qw)[0], tws) for tws in wordnet.synsets(qw) if any(tws in wordnet.synsets(tw) for tw in text_words)]
                if synonym_scores:
                    match_score += max(synonym_scores) * synonym_match_weight
                    total_weights += synonym_match_weight

        if total_weights == 0:
            return 0

        relevance_score = match_score / total_weights

        return relevance_score


    def check(self, text, questions):
        text_words = self.preprocess(text)
        results = []
        for question in questions:
            results.append({
                "question": question,
                "answer": self.count_matches(text_words, question) > 0.51
            })
        return results


In [None]:
import fasttext
import numpy as np
from scipy.spatial.distance import cosine

class FastTextSimilarityAnswerabilityChecker:
    def __init__(self):
        self.model = fasttext.load_model('cc.en.300.bin')

    def get_embedding(self, text):
        words = text.split()
        vectors = [self.model.get_word_vector(word) for word in words]
        if vectors:
            return np.mean(vectors, axis=0)
        return np.zeros(self.model.get_dimension())

    def compute_similarity(self, text1, text2):
        vec1 = self.get_embedding(text1)
        vec2 = self.get_embedding(text2)
        return 1 - cosine(vec1, vec2)

    def can_answer(self, text, question, threshold=0.51):
        sentences = [s.strip() for s in text.split('.') if s.strip()]
        max_similarity = max([self.compute_similarity(sentence, question) for sentence in sentences])
        return max_similarity > threshold

    def check(self, text, questions):
        answerabilities = []
        for question in questions:
            answerabilities.append({
                "question": question,
                "answer": self.can_answer(text, question)
            })

        return answerabilities


In [None]:
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine

class SentenceEmbeddingSimilarityAnswerabilityChecker:
    def __init__(self):
        self.model = SentenceTransformer('paraphrase-distilroberta-base-v1')

    def get_embedding(self, text):
        return self.model.encode(text)

    def compute_similarity(self, text1, text2):
        vec1 = self.get_embedding(text1)
        vec2 = self.get_embedding(text2)
        return 1 - cosine(vec1, vec2)

    def can_answer(self, text, question, threshold=0.51):
        sentences = [s.strip() for s in text.split('.') if s.strip()]

        if not sentences:
          return False

        max_similarity = max([self.compute_similarity(sentence, question) for sentence in sentences])
        return max_similarity > threshold

    def check(self, text, questions):
        answerabilities = []
        for question in questions:
            answerabilities.append({
                "question": question,
                "answer": self.can_answer(text, question)
            })

        return answerabilities


In [None]:
from transformers import BertForQuestionAnswering, BertTokenizer

class AdvancedBertAnswerabilityChecker:
    def __init__(self):
        self.model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
        self.tokenizer = BertTokenizer.from_pretrained(self.model_name)
        self.model = BertForQuestionAnswering.from_pretrained(self.model_name)

    def check(self, text, questions):
        results = []
        for question in questions:
            inputs = self.tokenizer.encode_plus(question, text, return_tensors="pt", max_length=512)
            input_ids = inputs["input_ids"].tolist()[0]

            outputs = self.model(**inputs)
            answer_start_scores = outputs.start_logits
            answer_end_scores = outputs.end_logits

            answer_start = torch.argmax(answer_start_scores)
            answer_end = torch.argmax(answer_end_scores) + 1

            confidence_threshold = 1
            if answer_start_scores.max() > confidence_threshold and answer_end_scores.max() > confidence_threshold:
                answer = self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
                results.append({"question": question, "answer": True})
            else:
                results.append({"question": question, "answer": False})

        return results


##Изпълнител на модели

In [None]:
class InformationOptimizer:
    def __init__(self, sentences, questions, checkers):
        self.sentences = sentences
        self.checkers = checkers
        self.questions = questions
        self.s_score = self.calculate_s_score(sentences)

    def calculate_s_score(self, sentences):
        coefficients = []
        for checker in self.checkers:
            x = [result['answer'] for result in checker.check(''.join(sentences), self.questions)]
            coefficients.append(sum(x) / len(self.questions))
        return sum(coefficients) / len(self.checkers)

    #def get_relevant_sentences(self, not_relevant_indexes, current_index):
    #    sentences = []
    #    for index, sentence in enumerate(self.sentences):
    #        if not index in not_relevant_indexes and current_index != index:
    #            sentences.append(sentence)
    #    return sentences

    def analyze(self):
        relevant_sentences = []
        not_relevant_indexes = set()
        for i, sentence in enumerate(self.sentences):
            #if self.calculate_s_score(self.get_relevant_sentences(not_relevant_indexes, i)) < self.s_score:
            if self.calculate_s_score(sentence) > 0:
                relevant_sentences.append(sentence)
            else:
                not_relevant_indexes.add(i)

        return relevant_sentences

##Резултати от тестовете

In [None]:
sentence_embed_sim = SentenceEmbeddingSimilarityAnswerabilityChecker()
synonym_word_counter_answer_checker = SynonymWordCounterAnswerabilityChecker()
fast_text_similarity = FastTextSimilarityAnswerabilityChecker()
base_bert_answer_checker = BaseBertAnswerabilityChecker()
advanced_bert_answer_checker = AdvancedBertAnswerabilityChecker()

checkers = [
    {
        'name': "Advanced_Bert_Answerability_Checker",
        'ref': advanced_bert_answer_checker
    },
    {
        'name': "Sentence_Embedding_Similarity_Answerability_Checker",
        'ref': sentence_embed_sim
    },
    {
        'name': "Fast_Text_Similarity_Answerability_Checker",
        'ref': fast_text_similarity
    },
    {
        'name': "Base_Bert_Answerability_Checker",
        'ref': base_bert_answer_checker
    },
    {
        'name': "Synonym_Word_Counter_Answerability_Checker",
        'ref': synonym_word_counter_answer_checker
    }
]

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initia

In [None]:
import time

stats_by_checkers = {}
for checker in checkers:
  print(f'Checker: {checker["name"]}')
  total_f1 = 0
  start_time = time.time()
  for i, test_case in enumerate(test_cases):
    relevant_sentences = InformationOptimizer(test_case['sentences'], test_case['questions'], [checker['ref']]).analyze()
    correct = 0
    incorrect = 0
    for received_sentence in relevant_sentences:
      is_found = False
      for correct_sentence in test_case['answer']:
        if received_sentence == correct_sentence:
          is_found = True
          break
      if is_found:
        correct += 1
      else:
        incorrect += 1

    tp = correct
    fp = incorrect
    fn = len(test_case["answer"]) - correct
    tn = len(test_case["sentences"]) - (tp + fp + fn) if tp + fp + fn <= len(test_case["sentences"]) else 0
    precision = tp / (tp + fp) if tp + fp > 0 else 0.0
    recall = tp / (tp + fn) if tp + fn > 0 else 0.0
    f1 = 2 * ((precision * recall) / (precision + recall)) if precision + recall > 0 else 0.0
    if f1 <= 0.5:
      print(f'\tTest Case {i + 1}:')
      print(f'\t\tTrue positives: {tp}')
      print(f'\t\tFalse positives: {fp}')
      print(f'\t\tTrue negatives: {tn}')
      print(f'\t\tFalse negatives: {fn}')
      print(f'\t\tAccuracy: {(tp + tn) / (tp + tn + fp + fn)}')
      print(f'\t\tPositive Predictive Value (Precision): {precision}')
      print(f'\t\tTrue Positive Rate(Recall, Sensitivity): {recall}')
      print(f'\t\tTrue Negative Rate (Specificity): {tn / (tn + fp) if tn + fp > 0 else 0.0}')
      print(f'\t\tNegative Predictive Value: {tn / (tn + fn) if tn + fn > 0 else 0.0}')
      print(f'\t\tF1 score: {f1}')
    else:
      print(f'\tTest Case {i + 1} -> F1 score: {f1}')
    total_f1 += f1

  elapsed_time = time.time() - start_time
  stats_by_checkers[checker["name"]] = {
      'f1': total_f1 / len(test_cases),
      'time': elapsed_time
  }
  print(f'Macro-average F1 Score: {stats_by_checkers[checker["name"]]["f1"]}')
  print(f'Time taken: {stats_by_checkers[checker["name"]]["time"]} seconds')
  print('\n')


print('Checkers ordered by best F1 score:')
sorted_data = dict(sorted(stats_by_checkers.items(), key=lambda item: item[1]["f1"], reverse=True))
for key, value in sorted_data.items():
    print(f'Checker: {key}, F1 score: {value["f1"]}, Time taken: {value["time"]}')


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Checker: Advanced_Bert_Answerability_Checker
	Test Case 1 -> F1 score: 0.6666666666666666
	Test Case 2 -> F1 score: 1.0
	Test Case 3 -> F1 score: 0.6666666666666666
	Test Case 4 -> F1 score: 0.7499999999999999
	Test Case 5 -> F1 score: 0.6666666666666666
	Test Case 6 -> F1 score: 0.7499999999999999
	Test Case 7 -> F1 score: 0.7499999999999999
	Test Case 8 -> F1 score: 0.5714285714285715
	Test Case 9 -> F1 score: 0.8571428571428571
	Test Case 10 -> F1 score: 0.6
	Test Case 11 -> F1 score: 0.8
	Test Case 12 -> F1 score: 0.75
	Test Case 13 -> F1 score: 0.6
	Test Case 14 -> F1 score: 0.7499999999999999
	Test Case 15 -> F1 score: 0.7499999999999999
	Test Case 16 -> F1 score: 1.0
	Test Case 17 -> F1 score: 0.8571428571428571
	Test Case 18 -> F1 score: 0.7499999999999999
	Test Case 19 -> F1 score: 0.7499999999999999
	Test Case 20 -> F1 score: 0.6666666666666666
	Test Case 21:
		True positives: 2
		False positives: 3
		True negatives: 1
		False negatives: 1
		Accuracy: 0.42857142857142855
		Po