## Installation

In [3]:
!pip install datasets transformers evaluate torch rouge_score -q

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone


In [4]:
from datasets import load_dataset, load_metric
from transformers import AutoTokenizer, BertTokenizer, BertModel, AutoModelForQuestionAnswering, pipeline

## Dataset

In [5]:
dataset = [
	{
        "context": "In the dense forests of North America, the haunting call of the wolf often echoes through the trees. While wolves are known for their howls, less attention is given to the growls of bears or the chirping of chipmunks, which are also common sounds in these habitats.",
        "question": "What sound do wolves make?",
        "response": "Howl.",
        "technical_terms": ["howl"]
    },
    {
        "context": "On a typical farm, one might hear a variety of animal sounds. The crowing of the rooster at dawn is perhaps as recognizable as the mooing of cows in the field. Despite the frequent bleating of goats, it's the rooster that heralds the break of day.",
        "question": "What sound do roosters make?",
        "response": "Crow.",
        "technical_terms": ["crow"]
    },
    {
        "context": "The Amazon rainforest is alive with the vibrant cacophony of its many inhabitants. Among the most notable are the parrots, known for their squawks. Equally vocal are the howler monkeys, whose calls can be heard echoing for miles. However, it is the parrots that are often remembered for their colorful feathers and distinctive sounds.",
        "question": "What sound do parrots make?",
        "response": "Squawk.",
        "technical_terms": ["squawk"]
    },
    {
        "context": "Visiting a zoo, one may encounter an array of exotic animal sounds. The trumpeting of elephants might overshadow the less dramatic grunts of hippos. Nevertheless, it's important to appreciate the wide range of vocalizations that contribute to the ambient soundscape of the zoo.",
        "question": "What sound do elephants make?",
        "response": "Trumpet.",
        "technical_terms": ["trumpet"]
    },
    {
        "context": "The African savannah is home to a myriad of animal noises, from the roaring of lions to the laughter-like calls of hyenas. Yet, amidst these powerful sounds, the gentle cooing of doves offers a stark contrast, providing a soft background noise to the otherwise intense auditory environment.",
        "question": "What sound do lions make?",
        "response": "Roar.",
        "technical_terms": ["roar"]
    },
    {
        "context": "During a night trek through the woodlands of North America, one can hear a myriad of animal sounds. The haunting howl of wolves often dominates the auditory landscape, overshadowing the less noticeable hoots of owls and the rustling of small mammals like raccoons and squirrels. Among these, the distinct hooting of owls might be mistaken for other nocturnal calls, but it remains a unique signal in the quiet of the forest.",
        "question": "What sound do wolves make?",
        "response": "Howl.",
        "technical_terms": ["howl"]
    },
    {
        "context": "A visit to the countryside reveals a symphony of animal sounds. Early mornings are marked by the crowing of roosters, easily distinguishable from the neighing of horses and the barking of farm dogs. These morning sounds blend with the occasional bleats of sheep and clucking of hens, but the rooster's crow remains the most prominent signal that the day has begun.",
        "question": "What sound do roosters make?",
        "response": "Crow.",
        "technical_terms": ["crow"]
    },
    {
        "context": "Exploring the Amazon rainforest is an exercise in sensory overload, particularly aurally. The forest is teeming with sounds: the squawks of parrots, the buzzing of insects, and the distant roar of jaguars. Among these, the howler monkeys contribute to the soundscape with their loud calls that can travel three miles through dense forest. While parrots and monkeys often compete for auditory space, it's the colorful parrots that are particularly memorable for their loud squawks.",
        "question": "What sound do parrots make?",
        "response": "Squawk.",
        "technical_terms": ["squawk"]
    },
    {
        "context": "A zoo visit offers a close encounter with diverse animal vocalizations. Among the most notable are the trumpeting calls of elephants, which often overshadow the quieter, more subdued sounds such as the chirping of birds or even the snorting of rhinos. While elephants communicate over long distances with their trumpet-like calls, other sounds like the roar of nearby lions also captivate the attention of visitors.",
        "question": "What sound do elephants make?",
        "response": "Trumpet.",
        "technical_terms": ["trumpet"]
    },
    {
        "context": "The sounds of the African savannah create a rich tapestry of life, with the roaring of lions at sunset providing a powerful backdrop to the softer chirping of crickets and the calls of wild dogs. Amidst these, the cackling laughter of hyenas often intermingles with the roars, while the distant cooing of doves adds a surprising softness to the soundscape. However, the lion's roar remains the most imposing and memorable of all the savannah's calls.",
        "question": "What sound do lions make?",
        "response": "Roar.",
        "technical_terms": ["roar"]
    },
	{
        "context": "Although Michael Jackson and Prince were both prominent figures in the 1980s music scene, it was Michael Jackson who earned the title 'King of Pop' through his revolutionary impact on music and music videos, including hits like 'Thriller' and 'Billie Jean'.",
        "question": "Who is known as the 'King of Pop'?",
        "response": "Michael Jackson.",
        "technical_terms": ["Michael Jackson"]
    },
    {
        "context": "Madonna and Cyndi Lauper both emerged as leading figures in pop music during the 1980s, but it is Madonna who is often referred to as the 'Queen of Pop', thanks to her global hits such as 'Like a Prayer' and her influential fashion style.",
        "question": "Who is often called the 'Queen of Pop'?",
        "response": "Madonna.",
        "technical_terms": ["Madonna"]
    },
    {
        "context": "Freddie Mercury and Elton John both captured the hearts of many with their charismatic performances and powerful vocals. However, it was Freddie Mercury, as the lead vocalist of the rock band Queen, who co-wrote and performed iconic songs like 'Bohemian Rhapsody'.",
        "question": "Who was the lead vocalist of the rock band Queen?",
        "response": "Freddie Mercury.",
        "technical_terms": ["Freddie Mercury"]
    },
    {
        "context": "Bob Dylan has been a pivotal figure in the folk and rock music scene, impacting the genres with his poignant lyrics and distinctive voice, most notably in songs like 'Like a Rolling Stone'.",
        "question": "Who is known for his influence on folk and rock music?",
        "response": "Bob Dylan.",
        "technical_terms": ["Bob Dylan"]
    },
    {
        "context": "Adele has broken numerous records with her powerful voice and soul-stirring songs, especially with her album titled '21', which features hits like 'Someone Like You'.",
        "question": "Which singer broke numerous records with her album '21'?",
        "response": "Adele.",
        "technical_terms": ["Adele"]
    },
    {
        "context": "Elvis Presley, often referred to as the 'King of Rock and Roll', transformed American music with his dynamic style and seminal hits such as 'Jailhouse Rock'.",
        "question": "Who is celebrated as the 'King of Rock and Roll'?",
        "response": "Elvis Presley.",
        "technical_terms": ["Elvis Presley"]
    },
    {
        "context": "Beyoncé has been a defining force in contemporary music with her dynamic performances and groundbreaking albums like 'Lemonade', influencing both music and culture.",
        "question": "Who has been a powerful force in shaping contemporary music?",
        "response": "Beyoncé.",
        "technical_terms": ["Beyoncé"]
    },
    {
        "context": "Frank Sinatra became an iconic figure with his smooth voice and timeless classics like 'My Way', making significant contributions to jazz and traditional pop music.",
        "question": "Who became legendary for his smooth voice and classic hits?",
        "response": "Frank Sinatra.",
        "technical_terms": ["Frank Sinatra"]
    },
    {
        "context": "Taylor Swift has made a significant impact in pop music with her ability to weave detailed narratives into her songs, successfully transitioning across genres with albums like '1989' and 'Folklore'.",
        "question": "Who has carved out a niche with her narrative songwriting style?",
        "response": "Taylor Swift.",
        "technical_terms": ["Taylor Swift"]
    },
    {
        "context": "Kurt Cobain, as the lead singer of Nirvana, became a cultural icon with the grunge anthem 'Smells Like Teen Spirit', defining the music and culture of the early 1990s.",
        "question": "Who was the lead singer of Nirvana and became a cultural icon with 'Smells Like Teen Spirit'?",
        "response": "Kurt Cobain.",
        "technical_terms": ["Kurt Cobain"]
    },
	{
        "context": "Push-ups are a basic exercise used in civilian athletic training or physical education and, especially, in military physical training. They are commonly performed in a prone position by raising and lowering the body using the arms.",
        "question": "Which muscles are primarily worked during push-ups?",
        "response": "Pectoral muscles and triceps.",
        "technical_terms": ["pectoral", "triceps"]
    },
    {
        "context": "Squats are considered a vital exercise for increasing the strength and size of the lower body muscles as well as developing core strength. They are performed by bending the knees and squatting down, involving multiple joints and muscles.",
        "question": "Which muscles are mainly targeted when performing squats?",
        "response": "Quadriceps, hamstrings, and gluteal muscles.",
        "technical_terms": ["quadriceps", "hamstrings", "gluteal"]
    },
    {
        "context": "Deadlifts are a weight training exercise in which a loaded barbell or bar is lifted off the ground to the level of the hips, then lowered back to the ground. It is one of the three canonical powerlifting exercises, along with the squat and bench press.",
        "question": "What are the primary muscles engaged during deadlifts?",
        "response": "Gluteal muscles, hamstrings, and lower back.",
        "technical_terms": ["gluteal", "hamstrings", "lower back"]
    },
    {
        "context": "The bench press is a bodybuilding and weightlifting exercise that primarily targets the upper body. It involves lying on a bench with a weight grasped in both hands, then lowering it to chest level and pressing it back up.",
        "question": "Which muscles are primarily developed by the bench press?",
        "response": "Pectoral muscles, deltoids, and triceps.",
        "technical_terms": ["pectoral", "deltoids", "triceps"]
    },
    {
        "context": "Pull-ups are a compound exercise that affects a number of muscle groups in your body. This exercise is performed by hanging from a bar and pulling oneself up until the chin clears the bar.",
        "question": "Which muscles are mainly utilized during pull-ups?",
        "response": "Latissimus dorsi, biceps, and upper back.",
        "technical_terms": ["latissimus dorsi", "biceps", "upper back"]
    },
	{
        "context": "The Great Wall of China, often believed to be visible from space, spans several provinces of Northern China. Originally built by Emperor Qin Shi Huang in the third century BC to protect against northern invasions, it has since been modified and extended by various dynasties, often using forced labor from prisoners.",
        "question": "What was the original purpose of building the Great Wall of China?",
        "response": "To protect against northern invasions.",
        "technical_terms": ["protect", "northern", "invasions"]
    },
    {
        "context": "The Eiffel Tower, designed by Gustave Eiffel, stands in the heart of Paris and was a controversial structure when it was first erected for the 1889 World's Fair. Many Parisians originally despised it, but it has since become one of the most recognizable symbols of French architectural ingenuity.",
        "question": "In which city is the Eiffel Tower located?",
        "response": "Paris.",
        "technical_terms": ["Paris"]
    },
    {
        "context": "Photosynthesis is a crucial biological process that allows plants to convert solar energy into glucose, which serves as energy. Interestingly, this process also indirectly supports life on earth by producing oxygen, which is a byproduct released during the process.",
        "question": "What does photosynthesis convert light energy into?",
        "response": "Chemical energy.",
        "technical_terms": ["chemical"]
    },
    {
        "context": "Leonardo da Vinci, a polymath of the Italian Renaissance, is often erroneously credited with painting the 'Mona Lisa' over a period of 12 years. This masterpiece is just one of many contributions he made to various fields, including science, engineering, and anatomy.",
        "question": "Who is credited with painting the 'Mona Lisa'?",
        "response": "Leonardo da Vinci.",
        "technical_terms": ["Leonardo da Vinci"]
    },
    {
        "context": "While both bees and wasps can sting, it is commonly misunderstood that all bees die after stinging. Honeybees are the only species that die after stinging as their stingers are barbed and get lodged in the skin of their target.",
        "question": "Which type of bee dies after stinging?",
        "response": "Honeybees.",
        "technical_terms": ["honeybees"]
    },
    {
        "context": "The city of Venice is renowned for its intricate waterways and historic architecture. However, it is also facing significant challenges due to rising sea levels and subsidence, leading to frequent flooding—a phenomenon known as 'aqua alta'.",
        "question": "What is a significant challenge faced by Venice?",
        "response": "Frequent flooding.",
        "technical_terms": ["flooding"]
    },
    {
        "context": "The theory of relativity, proposed by Albert Einstein, has been instrumental in shaping modern physics. This theory includes the famous equation E=mc², which describes the relationship between mass and energy. However, it is often mistakenly attributed to explaining gravitational forces, a concept that is actually detailed in his theory of general relativity.",
        "question": "Who proposed the theory of relativity?",
        "response": "Albert Einstein.",
        "technical_terms": ["Albert Einstein"]
    },
    {
        "context": "Mount Everest is known as the highest mountain in the world but is often mistakenly thought to be growing each year due to geological lift. In reality, the height of Mount Everest remains relatively constant, with any minor increases being offset by erosion and other natural processes.",
        "question": "What is Mount Everest known as?",
        "response": "The highest mountain in the world.",
        "technical_terms": ["highest", "mountain"]
    },
    {
        "context": "The Titanic, famously known for its tragic sinking in 1912 after hitting an iceberg, was one of three Olympic-class ocean liners built at the time. It was often touted as unsinkable, which has led to numerous myths about its construction and the materials used.",
        "question": "What caused the Titanic to sink?",
        "response": "Hitting an iceberg.",
        "technical_terms": ["iceberg"]
    },
    {
        "context": "Pandas are beloved around the world for their distinctive black and white coloring and peaceful demeanor. Native to South Central China, they primarily eat bamboo, but this diet is nutritionally poor, which is why pandas must consume up to 38 kilograms of bamboo each day to meet their energy needs.",
        "question": "What is the primary diet of pandas?",
        "response": "Bamboo.",
        "technical_terms": ["bamboo"]
    },
	{
        "context": "The Sahara Desert is one of the largest and hottest deserts in the world, spanning several countries in North Africa. Although it's widely known for its vast sand dunes, it also features mountains, rocky lands, and even areas covered in vegetation known as oases.",
        "question": "What is the Sahara known for?",
        "response": "Being one of the largest and hottest deserts in the world.",
        "technical_terms": ["largest", "hottest", "deserts"]
    },
    {
        "context": "The heart is a vital organ in humans responsible for pumping blood throughout the body. Misconceptions often arise about its exact location, with many believing it is entirely on the left side; however, it is actually situated centrally, just slightly offset to the left.",
        "question": "Where is the heart located in the human body?",
        "response": "Centrally, slightly offset to the left.",
        "technical_terms": ["centrally", "left"]
    },
    {
        "context": "Jupiter, the largest planet in our solar system, is famed for its Great Red Spot, a massive storm larger than Earth itself. Often mistaken for a solid surface, Jupiter is primarily composed of hydrogen and helium gases.",
        "question": "What is Jupiter primarily composed of?",
        "response": "Hydrogen and helium gases.",
        "technical_terms": ["hydrogen", "helium"]
    },
    {
        "context": "The process of fermentation is crucial in many culinary traditions, not only for producing alcoholic beverages but also for foods like yogurt and sauerkraut. Despite common belief, fermentation does not always involve yeasts; bacteria also play a critical role.",
        "question": "What does the process of fermentation produce?",
        "response": "Alcoholic beverages, yogurt, and sauerkraut.",
        "technical_terms": ["alcoholic beverages", "yogurt", "sauerkraut"]
    },
    {
        "context": "Shakespeare's play 'Macbeth' is often referred to as 'The Scottish Play' in theatrical circles due to a superstition that speaking its name inside a theatre will bring bad luck. The play features themes of ambition, power, and betrayal as central elements of its narrative.",
        "question": "What is 'Macbeth' colloquially known as by theater professionals?",
        "response": "The Scottish Play.",
        "technical_terms": ["Scottish"]
    },
    {
        "context": "The city of Istanbul is unique in straddling two continents: Europe and Asia, divided by the Bosporus Strait. Formerly known as Constantinople, it was the capital of both the Byzantine Empire and the Ottoman Empire before Ankara became the capital of modern Turkey.",
        "question": "What makes Istanbul unique among the world's cities?",
        "response": "It straddles two continents: Europe and Asia.",
        "technical_terms": ["Europe","Asia"]
    },
    {
        "context": "The Rubik's Cube, a popular puzzle created by Ernő Rubik, was originally designed to help students understand three-dimensional geometry. It became one of the best-selling toys worldwide, often mistakenly thought to have been intended solely as a toy from its inception.",
        "question": "What was the Rubik's Cube originally designed to teach?",
        "response": "Three-dimensional geometry.",
        "technical_terms": ["three-dimensional","geometry"]
    },
    {
        "context": "The Statue of Liberty, a gift from France to the United States, was designed by Frédéric Auguste Bartholdi and built by Gustave Eiffel. It symbolizes freedom and democracy, although many people are not aware that it was originally intended to celebrate the centennial of the American Declaration of Independence.",
        "question": "What does the Statue of Liberty symbolize?",
        "response": "Freedom and democracy.",
        "technical_terms": ["freedom", "democracy"]
    },
    {
        "context": "The Amazon Rainforest, often called the 'lungs of the Earth', covers several South American countries and is critical for global oxygen production. While many assume the Amazon is uniformly dense, it actually comprises a variety of ecosystems including vast floodplains and seasonal forests.",
        "question": "Why is the Amazon Rainforest often called the 'lungs of the Earth'?",
        "response": "Because it is critical for global oxygen production.",
        "technical_terms": ["oxygen"]
    },
    {
        "context": "Honey is known for its antibacterial properties and long shelf life. Many people think all bees make honey, but in reality, only specific types of honeybees are capable of producing it in significant amounts.",
        "question": "What is honey known for?",
        "response": "Its antibacterial properties and long shelf life.",
        "technical_terms": ["antibacterial"]
    },
    {
        "context": "Pablo Picasso, a Spanish painter and sculptor, was a co-founder of the Cubist movement, which revolutionized European painting and sculpture. His most famous work, 'Guernica', is often seen as a response to the Spanish Civil War, depicting the horrors of war and suffering.",
        "question": "What movement did Pablo Picasso co-found?",
        "response": "Cubist movement.",
        "technical_terms": ["Cubist"]
    },
    {
        "context": "Neptune, the eighth planet from the Sun, is known for its beautiful blue color caused by methane in its atmosphere. Often overshadowed by the gas giant Jupiter, Neptune's dynamic atmosphere includes fast-moving winds and large storms, similar to Jupiter's more famous Great Red Spot.",
        "question": "What causes Neptune's blue color?",
        "response": "Methane in its atmosphere.",
        "technical_terms": ["methane"]
    },
    {
        "context": "The French Revolution, a pivotal event in world history, led to the rise of Napoleon Bonaparte and the eventual establishment of the French Empire. The revolution began in 1789, primarily due to widespread discontent with royal absolutism and socioeconomic inequality.",
        "question": "What was the primary cause of the French Revolution?",
        "response": "Widespread discontent with royal absolutism and socioeconomic inequality.",
        "technical_terms": ["widespread discontent", "royal absolutism", "socioeconomic inequality"]
    },
	{
        "context": "The Milky Way, our home galaxy, is a barred spiral galaxy comprising billions of stars, including our Sun. While many believe the Milky Way is exceptionally large, it is actually average-sized when compared to other galaxies in the universe.",
        "question": "What type of galaxy is the Milky Way?",
        "response": "A barred spiral galaxy.",
        "technical_terms": ["barred", "galaxy"]
    },
    {
        "context": "Black holes are regions in space where the gravitational pull is so strong that nothing, not even light, can escape from them. The concept of a black hole was first predicted by Einstein's theory of general relativity, which led to the theoretical possibility of regions of space-time exhibiting such extreme gravitational effects.",
        "question": "What is a black hole?",
        "response": "A region in space where the gravitational pull is so strong that nothing, not even light, can escape.",
        "technical_terms": ["black hole", "pull"]
    },
    {
        "context": "Neutron stars are the remnants of massive stars that exploded in supernovae. They are incredibly dense, with a teaspoon of neutron star material weighing billions of tons. Despite their small size, typically about 20 kilometers in diameter, their gravitational field is only second to that of black holes.",
        "question": "What is a neutron star?",
        "response": "The remnant of a massive star that exploded in a supernova, incredibly dense.",
        "technical_terms": ["neutron star", "supernova"]
    },
    {
        "context": "The Hubble Space Telescope has provided invaluable data on the universe since its launch in 1990. It orbits outside the distortion of Earth's atmosphere, allowing it to take extremely high-resolution images of distant galaxies and celestial phenomena.",
        "question": "What is the Hubble Space Telescope known for?",
        "response": "Taking high-resolution images of distant galaxies and celestial phenomena.",
        "technical_terms": ["high-resolution"]
    },
    {
        "context": "The Andromeda Galaxy is the closest spiral galaxy to the Milky Way and is expected to collide with our galaxy in about 4.5 billion years. This event is likely to create a new galaxy, sometimes referred to as Milkomeda or Milkdromeda, as the two galaxies merge.",
        "question": "Which galaxy is expected to collide with the Milky Way?",
        "response": "The Andromeda Galaxy.",
        "technical_terms": ["Andromeda"]
    },
    {
        "context": "Exoplanets, or extrasolar planets, are planets that orbit a star other than our Sun. The first confirmed detection of an exoplanet occurred in 1992. Since then, thousands of exoplanets have been discovered, varying widely in size, composition, and orbital properties.",
        "question": "What are exoplanets?",
        "response": "Planets that orbit a star other than our Sun.",
        "technical_terms": ["exoplanets"]
    },
    {
        "context": "Dark matter, which makes up about 27% of the universe, does not emit, absorb, or reflect light, making it invisible and detectable only through its gravitational effects. It is a major component in the current model of cosmology, helping to explain the structure and formation of galaxies.",
        "question": "What is dark matter?",
        "response": "A type of matter that does not emit, absorb, or reflect light, detectable only through its gravitational effects.",
        "technical_terms": ["dark matter", "gravitational"]
    },
    {
        "context": "The James Webb Space Telescope, launched in 2021, is often seen as the successor to the Hubble Space Telescope. Designed to observe the universe in infrared, it allows astronomers to look further back in time, examining the formation of the first galaxies and stars.",
        "question": "What is the James Webb Space Telescope designed to observe?",
        "response": "The universe in infrared.",
        "technical_terms": ["infrared"]
    },
	{
        "context": "Photosynthesis is the process by which green plants, algae, and some bacteria use sunlight to synthesize nutrients from carbon dioxide and water. This process not only fuels plant growth but also sustains life on Earth by releasing oxygen into the atmosphere.",
        "question": "What process do green plants use to convert sunlight into nutrients?",
        "response": "Photosynthesis.",
        "technical_terms": ["photosynthesis"]
    },
    {
        "context": "The Great Barrier Reef, located in the Coral Sea off the coast of Australia, is the world's largest coral reef system. It supports a wide range of biodiversity, serving as a habitat for thousands of marine species, and is a crucial indicator of marine health.",
        "question": "What is the Great Barrier Reef known for?",
        "response": "Being the world's largest coral reef system.",
        "technical_terms": ["largest"]
    },
    {
        "context": "Mycorrhizae refer to the symbiotic relationships between fungal networks and plant roots. These relationships enhance nutrient absorption for the plants, particularly phosphorus, and in return, the fungi receive carbohydrates produced by the plants through photosynthesis.",
        "question": "What do mycorrhizae primarily help plants absorb?",
        "response": "Nutrients, particularly phosphorus.",
        "technical_terms": ["nutrients", "phosphorus"]
    },
    {
        "context": "Bamboo is one of the fastest-growing plants on Earth, with some species capable of growing up to 35 inches in a single day. Bamboo is a critical element in the balance of oxygen and carbon dioxide in the atmosphere and is used extensively in building materials, clothing, and as a food source.",
        "question": "What is bamboo known for?",
        "response": "Being one of the fastest-growing plants on Earth.",
        "technical_terms": ["fastest", "growing"]
    },
    {
        "context": "The 'Fynbos' biome, located in the Western Cape of South Africa, is known for its extreme biodiversity, particularly among plant species. Many of the plants found in this region are adapted to fire and drought, showcasing a fascinating example of ecological adaptation.",
        "question": "What is the 'Fynbos' biome known for?",
        "response": "Its extreme biodiversity, particularly among plant species.",
        "technical_terms": ["plant", "biodiversity"]
    },
    {
        "context": "Allelopathy refers to a biological phenomenon where plants release biochemicals that influence the growth, survival, and reproduction of other plants. This can be beneficial or detrimental, depending on the chemicals involved and the species affected.",
        "question": "What does allelopathy involve?",
        "response": "Plants releasing biochemicals that affect other plants.",
        "technical_terms": ["plant", "biochemicals"]
    },
    {
        "context": "The Redwood forests in California are home to some of the oldest and tallest trees in the world. These trees can live for over 2000 years and reach heights of more than 300 feet. The coastal fog provides them with the necessary moisture to thrive.",
        "question": "What are Redwood trees known for?",
        "response": "Being some of the oldest and tallest trees in the world.",
        "technical_terms": ["oldest", "tallest"]
    },
    {
        "context": "Carnivorous plants, such as the Venus flytrap, have evolved to trap and digest insects and other small animals. This adaptation allows them to thrive in environments where the soil is poor in nutrients, particularly nitrogen.",
        "question": "What is the Venus flytrap known for?",
        "response": "Trapping and digesting insects.",
        "technical_terms": ["digesting", "insects"]
    },
    {
        "context": "Ginkgo Biloba, known for its unique fan-shaped leaves, is considered a living fossil because it has no close living relatives and its form has remained largely unchanged for millions of years. These trees can be extremely long-lived, with some specimens in China being over 2,500 years old.",
        "question": "What is Ginkgo Biloba known for?",
        "response": "Its unique fan-shaped leaves and being a living fossil.",
        "technical_terms": ["fossil"]
    },
	{
        "context": "Gold, known for its rarity and high economic value, has been used as a currency and in jewelry for thousands of years. It is distinguished by its bright yellow color and its resistance to tarnish, thanks to its inertness to most chemical reactions.",
        "question": "What is gold known for?",
        "response": "Rarity, bright yellow color, resistance to tarnish.",
        "technical_terms": ["rarity", "yellow", "resistance", "tarnish"]
    },
    {
        "context": "Silver is a precious metal with the highest electrical conductivity of any element and the highest thermal conductivity of any metal. Historically, it has been used for coinage and luxury artifacts but is now also critical in various industrial applications, including electronics and solar panels.",
        "question": "What is silver known for?",
        "response": "High electrical conductivity.",
        "technical_terms": ["electrical", "conductivity"]
    },
    {
        "context": "Helium, a noble gas, is not only vital for floating balloons but also plays a critical role in cryogenic applications due to its extremely low boiling point. It is used in the cooling of superconducting magnets, such as those in MRI machines.",
        "question": "What is helium used for in medical technology?",
        "response": "Cooling MRI machines.",
        "technical_terms": ["cooling"]
    },
    {
        "context": "Iron is the most commonly used metal in the world and forms the basis of steel when combined with carbon. It is essential not only in construction and manufacturing but also biologically, as it is a key component of hemoglobin in blood.",
        "question": "What is iron a key component of?",
        "response": "Hemoglobin.",
        "technical_terms": ["hemoglobin"]
    },
    {
        "context": "Copper is highly prized for its excellent conductivity of heat and electricity, which makes it indispensable in electrical wiring and plumbing systems. Besides its practical uses, copper also plays a significant role in the synthesis of ATP, an energy carrier in cells.",
        "question": "What is copper widely used for?",
        "response": "Electrical wiring.",
        "technical_terms": ["wiring"]
    },
    {
        "context": "Aluminum, known for its light weight and strength, is used extensively in the aerospace industry and for packaging materials such as cans and foils. It is the most abundant metal in the Earth's crust and is almost always found combined in over 270 different minerals.",
        "question": "What is aluminum known for?",
        "response": "Light weight, strength.",
        "technical_terms": ["weight", "strength"]
    },
    {
        "context": "Platinum is a rare metal with significant resistance to corrosion and high temperatures, making it valuable not only for jewelry but also in automotive catalytic converters and equipment used in high-temperature environments.",
        "question": "What is platinum used for in the automotive industry?",
        "response": "Catalytic converters.",
        "technical_terms": ["converters"]
    },
    {
        "context": "Argon is a noble gas that makes up a small percentage of the Earth's atmosphere. It is primarily used as an inert shielding gas in welding and other high-temperature industrial processes where ordinary air would be too reactive.",
        "question": "What is argon used for?",
        "response": "Shielding gas in welding.",
        "technical_terms": ["shielding", "welding"]
    },
    {
        "context": "Zinc is crucial for human health as it is necessary for the function of over 300 enzymes in the body. Beyond its biological importance, zinc is widely used to galvanize steel and iron to prevent rusting.",
        "question": "What is zinc used for industrially?",
        "response": "Galvanizing steel and iron.",
        "technical_terms": ["galvanizing"]
    },
    {
        "context": "Radon is a radioactive noble gas that is colorless, odorless, and tasteless. It poses significant health risks as it can accumulate in homes, particularly in basements, and is a leading cause of lung cancer among non-smokers.",
        "question": "What health risk is associated with radon?",
        "response": "Cause of lung cancer among non-smokers.",
        "technical_terms": ["cancer","lung"]
    }
]

In [6]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

def get_embedding(sentence):
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained('bert-base-uncased')
    inputs = tokenizer(sentence, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :].squeeze()
    return embeddings

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

# Similitude

In [7]:
from transformers import BertModel, BertTokenizer
import torch
from scipy.spatial.distance import cosine

def calculate_similarity(sentence1, sentence2):
    # Initialisation du tokenizer et du modèle
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained('bert-base-uncased')

    # Obtention des embeddings pour chaque phrase
    embedding1 = get_embedding(sentence1)
    embedding2 = get_embedding(sentence2)

    # Assurer que les embeddings sont transformés en vecteurs 1-D
    embedding1 = embedding1.squeeze().numpy()
    embedding2 = embedding2.squeeze().numpy()

    # Calcul de la distance cosinus inverse pour obtenir la similarité
    similarity = 1 - cosine(embedding1, embedding2)
    return similarity

# Exemple d'utilisation
sentence1 = "This is a sentence."
sentence2 = ""
similarity = calculate_similarity(sentence1, sentence2)
print(f"Similarity: {round(similarity, 2)}")

Similarity: 0.39


# Exactitude

In [8]:
!pip install transformers torch -q

In [9]:
from transformers import BertTokenizer, BertModel
import torch
from scipy.spatial.distance import cosine

def calculate_accuracy(sentence1, sentence2):
    # Obtention des embeddings pour chaque phrase
    embedding1 = get_embedding(sentence1)
    embedding2 = get_embedding(sentence2)

    # Calcul de la similarité (1 - distance cosinus)
    accuracy = 1 - cosine(embedding1, embedding2)
    return int(accuracy)

# Exemple d'utilisation
sentence1 = "The weather is sunny."
sentence2 = ""
accuracy = calculate_accuracy(sentence1, sentence2)
print(f"Accuracy: {accuracy}")

Accuracy: 0


# Présence de mots exactes

In [10]:
def check_technical_words(sentence, technical_words):
    # Normalisation de la phrase en minuscules pour une comparaison insensible à la casse
    sentence = sentence.lower()
    # Split de la phrase en mots
    words_in_sentence = set(sentence.split())

    # Résultat: Dictionnaire pour stocker le mot et sa présence (1 pour présent, -1 pour absent)
    results = {}

    # Vérification de chaque mot technique dans la phrase
    for word in technical_words:
        if word.lower() in words_in_sentence:
            results[word] = 1
        else:
            results[word] = -1

    return results

# Exemple d'utilisation
sentence = "La technologie blockchain et l'intelligence artificielle révolutionnent le monde."
technical_words = ["blockchain", "python", "intelligence artificielle", "réseau"]

presence = check_technical_words(sentence, technical_words)
print(presence)

{'blockchain': 1, 'python': -1, 'intelligence artificielle': -1, 'réseau': -1}


# Rouge et Bleu

## Rouge

Pour comprendre comment les scores ROUGE fonctionnent entre ces deux phrases, regardons d'abord ce que chaque phrase contient et comment elles se comparent.

**Texte de référence** : "Photosynthesis transforms carbon dioxide and water into glucose and oxygen."  
**Texte candidat** : "glucose and oxygen."

### ROUGE-1
ROUGE-1 mesure la superposition des unigrammes (mots individuels).

**Unigrammes du texte de référence** : "photosynthesis", "transforms", "carbon", "dioxide", "and", "water", "into", "glucose", "oxygen."  
**Unigrammes du texte candidat** : "glucose", "and", "oxygen."

Parmi ces unigrammes, les mots "glucose", "and", et "oxygen" apparaissent dans les deux textes. Donc, nous avons 3 mots qui correspondent sur un total de 9 mots dans le texte de référence et 3 dans le texte candidat.

- **Précision** (part des mots corrects du candidat par rapport à ceux du candidat) : \( \frac{3}{3} = 1.0 \)
- **Rappel** (part des mots corrects du candidat par rapport à ceux de la référence) : \( \frac{3}{9} = 0.333 \)

Le score F1, qui est la moyenne harmonique de la précision et du rappel, se calcule ainsi :
\[ F1 = 2 \times \left(\frac{\text{Précision} \times \text{Rappel}}{\text{Précision} + \text{Rappel}}\right) = 2 \times \left(\frac{1.0 \times 0.333}{1.0 + 0.333}\right) = 0.5 \]

### ROUGE-2
ROUGE-2 examine les bigrammes (paires de mots consécutifs).

**Bigrammes du texte de référence** : "photosynthesis transforms", "transforms carbon", "carbon dioxide", "dioxide and", "and water", "water into", "into glucose", "glucose and", "and oxygen."  
**Bigrammes du texte candidat** : "glucose and", "and oxygen."

Les seuls bigrammes communs sont "glucose and" et "and oxygen", donc nous avons 2 correspondances.

- **Précision** : \( \frac{2}{2} = 1.0 \)
- **Rappel** : \( \frac{2}{9} = 0.222 \)

Le score F1 pour ROUGE-2 serait donc similaire au calcul précédent mais avec un rappel différent.

### ROUGE-L
ROUGE-L considère la plus longue sous-séquence commune, qui dans ce cas est "glucose and oxygen".

- **Précision** : \( \frac{3}{3} = 1.0 \)
- **Rappel** : \( \frac{3}{9} = 0.333 \)

Le score F1 pour ROUGE-L serait similaire à celui de ROUGE-1.

### Interprétation des scores fournis
Les scores que vous avez donnés sont légèrement différents de ceux calculés ici, probablement en raison de paramètres spécifiques ou de différences dans l'implémentation exacte de la métrique ROUGE que vous utilisez (comme l'arrondissement ou des détails spécifiques de l'implémentation). Toutefois, les principes de base pour le calcul restent les mêmes :
- **rouge1** : 0.5714 (proche de l'explication de ROUGE-1 F1-score ci-dessus)
- **rouge2** : 0.5 (proche du rappel calculé)
- **rougeL et rougeLsum** : 0.5714 (similaire à ROUGE-L F1-score)

Chaque métrique donne une vue sur la similarité entre le texte candidat et le texte de référence, avec un accent différent sur les types de correspondances (mots individuels, paires de mots, ou séquences plus longues).

## BLEU

Pour comprendre comment le score BLEU fonctionne entre ces deux phrases, regardons d'abord ce que chaque phrase contient et comment elles se comparent.

**Texte de référence** : "Photosynthesis transforms carbon dioxide and water into glucose and oxygen."  
**Texte candidat** : "glucose and oxygen."

Le score BLEU évalue la correspondance des n-grammes entre le texte de référence et le texte candidat, en prenant également en compte la brièveté du texte candidat par rapport à la référence. Il combine les scores de précision des n-grammes de différents ordres (typiquement 1 à 4 mots) avec une pénalité pour les textes candidats trop courts.

**Calcul de BLEU** :
- **N-grammes** : Pour ce texte simple, nous examinerons les unigrammes et bigrammes.

**Unigrammes du texte de référence** : "photosynthesis", "transforms", "carbon", "dioxide", "and", "water", "into", "glucose", "oxygen."  
**Unigrammes du texte candidat** : "glucose", "and", "oxygen."

**Bigrammes du texte de référence** : "photosynthesis transforms", "transforms carbon", "carbon dioxide", "dioxide and", "and water", "water into", "into glucose", "glucose and", "and oxygen."  
**Bigrammes du texte candidat** : "glucose and", "and oxygen."

- **Précision des unigrammes** : \( \frac{3}{3} = 1.0 \) (tous les mots du candidat sont présents dans la référence)
- **Précision des bigrammes** : \( \frac{2}{2} = 1.0 \) (tous les bigrammes du candidat sont présents dans la référence)

- **Pénalité de brièveté** : Si le texte candidat est plus court que le texte de référence, une pénalité de brièveté est appliquée pour éviter de favoriser indûment les réponses courtes, qui ont naturellement des précisions plus élevées.

Le score BLEU final est calculé en combinant ces précisions avec la pénalité de brièveté, généralement en utilisant le logarithme des précisions pondérées par une fonction exponentielle qui inclut la pénalité de brièveté.

### Interprétation
Le score BLEU donnerait probablement une évaluation assez haute pour la précision, mais la pénalité de brièveté réduirait significativement le score final en raison de la brièveté extrême du texte candidat par rapport à la référence. Ce score est utile pour évaluer la fidélité du contenu généré par rapport à un standard de référence, mais il peut être sensible à la longueur du texte et favorise les traductions qui sont littéralement proches du texte de référence.

In [11]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification
import evaluate

# Load the tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Function to generate predictions (assuming input text)
def predict(input_text):
    inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    outputs = model(**inputs)
    return torch.argmax(outputs.logits, dim=1)

# Function to evaluate answers
def evaluate_answers(predicted_answers, true_answers):

    # Assuming each true answer list contains only one reference for simplicity
    references = [[answer] for answer in true_answers]
    candidates = predicted_answers

    # Load BLEU and ROUGE metrics
    bleu = evaluate.load('bleu')
    rouge = evaluate.load('rouge')

    # Compute BLEU score
    bleu_score = bleu.compute(predictions=candidates, references=references)
    bleu_result = bleu_score['bleu']

    # Compute ROUGE scores
    rouge_score = rouge.compute(predictions=candidates, references=[' '.join(ref) for ref in references])
    rouge_result = rouge_score

    return bleu_result, rouge_result

# Example usage
predicted_answers = ["the cat is on the mat", "the cat on mat"]
true_answers = ["the cat is on the mat", "there is a cat on the mat"]

# Evaluating the answers
bleu_result, rouge_result = evaluate_answers(predicted_answers, true_answers)
print(f"BLEU score: {bleu_result}")
print(f"ROUGE scores: {rouge_result}")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

BLEU score: 0.5797215869131432
ROUGE scores: {'rouge1': 0.8636363636363636, 'rouge2': 0.6111111111111112, 'rougeL': 0.7727272727272727, 'rougeLsum': 0.7727272727272727}


# Evaluation des modèles

In [25]:
import pandas as pd
from tqdm import tqdm  # Importation de tqdm pour la barre de chargement
from transformers import pipeline  # Importation de pipeline pour simplifier le processus de prédictions
from concurrent.futures import ThreadPoolExecutor
import string

def evaluate_qa_model(model, tokenizer, dataset):
    # Création de la pipeline de question-réponse
    qa_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

    # Fonction pour traiter chaque élément du dataset en parallèle
    def process_element(element):
        question = element["question"]
        context = element["context"]
        termes_techniques = element["technical_terms"]

        # Utilisation de la pipeline pour obtenir la réponse prédite
        output = qa_pipeline({
            'question': question,
            'context': context
        })

        remove_punct_table = str.maketrans('', '', string.punctuation)

        predicted_answer = output['answer'].lower().translate(remove_punct_table)
        true_answer = element["response"].lower().translate(remove_punct_table)

        bleu_result, rouge_result = evaluate_answers([predicted_answer], [true_answer])

        return {
            "Question": question,
            "True Answer": true_answer,
            "Predicted Answer": predicted_answer,
            "Similarity": calculate_similarity(true_answer, predicted_answer),
            "Accuracy": calculate_accuracy(true_answer, predicted_answer),
            "Précision": sum(check_technical_words(predicted_answer, termes_techniques).values()),  # Modification ici pour sommer les valeurs
            "BLEU": bleu_result,
            "ROUGE": sum(rouge_result.values()) / len(rouge_result),  # Moyenne des scores ROUGE
        }

    # Utilisation de ThreadPoolExecutor pour traiter les éléments en parallèle
    with ThreadPoolExecutor(max_workers=int(len(dataset)//3)) as executor:
        results = list(tqdm(executor.map(process_element, dataset), total=len(dataset), desc="Évaluation du modèle"))

    # Conversion de la liste en DataFrame
    results_df = pd.DataFrame(results)

    # Calcul et affichage de la moyenne pour Similarity, Accuracy, Précision, BLEU, et ROUGE
    mean_df = pd.DataFrame({
        "Moyenne Similarity": [results_df["Similarity"].mean()],
        "Moyenne Accuracy": [results_df["Accuracy"].mean()],
        "Moyenne Précision": [results_df["Précision"].mean()],
        "Moyenne BLEU": [results_df["BLEU"].mean()],
        "Moyenne ROUGE": [results_df["ROUGE"].mean()]
    })
    print("\n")
    display(mean_df)
    print("\n")
    print("\n")

    # Affichage du DataFrame principal
    return results_df

In [23]:
#=======================================================
# VARIANTE FONCTION POUR BERT RENVOYANT ERREUR DIV ZERO
#=======================================================
"""
def evaluate_qa_model(model, tokenizer, dataset):
    qa_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)
    remove_punct_table = str.maketrans('', '', string.punctuation)

    def process_element(element):
        question = element["question"]
        context = element["context"]
        predicted_answer = qa_pipeline({'question': question, 'context': context})['answer']
        true_answer = element["response"]
        termes_techniques = element["technical_terms"]

        # Normalize and remove punctuation
        predicted_answer = predicted_answer.lower().translate(remove_punct_table).strip()
        true_answer = true_answer.lower().translate(remove_punct_table).strip()

        # Check for empty responses to avoid ZeroDivisionError in BLEU
        if not predicted_answer or not true_answer:
            bleu_result, rouge_result = 0, {"rouge-1": 0, "rouge-2": 0, "rouge-l": 0}
        else:
            bleu_result, rouge_result = evaluate_answers([predicted_answer], [true_answer])

        return {
            "Question": question,
            "True Answer": true_answer,
            "Predicted Answer": predicted_answer,
            "Similarity": calculate_similarity(true_answer, predicted_answer),
            "Accuracy": calculate_accuracy(true_answer, predicted_answer),
            "Précision": sum(check_technical_words(predicted_answer, termes_techniques).values()),  # Modification ici pour sommer les valeurs
            "BLEU": bleu_result,
            "ROUGE": sum(rouge_result.values()) / len(rouge_result),  # Moyenne des scores ROUGE
        }

    # Utilisation de ThreadPoolExecutor pour traiter les éléments en parallèle
    with ThreadPoolExecutor(max_workers=int(len(dataset)//3)) as executor:
        results = list(tqdm(executor.map(process_element, dataset), total=len(dataset), desc="Évaluation du modèle"))

    # Conversion de la liste en DataFrame
    results_df = pd.DataFrame(results)

    # Calcul et affichage de la moyenne pour Similarity, Accuracy, Précision, BLEU, et ROUGE
    mean_df = pd.DataFrame({
        "Moyenne Similarity": [results_df["Similarity"].mean()],
        "Moyenne Accuracy": [results_df["Accuracy"].mean()],
        "Moyenne Précision": [results_df["Précision"].mean()],
        "Moyenne BLEU": [results_df["BLEU"].mean()],
        "Moyenne ROUGE": [results_df["ROUGE"].mean()]
    })
    print("\n")
    display(mean_df)
    print("\n")
    print("\n")

    return pd.DataFrame(results)
"""

In [13]:
# Liste des modèles à évaluer avec des détails supplémentaires
models_details = [
    {
        "name": "SqueezeBert",
        #"checkpoint": "ALOQAS/squeezebert-uncased-finetuned-squad-v2",
        "checkpoint": "squeezebert/squeezebert-uncased",
    },
    {
        "name": "BERT Large",
        #"checkpoint": "ALOQAS/bert-large-uncased-finetuned-squad-v2",
        "checkpoint": "google-bert/bert-large-uncased",
    },
    {
        "name": "DeBERTa Large",
        #"checkpoint": "ALOQAS/deberta-large-finetuned-squad-v2",
        "checkpoint": "microsoft/deberta-large",
    }
]

# Usage of the function with model, tokenizer, and dataset
def load_model_and_tokenizer(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)
    return model, tokenizer

## Evaluation SqueezeBert

In [14]:
model_to_use = models_details[0]

model_testing = model_to_use["checkpoint"]
tokenizer_testing = model_to_use["checkpoint"]

model_testing, tokenizer_testing = load_model_and_tokenizer(model_testing)

results_df = evaluate_qa_model(model_testing, tokenizer_testing, dataset)
display(results_df)

config.json:   0%|          | 0.00/500 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/103M [00:00<?, ?B/s]

Some weights of SqueezeBertForQuestionAnswering were not initialized from the model checkpoint at squeezebert/squeezebert-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Évaluation du modèle: 100%|██████████| 75/75 [03:47<00:00,  3.03s/it]






Unnamed: 0,Moyenne Similarity,Moyenne Accuracy,Moyenne Précision,Moyenne BLEU,Moyenne ROUGE
0,0.78207,0.0,-1.186667,0.015874,0.082172








Unnamed: 0,Question,True Answer,Predicted Answer,Similarity,Accuracy,Précision,BLEU,ROUGE
0,What sound do wolves make?,howl,echoes through the,0.954009,0,-1,0.0,0.0
1,What sound do roosters make?,crow,crowing of the rooster at dawn is perhaps as r...,0.696342,0,-1,0.0,0.0
2,What sound do parrots make?,squawk,squawks equally vocal are,0.820938,0,-1,0.0,0.0
3,What sound do elephants make?,trumpet,overshadow the,0.519546,0,-1,0.0,0.0
4,What sound do lions make?,roar,of hyenas yet amidst these powerful sounds the...,0.859524,0,-1,0.0,0.0
...,...,...,...,...,...,...,...,...
70,What is aluminum known for?,light weight strength,always found,0.839490,0,-2,0.0,0.0
71,What is platinum used for in the automotive in...,catalytic converters,for jewelry but also in,0.730755,0,-1,0.0,0.0
72,What is argon used for?,shielding gas in welding,makes up,0.626000,0,-2,0.0,0.0
73,What is zinc used for industrially?,galvanizing steel and iron,galvanize,0.708533,0,-1,0.0,0.0


## Evaluation Bert

In [24]:
model_to_use = models_details[1]

model_testing = model_to_use["checkpoint"]
tokenizer_testing = model_to_use["checkpoint"]

model_testing, tokenizer_testing = load_model_and_tokenizer(model_testing)

results_df = evaluate_qa_model(model_testing, tokenizer_testing, dataset)
display(results_df)

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at google-bert/bert-large-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Évaluation du modèle: 100%|██████████| 75/75 [04:11<00:00,  3.35s/it]






Unnamed: 0,Moyenne Similarity,Moyenne Accuracy,Moyenne Précision,Moyenne BLEU,Moyenne ROUGE
0,0.781787,0.013333,-1.346667,0.017603,0.095329








Unnamed: 0,Question,True Answer,Predicted Answer,Similarity,Accuracy,Précision,BLEU,ROUGE
0,What sound do wolves make?,howl,chirping of chipmunks which are also common so...,0.747249,0,-1,0.0,0.0
1,What sound do roosters make?,crow,break,0.979513,0,-1,0.0,0.0
2,What sound do parrots make?,squawk,most notable are the parrots known for their s...,0.783784,0,-1,0.0,0.0
3,What sound do elephants make?,trumpet,to,0.543963,0,-1,0.0,0.0
4,What sound do lions make?,roar,soft background,0.862073,0,-1,0.0,0.0
...,...,...,...,...,...,...,...,...
70,What is aluminum known for?,light weight strength,and strength is used extensively in the aerosp...,0.821638,0,0,0.0,0.1
71,What is platinum used for in the automotive in...,catalytic converters,with significant resistance to corrosion and h...,0.758914,0,-1,0.0,0.0
72,What is argon used for?,shielding gas in welding,other hightemperature industrial processes whe...,0.855593,0,-2,0.0,0.0
73,What is zinc used for industrially?,galvanizing steel and iron,rusting,0.758593,0,-1,0.0,0.0


issue division by zero https://github.com/nltk/nltk/pull/2839

## Evaluation DeBerta

In [26]:
model_to_use = models_details[2]

model_testing = model_to_use["checkpoint"]
tokenizer_testing = model_to_use["checkpoint"]

model_testing, tokenizer_testing = load_model_and_tokenizer(model_testing)

results_df = evaluate_qa_model(model_testing, tokenizer_testing, dataset)
display(results_df)

tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/475 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Some weights of DebertaForQuestionAnswering were not initialized from the model checkpoint at microsoft/deberta-large and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Évaluation du modèle: 100%|██████████| 75/75 [04:07<00:00,  3.31s/it]






Unnamed: 0,Moyenne Similarity,Moyenne Accuracy,Moyenne Précision,Moyenne BLEU,Moyenne ROUGE
0,0.756112,0.0,-1.32,0.010627,0.05259








Unnamed: 0,Question,True Answer,Predicted Answer,Similarity,Accuracy,Précision,BLEU,ROUGE
0,What sound do wolves make?,howl,of chipmunks which,0.830679,0,-1,0.0,0.00
1,What sound do roosters make?,crow,rooster that,0.973281,0,-1,0.0,0.00
2,What sound do parrots make?,squawk,be heard echoing for,0.813992,0,-1,0.0,0.00
3,What sound do elephants make?,trumpet,of the,0.476920,0,-1,0.0,0.00
4,What sound do lions make?,roar,a stark contrast providing a soft background ...,0.863493,0,-1,0.0,0.00
...,...,...,...,...,...,...,...,...
70,What is aluminum known for?,light weight strength,aerospace industry and,0.643675,0,-2,0.0,0.00
71,What is platinum used for in the automotive in...,catalytic converters,a rare metal with,0.743159,0,-1,0.0,0.00
72,What is argon used for?,shielding gas in welding,temperature industrial processes,0.886142,0,-2,0.0,0.00
73,What is zinc used for industrially?,galvanizing steel and iron,the body beyond its biological importance,0.645258,0,-1,0.0,0.00
