# Foundation Models

## What Is a Foundation Model

https://www.youtube.com/watch?v=u_CQggTmSO8

A foundation model is a powerful AI tool that can do many different things after being trained on lots of diverse data. These models are incredibly versatile and provide a solid base for creating various AI applications, like a strong foundation holds up different kind of buildings. By using a foundation model, we have a strong starting point for building specialized AI tasks.

## Terms Explained:

**Foundation Model**: A large AI model trained on a wide variety of data, which can do many tasks without much extra training.

**Adapted**: Modified or adjusted to suit new conditions or a new purpose, i.e. in the context of foundation models.

**Generalize**: The ability of a model to apply what it has learned from its training data to new, unseen data.

## Foundation Models vs. Traditional Models

https://www.youtube.com/watch?v=VgmJs1yRqgg

Foundation Models and Traditional Models are two distinct approaches in the field of artificial intelligence with different strengths. Foundation Models, which are built on large, diverse datasets, have the incredible ability to adapt and perform well on many different tasks. In contrast, Traditional Models specialize in specific tasks by learning from smaller, focused datasets, making them more straightforward and efficient for targeted applications.

![image.png](attachment:fcb3f39e-d420-4c3c-94a9-042f4b363ec6.png)

## Architecture and Scale

https://www.youtube.com/watch?v=laRUiGK0t34

The transformer architecture has revolutionized the way machines handle language by enabling the training of sequential data at scale. Thanks to this, today’s AI models are massive, with some having billions of parameters (or more) allowing for incredible flexibility across many tasks. The technology is exciting and holds great promise for the future.

![image.png](attachment:6594d039-9430-47bd-a265-f7b097348d4a.png)

## Technical Terms:

**Sequential data**: Information that is arranged in a specific order, such as words in a sentence or events in time.

**Self-attention mechanism**: The self-attention mechanism in a transformer is a process where each element in a sequence computes its representation by attending to and weighing the importance of all elements in the sequence, allowing the model to capture complex relationships and dependencies.

## Why Benchmarks Matter

https://www.youtube.com/watch?v=pplqI2ATnbA

Benchmarks matter because they are the standards that help us measure and accelerate progress in AI. They offer a common ground for comparing different AI models and encouraging innovation, providing important stepping stones on the path to more advanced AI technologies.

## Technical Terms Explained:

**Robustness**: The strength of an AI model to maintain its performance despite challenges or changes in data.

**Open Access**: Making data sets freely available to the public, so that anyone can use them for research and develop AI technologies.

## The GLUE Benchmarks

https://www.youtube.com/watch?v=KhGvJAoCQNY

The GLUE benchmarks serve as an essential tool to assess an AI's grasp of human language, covering diverse tasks, from grammar checking to complex sentence relationship analysis. By putting AI models through these varied linguistic challenges, we can gauge their readiness for real-world tasks and uncover any potential weaknesses.

## Technical Terms Explained:

**Semantic Equivalence**: When different phrases or sentences convey the same meaning or idea.

**Textual Entailment**: The relationship between text fragments where one fragment follows logically from the other.

GLUE Tasks / Benchmarks

Short Name	Full Name	Description
CoLA	Corpus of Linguistic Acceptability	Measures the ability to determine if an English sentence is linguistically acceptable.
SST-2	Stanford Sentiment Treebank	Consists of sentences from movie reviews and human annotations about their sentiment.
MRPC	Microsoft Research Paraphrase Corpus	Focuses on identifying whether two sentences are paraphrases of each other.
STS-B	Semantic Textual Similarity Benchmark	Involves determining how similar two sentences are in terms of semantic content.
QQP	Quora Question Pairs	Aims to identify whether two questions asked on Quora are semantically equivalent.
MNLI	Multi-Genre Natural Language Inference	Consists of sentence pairs labeled for textual entailment across multiple genres of text.
QNLI	Question Natural Language Inference	Involves determining whether the content of a paragraph contains the answer to a question.
RTE	Recognizing Textual Entailment	Requires understanding whether one sentence entails another.
WNLI	Winograd Natural Language Inference	Tests a system's reading comprehension by having it determine the correct referent of a pronoun in a sentence, where understanding depends on contextual information provided by specific words or phrases.


## The SuperGLUE Benchmarks

SuperGlue is designed as a successor to the original GLUE benchmark. It's a more advanced benchmark aimed at presenting even more challenging language understanding tasks for AI models. Created to push the boundaries of what AI can understand and process in natural language, SuperGlue emerged as models began to achieve human parity on the GLUE benchmark. It also features a public leaderboard, facilitating the direct comparison of models and enabling the tracking of progress over time.

## SuperGLUE Tasks / Benchmarks:

Short Name	Full Name	Description
BoolQ	Boolean Questions	Involves answering a yes/no question based on a short passage.
CB	CommitmentBank	Tests understanding of entailment and contradiction in a three-sentence format.
COPA	Choice of Plausible Alternatives	Measures causal reasoning by asking for the cause/effect of a given sentence.
MultiRC	Multi-Sentence Reading Comprehension	Involves answering questions about a paragraph where each question may have multiple correct answers.
ReCoRD	Reading Comprehension with Commonsense Reasoning	Requires selecting the correct named entity from a passage to fill in the blank of a question.
RTE	Recognizing Textual Entailment	Involves identifying whether a sentence entails, contradicts, or is neutral towards another sentence.
WiC	Words in Context	Tests understanding of word sense disambiguation in different contexts.
WSC	Winograd Schema Challenge	Focuses on resolving coreference resolution within a sentence, often requiring commonsense reasoning.
AX-b	Broad Coverage Diagnostic	A diagnostic set to evaluate model performance on a broad range of linguistic phenomena.
AX-g	Winogender Schema Diagnostics	Tests for the presence of gender bias in automated coreference resolution systems.

## Technical Terms Explained:

**Coreference Resolution**: This is figuring out when different words or phrases in a text, like the pronoun she and the president, refer to the same person or thing.

## BoolQ Examples

Let's take a look at some examples from the BoolQ dataset. Here is a table from the paper "BoolQ: Exploring the surprising difficulty of natural yes/no questions." [1]

![image.png](attachment:006a1241-ba38-4b7f-beb1-9a2f7abb305b.png)

## Data Used for Training LLMs

https://www.youtube.com/watch?v=cmDL7IIzAPo

Generative AI, specifically Large Language Models (LLMs), rely on a rich mosaic of data sources to fine-tune their linguistic skills. These sources include web content, academic writings, literary works, and multilingual texts, among others. By engaging with a variety of data types, such as scientific papers, social media posts, legal documents, and even conversational dialogues, LLMs become adept at comprehending and generating language across many contexts, enhancing their ability to provide relevant and accurate information.

## Explanation of Technical Terms:

**Preprocessing**: This is the process of preparing and cleaning data before it is used to train a machine learning model. It might involve removing errors, irrelevant information, or formatting the data in a way that the model can easily learn from it.

**Fine-tuning**: After a model has been pre-trained on a large dataset, fine-tuning is an additional training step where the model is further refined with specific data to improve its performance on a particular type of task.

## Data Scale and Volume

https://www.youtube.com/watch?v=9ZRaCc3fu48

The scale of data for Large Language Models (LLMs) is tremendously vast, involving datasets that could equate to millions of books. The sheer size is pivotal for the model's understanding and mastery of language through exposure to diverse words and structures.

## Explanation of Technical Terms:

**Gigabytes/Terabytes**: Units of digital information storage. One gigabyte (GB) is about 1 billion bytes, and one terabyte (TB) is about 1,000 gigabytes. In terms of text, a single gigabyte can hold roughly 1,000 books.

**Common Crawl**: An open repository of web crawl data. Essentially, it is a large collection of content from the internet that is gathered by automatically scraping the web.

## Biases in Training Data

Biases in training data deeply influence the outcomes of AI models, reflecting societal issues that require attention. Ways to approach this challenge include promoting diversity in development teams, seeking diverse data sources, and ensuring continued vigilance through bias detection and model monitoring.

## Technical Terms Explained:

**Selection Bias**: When the data used to train an AI model does not accurately represent the whole population or situation by virtue of the selection process, e.g. those choosing the data will tend to choose dataset their are aware of

**Historical Bias**: Prejudices and societal inequalities of the past that are reflected in the data, influencing the AI in a way that perpetuates these outdated beliefs.

**Confirmation Bias**: The tendency to favor information that confirms pre-existing beliefs, which can affect what data is selected for AI training.

**Discriminatory Outcomes**: Unfair results produced by AI that disadvantage certain groups, often due to biases in the training data or malicious actors.

**Echo Chambers**: Situations where biased AI reinforces and amplifies existing biases, leading to a narrow and distorted sphere of information.

**Bias Detection and Correction**: Processes and algorithms designed to identify and remove biases from data before it's used to train AI models.

**Transparency and Accountability**: Openness about how AI models are trained and the nature of their data, ensuring that developers are answerable for their AI's performance and impact.

## Disinformation and Misinformation

https://www.youtube.com/watch?v=e-QzJnego04

In today's digital landscape, disinformation and misinformation pose significant risks, as foundation models like AI language generators have the potential to create and propagate false content. It's crucial to educate people about AI's capabilities and limitations to help them critically assess AI-generated material, fostering a community that is well-informed and resilient against these risks.

## Technical terms explained:

**Synthetic Voices**: These are computer-generated voices that are often indistinguishable from real human voices. AI models have been trained on samples of speech to produce these realistic voice outputs.

**Content Provenance Tools**: Tools designed to track the origin and history of digital content. They help verify the authenticity of the content by providing information about its creation, modification, and distribution history.

## Environmental and Human Impacts

https://www.youtube.com/watch?v=-BIHaDn036g

Foundation models have both environmental and human impacts that are shaping our world. While the environmental footprint includes high energy use, resource depletion, and electronic waste, we're also facing human challenges in the realms of economic shifts, bias and fairness, privacy concerns, and security risks.