In [3]:
!pip install ipykernel



In [4]:
!pip install -U langchain langchain-openai langchain-core



In [5]:
!pip install langchain_openai



In [6]:
import sys
print(sys.executable)

/opt/homebrew/anaconda3/envs/vidhya-agents/bin/python


In [7]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from IPython.display import Markdown

In [9]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv('/Users/jasper/Downloads/My_first_agents-main/notebooks/.env')

# Verify that the API key is loaded
api_key = os.getenv('OPENAI_API_KEY')

In [10]:
llm = ChatOpenAI(model='gpt-4o-mini', api_key=api_key)

prompt = ChatPromptTemplate.from_messages([
    ('system', 'You are a research assistant'),
    ('human', '{input}')
])

output_parser = StrOutputParser()

basic_chain = prompt | llm | output_parser

output = basic_chain.invoke({'input': 'Write a 3 bullet point summary about how transformers work. Simplify to non-technical people but keep the main bits of information.'})

Markdown(output)

- **Attention Mechanism**: Transformers use a method called "attention" to focus on different parts of the input data, allowing them to determine which words or elements are most important for understanding the context. This helps the model capture relationships between words in a sentence, regardless of their position.

- **Parallel Processing**: Unlike older models that process information in sequence, transformers can analyze all parts of the input at once. This parallel processing makes them faster and more efficient, especially when handling large amounts of data.

- **Layers and Encoding**: Transformers are built with multiple layers that transform the input data into a format the model can understand. Each layer refines the information, enabling the model to learn complex patterns and generate accurate outputs, like translations or text completions.

Let's write a draft of a research report using chains in langchain.

In [11]:
WRITER_SYS_MSG = """
You are a research assistant and a scientific writer.
You take in requests about tpics and write organized research reprts on those topics.
"""

prompt = ChatPromptTemplate.from_messages([
    ('system', WRITER_SYS_MSG),
    ('human', 'Write an organized research report about this topic:\n\n{topic}.')
])

llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)

output_parser = StrOutputParser()

writer_chain = prompt | llm | output_parser

In [12]:
output = writer_chain.invoke({'topic': 'How do transformers work for non AI researchers?'})

Markdown(output)

# Understanding Transformers: A Non-AI Researcher’s Guide

## Introduction
Transformers are a type of model architecture that has revolutionized the field of artificial intelligence (AI), particularly in natural language processing (NLP). Introduced in a 2017 paper titled "Attention is All You Need" by Vaswani et al., transformers have become the backbone of many state-of-the-art AI applications, including language translation, text generation, and more. This report aims to explain how transformers work in a straightforward manner, making it accessible for non-AI researchers.

## 1. The Basics of Transformers

### 1.1 What is a Transformer?
A transformer is a neural network architecture designed to process sequential data, such as text. Unlike previous models that processed data in order (like recurrent neural networks), transformers can analyze all parts of the input simultaneously, which allows for greater efficiency and effectiveness.

### 1.2 Key Components
Transformers consist of several key components:
- **Input Embeddings**: Words or tokens are converted into numerical vectors that represent their meanings.
- **Positional Encoding**: Since transformers do not process data sequentially, positional encoding is added to the input embeddings to give the model information about the order of the words.
- **Attention Mechanism**: This is the core innovation of transformers, allowing the model to focus on different parts of the input when making predictions.

## 2. The Attention Mechanism

### 2.1 What is Attention?
The attention mechanism allows the model to weigh the importance of different words in a sentence when making predictions. For example, in the sentence "The cat sat on the mat," the model can learn to focus more on "cat" and "mat" when predicting the next word.

### 2.2 How Attention Works
- **Query, Key, and Value**: Each word is transformed into three vectors: a query, a key, and a value. The query represents what the model is looking for, the key represents the information available, and the value is the actual information.
- **Calculating Attention Scores**: The model calculates a score for how much focus to place on each word by taking the dot product of the query and key vectors. This score is then normalized using a softmax function to create a probability distribution.
- **Weighted Sum**: Finally, the model computes a weighted sum of the value vectors based on the attention scores, allowing it to focus on the most relevant words.

## 3. The Transformer Architecture

### 3.1 Encoder and Decoder
Transformers are typically composed of two main parts: the encoder and the decoder.
- **Encoder**: The encoder processes the input data and generates a set of attention-based representations.
- **Decoder**: The decoder takes these representations and generates the output, such as a translated sentence.

### 3.2 Stacking Layers
Both the encoder and decoder consist of multiple layers (often 6 to 12). Each layer contains:
- **Multi-Head Attention**: This allows the model to focus on different parts of the input simultaneously.
- **Feed-Forward Neural Networks**: These networks process the output from the attention mechanism.
- **Residual Connections and Layer Normalization**: These techniques help stabilize training and improve performance.

## 4. Training Transformers

### 4.1 Data Preparation
Transformers require large datasets for training. Text data is typically preprocessed to remove noise and convert words into tokens.

### 4.2 Loss Function
During training, the model's predictions are compared to the actual outputs using a loss function, which quantifies the difference. The goal is to minimize this loss through optimization techniques like gradient descent.

### 4.3 Transfer Learning
Once trained, transformers can be fine-tuned on specific tasks with smaller datasets, making them versatile for various applications.

## 5. Applications of Transformers

Transformers have a wide range of applications, including:
- **Natural Language Processing**: Language translation, sentiment analysis, and text summarization.
- **Computer Vision**: Image classification and object detection.
- **Speech Recognition**: Converting spoken language into text.

## Conclusion

Transformers represent a significant advancement in AI, particularly in how machines understand and generate human language. By leveraging the attention mechanism and a unique architecture, transformers can process information more effectively than previous models. Understanding the basics of transformers can provide valuable insights into the capabilities and potential applications of AI technologies in various fields. As research continues to evolve, transformers are likely to play an even more prominent role in shaping the future of AI.

In [13]:
REVIEWER_SYS_MSG = """
You are a reviewer for research reports. You take in research reports and provide feecback on them.
"""

prompt_reviewer = ChatPromptTemplate.from_messages([
    ('system', REVIEWER_SYS_MSG),
    ('human', 'Provide feedback on this research report:\n\n{report}. As 5 concise bullet points.')
])

llm_reviewer = ChatOpenAI(model='gpt-4o-mini', temperature=0.2)

review_chain = prompt_reviewer | llm_reviewer | output_parser

feedback_output = review_chain.invoke({'report': output})

Markdown(feedback_output)

- **Clarity and Accessibility**: The report effectively breaks down complex concepts related to transformers into digestible sections, making it accessible for non-AI researchers. However, consider adding more analogies or real-world examples to further enhance understanding.

- **Depth of Explanation**: While the report covers the basics well, it could benefit from a deeper exploration of the implications of the attention mechanism and how it differs from traditional methods. This would provide a more comprehensive understanding of why transformers are revolutionary.

- **Visual Aids**: Incorporating diagrams or flowcharts to illustrate the transformer architecture and the attention mechanism would greatly enhance comprehension, especially for visual learners.

- **Applications Section**: The applications of transformers are briefly mentioned but could be expanded with specific examples or case studies to illustrate their impact in various fields, particularly in NLP and computer vision.

- **Future Directions**: The conclusion could be strengthened by discussing potential future developments in transformer research or applications, which would provide readers with insights into the evolving landscape of AI technologies.

In [14]:
FINAL_WRITER_SYS_MSG = """
You take in a research report and a set of bullet points with feedback to improve,
and you revise the research report based on the feedback and write a final version.
"""

prompt_final_writer = ChatPromptTemplate.from_messages(
    [
        ('system', FINAL_WRITER_SYS_MSG),
        ('human', 'Write a reviewed and improved version of this research report:\n\n{report}, based on this feedback:\n\n{feedback}.')
    ]
)
llm_final_writer = ChatOpenAI(model='gpt-4o-mini', temperature=0.2)
chain_final_writer = prompt_final_writer | llm_final_writer | output_parser

output_final_report = chain_final_writer.invoke({'report': output, 'feedback': feedback_output})

Markdown(output_final_report)

# Understanding Transformers: A Non-AI Researcher’s Guide

## Introduction
Transformers are a groundbreaking model architecture that has transformed the landscape of artificial intelligence (AI), particularly in natural language processing (NLP). Introduced in the seminal 2017 paper "Attention is All You Need" by Vaswani et al., transformers have become the foundation for many cutting-edge AI applications, including language translation, text generation, and more. This report aims to explain how transformers work in a straightforward manner, making it accessible for non-AI researchers.

## 1. The Basics of Transformers

### 1.1 What is a Transformer?
A transformer is a neural network architecture designed to process sequential data, such as text. Unlike previous models that processed data in a linear order (like recurrent neural networks), transformers can analyze all parts of the input simultaneously. This parallel processing capability allows for greater efficiency and effectiveness, akin to reading an entire paragraph at once rather than word by word.

### 1.2 Key Components
Transformers consist of several key components:
- **Input Embeddings**: Words or tokens are converted into numerical vectors that represent their meanings. Think of this as translating words into a language that the model can understand.
- **Positional Encoding**: Since transformers do not process data sequentially, positional encoding is added to the input embeddings to provide information about the order of the words. This is similar to adding timestamps to a series of events to understand their sequence.
- **Attention Mechanism**: This is the core innovation of transformers, allowing the model to focus on different parts of the input when making predictions. Imagine a teacher highlighting important sections of a textbook while preparing for a lecture.

## 2. The Attention Mechanism

### 2.1 What is Attention?
The attention mechanism enables the model to weigh the importance of different words in a sentence when making predictions. For instance, in the sentence "The cat sat on the mat," the model can learn to focus more on "cat" and "mat" when predicting the next word, much like a reader emphasizing key terms while summarizing a text.

### 2.2 How Attention Works
- **Query, Key, and Value**: Each word is transformed into three vectors: a query, a key, and a value. The query represents what the model is looking for, the key represents the information available, and the value is the actual information.
- **Calculating Attention Scores**: The model calculates a score for how much focus to place on each word by taking the dot product of the query and key vectors. This score is then normalized using a softmax function to create a probability distribution, akin to assigning weights to different pieces of information based on their relevance.
- **Weighted Sum**: Finally, the model computes a weighted sum of the value vectors based on the attention scores, allowing it to concentrate on the most relevant words, similar to how a researcher prioritizes sources when writing a paper.

## 3. The Transformer Architecture

### 3.1 Encoder and Decoder
Transformers are typically composed of two main parts: the encoder and the decoder.
- **Encoder**: The encoder processes the input data and generates a set of attention-based representations. It can be thought of as a translator that understands the source language.
- **Decoder**: The decoder takes these representations and generates the output, such as a translated sentence, functioning like a translator that produces the target language.

### 3.2 Stacking Layers
Both the encoder and decoder consist of multiple layers (often 6 to 12). Each layer contains:
- **Multi-Head Attention**: This allows the model to focus on different parts of the input simultaneously, similar to having multiple experts analyze various aspects of a problem.
- **Feed-Forward Neural Networks**: These networks process the output from the attention mechanism, akin to synthesizing insights from the analysis.
- **Residual Connections and Layer Normalization**: These techniques help stabilize training and improve performance, ensuring that the model learns effectively without losing important information.

## 4. Training Transformers

### 4.1 Data Preparation
Transformers require large datasets for training. Text data is typically preprocessed to remove noise and convert words into tokens, much like cleaning and organizing data before analysis.

### 4.2 Loss Function
During training, the model's predictions are compared to the actual outputs using a loss function, which quantifies the difference. The goal is to minimize this loss through optimization techniques like gradient descent, similar to refining a thesis based on feedback.

### 4.3 Transfer Learning
Once trained, transformers can be fine-tuned on specific tasks with smaller datasets, making them versatile for various applications. This is akin to a professional adapting their skills to a new job role.

## 5. Applications of Transformers

Transformers have a wide range of applications, including:
- **Natural Language Processing**: Language translation (e.g., Google Translate), sentiment analysis (e.g., determining the emotional tone of a review), and text summarization (e.g., generating concise summaries of articles).
- **Computer Vision**: Image classification (e.g., identifying objects in photos) and object detection (e.g., recognizing faces in images).
- **Speech Recognition**: Converting spoken language into text (e.g., virtual assistants like Siri and Alexa).

## Conclusion

Transformers represent a significant advancement in AI, particularly in how machines understand and generate human language. By leveraging the attention mechanism and a unique architecture, transformers can process information more effectively than previous models. Understanding the basics of transformers provides valuable insights into the capabilities and potential applications of AI technologies across various fields. As research continues to evolve, future developments may include more efficient training methods, enhanced interpretability of models, and broader applications in areas such as healthcare and education. The ongoing evolution of transformers is likely to shape the future of AI, making it an exciting area for further exploration.