### Summarization with LangChain and OpenAI

In [1]:
# !pip install -r ./requirements.txt -q
# !pip - installs the packages in the base environment
# pip - installs the packages in the virtual environment

In [2]:
# !pip show langchain

In [100]:
# !pip install langchain --upgrade -q
# For working with unstructured pdf load, install the below dependencies
# !pip install unstructured
# !pip install pdf2image
# !pip install pdfminer
!pip install pdfminer.six 

Collecting pdfminer.six
  Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: pdfminer.six
Successfully installed pdfminer.six-20221105


### Python-dotenv

In [18]:
import os
from dotenv import load_dotenv, find_dotenv

In [19]:
load_dotenv(find_dotenv(), override=True)

True

### Basic Prompt

In [20]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [49]:
textMessage="""
Most LLM applications have a conversational interface. An essential component of a conversation is being able to \  
refer to information introduced earlier in the conversation. At bare minimum, a conversational system should be able \
to access some window of past messages directly. A more complex system will need to have a world model that it is \
constantly updating, which allows it to do things like maintain information about entities and their relationships.
We call this ability to store information about past interactions "memory". LangChain provides a lot of utilities \
for adding memory to a system. These utilities can be used by themselves or incorporated seamlessly into a chain.
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some \
core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of \
these inputs can come from memory. A chain will interact with its memory system twice in a given run.
AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its \
memory system and augment the user inputs.
AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs \
of the current run to memory, so that they can be referred to in future runs.
"""

messages = [
    SystemMessage(content='You are a higly skilled writer, and your expertiese lies in the field of copywriting'),
    HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT:{textMessage}')
]

In [50]:
llm.get_num_tokens(textMessage)

253

In [51]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
summary = llm(messages)
print(summary.content)

In the field of conversational interfaces, it is crucial for a system to be able to refer back to previous information in a conversation. This can be achieved through a memory system, which allows the system to store and access past interactions. LangChain offers utilities for adding memory to a system, which can be used independently or seamlessly integrated into a chain. The memory system supports two basic actions: reading and writing. Before executing the core logic, the system reads from memory to augment user inputs, and after executing the core logic, it writes the inputs and outputs to memory for future reference.


### Summarization with LangChain Using Prompt Template

In [42]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [52]:
template="""
Compose a concise and a brief summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
"""

prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [57]:
llm.get_num_tokens(prompt.format(text=textMessage, language='German'))

275

In [54]:
chain = LLMChain(llm=llm, prompt=prompt)

In [55]:
summary= chain.run({'text':textMessage, 'language':'arabic'})

In [56]:
print(summary)

معظم تطبيقات LLM لديها واجهة محادثة. واحدة من المكونات الأساسية للمحادثة هي القدرة على الإشارة إلى المعلومات التي تم طرحها في وقت سابق في المحادثة. على الأقل، يجب أن يكون لدى نظام محادثة القدرة على الوصول إلى نافذة من الرسائل السابقة مباشرة. سيحتاج النظام المعقد أكثر إلى وجود نموذج عالم يتم تحديثه باستمرار، مما يتيح له القيام بأشياء مثل الحفاظ على معلومات حول الكيانات وعلاقاتها.
نسمي هذه القدرة على تخزين المعلومات حول التفاعلات السابقة "الذاكرة". يوفر LangChain العديد من الأدوات لإضافة الذاكرة إلى النظام. يمكن استخدام هذه الأدوات بمفردها أو دمجها بسلاسة في سلسلة.
يحتاج نظام الذاكرة إلى دعم إجراءين أساسيين: القراءة والكتابة. تذكر أن كل سلسلة تحدد بعض المنطق التنفيذي الأساسي الذي يتوقع بعض المدخلات المحددة. بعض هذه المدخلات تأتي مباشرة من المستخدم، ولكن بعض هذه المدخلات يمكن أن تأتي من الذاكرة. ستتفاعل السلسلة مع نظام الذاكرة مرتين في تشغيل معين.
بعد استلام المدخلات الأولية من المستخدم ولكن قبل تنفيذ المنطق الأساسي، ستقوم السلسلة بقراءة من نظام الذاكرة وتعزيز المدخلات من المستخدم.
بعد تن

### Summarization with LangChain using StuffDcoumentsChain

In [58]:
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [64]:
with open('./CNNs.txt', encoding='utf-8') as f:
    text = f.read()
docs = [Document(page_content=text)]

In [65]:
template="""
Compose a concise and a brief summary of the following text:
TEXT: `{text}`
"""

prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [66]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
summary = chain.run(docs)

In [67]:
print(summary)

Convolutional Neural Networks (CNNs) are a type of neural network architecture commonly used for image recognition tasks. They are designed to encode image-specific features into the network, making them well-suited for image-focused tasks. CNNs operate by scanning images and categorizing them based on the features they detect. They are used in various applications such as self-driving cars, face recognition, and computer vision. CNNs address the challenge of working with large input images by implementing the convolution operation, which is a fundamental building block of CNNs. Unlike other neural networks, CNNs exploit knowledge about the specific type of input, allowing for a simpler network architecture.


### Summarization with LangChain using Map_Reduce

In [69]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
with open('./CNNs.txt', encoding='utf-8') as f:
    text = f.read()
    
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [70]:
llm.get_num_tokens(text)

2022

In [81]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [82]:
len(chunks)

3

In [83]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
summary = chain.run(chunks)

In [84]:
print(summary)

Convolutional Neural Networks (CNNs) are widely used in image recognition tasks due to their ability to encode image-specific features and simplify neural network architecture. They operate similarly to the brain, categorizing objects based on detected features. CNNs are crucial in computer vision applications like self-driving cars and face recognition as they accurately classify and detect objects in images. However, working with large images can lead to overfitting and require significant computational resources. CNNs handle image data using the convolution operation and can be used for tasks like image classification and emotion detection. They analyze images as arrays of pixel values and exploit knowledge about the input type to simplify the network architecture.


In [85]:
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [87]:
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

### Summarization using the refine CombineDocumentChain

In [88]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [89]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [101]:
loader = UnstructuredPDFLoader('./GNNs in Anomaly Detection.pdf')
data = loader.load()

[nltk_data] Downloading package punkt to /Users/ds/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/ds/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [102]:
print(data[0].page_content)

Chapter 26 Graph Neural Networks in Anomaly Detection

Shen Wang, Philip S. Yu

Abstract Anomaly detection is an important task, which tackles the problem of dis- covering “different from normal” signals or patterns by analyzing a massive amount of data, thereby identifying and preventing major faults. Anomaly detection is ap- plied to numerous high-impact applications in areas such as cyber-security, ﬁnance, e-commerce, social network, industrial monitoring, and many more mission-critical tasks. While multiple techniques have been developed in past decades in address- ing unstructured collections of multi-dimensional data, graph-structure-aware tech- niques have recently attracted considerable attention. A number of novel techniques have been developed for anomaly detection by leveraging the graph structure. Re- cently, graph neural networks (GNNs), as a powerful deep-learning-based graph rep- resentation technique, has demonstrated superiority in leveraging the graph structure and be

In [104]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [105]:
len(chunks)

6

In [106]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [107]:
# If you want to calculate the cost
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('gpt-3.5-turbo')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.002:.6f}')
    
print_embedding_cost(chunks)

Total Tokens: 13079
Embedding Cost in USD: 0.026158


In [108]:
chain = load_summarize_chain (
    llm=llm,
    chain_type='refine',
    verbose=False
)

output_summary = chain.run(chunks)

In [109]:
print(output_summary)

This chapter provides an overview of the use of graph neural networks (GNNs) in anomaly detection, focusing on the unique challenges in this domain. It discusses existing works that apply GNNs in anomaly detection and highlights the benefits of using graph-based methods. The chapter also presents a summary of the general pipeline and taxonomies of GNN-based anomaly detection approaches. It addresses data-specific, task-specific, and model-specific issues that arise in anomaly detection. The standard pipeline of GNN-based anomaly detection is described, including graph construction and transformation, graph representation learning, and prediction. The chapter concludes with case studies of representative GNN-based anomaly detection approaches, including attentional heterogeneous graph neural networks for malicious program detection and graph matching framework to learn the program representation and similarity metric for unknown malicious program detection. Additionally, the chapter pro

### Summarizing with LangChain Agents

In [117]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper 

In [118]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [119]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
wikipedia = WikipediaAPIWrapper()

In [120]:
tools = [
    Tool (
        name='Wikipedia',
        func=wikipedia.run,
        description='Using wikipedia for getting information'
    )
]

In [121]:
agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

In [122]:
output = agent_executor.run('Please help me by providing a short summary of what is clean architecture in software design?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should use Wikipedia to find information about clean architecture in software design.
Action: Wikipedia
Action Input: Clean architecture[0m
Observation: [36;1m[1;3mPage: Hexagonal architecture (software)
Summary: The hexagonal architecture, or ports and adapters architecture, is an architectural pattern used in software design. It aims at creating loosely coupled application components that can be easily connected to their software environment by means of ports and adapters. This makes components exchangeable at any level and facilitates test automation.

Page: Single-responsibility principle
Summary: The single-responsibility principle (SRP) is a computer programming principle that states that "A module should be responsible to one, and only one, actor." The term actor refers to a group (consisting of one or more stakeholders or users) that requires a change in the module.
Robert C. Martin, the originator of the term, e

In [123]:
print(output)

Clean architecture in software design is an architectural pattern that aims to create loosely coupled application components that can be easily connected to their software environment. It promotes the use of ports and adapters to make components exchangeable at any level and facilitate test automation. It is based on principles such as the single-responsibility principle, the open-closed principle, the Liskov substitution principle, the interface segregation principle, and the dependency inversion principle.
