### Summarization with LangChain and OpenAI

In [6]:
# !pip install -r ./requirements.txt -q

In [7]:
# !pip show langchain

In [8]:
# !pip install langchain --upgrade -q
# For working with unstructured pdf load, install the below dependencies
# !pip install unstructured
# !pip install pdf2image
# !pip install pdfminer
# !pip install pdfminer.six 

### Environment Variables

In [9]:
import os
from dotenv import load_dotenv, find_dotenv

In [10]:
load_dotenv(find_dotenv(), override=True)

True

### Summarization with LangChain & OpenAI using Map Reduce

In [11]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [12]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [13]:
with open('./CNNs.txt', encoding='utf-8') as f:
    text = f.read()

In [21]:
llm.get_num_tokens(text)

2022

In [22]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [23]:
len(chunks)

3

In [24]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
summary = chain.run(chunks)

In [25]:
print(summary)

Convolutional Neural Networks (CNNs) are widely used in image recognition tasks due to their ability to encode image-specific features and simplify the architecture of artificial neural networks. They are important in computer vision applications such as self-driving cars and face recognition, as they accurately classify and detect objects in images. CNNs are designed to handle image data and use the convolution operation as a fundamental building block. They can detect patterns in images by searching for specific pixel values and exploit knowledge about the specific type of input to simplify the network architecture.


In [26]:
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [27]:
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

### Summarization with LangChain & OpenAI using Map Reduce & Custom Prompt

In [15]:
map_custom_prompt = '''
Summarize the following text in a clear and concise way:
TEXT:`{text}`
Brief Summary:
'''

In [16]:
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template = map_custom_prompt
)

In [17]:
combine_custom_prompt='''
Generate a summary of the following text that includes the following elements:

* A title that accurately reflects the content of the text.
* An introduction paragraph that provides an overview of the topic.
* Bullet points that list the key points of the text.
* A conclusion paragraph that summarizes the main points of the text.

Text:`{text}`
'''

In [18]:
combine_prompt_template = PromptTemplate(
    input_variables=['text'],
    template = combine_custom_prompt
)

In [20]:
summary_chain = load_summarize_chain (
    llm=llm,
    chain_type='map_reduce',
    map_prompt = map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)

In [28]:
summary = summary_chain.run(chunks)

In [29]:
print(summary)

Title: Understanding Convolutional Neural Networks (CNNs) for Image Recognition

Introduction:
This text provides an overview of convolutional neural networks (CNNs) and their significance in computer vision applications. It highlights the challenges of working with large images and the need for CNNs to handle these images effectively.

Key Points:
- CNNs are commonly used for image recognition tasks in computer vision.
- They mimic the brain's recognition process by detecting and categorizing features.
- CNNs are crucial in applications like self-driving cars and face recognition.
- They can be used for image classification and object detection.
- CNNs are designed to handle large images and overcome the challenges associated with them.
- The structure of CNNs includes an input layer, the convolutional neural network itself, and an output layer.
- CNNs can be used for tasks beyond image classification, such as detecting human emotions in images.
- CNNs scan images by representing blac

### Another Use Case

In [30]:
map_custom_prompt='''
Extracts the Important keywords from each chunk:
TEXT:`{text}`
List of Keywords:
'''

map_prompt_template = PromptTemplate (
    input_variables=['text'],
    template=map_custom_prompt
)

In [31]:
combine_custom_prompt='''
Counts the number of times each important keyword appears in the document & Find the totoal number of keywords:

Text:`{text}`
'''

combine_prompt_template = PromptTemplate(
    template=combine_custom_prompt, 
    input_variables=['text']
)

In [32]:
summary_chain = load_summarize_chain (
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)

In [33]:
summary=summary_chain.run(chunks)
print(summary)

The important keywords and their respective counts are as follows:

- Convolutional Neural Networks: 4
- images: 4
- neural network applications: 1
- pattern recognition tasks: 1
- image-specific features: 1
- brain: 1
- objects: 3
- people: 1
- recognition: 2
- artificial vision systems: 1
- optical illusions: 1
- features: 2
- categorization: 1
- angle: 1
- object detection: 2
- computer vision: 3
- deep learning: 1
- self-driving cars: 1
- pedestrians: 1
- face recognition: 1
- phone: 1
- apps: 1
- image classifications: 1
- image recognition: 1
- position: 1
- cars: 1
- boxes: 1
- multiple cars: 1
- distance: 1
- computer vision problems: 1
- inputs: 1
- 64 by 64 images: 1
- 64 by 64 by three: 1
- 12288: 1
- X: 1
- input features: 1
- 1000-pixel by 1000-pixel image: 1
- one megapixel: 1
- three RGB channels: 1
- three million: 1
- first hidden layer: 1
- 1000 hidden units: 1
- three billion parameters: 1
- overfitting: 1
- computational requirements: 1
- memory requirements: 1
- co