In [1]:
from langchain_community.document_loaders import TextLoader # For loading text documents
from langchain_openai import ChatOpenAI # For interacting with OpenAI chat models
from langchain_core.output_parsers import StrOutputParser # For parsing LLM output to a string
from langchain_core.prompts import PromptTemplate # For creating reusable prompt structures
from dotenv import load_dotenv # For loading environment variables

# Load environment variables from a .env file. 🌍
# This is crucial for securely loading your API keys (e.g., OPENAI_API_KEY)
# so they are not hardcoded directly in your script.
load_dotenv()

# Initialize the ChatOpenAI language model. 🤖
# This creates an instance of the OpenAI chat model (e.g., gpt-3.5-turbo or gpt-4).
# This model will be used to generate the summary of the poem.
model = ChatOpenAI()

In [2]:
# Define the PromptTemplate for summarizing a poem. 📝
# This template takes a single input variable, 'poem', and instructs the LLM
# to write a summary for the provided poem text.
prompt = PromptTemplate(
    template='Write a summary for the following poem - \n {poem}',
    input_variables=['poem']
)

In [3]:
# Initialize a StrOutputParser. 📄
# This parser is responsible for taking the raw output from the LLM (which is
# typically a message object) and extracting its content as a simple string.
# This makes the LLM's output directly usable by subsequent components or for printing.
parser = StrOutputParser()

In [4]:
# Initialize the TextLoader to load content from a file. 📂
# 'cricket.txt' is the name of the file to load.
# `encoding='utf-8'` specifies the character encoding, which is common for text files.
loader = TextLoader('data\\45_ai.txt', encoding='utf-8')

In [5]:
# Load the document(s) using the loader. 📚
# The `loader.load()` method reads the file and returns a list of `Document` objects.
# Each `Document` object contains the `page_content` (the text from the file)
# and `metadata` (information about the document, like its source).
docs = loader.load()

In [6]:
# Print the type of the 'docs' variable. 🧐
# It will confirm that `docs` is a list (specifically, a list of LangChain Document objects).
print(f"Type of 'docs': {type(docs)}")

Type of 'docs': <class 'list'>


In [7]:
# Print the number of documents loaded. 📏
# If 'cricket.txt' is a single file, this will typically be 1.
print(f"Number of documents loaded: {len(docs)}")

Number of documents loaded: 1


In [8]:
# Print the actual content of the first document. 📜
# `docs[0]` accesses the first Document object in the list.
# `.page_content` retrieves the main text content from that document.
print("\nContent of the first document:")
print(docs[0].page_content)


Content of the first document:
AI, or **Artificial Intelligence**, is a field of computer science focused on creating machines that can perform tasks typically requiring human intelligence 🧠. This includes learning, problem-solving, decision-making, perception, and understanding language. From powering virtual assistants and recommending movies to driving autonomous vehicles and assisting in medical diagnostics, AI is rapidly transforming industries and daily life. It encompasses various techniques, including **machine learning** and **deep learning**, enabling systems to learn from data and improve performance over time. While offering immense potential, AI also raises important considerations regarding ethics, bias, and its societal impact.


In [9]:
# Print the metadata associated with the first document. 🏷️
# `.metadata` retrieves a dictionary containing information like the source file path.
print("\nMetadata of the first document:")
print(docs[0].metadata)


Metadata of the first document:
{'source': 'data\\45_ai.txt'}


In [10]:
# Create a LangChain Expression Language (LCEL) chain. 🔗
# The `|` operator pipes the output of one component as the input to the next.
# 1. `prompt`: Takes the `poem` content from the `invoke` call and formats the prompt.
# 2. `model`: Receives the formatted prompt and generates the summary.
# 3. `parser`: Takes the LLM's summary output and extracts it as a clean string.
chain = prompt | model | parser

In [11]:
# Invoke the chain with the poem content from the loaded document. 🚀
# We pass the `page_content` of the first document to the 'poem' input variable of our prompt.
# The chain then executes the entire pipeline: prompt formatting -> LLM call -> output parsing.
print("\nGenerated Summary:")
print(chain.invoke({'poem':docs[0].page_content}))


Generated Summary:
The poem introduces the concept of Artificial Intelligence (AI) as a field of computer science that aims to create machines capable of performing tasks that typically require human intelligence. It highlights the various capabilities of AI, such as learning, problem-solving, decision-making, perception, and language understanding, and how it is transforming industries and daily life by powering virtual assistants, recommending movies, driving autonomous vehicles, and assisting in medical diagnostics. The poem also mentions the techniques used in AI, such as machine learning and deep learning, which allow systems to learn from data and improve performance over time. However, it also touches on the ethical considerations, biases, and societal impact that come with the advancements in AI technology.
