In [5]:
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_google_genai import ChatGoogleGenerativeAI

os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY')
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

In [6]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
with open("Data Transformation Simplified Expl.txt", "r") as f:
    text = f.read()

text

"Data Transformation: Simplified Explanation\n\nWhat is Data Transformation?\nData transformation is like changing the way we measure things to make them easier to understand and compare. It's like converting inches to centimeters or changing dollars to euros to help with calculations and comparisons.\n\nWhy and Where?\nData transformation is important because it helps us work with data more effectively. We transform data to make it more useful for analysis and building models.\n\nPractical Uses:\n- Scaling: Changing the range or size of numbers.\n- Normalization: Making data follow a certain pattern or shape.\n\nReal-Life Example:\nThink about temperatures. In some places, they use Celsius, and in others, Fahrenheit. If you want to compare temperatures accurately, you might convert everything to one scale, like Celsius.\n\nMin-Max Scaling: Simplified Explanation\n\nStandard Scaling is like making all your numbers play nicely with each other by giving them a common center (average) and

In [7]:
chat_messages = [
    SystemMessage(content="You are an expert in summarizing text content."),
    HumanMessage(content=f"Please provide a short and concise summary of the text below:\n\n{text}")
]

In [8]:
llm.get_num_tokens(text)

390

In [10]:
# gengerating summary of the text
llm(chat_messages).content

"Data transformation changes the way data is measured for easier analysis and comparison. It's like converting units to make calculations simpler. Methods like Min-Max Scaling and Standard Scaling help normalize data by adjusting its range and spread, ensuring fair comparisons, especially when data has different averages or variations. Choosing the right method depends on the specific goal and data characteristics. \n"

## Prompt Template Text Summarization

In [11]:
from langchain.chains.llm import LLMChain
from langchain_core.prompts import PromptTemplate

prompt_template = """
Write the summary of the following text:
Text: {text}

Translate the precise summary into {language}
"""

prompt = PromptTemplate(
    input_variables=["text", "language"],
    template=prompt_template
)
prompt

PromptTemplate(input_variables=['language', 'text'], template='\nWrite the summary of the following text:\nText: {text}\n\nTranslate the precise summary into {language}\n')

In [14]:
complete_prompt = prompt.format(text=text, language="Urdu")
complete_prompt

"\nWrite the summary of the following text:\nText: Data Transformation: Simplified Explanation\n\nWhat is Data Transformation?\nData transformation is like changing the way we measure things to make them easier to understand and compare. It's like converting inches to centimeters or changing dollars to euros to help with calculations and comparisons.\n\nWhy and Where?\nData transformation is important because it helps us work with data more effectively. We transform data to make it more useful for analysis and building models.\n\nPractical Uses:\n- Scaling: Changing the range or size of numbers.\n- Normalization: Making data follow a certain pattern or shape.\n\nReal-Life Example:\nThink about temperatures. In some places, they use Celsius, and in others, Fahrenheit. If you want to compare temperatures accurately, you might convert everything to one scale, like Celsius.\n\nMin-Max Scaling: Simplified Explanation\n\nStandard Scaling is like making all your numbers play nicely with each 

In [15]:
llm.get_num_tokens(complete_prompt)

410

In [20]:
llm_chain = LLMChain(llm=llm, prompt=prompt)

summary = llm_chain.run({"text": text, "language": "Urdu"})
summary

'## Summary:\n\nData transformation is like changing the way we measure things to make them easier to understand and compare. It helps us analyze data effectively and build better models. Examples include scaling, which changes the range of numbers, and normalization, which makes data follow a specific pattern.\n\nMin-Max Scaling is a specific type of data transformation where we adjust data to have the same average and spread. This allows for fairer comparisons, especially when dealing with data that has different averages and spreads, like stock prices or test scores.\n\n## Urdu Translation:\n\nڈیٹا ٹرانسفرمیشن چیزوں کو ناپنے کے طریقے کو تبدیل کرنے جیسا ہے تاکہ انہیں سمجھنا اور موازنہ کرنا آسان ہو جائے۔ یہ ہمیں ڈیٹا کا تجزیہ کرنے اور بہتر ماڈل بنانے میں مدد کرتا ہے۔ اس کے مثالیں ہیں سکیلنگ، جو نمبروں کی حد کو تبدیل کرتا ہے، اور نارملائزیشن، جو ڈیٹا کو ایک مخصوص پیٹرن پر چلاتا ہے۔\n\nمن-میکس سکیلنگ ڈیٹا ٹرانسفرمیشن کا ایک خاص طریقہ ہے جہاں ہم ڈیٹا کو ایڈجسٹ کرتے ہیں تاکہ اس کا اوسط او

In [21]:
# we can also use this method
llm_chain = prompt | llm 

summary = llm_chain.invoke({"text": text, "language": "Urdu"})
summary.content

"## Summary:\n\nData transformation is like changing the units of measurement to make data easier to understand and compare. It's crucial for effective analysis and building models. \n\nTwo common techniques are:\n\n* **Min-Max Scaling:**  Rescales data to a specific range, like 0 to 1. This helps compare data with different scales.\n* **Standard Scaling:**  Adjusts data by centering it around the average and standardizing its spread. This is useful for comparing data with different averages and variations.\n\nThink of it like converting temperatures from Fahrenheit to Celsius for accurate comparisons.\n\n## Urdu Translation:\n\nڈیٹا ٹرانسفرمیشن ڈیٹا کو سمجھنے اور موازنہ کرنے کے لیے اس کے پیمانے کو تبدیل کرنے جیسا ہے۔ یہ تجزیہ اور ماڈل سازی کے لیے بہت ضروری ہے۔\n\nدو عام تکنیکیں ہیں:\n\n* **من-میکس اسکیلنگ:**  ڈیٹا کو کسی مخصوص رینج، جیسے 0 سے 1 تک، میں تبدیل کرنا۔ یہ مختلف پیمانے پر ڈیٹا کے موازنہ میں مدد کرتا ہے۔\n* **سٹینڈرڈ اسکیلنگ:**  ڈیٹا کو اوسط کے ارد گرد مرتکز کر کے اور اس کی 

## 1. Stuff Document Chain Text Summarization
If the size of documents fits in the context window of the LLM, we use this method.

In [26]:
from langchain_community.document_loaders import PyPDFLoader, PyMuPDFLoader

loader = PyMuPDFLoader("Panaversity Cloud Native Applied Generative AI Engineer (updated).pdf")
doc_pages = loader.load_and_split()
doc_pages

[Document(metadata={'source': 'Panaversity Cloud Native Applied Generative AI Engineer (updated).pdf', 'file_path': 'Panaversity Cloud Native Applied Generative AI Engineer (updated).pdf', 'page': 0, 'total_pages': 36, 'format': 'PDF 1.4', 'title': 'Panaversity Cloud Native Applied Generative AI Engineer', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'Skia/PDF m128 Google Docs Renderer', 'creationDate': '', 'modDate': '', 'trapped': ''}, page_content="Certified Cloud Native Applied\nGenerative AI Engineer\nMaster the Future\nBuild Custom GPTs, AI Agents, Humanoids, and Fine-Tune LLMs\nVersion: 12.5 (Implementation and adoption starting from August 1, 2024)\nToday's pivotal technological trends are Cloud Native (CN), Generative AI (GenAI),\nand Physical AI. Cloud Native technology offers a scalable and dependable platform\nfor application operation, while AI equips these applications with intelligent,\nhuman-like capabilities. Physical AI aims to bridge the ga

In [27]:
template = """
Write a concise and short summary of the following text capturing all important information:
Text: {text}
"""
prompt = PromptTemplate(input_variables=["text"], template=template)
prompt

PromptTemplate(input_variables=['text'], template='\nWrite a concise and short summary of the following text capturing all important information:\nText: {text}\n')

In [28]:
from langchain.chains.summarize import load_summarize_chain

chain = load_summarize_chain(llm=llm, chain_type="stuff", prompt=prompt, verbose=True)
summary = chain.run(doc_pages[:10])  # first 10 document chunks
summary



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise and short summary of the following text capturing all important information:
Text: Certified Cloud Native Applied
Generative AI Engineer
Master the Future
Build Custom GPTs, AI Agents, Humanoids, and Fine-Tune LLMs
Version: 12.5 (Implementation and adoption starting from August 1, 2024)
Today's pivotal technological trends are Cloud Native (CN), Generative AI (GenAI),
and Physical AI. Cloud Native technology offers a scalable and dependable platform
for application operation, while AI equips these applications with intelligent,
human-like capabilities. Physical AI aims to bridge the gap between digital
intelligence and physical capability, creating systems that can understand and
interact with the world in a human-like manner. Our aim is to train you to excel as a
Cloud Native Applied Generative and Physical AI developer globally.
The C

"## Certified Cloud Native Applied Generative AI Engineer: Master the Future\n\nThis 21-month program trains you to become a leading-edge Cloud Native Generative and Physical AI developer. It equips you with skills to thrive in the future of AI and cloud computing, covering topics like custom GPTs, AI agents, and humanoid robotics.\n\n**Key Features:**\n\n* **Cutting-edge skills:** Develop in-demand skills using GenAI and cloud-native technologies.\n* **Industry-ready:** Prepare for global certifications and freelance opportunities in just six months.\n* **Future-proof your career:** Stay ahead of the curve in the rapidly evolving tech landscape.\n\n**Program Structure:**\n\n* **Foundation Level (3 Quarters):** Covers fundamentals of prompt engineering, Docker, GitHub, modern Python programming, applied Generative AI, building custom GPTs and multi AI agent systems, and cloud-native AI-powered microservices design, development, and deployment.\n* **Professional Level (4 Quarters):** Fo

## 2. Map Reduce Text Summarization

In [31]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=300)
text_chunks = splitter.split_documents(doc_pages)
text_chunks

[Document(metadata={'source': 'Panaversity Cloud Native Applied Generative AI Engineer (updated).pdf', 'file_path': 'Panaversity Cloud Native Applied Generative AI Engineer (updated).pdf', 'page': 0, 'total_pages': 36, 'format': 'PDF 1.4', 'title': 'Panaversity Cloud Native Applied Generative AI Engineer', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'Skia/PDF m128 Google Docs Renderer', 'creationDate': '', 'modDate': '', 'trapped': ''}, page_content="Certified Cloud Native Applied\nGenerative AI Engineer\nMaster the Future\nBuild Custom GPTs, AI Agents, Humanoids, and Fine-Tune LLMs\nVersion: 12.5 (Implementation and adoption starting from August 1, 2024)\nToday's pivotal technological trends are Cloud Native (CN), Generative AI (GenAI),\nand Physical AI. Cloud Native technology offers a scalable and dependable platform\nfor application operation, while AI equips these applications with intelligent,\nhuman-like capabilities. Physical AI aims to bridge the ga

In [41]:
len(text_chunks)

58

In [42]:
chunks_prompt = """
Please summarize the following text:
Text: {text}
Summary:
"""
map_prompt_template = PromptTemplate(input_variables=["text"], template=chunks_prompt)
map_prompt_template

PromptTemplate(input_variables=['text'], template='\nPlease summarize the following text:\nText: {text}\nSummary:\n')

In [43]:
final_summary_prompt = """
Provide the final summary of the entire document with these important points:
Add a suitable title then start the summary in proper markdown format:

Summary: {text}
"""
final_prompt_template = PromptTemplate(input_variables=["text"], template=final_summary_prompt)

In [49]:
from langchain_groq import ChatGroq
llm = ChatGroq(model="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))

In [50]:
summary_chain = load_summarize_chain(llm=llm,
                                     chain_type="map_reduce",
                                     map_prompt=map_prompt_template,
                                     combine_prompt=final_prompt_template,
                                     verbose=True)
output = summary_chain.run(text_chunks[:20])
output



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Please summarize the following text:
Text: Certified Cloud Native Applied
Generative AI Engineer
Master the Future
Build Custom GPTs, AI Agents, Humanoids, and Fine-Tune LLMs
Version: 12.5 (Implementation and adoption starting from August 1, 2024)
Today's pivotal technological trends are Cloud Native (CN), Generative AI (GenAI),
and Physical AI. Cloud Native technology offers a scalable and dependable platform
for application operation, while AI equips these applications with intelligent,
human-like capabilities. Physical AI aims to bridge the gap between digital
intelligence and physical capability, creating systems that can understand and
interact with the world in a human-like manner. Our aim is to train you to excel as a
Cloud Native Applied Generative and Physical AI developer globally.
The Cloud Native Applied Generative AI Certification prog



In [48]:
from rich import print
print(output)

## 3. Refine Chain Text Summarization
Each previous summary is taken as context and new summary is refined.

In [53]:
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    verbose=True
)
output = chain.run(text_chunks[:20])



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Certified Cloud Native Applied
Generative AI Engineer
Master the Future
Build Custom GPTs, AI Agents, Humanoids, and Fine-Tune LLMs
Version: 12.5 (Implementation and adoption starting from August 1, 2024)
Today's pivotal technological trends are Cloud Native (CN), Generative AI (GenAI),
and Physical AI. Cloud Native technology offers a scalable and dependable platform
for application operation, while AI equips these applications with intelligent,
human-like capabilities. Physical AI aims to bridge the gap between digital
intelligence and physical capability, creating systems that can understand and
interact with the world in a human-like manner. Our aim is to train you to excel as a
Cloud Native Applied Generative and Physical AI developer globally.
The Cloud Native Applied Generative AI Certification progra

In [54]:
print(output)