## Langchain Document Loader

In [None]:
import os

from dotenv import load_dotenv

load_dotenv()

GROQ_API_KEY = os.getenv("GROQ_API_KEY") 

os.environ['GROQ_API_KEY'] = GROQ_API_KEY

GROQ_API_KEY


'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'

In [3]:
from langchain_groq import ChatGroq
llm_groq = ChatGroq(
    model = 'llama-3.1-8b-instant',
    temperature = 0.1,
)

In [4]:
import urllib.request
url ='https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/Client_Mail/Mail.txt'
filename = "Mail.txt"
urllib.request.urlretrieve(url,filename)

('Mail.txt', <http.client.HTTPMessage at 0x24813db2b40>)

In [6]:
from langchain.document_loaders import TextLoader
loader = TextLoader(filename)
loaded_text = loader.load()
print(loaded_text[0].page_content)

Subject: Urgent Attention Required: Project Delay and Critical Issues
Ticket-id: UAT1966286RT5
Dear Alex,

I hope this message finds you well. I am writing to express my concerns regarding the recent delivery of our project, which, unfortunately, has not met our agreed-upon timelines and quality expectations. As you are aware, the project delivery was delayed by two weeks, which has significantly impacted our operational planning and has had a ripple effect on our commitments to our stakeholders.

Upon reviewing the delivered project, we have identified three major bugs that are critical to the functionality and overall performance of the software. These bugs are as follows:

1. **Data Syncing Issue:** There is a significant problem with the data syncing mechanism, leading to loss of data between our servers and the client application.
2. **User Authentication Flaw:** The user authentication process has a serious vulnerability that could potentially compromise user security and privacy

In [7]:
template = """
Read the following mail and extract the following details:
- Name of the person
- Main point of the mail
- ticket id

mail_context - {input_data}

"""

In [8]:
from langchain.prompts import PromptTemplate
prompt = PromptTemplate(
    input_variables=['input_data'],
    template=template
)

chain = prompt | llm_groq

x = chain.invoke({'input_data':loaded_text})
print(x.content)

Here are the extracted details from the mail:

1. **Name of the person**: 
   - Alex (the recipient of the mail)
   - John Doe (the sender of the mail)

2. **Main point of the mail**: 
   - The sender, John Doe, is expressing concerns about the recent delivery of a project that has not met the agreed-upon timelines and quality expectations. He is requesting immediate attention to address three critical bugs that have been identified in the project.

3. **Ticket ID**: 
   - UAT1966286RT5


## Wikipedia Loader

In [None]:
from langchain.document_loaders import WikipediaLoader

loader = WikipediaLoader(query='Call Of Duty',load_max_docs=4)

wiki_loaded_text = loader.load()

print(wiki_loaded_text[0].page_content)

Call of Duty is a first-person shooter military video game series and media franchise published by Activision, starting in 2003. The games were first developed by Infinity Ward, then by Treyarch and Sledgehammer Games. Several spin-off and handheld games were made by other developers. The most recent, Call of Duty: Black Ops 6, was released on October 25, 2024.
The series originally focused on a World War II setting, with Infinity Ward developing Call of Duty (2003) and Call of Duty 2 (2005) and Treyarch developing Call of Duty 3 (2006). Infinity Ward's Call of Duty 4: Modern Warfare (2007) introduced a modern setting and proved to be the breakthrough title for the series, creating the Modern Warfare sub-series; a Modern Warfare remastered version released in 2016. Two other entries, Modern Warfare 2 (2009) and Modern Warfare 3 (2011), were made. The sub-series received a reboot with Modern Warfare in 2019, Modern Warfare II in 2022, and Modern Warfare III in 2023. Infinity Ward has al

## Youtube Loader

In [18]:
from langchain.document_loaders import YoutubeLoader

url = 'https://www.youtube.com/watch?v=OfpXgjP4AOs'

loader = YoutubeLoader.from_youtube_url(url)

youtube_file_data = loader.load()

print(youtube_file_data[0].page_content)

[Music] Superman, [Music] the most powerful being on planet Earth. [Music] We finally meet now as planned. I'll destroy you. And of course, that reporter you always do interviews with who raised you as a child. I'll kill them, too. [Music] No matter what you do to me, loser, your plans will work. Wrap it up. Good luck with that. Make a move, big blue. [Music] They chose him. Let them die. [Music] Hey, quit messing around. I'm not messing around. I'm doing important stuff. [Music] Hey buddy, eyes up here. [Music]


In [19]:
template = """  
Based on the youtube video extract the following details:
- Summary of the video
- Title of the video
- Description

video_context = {input_data}
"""

In [20]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    template=template,
    input_variables=[{'input_data':youtube_file_data}]
)

chain = prompt | llm_groq

x = chain.invoke({'input_data':youtube_file_data})

print(x.content)

Based on the provided video context, here are the extracted details:

- **Summary of the video**: The video appears to be a dramatic and intense scene from a superhero movie or TV show, where the villain (likely Lex Luthor) is threatening Superman and a reporter who raised him as a child. The tone is aggressive and menacing.

- **Title of the video**: Unfortunately, the title of the video is not provided in the given context. However, based on the content, a possible title could be "Lex Luthor Threatens Superman and Lois Lane".

- **Description**: Unfortunately, the description of the video is not provided in the given context. However, based on the content, a possible description could be "Lex Luthor vows to destroy Superman and Lois Lane in this intense confrontation".


## CSV Loader


In [21]:
from langchain.document_loaders import CSVLoader

import urllib.request
url = 'https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/Leads.csv'
filename = 'Leads.csv'
urllib.request.urlretrieve(url,filename)

('Leads.csv', <http.client.HTTPMessage at 0x24825671b80>)

In [22]:
loader = CSVLoader(filename)
loaded_csv = loader.load()
loaded_csv


[Document(metadata={'source': 'Leads.csv', 'row': 0}, page_content='Week_num: 1\nLeads: 756.48\nPromotion_Budget: 51735.6'),
 Document(metadata={'source': 'Leads.csv', 'row': 1}, page_content='Week_num: 2\nLeads: 878.72\nPromotion_Budget: 64608.6'),
 Document(metadata={'source': 'Leads.csv', 'row': 2}, page_content='Week_num: 3\nLeads: 857.92\nPromotion_Budget: 63833'),
 Document(metadata={'source': 'Leads.csv', 'row': 3}, page_content='Week_num: 4\nLeads: 715.84\nPromotion_Budget: 50649.2'),
 Document(metadata={'source': 'Leads.csv', 'row': 4}, page_content='Week_num: 5\nLeads: 772.48\nPromotion_Budget: 60965.8'),
 Document(metadata={'source': 'Leads.csv', 'row': 5}, page_content='Week_num: 6\nLeads: 714.88\nPromotion_Budget: 47608.4'),
 Document(metadata={'source': 'Leads.csv', 'row': 6}, page_content='Week_num: 7\nLeads: 815.04\nPromotion_Budget: 63597.8'),
 Document(metadata={'source': 'Leads.csv', 'row': 7}, page_content='Week_num: 8\nLeads: 691.84\nPromotion_Budget: 49515.2'),
 D

In [24]:
template = """ 
Read the following data and extract below information and don't give me any code just analyze the data :
- Maximum of the Promotion_Budget
- Maximum of the  Leads
- Average of the Promotion_Budget
- Average of the Lead

Data - {input_data}

"""

prompt = PromptTemplate(
    input_variables=['input_data'],
    template=template
)

chain = prompt | llm_groq

x = chain.invoke({'input_data':loaded_csv})

print(x.content)

Based on the provided data, here are the extracted information:

- Maximum of the Promotion_Budget: 110825.4 (Week 77)
- Maximum of the Leads: 1624.56 (Week 77)
- Average of the Promotion_Budget: The average of the Promotion_Budget is calculated by summing up all the Promotion_Budget values and dividing by the total number of weeks. The sum of the Promotion_Budget values is 4,333,111.4 and the total number of weeks is 80. Therefore, the average of the Promotion_Budget is 54,164.64.
- Average of the Leads: The average of the Leads is calculated by summing up all the Leads values and dividing by the total number of weeks. The sum of the Leads values is 63,511.28 and the total number of weeks is 80. Therefore, the average of the Leads is 794.64.
