### ***Stuffs***

In [22]:
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

True

In [None]:
api_key = os.getenv("GOOGLE_API_KEY")

In [24]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.utilities import WikipediaAPIWrapper
from langchain_community.tools import WikipediaQueryRun

In [25]:
wikipedia_api = WikipediaAPIWrapper()

In [26]:
print(wikipedia_api.run("Javascript"))

Page: JavaScript
Summary: JavaScript ( ), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have a dedicated JavaScript engine that executes the client code. These engines are also utilized in some servers and a variety of apps. The most popular runtime system for non-browser usage is Node.js.
JavaScript is a high-level, often just-in-time–compiled language that conforms to the ECMAScript standard. It has dynamic typing, prototype-based object-orientation, and first-class functions. It is multi-paradigm, supporting event-driven, functional, and imperative programming styles. It has application programming interfaces (APIs) for working with text, dates, regular expressions, standard data structures, and the Document Object Model (DOM).
The ECMAScript standard does not include any input/output (I/O), such as networking, s

In [27]:
wikipedia_tool = WikipediaQueryRun(api_wrapper=wikipedia_api)

In [28]:
print(wikipedia_tool.name)
print(wikipedia_tool.description)
print(wikipedia_tool.args)

wikipedia
A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
{'query': {'description': 'query to look up on wikipedia', 'title': 'Query', 'type': 'string'}}


In [29]:
res = wikipedia_tool.invoke({'query':'Python'})



  lis = BeautifulSoup(html).find_all('li')


In [30]:
print(res)

Page: Python (programming language)
Summary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language, and he first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.
Python consistently ranks as one of the most popular programming languages, and it has gained widespread use in the machine learning community

### ***chain = prompt_template | chat | StrOutputParser() | wikipedia_tool***

In [31]:
TEMPLATE = '''
Turn the following user input into a Wikipedia search query. Don't answer the question:

{input}
'''

prompt_template = PromptTemplate.from_template(template=TEMPLATE)

In [32]:
chat = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
)

In [33]:
chain = prompt_template | chat | StrOutputParser() | wikipedia_tool

In [34]:
chain.invoke({'input': 'Who is the creator of the Python programming language?'})

'Page: Python (programming language)\nSummary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.\nPython is dynamically type-checked and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.\nGuido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language, and he first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.\nPython consistently ranks as one of the most popular programming languages, and it has gained widespread use in the machine learning comm

### ***Creating a Retriever and Custom Tool***

In [35]:
from langchain_community.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

from langchain_core.tools import tool
from langchain_core.tools import create_retriever_tool

from platform import python_version

In [36]:
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
)

In [37]:
vectorStore = Chroma(persist_directory="./test",
                     embedding_function=embeddings)

retriever = vectorStore.as_retriever(search_type='mmr', search_kwargs={'k':3, 'lambda_mult': 0.7})

retriever_tool = create_retriever_tool(retriever=retriever,
                                       name= "Introduction to Data and Data Science Course",
                                       description = '''For any question regarding the Introduction to Data and Data Science Course,
                                        you must use this tool''')

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [38]:
retriever_tool.args

{'query': {'description': 'query to look up in retriever',
  'title': 'Query',
  'type': 'string'}}

In [39]:
print(retriever_tool.invoke("Could you list the programming languages a data scientist should know?"))

rning a programming language. More importantly, it will be sufficient for your need to create quick and accurate analyses. However, if your theoretical preparation is strong enough, you will find yourself restricted by software. Knowing a programming language such as R and Python, gives you the freedom to create specific, ad-hoc tools for each project you are working on. Great!

When preparing your BI analysis, for instance, you will surely employ it. Okay. When it comes to data science, mentioning

nal data science. What about big data? Apart from R and Python, people working in this area are often proficient in other languages like Java or Scala. These two have not been developed specifically for doing statistical analyses, however they turn out to be very useful when combining data from multiple sources. All right! Let’s finish off with machine learning. When it comes to machine learning, we often deal with big data. Thus, we need a lot of computational power, and we can expect peop

### ***Custom Tool***

In [51]:
@tool
def get_python_version() -> str:
    '''Useful for questions regarding the version of Python currently used.'''
    return python_version()

In [52]:
get_python_version

StructuredTool(name='get_python_version', description='Useful for questions regarding the version of Python currently used.', args_schema=<class 'langchain_core.utils.pydantic.get_python_version'>, func=<function get_python_version at 0x000001E24F4D9C60>)

In [53]:
get_python_version.invoke({})

'3.13.5'

### ***Adding pdf to vector store***

In [40]:
from langchain_community.document_loaders import PyPDFLoader
import copy

In [41]:
loader_pdf = PyPDFLoader("Introduction_to_Data_and_Data_Science.pdf")

In [42]:
data = loader_pdf.load()

In [43]:
pages_pdf_cut = copy.deepcopy(data)

In [44]:
for i in pages_pdf_cut:
    i.page_content = ' '.join(i.page_content.split())

In [45]:
from langchain_text_splitters.character import CharacterTextSplitter

In [46]:
char_splitter = CharacterTextSplitter(separator="", chunk_size=500, chunk_overlap=50)

In [47]:
pages_char_split = char_splitter.split_documents(pages_pdf_cut)

In [48]:
len(pages_char_split)

21

In [49]:
vector = embeddings.embed_query(pages_char_split[3].page_content)
print(vector)

[0.02296156994998455, -0.013071710243821144, -0.04320036992430687, 0.03289970010519028, 0.06785649061203003, 0.005432088393718004, 8.605547191109508e-05, -0.014738018624484539, 0.00533744040876627, 0.05038222670555115, -0.0059264348819851875, 0.047278545796871185, -0.02962195686995983, 0.020221011713147163, -0.008495617657899857, -0.06722406297922134, 0.0013323277235031128, 0.027411961928009987, 0.020517367869615555, -0.04017411544919014, -0.001996435923501849, -0.010084262117743492, 0.00042604401824064553, -0.017614828422665596, 0.04454001039266586, -0.014855248853564262, 0.010033410973846912, -0.02636623941361904, 0.020376352593302727, 0.01428438350558281, -0.041970521211624146, 0.040634531527757645, -0.035447366535663605, 0.026638366281986237, 0.0031077375169843435, -0.02660328522324562, 0.0036227754317224026, 0.0403808057308197, -0.004819600842893124, 0.04152556136250496, 0.021272145211696625, -0.051691386848688126, -0.03196340799331665, 0.03862547129392624, -0.04072563350200653, -

In [None]:
# vectorstore = Chroma.from_documents(documents=pages_char_split,
#                                     embedding=embeddings,
#                                     persist_directory='./test')

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
