# This notebook serves as a testing site

In [4]:
# !pip install langchain --upgrade
# # Version: 0.0.164
#!pip install tiktoken
# !pip install pypdf
#!pip install python-dotenv

In [1]:
# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader
import openai
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from dotenv import load_dotenv

### Load your data

In [2]:
loader = PyPDFLoader("data/Introduction_to_algorithms-3rd Edition.pdf")

## Other options for loaders 
# loader = UnstructuredPDFLoader("../data/field-guide-to-data-science.pdf")
# loader = OnlinePDFLoader("https://wolfpaulus.com/wp-content/uploads/2017/05/field-guide-to-data-science.pdf")

In [3]:
data = loader.load()

In [4]:
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[6].page_content)} characters in your document')

You have 1313 document(s) in your data
There are 983 characters in your document


### Chunk your data up into smaller documents

In [5]:
# Note: If you're using PyPDFLoader then we'll be splitting for the 2nd time.
# This is optional, test out on your own data.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
texts = text_splitter.split_documents(data)

In [6]:
print (f'Now you have {len(texts)} documents')

Now you have 1919 documents


### Create embeddings of your documents to get ready for semantic search

In [11]:
# !pip3 install pinecone-client

In [None]:
from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

In [39]:
load_dotenv()

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY')
PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV') 
openai_api_key = os.getenv("OPENAI_API_KEY")
pinecone_api_key = os.getenv("PINECONE_API_KEY")
pinecone_index = os.getenv("PINECONE_INDEX_NAME")
pinecone_env = os.getenv("PINECONE_API_ENV")
pinecone_namespace = os.getenv("PINECONE_NAMESPACE")

In [9]:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [26]:
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "langchainalgoedu" # put in the name of your pinecone index here

In [38]:
pinecone.list_indexes()

['langchainalgoedu']

In [41]:
index = pinecone.Index(pinecone_index)


In [42]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.1,
 'namespaces': {'': {'vector_count': 2725}},
 'total_vector_count': 2725}

In [None]:
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

In [44]:
docsearch = Pinecone.from_existing_index(index_name=index_name, embedding=embeddings)

In [46]:
query = "What is Kruskal algorithm?"
docs = docsearch.similarity_search(query)

In [20]:
# Here's an example of the first document that was returned
#print(docs[0].page_content[:450])

### Query those docs to get your answer back

In [30]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [47]:
llm = OpenAI(temperature = 0.2, max_tokens = 200, top_p = 0.2, frequency_penalty = 0.8, presence_penalty = 0.1, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")

In [48]:
topic = "Kruskal algorithm"

## 1. Ask random questions

In [49]:
query = f"Ask me 5 questions about {topic}"
docs = docsearch.similarity_search(query)

In [50]:
response_1 = chain.run(input_documents=docs, question=query)

In [51]:
print(response_1)


1. What is Kruskal's algorithm? 
2. How does Kruskal's algorithm work? 
3. What is the purpose of Kruskal's algorithm? 
4. What are the steps involved in implementing Kruskal's algorithm? 
5. How can Kruskal's algorithm be used to solve a graph problem?


## 2. Key Points for studying

In [19]:
# Create study notes Model
response_2 = openai.Completion.create(
  model="text-davinci-003",
  prompt=f"What are main key points I should know when studying {topic}?",
  temperature=0.2,
  max_tokens=150,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

In [20]:
print(response_2.choices[0].text)



1. Kruskal's algorithm is a greedy algorithm used to find the minimum spanning tree of a graph.
2. It works by sorting the edges of the graph by weight, then selecting the edges with the lowest weight until a spanning tree is formed.
3. The algorithm is used to find the most efficient way to connect all the vertices of a graph with the minimum total weight of the edges.
4. Kruskal's algorithm is used in network design problems, such as finding the most efficient way to connect computers in a network.
5. The algorithm is also used in image segmentation, clustering, and other applications.
6. Kruskal's algorithm is a good choice when the graph is


## 3. Give Pseudocode -- not work yet

In [52]:
query_3 = f"Give me a pseudocode of how {topic} works"
response_3 = chain.run(input_documents=docs, question=query_3)
print(response_3)   

 The pseudocode for Kruskal's algorithm is as follows: 
1. Initialize the set A to the empty set and create jVj trees, one containing each vertex. 
2. Take edges in non-decreasing order by weight. 
3. For each edge (u, v): 
    a) If FIND-SET(u) ≠ FIND-SET(v): 
        i) Add edge (u, v) to A 
        ii) UNION(u, v). 
4. Return A


## 4. Interview questions about algorithm

In [22]:
topic = "Kruskal algorithm"

In [23]:
response_4 = openai.Completion.create(
  model="text-davinci-003",
  prompt="Create a list of 8 questions for my interview with a technical recruiter about {topic}:",
  temperature=0.5,
  max_tokens=150,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

In [24]:
print(response_4.choices[0].text)



1. What experience do you have in recruiting technical talent?
2. What strategies do you use to source and attract technical talent?
3. How have you successfully matched technical talent with the right organizations?
4. What challenges have you encountered when recruiting technical talent?
5. What do you consider the most important qualities to look for in a technical candidate?
6. How do you ensure that technical candidates are well-suited to the organization's culture?
7. What methods do you use to evaluate technical candidates?
8. How do you keep up to date with the latest trends and developments in the technical recruiting field?
