In [179]:
import os
import re
import string
import PyPDF2 as pdf2, nltk, textract

from pathlib import Path

from txtai.embeddings import Embeddings
from txtai.app import Application

In [180]:
config = Application.read("./app.yml")
embeddings = Embeddings(config["embeddings"], content=True, autoid="uuid5")

# Can also use the config programmatically
#embeddings = Embeddings({
#    "path": "sentence-transformers/all-MiniLM-L6-v2",
#    "backend": "qdrant.Qdrant",
#})


In [181]:
from txtai.pipeline import LLM

# Create LLM
llm = LLM("TheBloke/Mistral-7B-OpenOrca-AWQ")

You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [182]:
def stream(path):
    id = 0
    for f in sorted(os.listdir(path)):
      fpath = os.path.join(path, f)

    # Only accept documents
    if f.endswith(("pdf")):
      print(f"Indexing {fpath}")
      with open (fpath, "rb") as f:
        pdfreader = pdf2.PdfReader(f)
        for index, page in enumerate(pdfreader.pages):
        # page = pdfreader.pages[100]
          page_text = page.extract_text()
          for paragraph in page_text.split("\n\n"):
            cleaned = re.sub('\s+',' ', paragraph)
            if len(cleaned) > 1:
              txt = { "text": cleaned, "page_num": index, "book_name": Path(fpath).name }
              yield (id, txt)
              id = id + 1



In [183]:
embeddings.index(stream("docs/growth"))

Indexing docs/growth/The_Art_of_Work.pdf


In [184]:
# Extractor prompt
def no_rag(question):
  prompt = f"""<|im_start|>system
You are a friendly assistant. You answer questions from users.<|im_end|>
<|im_start|>user
{question} <|im_end|>
<|im_start|>assistant
"""
  return llm(prompt, maxlength=4096, pad_token_id=32000)
    
def prompt_llm(question, text):
  prompt = f"""<|im_start|>system
  You are a friendly assistant. You answer questions from users.<|im_end|>
  <|im_start|>user
  Answer the following question using only the context below. Only include information specifically discussed.

  question: {question}
  context: {text} <|im_end|>
  <|im_start|>assistant
  """

  return llm(prompt, maxlength=4096, pad_token_id=32000)

def context(prompt, ann_embeddings):
  context =  "\n".join(x["text"] for x in ann_embeddings)
  return context

def rag(prompt):
  ann_embeddings = embeddings.search(prompt)
  return prompt_llm(prompt, context(prompt, ann_embeddings)), ann_embeddings

In [185]:
no_rag_result = no_rag("How to discover what you were meant to do")
print(no_rag_result)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Discovering what you were meant to do can be a life-changing experience. Here are some steps to help you find your purpose:

1. Reflect on your interests and passions: Think about the activities, hobbies, and subjects that genuinely excite you. These interests can give you clues about your purpose.

2. Consider your strengths and talents: Assess your skills and abilities, and think about how you can use them to make a positive impact on the world.

3. Analyze your past experiences: Look back at your life and identify patterns or events that have shaped your values and interests. These experiences can provide insights into your purpose.

4. Seek inspiration: Read books, watch movies, or listen to podcasts about people who have found their purpose. This can help you gain perspective and inspiration.

5. Set goals and take action: Once you have a general idea of what you want to do, set specific, achievable goals and take steps towards achieving them.

6. Network and connect with others: 

In [186]:
(result, citations) = rag("How to discover what you were meant to do")
print(result)
references = ""
rid = 0
for c in citations:
    rid = rid + 1
    reference = embeddings.search(f"""select text, page_num, book_name from txtai where id = {c["id"]}""")[0]
    references = references + "\n" + f"""REF {rid}: book_name: {reference["book_name"]}, page_num: {reference["page_num"]}, text: "{reference["text"]}" \n"""
print(f"CITATIONS: {references}")

1. Awareness: To discover what you were meant to do, start by listening to your life and being aware of the signs it is giving you. Pay attention to your interests, passions, and experiences.

2. Apprenticeship: Surround yourself with a supportive community and seek out mentors or accidental apprenticeships. Learn from others and allow your life to prepare you for your calling.

3. Practice: Be open to learning new skills and embrace the challenges that come with real practice. Look for inspiration and guidance along the way.

4. Discovery: Understand that discovering your calling is a process and doesn't happen in a single moment. Be patient and take gradual steps towards uncovering your purpose.

5. Profession: Embrace failure as a learning opportunity and use it to grow and pivot around obstacles. Be open to change and adapt as you pursue your calling.

Remember, finding your calling takes work and requires a lifetime of self-discovery and growth. Be open to change and embrace the j