## Project goal

Get valuable insights from a predefined text document. The obtained information should include at minimum the basic premise of the text, the outline and main topics mentioned in the text. Additional information may be extracted from the main content.


The text analysis will focus on the book: "[Man's Search for Meaning](https://www.amazon.com/Mans-Search-Meaning-Viktor-Frankl-ebook/dp/B009U9S6FI)" by [Viktor Frankl](https://en.wikipedia.org/wiki/Viktor_Frankl).

This project aims to provide both context knowledge as well as technical knowledge. The meaning of live is one one the fundamental  questions of existance. Finding a solution to such a complex topic isn't trivial and goes beyond the framework of one data project. Discovering meaningful insights however can be informative despite not providing a fix answer to a broad philosophical problem.

This project at its basis uses [LangChain](https://github.com/hwchase17/langchain) and OpenAI's GPT3.5. The technical focus of the project is based arround setting un LangChain model calls to obtain meaningfull and reasonable answers for questions ans tasks provided by the user.

The project albo uses [Pinecone](https://www.pinecone.io/) - a vector database perfect for text semantic search.

### Libraries instalation

The notebook uses LangChain, which requires some external libraries to be installed

In [None]:
!pip install langchain
!pip install unstructured
!pip install unstructured[local-inference]
!apt-get install poppler-utils 

!pip install openai
!pip install pinecone-client

In [None]:
!apt install tesseract-ocr
!apt install libtesseract-de

!pip install chromadb

In [None]:
!pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"

### Libraries initialization

In [1]:
import os
import json

import pinecone

import langchain
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings

from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

  from tqdm.autonotebook import tqdm


### Credentials set-up

In [2]:
def load_api_keys(credentials_file_name: str = 'credentials.json') -> tuple:
    '''Load API keys from file

    Arguments:
        credentials_file_name: name of file containing credentials

    Returns:
        A tuple containing OpenAI API Key, Pinecone API key and Pinecone API
        environment name

    '''
    
    if os.path.exists(credentials_file_name):

      # open credentials file 
        with open(credentials_file_name) as f:
            content = json.load(f)

            # load api keys
            OPENAI_API_KEY = content['OPENAI_API_KEY']
            PINECONE_API_KEY = content['PINECONE_API_KEY']
            PINECONE_API_ENV = content['PINECONE_API_ENV']
    else:
        return f'No file {credentials_file_name} or file corrupted'

    return OPENAI_API_KEY, PINECONE_API_KEY, PINECONE_API_ENV


In [3]:
OPENAI_API_KEY, PINECONE_API_KEY, PINECONE_API_ENV = load_api_keys('credentials.json')

In [4]:
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

### Book load

The book itself is loaded to a hard drive. Do to its size it can be split into smaller chunks.

In [5]:
loader = UnstructuredPDFLoader('/content/input/book.pdf')

In [None]:
book = loader.load()

In [None]:
len(book)

1

In [None]:
book[0].page_content[:500]

"be Bee oer) il \n\nRevised and Updated  \n\nInternationally renowned psychiatrist.Viktor E. Frankl,endured years of unspeakablehorror in Nazi death camps. During,and partly because of his suffering, Dr. Frankldeveloped a revolutionary approach topsychotherapy known as logotherapy. At thecore of his theory is the belief thatman's primary motivational force is hissearch for meaning.MAN'S SEARCH FOR MEANING is morethan the story of Viktor E. Frankl's triumph:it is a remarkable blend of science andhuman"

### Book split

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 0)
texts = text_splitter.split_documents(book)

In [None]:
len(texts)

391

In [None]:
texts[12]

Document(page_content='("Logotherapy in a Nutshell") boils down, as it were, to the lesson one may distill from the first part, the autobiographical account ("Experiences in a Concen- tration Camp"), whereas Part One serves as the exis- tential validation of my theories. Thus, both parts mutually support their credibility. I had none of this in mind when I wrote the book in 1945. And I did so within nine successive days and with the firm determination that the book would be published anonymously. In fact, the first printing of the original German version does not show my name on the cover, though at the last moment, just before the book\'s initial publication, I did finally give in to my friends who had urged me to let it be published with my name at least on the title page. At first, however, it had been written with the absolute conviction that, as an anonymous opus, it could never earn its author literary fame. I had wanted simply to convey to the reader by way of a concrete example

### Embeddings creation

OpenAI embeddings will be created for the book content and later upsert to Pinecone database

In [None]:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

### Pinecone initialization and embeddings dump

In [None]:
pinecone.init(
    api_key = PINECONE_API_KEY,
    environment=PINECONE_API_ENV
)

index_name = 'langchain'

In [None]:
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name = index_name)

### Chain building and ask function creation

In [None]:
llm = OpenAI(temperature = 0, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type = 'stuff')

In [None]:
def ask(query: str, chain: function = chain) -> str:
  docs = docsearch.similarity_search(query, include_metadata = True)
  return chain.run(input_documents = docs, queation = query)

### Asking questions

In [None]:
ask('What was the life of prisoners of concentration camps like?')

---

In [None]:
llm = OpenAI(temperature=0.9)

In [None]:
llm('What is the meaning of life according to Viktor Frankl?')

"\n\nViktor Frankl believed that the meaning of life is found in every individual's unique search for meaning, which is largely determined by the person's attitude and individual interpretations of a set of circumstances. He argued that humans have a natural instinct to search for meaning and purpose, and that this was the primary motivation in life. He believed that in order to find meaning, we must invest our time and energy in something greater than ourselves, such as a meaningful job, a relationship with God, or a meaningful cause. He also believed that meaning can be found even in difficult circumstances and that each person has the potential to use the worst circumstances to fashion the best possible outcome."