In [None]:
! git clone https://github.com/steelblu/DFOCUS_QA.git

Cloning into 'DFOCUS_QA'...
remote: Enumerating objects: 31, done.[K
remote: Counting objects: 100% (31/31), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 31 (delta 11), reused 13 (delta 3), pack-reused 0[K
Receiving objects: 100% (31/31), 650.25 KiB | 2.08 MiB/s, done.
Resolving deltas: 100% (11/11), done.


# Install the dependicies
Run the code below to install the depencies we need for our functions

In [None]:
!pip install llama-index
!pip install langchain
!pip install transformers

Collecting llama-index
  Downloading llama_index-0.10.38-py3-none-any.whl (6.8 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.5-py3-none-any.whl (13 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core<0.11.0,>=0.10.38 (from llama-index)
  Downloading llama_index_core-0.10.38.post1-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m45.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.10-py3-none-any.whl (6.2 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.6-py3-none-any.whl (6.7 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downl

# Define the functions
The following code defines the functions we need to construct the index and query it

In [None]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 300
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.7, model_name="text-davinci-003", max_tokens=num_outputs))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    documents = SimpleDirectoryReader(directory_path).load_data()

    index = GPTSimpleVectorIndex(
        documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
    )

    index.save_to_disk('index.json')

    return index

def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True:
        query = input("What do you want to ask? ")
        response = index.query(query, response_mode="compact")
        display(Markdown(f"Response: <b>{response.response}</b>"))


ImportError: ignored

# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

If you don't have one yet, get it by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste your API key into the text input.

In [None]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")

Paste your OpenAI key here and hit enter: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


#Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

**Notice:** running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you've just set up your account, the free credits that you have should be more than enough for this experiment.

In [None]:
construct_index("DFOCUS_QA/data")

<gpt_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7fec1aef64f0>

#Ask questions
It's time to have fun and test our AI. Run the function that queries GPT and type your question into the input.

If you've used the provided example data for your custom knowledge base, here are a few questions that you can ask:
1. seedCloud는?
2. 디포커스는?


In [None]:
ask_ai()

What do you want to ask? 시드클라우드


Response: <b>는 무엇을 제공하는가?

SeedCloud는 하이브리드 클라우드(Hybrid Cloud) 관리 및 운영, DR서버(Disaster Recovery Server) 통합 관리, 백업 정책 관리, 다중 vCenter 환경의 가상 서버 관리 등을 위한 클라우드 관리 포털 서비스를 제공합니다.</b>

What do you want to ask? 전화 문의


Response: <b>
저희는 전화 문의를 받고 있습니다. 전화 번호는 XXX-XXX-XXXX입니다. 문의 사항이 있으시면 언제든지 연락주시기 바랍니다.</b>

What do you want to ask? 전화


Response: <b>
No, the context information does not provide an answer to the question.</b>

What do you want to ask? 디포커스 홈페이지


Response: <b>문의하기 링크는 무엇인가요?

The link to the DFOCUS website inquiry is https://www.dfocus.net.</b>

What do you want to ask? 현재 년도


Response: <b>
2020년</b>

What do you want to ask? 한국 대통령


Response: <b>
한국 대통령과는 관련이 없습니다.</b>

What do you want to ask? 한국 대통령 이름


Response: <b>
문 대통령</b>

KeyboardInterrupt: ignored