### Installing the library

In [1]:
!pip install openai
!pip install langchain
!pip install pinecone-client
!pip install python-dotenv
!pip install tiktoken
!pip install wikipedia -q



### Loading environment variables

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

os.environ.get('OPENAI_API_KEY')

print("API Key Loaded:", os.environ.get('OPENAI_API_KEY') is not None)

API Key Loaded: True


### Loading data from Public or Private Services --- Wikipedia

In [15]:
def chat_with_wikipedia(query, lang='en', load_max_docs=2):
    from langchain.document_loaders import WikipediaLoader
    loader = WikipediaLoader(query=query, lang=lang, load_max_docs=load_max_docs)
    data = loader.load()
    return data

### Chunking Strategies and Splitting the Documents

In [4]:
def split_text_into_chunks(data, chunk_size=256):
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0)
    chunks = text_splitter.split_documents(data)
    return chunks

### Calculating the cost of Embeddings

In [5]:
def calculate_and_display_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD:{total_tokens / 1000 * 0.0004:.6f}')

### Load or create embedding index function

In [6]:
def load_or_create_embeddings_index(index_name,chunks):
    import pinecone
    from langchain.vectorstores import Pinecone
    from langchain_openai import OpenAIEmbeddings
    
    embeddings = OpenAIEmbeddings()
    
    pinecone.init(api_key=os.environ.get('PINECONE_API_KEY'), environment=os.environ.get('PINECONE_ENV'))
    
    if index_name in pinecone.list_indexes():
        print(f'Index {index_name} already exists. Loading embeddings...', end='')
        vector_store = Pinecone.from_existing_index(index_name, embeddings)
        print('Done')
    else:
        print(f'Creating index {index_name} and embeddings ...', end = '')
        pinecone.create_index(index_name, dimension=1536, metric='cosine')
        vector_store = Pinecone.from_documents(chunks, embeddings, index_name = index_name)
        print('Done')
        
    return vector_store

### Delete Pinecone Index Function

In [7]:
def drop_pinecone_index(index_name='all'):
    import pinecone
    
    pinecone.init(api_key=os.environ.get('PINECONE_API_KEY'), environment=os.environ.get('PINECONE_ENV'))

    if index_name == 'all':
        indexes = pinecone.list_indexes()
        print('Deleting all indexes ...')
        for index in indexes:
            pinecone.delete_index(index)
        print('Done')
    else:
        print(f'Deleting index {index_name} ...', end='')
        pinecone.delete_index(index_name)
        print('Done')

### Chat Function

In [8]:
def chat_app_with_wikipedia(vector_store, query, chat_history=[]):
    from langchain.chains import ConversationalRetrievalChain
    from langchain_openai import ChatOpenAI
    
    llm = ChatOpenAI(temperature=1)
    retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k':3})
    
    crc = ConversationalRetrievalChain.from_llm(llm, retriever)
    result = crc.invoke({'question': query, 'chat_history': chat_history})
    chat_history.append((query, result['answer']))
    
    return result, chat_history

### Chating with context and memory with Wikipedia

In [9]:
### Loading 'Samsung Galaxy S24' information from Wikipedia 
data = chat_with_wikipedia('Samsung Galaxy S24', lang='en')
print(data[0].page_content)

The Samsung Galaxy S24 is a series of high-end Android-based smartphones designed, developed, manufactured, and marketed by Samsung Electronics as part of its flagship Galaxy S series. They collectively serve as the successor to the Samsung Galaxy S23 series. The phones were announced on January 17, 2024, at the 2024 Galaxy Unpacked, alongside Galaxy AI, in San Jose, California. The Samsung Galaxy S24, 24+, and 24 Ultra were released on January 31, 2024. 


== Lineup ==
The Galaxy S24 series includes three devices, which share the same lineup and screen sizes as the previous Galaxy S23 series. The entry-level Galaxy S24 features a flat 6.2-inch (155 mm) display. The Galaxy S24+ features similar hardware in a larger 6.7-inch (168 mm) form factor. At the top of the lineup, the Galaxy S24 Ultra features a flat 6.8-inch (173 mm) display. S24 and S24+ are powered by Snapdragon 8 Gen 3 in the USA, Canada, China, Macau, Hong kong, Taiwan, and a Samsung Exynos 2400 in the rest of the world inc

In [10]:
my_chunks = split_text_into_chunks(data)
print(len(my_chunks))
print(my_chunks[0].page_content)

47
The Samsung Galaxy S24 is a series of high-end Android-based smartphones designed, developed, manufactured, and marketed by Samsung Electronics as part of its flagship Galaxy S series. They collectively serve as the successor to the Samsung Galaxy S23


In [11]:
calculate_and_display_embedding_cost(my_chunks)

Total Tokens: 1789
Embedding Cost in USD:0.000716


In [12]:
drop_pinecone_index()

  from tqdm.autonotebook import tqdm


Deleting all indexes ...
Done


In [13]:
index_name='wikipediadocument'
vector_store = load_or_create_embeddings_index(index_name, my_chunks)

Creating index wikipediadocument and embeddings ...Done


In [18]:
import time
i = 1
print("Please type Quit or Exit to quit.")
while True:
    query = input(f'Query #{i}:')
    i = i + 1
    if query.lower() in ['quit', 'exit']:
        print('Exiting the application.... see you!')
        time.sleep(2)
        break
    
    chat_history = []
    answer, chat_history = chat_app_with_wikipedia(vector_store, query, chat_history)
    answer_content = answer['answer']
    print(f'\nAnswer: {answer_content}')
    print(chat_history)
    print(f'\n {"-" * 100}\n')

Please type Quit or Exit to quit.
Query #1:When was Samsung Galaxy S24 released?

Answer: The Samsung Galaxy S24 series was released on January 31, 2024.
[('When was Samsung Galaxy S24 released?', 'The Samsung Galaxy S24 series was released on January 31, 2024.')]

 ----------------------------------------------------------------------------------------------------

Query #2:Can you give me more information about it?

Answer: Yes, I can provide more information about the Galaxy S24 Ultra. The device features a higher-resolution display and utilizes Gorilla Glass Armor for its display, which Corning claims is 75% less reflective than typical glass surfaces. This means that the screen will have reduced glare and better visibility, especially in bright environments.

The Galaxy S24 Ultra also comes with an integrated S Pen, similar to its predecessor, the Galaxy S22 Ultra. This allows for increased functionality and productivity, as the S Pen can be used for tasks such as taking notes, dr