<a href="https://colab.research.google.com/github/anshupandey/Generative-AI-for-Professionals/blob/main/RAG_Basic_Concepts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install chromadb transformers gensim --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.0/509.0 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m57.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.1/92.1 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.1/41.1 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.4/5.4 MB[0m [31m51.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m55.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.9/57.9 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Working with **Embeddings**

In [None]:
import gensim.downloader as api

# Load the GloVe model from Gensim-data repository
# Here we use 'glove-wiki-gigaword-50' as an example. Other dimensions/models are available as well.
glove_model = api.load('glove-wiki-gigaword-50')
word = 'python'
word_vector = glove_model[word]
print(word_vector.shape)

(50,)


In [None]:
v1 = glove_model["king"]
v2 = glove_model['ruler']
v3 = glove_model['table']

In [None]:
# prompt: python code to calculate similarity between two vectors

import numpy as np
similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
print(similarity)


0.74342537


In [None]:
similarity = np.dot(v1, v3) / (np.linalg.norm(v1) * np.linalg.norm(v2))
print(similarity)

0.26058447


In [None]:
glove_model.most_similar("stocks")

[('stock', 0.8653818368911743),
 ('markets', 0.8522835969924927),
 ('prices', 0.8431004285812378),
 ('market', 0.8400351405143738),
 ('traders', 0.8257467150688171),
 ('trading', 0.8112872838973999),
 ('investors', 0.8083530068397522),
 ('indexes', 0.7902355194091797),
 ('dealers', 0.7884277701377869),
 ('shares', 0.7868536114692688)]

In [None]:
glove_model.most_similar("trading")

[('stock', 0.9012669920921326),
 ('exchange', 0.898104190826416),
 ('futures', 0.8487032651901245),
 ('trades', 0.8236047029495239),
 ('traded', 0.8166490793228149),
 ('stocks', 0.8112873435020447),
 ('market', 0.8051413893699646),
 ('prices', 0.7966799139976501),
 ('closing', 0.7950035929679871),
 ('closed', 0.7914804220199585)]

In [None]:
glove_model.most_similar("amazing")

[('incredible', 0.9189565181732178),
 ('fantastic', 0.8799790143966675),
 ('awesome', 0.8620665669441223),
 ('wonderful', 0.8537988662719727),
 ('terrific', 0.8482187390327454),
 ('marvelous', 0.8439217805862427),
 ('astonishing', 0.8103041052818298),
 ('remarkable', 0.8091045022010803),
 ('exciting', 0.79411780834198),
 ('unbelievable', 0.7916541695594788)]

## Vector Database: **ChromaDB**

In [None]:
import chromadb
chroma_client = chromadb.Client()

In [None]:
collection = chroma_client.create_collection(name="my_collection")

In [None]:
collection.add(
    embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    documents=["Python is a programming language", "langchain is an framework"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

In [None]:
results = collection.query(
    query_embeddings=[[7.1, 9.1, 6.1]],
    n_results=1
)
results

{'ids': [['id2']],
 'distances': [[10.580000877380371]],
 'metadatas': [[{'source': 'my_source'}]],
 'embeddings': None,
 'documents': [['langchain is an framework']],
 'uris': None,
 'data': None}

## Context Based Generation

In [None]:
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf


In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForQuestionAnswering.

All the weights of TFBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


In [None]:
text = r"""


Standard Chartered plc is a British multinational bank with operations in consumer, corporate and institutional banking, and treasury services. Despite being headquartered in the United Kingdom, it does not conduct retail banking in the UK, and around 90% of its profits come from Asia, Africa, and the Middle East.

Standard Chartered has a primary listing on the London Stock Exchange and is a constituent of the FTSE 100 Index. It has secondary listings on the Hong Kong Stock Exchange, the National Stock Exchange of India, and OTC Markets Group Pink. Its largest shareholder is the Government of Singapore-owned Temasek Holdings.[4][5][6] The Financial Stability Board considers it a systemically important bank.

José Viñals is the group chairman of Standard Chartered.[7] Bill Winters is the current group chief executive.[8]

The Standard Bank was a British bank founded in the Cape Province of South Africa in 1862 by Scot, John Paterson.[10] Having established a considerable number of branches Standard was prominent in financing the development of the diamond fields of Kimberley from 1867 and later extended its network further north to the new town of Johannesburg when gold was discovered there in 1885. Half the output of the second largest gold field in the world passed through the Standard Bank on its way to London. Standard expanded widely in Africa over the years, but from 1883 to 1962 was formally known as the Standard Bank of South Africa. In 1962 the bank changed its name to Standard Bank Limited, and the South African operations became a separate subsidiary that took the parent bank's previous name, Standard Bank of South Africa Ltd.[9]

"""


In [None]:
questions = [
    "WHere Standard Chartered Bank is listed? ",
    "Where standardard charteed bank is spread?",
    "WHo is the current chief of Standard Charted Bank?",
]

for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="tf")
    input_ids = inputs["input_ids"].numpy()[0]

    outputs = model(inputs)
    answer_start_scores = outputs.start_logits
    answer_end_scores = outputs.end_logits

    # Get the most likely beginning of answer with the argmax of the score
    answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]
    # Get the most likely end of answer with the argmax of the score
    answer_end = tf.argmax(answer_end_scores, axis=1).numpy()[0] + 1

    answer = tokenizer.convert_tokens_to_string(
        tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])
    )

    print(f"Question: {question}")
    print(f"Answer: {answer}")

Question: WHere Standard Chartered Bank is listed? 
Answer: london stock exchange
Question: Where standardard charteed bank is spread?
Answer: asia, africa, and the middle east
Question: WHo is the current chief of Standard Charted Bank?
Answer: bill winters
