<a href="https://colab.research.google.com/github/ThaminduSulakshana/CustomChatbot-Tutorial/blob/main/ChatGPT_with_excel_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pandas
!pip install -U sentence-transformers
!pip install langchain
!pip install faiss-cpu

In [None]:
import pandas as pd
from sentence_transformers import SentenceTransformer
from langchain.text_splitter import SpacyTextSplitter
import faiss

In [None]:
# Assuming the data is in the first sheet, change the sheet name accordingly

faq_df = pd.read_excel('/content/drive/MyDrive/bluechiptech.asia/LangChain-Chat-bot/docs/MachineLearning-Lecture01.xlsx', sheet_name=0)

# Extract and print the first two lines
first_two_lines = faq_df.head(5)
print("First two lines from the Excel sheet:")
print(first_two_lines)

First two lines from the Excel sheet:
                           MachineLearning-Lecture01
0  Instructor (Andrew Ng): Okay. Good morning. We...
1  By way of introduction, my name's Andrew Ng an...
2  I also want to introduce the TAs, who are all ...
3  So you'll get to know the TAs and me much bett...
4  So just in my own daily work, I actually frequ...


In [None]:
print(faq_df.columns)

Index(['MachineLearning-Lecture01'], dtype='object')


In [None]:
# Create a SpacyTextSplitter instance with a specified chunk size
text_splitter = SpacyTextSplitter(chunk_size=500)

# Split the text in each cell into chunks
faq_df['Question_Chunks'] = faq_df['MachineLearning-Lecture01'].apply(lambda x: text_splitter.split_text(x))

# Flatten the chunks into a list
faq_text_chunks = [chunk for chunks in faq_df['Question_Chunks'] for chunk in chunks]



In [None]:
# Encode the FAQ text using SentenceTransformer
encoder = SentenceTransformer("paraphrase-mpnet-base-v2")
faq_vectors = encoder.encode(faq_text_chunks)

In [None]:
# Build the FAISS index
faq_index = faiss.IndexFlatIP(faq_vectors.shape[1])
faq_index.add(faq_vectors)

In [None]:
df = pd.DataFrame({'facts': faq_text_chunks})
question_counter = 0

In [None]:
df.head()

Unnamed: 0,facts
0,Instructor (Andrew Ng): Okay.\n\nGood morning....
1,"By way of introduction, my name's Andrew Ng\n\..."
2,So I'm actually always excited about teaching ...
3,"I also want to introduce the TAs, who are all ..."
4,"Tom Do is another PhD student, works in comput..."


In [None]:
# Define the search function
def search_faq(search_text, encoder, index, faq_df):
    # Encode the search text
    search_vector = encoder.encode([search_text])

    # Normalize the search vector
    faiss.normalize_L2(search_vector)

    # Perform similarity search
    k = index.ntotal
    distances, ann = index.search(search_vector, k=k)
    results = pd.DataFrame({'distances': distances[0], 'ann': ann[0]})

    # Merge with the FAQ dataframe
    merge = pd.merge(results, faq_df, left_on='ann', right_index=True)

    return merge

In [None]:
# Example search query
search_text = 'Why does Andrew Ng believe in machine learning'

# Perform the search using faq_index instead of index
result = search_faq(search_text, encoder, faq_index, faq_df)

In [None]:
print(result)

     distances  ann                          MachineLearning-Lecture01  \
0     1.652704  199  Okay. So that was most of what I wanted to say...   
1     1.597750  116  And this sort of learning problem of learning ...   
2     1.584338   27  Instructor (Andrew Ng) : Oh, I see, industry. ...   
3     1.529696  153                                      Microphone 1:   
4     1.492354  127  So let's see. So that was supervised learning....   
..         ...  ...                                                ...   
237   0.011255  187  So I got this from Samuel Wyse at Toronto, U o...   
238  -0.026369   76        Instructor (Andrew Ng): Oh, yes, thank you.   
239  -0.129231  201                                     [End of Audio]   
240  -0.129231  194  So it turns out reinforcement learning is appl...   
241  -0.129231  189  So, for example, this is something that my stu...   

                                       Question_Chunks  
0    [Okay.\n\nSo that was most of what I wanted to...

In [None]:
# Extract the nearest result
nearest_result = result.iloc[0]

# Extract information from the nearest result
nearest_distance = nearest_result['distances']
nearest_ann = nearest_result['ann']
nearest_text = df.loc[nearest_ann, 'facts']

# Display the information
print(f"Nearest Distance: {nearest_distance}")
print(f"Nearest ANN: {nearest_ann}")
print(f"Nearest Text:\n{nearest_text}")


Nearest Distance: 1.6527042388916016
Nearest ANN: 199
Nearest Text:
Instructor (Andrew Ng) :

And there's the second algorithm:


In [None]:
# Assuming 'result' DataFrame has columns 'ann' and 'distances'
nearest_index = result['ann'].iloc[0]

# Retrieve the corresponding information from the original DataFrame
nearest_match = df.loc[nearest_index, 'facts']

# Display the result
print(f"Question: {search_text}\n")
print(f"Answer: {nearest_match}\n")
print(f"Distance: {result['distances'].iloc[0]}")


Question: Why does Andrew Ng believe in machine learning

Answer: Instructor (Andrew Ng) :

And there's the second algorithm:

Distance: 1.6527042388916016


In [None]:
# Interactive loop
question_counter = 0
while True:
    # Increment question counter
    question_counter += 1

    # Input question
    search_text = input("Type your question (type 'exit' to quit): ")

    # Check for exit condition
    if search_text.lower() == 'x':
        break

    # Perform the search
    result = search_faq(search_text, encoder, faq_index, faq_df)

    # Display the results
    if not result.empty:
        nearest_index = result['ann'].iloc[0]
        nearest_match = df.loc[nearest_index, 'facts']
        print(f"\n{question_counter} Question: {search_text}\n")
        print(f"Nearest Match: {nearest_match}\n")
        print(f"Distance: {result['distances'].iloc[0]}\n")
    else:
        print("No matching result found.\n")

Type your question (type 'exit' to quit): What are the three goals that are mentioned in machine learning?

1 Question: What are the three goals that are mentioned in machine learning?

Nearest Match: So in teaching this class, I sort of have three goals.

One of them is just to I hope convey some of my own excitement about machine learning to you.

Distance: 2.1442618370056152

Type your question (type 'exit' to quit): Machine learning based on the backgrounds of the TAs

2 Question: Machine learning based on the backgrounds of the TAs

Nearest Match: So you'll get to know the TAs and me much better throughout this quarter, but just from the sorts of things the TA's do, I hope you can already tell that machine learning is a highly interdisciplinary topic in which just the TAs find learning algorithms to problems in computer vision and biology and robots and language.

And machine learning is one of those things that has and is having a large impact on many applications.

Distance: 1.8