In [1]:
!pip install langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
!pip install chromadb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
!pip install tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


<h2>Question Answering over Docs</h2>

In [5]:
import pprint

In [6]:
files_txt = [f"file_{i}.txt" for i in range(6)]
with open('files_contat.txt', 'w') as outfile:
    for file_txt in files_txt:
        with open(file_txt) as infile:
            outfile.write(infile.read())

In [7]:
from langchain.document_loaders import TextLoader
loader = TextLoader('files_contat.txt')

In [8]:
import os
import openai

openai_key = "#####"
openai.api_key = (openai_key)
os.environ['OPENAI_API_KEY'] = openai_key

In [9]:
from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator().from_loaders([loader])

<b>Guided queries - general</b>

In [10]:
query = """
The following is an ML engineer interview. Interviewee should ask 3 the most \
wide-spread technical questions about machine learning.

Interviewee: Hello, my name is Artem and I am a software engineer.
Interviewer:
"""
generated_text = index.query(query)
pprint.pprint(generated_text)

('\n'
 '\n'
 '1. What is the difference between supervised and unsupervised learning?\n'
 '2. What is the purpose of a neural network?\n'
 '3. What is the difference between deep learning and traditional machine '
 'learning?')


In [11]:
questions = generated_text
query =f"""
The following is an ML engineer interview. Enterviewee asked 3 the most \
wide-spread technical questions about machine learning. Interviewer should \
answer all the questions - {questions} - 25 words maximum for each answer.

Interviewee: {questions}
Interviewer:
"""

generated_text = index.query(query)
pprint.pprint(generated_text)

('\n'
 '1. Supervised learning uses labeled data to train a model, while '
 'unsupervised learning uses unlabeled data to identify patterns and '
 'clusters.\n'
 '2. Neural networks are used to identify patterns and relationships in data, '
 'and can be used for classification and regression tasks.\n'
 '3. Deep learning is a subset of machine learning that uses multiple layers '
 'of neural networks to learn from data, while traditional machine learning '
 'uses simpler algorithms.')


<b>Guided queries - with sources</b>

In [12]:
query = """
The following is an ML engineer interview. Interviewee should ask 3 \
questions about machine learning.

Interviewee: Hello, my name is Artem and I am a software engineer.
Interviewer:
"""
generated_text = index.query_with_sources(query)
pprint.pprint(generated_text)

{'answer': '\n'
           'Questions about machine learning could include:\n'
           '1. What are the different types of machine learning?\n'
           '2. What is bias and variance in machine learning?\n'
           '3. What is clustering in machine learning?\n',
 'question': '\n'
             'The following is an ML engineer interview. Interviewee should '
             'ask 3 questions about machine learning.\n'
             '\n'
             'Interviewee: Hello, my name is Artem and I am a software '
             'engineer.\n'
             'Interviewer:\n',
 'sources': 'files_contat.txt'}


In [13]:
questions = generated_text['answer']
query =f"""
The following is an ML engineer interview. Enterviewee asked 3 the most \
wide-spread technical questions about machine learning. Interviewer should \
answer all the questions - {questions} - 25 words maximum for each answer.

Interviewee: {questions}
Interviewer:
"""

generated_text = index.query_with_sources(query)
pprint.pprint(generated_text)

{'answer': '\n'
           '1. Supervised, Unsupervised, and Reinforcement Learning.\n'
           '2. Bias is the difference between the expected and actual values '
           "of the model. Variance is the measure of how the model's "
           'predictions vary for different data points.\n'
           '3. Clustering is an unsupervised learning technique that groups '
           'data points into clusters based on their similarity.\n',
 'question': '\n'
             'The following is an ML engineer interview. Enterviewee asked 3 '
             'the most wide-spread technical questions about machine learning. '
             'Interviewer should answer all the questions - \n'
             'Questions about machine learning could include:\n'
             '1. What are the different types of machine learning?\n'
             '2. What is bias and variance in machine learning?\n'
             '3. What is clustering in machine learning?\n'
             ' - 25 words maximum for each answer.\n

END

References:<br>
https://pieriantraining.com/the-complete-guide-for-using-the-openai-python-api/<br>
https://python.langchain.com/en/latest/use_cases/question_answering.html<br>