## Description :

This Python application implements a simple Question-Answering (QA) system based on the principles of Retrieval-Augmented Generation (RAG). It reads unstructured text from a .txt file and returns relevant answers based on user questions.

###### The system:

Reads from a knowledge base in .txt format.

Uses TF-IDF to vectorize both content and questions.

Computes cosine similarity to identify the most relevant part of the text.

Returns the best-matching chunk as the answer.



### Installing and Importing the necessary libraries

In [1]:
pip install nltk scikit-learn

Note: you may need to restart the kernel to use updated packages.


In [2]:
import nltk
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Download NLTK tokenizer
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\musad\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [3]:
# created a function named 'load_text' to read the file and stored the readed file into a variable named 'text'
def load_text(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        text = f.read()
    # Split into sentences or paragraphs # Download NLTK tokenizer
    chunks = nltk.sent_tokenize(text)
    return chunks
 # successfully loaded  and read the file into var  'text'. see the loaded file below   
text = load_text('python1.txt')
print(text)


['\n\nPython is a high-level, interpreted programming language known for its simplicity and readability.', 'It was created by Guido van Rossum and first released in 1991.', 'Key Features of Python: Easy to Read & Write Syntax is clean and similar to English, Interpreted Language Python code is executed line-by-line which makes debugging easier,\ndynamically Typed You don’t need to declare data types explicitly, Object-Oriented & Functional Supports multiple programming paradigms,\nLarge Standard Library Comes with many modules and packages (like math, datetime, os, etc.', '), Cross-Platform Works on Windows, Mac, and Linux.', 'Common Uses of Python: Web Development (with frameworks like Django, Flask), Data Science & Machine Learning (using NumPy, pandas, scikit-learn), Automation & Scripting, Game Development,\nAPIs and Backend Services, Artificial Intelligence & Deep Learning.']


In [4]:
# created a function named preprocess to normalize and clean text
'''text.lower()
Converts all characters in the input text to lowercase.

This helps ensure case-insensitive matching (e.g., "Python" and "python" are treated the same).

translate(str.maketrans('', '', string.punctuation))
string.punctuation contains all common punctuation characters:
!"#$%&'()*+,-./:;<=>?@[\]^_{|}~

str.maketrans('', '', string.punctuation) creates a translation table that tells Python to remove all punctuation.'''

def preprocess(text):

    return text.lower().translate(str.maketrans('', '', string.punctuation))


In [5]:
#created a function that takes: question as a string (user's input),chunks as a list of text chunks (e.g., sentences or paragraphs from the .txt file)
def retrieve_answer(question, chunks):
    
#It preprocesses all the text chunks (lowercasing + punctuation removal).Returns a cleaned list of strings.
    processed_chunks = [preprocess(chunk) for chunk in chunks]
    
#below line  preprocesses the user’s question using the same method
    processed_question = preprocess(question)

#converted each chunk of text (like sentences or paragraphs) into a vector using technique 'TF-IDF' 
#Initializes a TF-IDF vectorizer.TF-IDF = Term Frequency - Inverse Document Frequency, used to represent text as numeric vectors.    
    vectorizer = TfidfVectorizer()

#Combines all preprocessed chunks and  question into one list.Vectorizes them all at once.The last vector corresponds to the question.
    vectors = vectorizer.fit_transform(processed_chunks + [processed_question])

#Computes cosine similarity between: vectors[-1] → the question vectorvectors[:-1] → all the chunksReturns an array of similarity scores.
    similarity = cosine_similarity(vectors[-1], vectors[:-1])

#Finds the index of the most similar chunk,argmax() returns the position of the highest similarity score.
    best_match_index = similarity.argmax()
    
#Returns the original (non-preprocessed) chunk that best matches the question. 
    return chunks[best_match_index]

In [6]:
#The main() function starts the QA system and handles user interaction in a loop:
def main():
    
#Loads and splits the content of the file python1.txt into text chunks (sentences or paragraphs).
    chunks = load_text("python1.txt")
    
#Displays an instruction to the user about how to use the system.
    print("Question and Answer System Initialized. Type 'exit' to quit.")
    
#Continuously prompts the user for a question and Exits the loop if the user types 'exit'
    while True:
        question = input("\nAsk a question: ")
        if question.lower() == 'exit':
            break
            
#Calls the retrieve_answer() function to get the most relevant answer from the text chunks and Prints the answer to the user.
        answer = retrieve_answer(question, chunks)
        print(f"\nAnswer: {answer}")
        
#this ensures the main() function runs only when the script is executed directly (not imported).
if __name__ == "__main__":
    main()

Question and Answer System Initialized. Type 'exit' to quit.



Ask a question:  what is python?



Answer: 

Python is a high-level, interpreted programming language known for its simplicity and readability.



Ask a question:  who created python?



Answer: It was created by Guido van Rossum and first released in 1991.



Ask a question:  what are the uses of python?



Answer: Common Uses of Python: Web Development (with frameworks like Django, Flask), Data Science & Machine Learning (using NumPy, pandas, scikit-learn), Automation & Scripting, Game Development,
APIs and Backend Services, Artificial Intelligence & Deep Learning.



Ask a question:  what are the advantages of python?



Answer: Key Features of Python: Easy to Read & Write Syntax is clean and similar to English, Interpreted Language Python code is executed line-by-line which makes debugging easier,
dynamically Typed You don’t need to declare data types explicitly, Object-Oriented & Functional Supports multiple programming paradigms,
Large Standard Library Comes with many modules and packages (like math, datetime, os, etc.



Ask a question:  exot



Answer: 

Python is a high-level, interpreted programming language known for its simplicity and readability.



Ask a question:  exit


![Architecture Diagram](C:\Users\musad\Desktop\architecture.jpg)