<a href="https://colab.research.google.com/github/Neermalsha/poverty-of-Nepal-/blob/main/Boolean_retrieval_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Import necessary libraries
import re
from collections import defaultdict

In [2]:
# Step 1: Document Insertion
documents = {
    1: "Information retrieval is the process of obtaining information from large document collections.",
    2: "Boolean retrieval uses AND, OR, and NOT operators to find relevant documents.",
    3: "The inverted index is a key component of the information retrieval system.",
    4: "Indexing helps retrieve documents quickly by mapping terms to document IDs.",
    5: "Retrieval systems improve the efficiency of search engines."
}

In [3]:
# Step 2: Tokenization and Preprocessing
def preprocess(text):
    # Convert text to lowercase and remove punctuation
    text = text.lower()
    text = re.sub(r'\W+', ' ', text)
    return text.split()

In [4]:
# Step 3: Building the Dictionary and Inverted Index
def build_inverted_index(documents):
    inverted_index = defaultdict(list)
    for doc_id, text in documents.items():
        tokens = preprocess(text)
        for token in tokens:
            if doc_id not in inverted_index[token]:
                inverted_index[token].append(doc_id)
    return inverted_index



In [9]:
# Step 4: Boolean Retrieval
def boolean_query(inverted_index, query):
    query_terms = query.lower().split()
    if 'and' in query_terms:
        return set(inverted_index[query_terms[0]]) & set(inverted_index[query_terms[2]])
    elif 'or' in query_terms:
        return set(inverted_index[query_terms[0]]) | set(inverted_index[query_terms[2]])
    elif 'not' in query_terms:
        return set(inverted_index[query_terms[0]]) - set(inverted_index[query_terms[2]])
    else:
        return inverted_index[query_terms[0]]

In [10]:
# Step 5: Testing the system
def parse_complex_query(inverted_index, query):
    query = query.lower()
    # Handle parentheses and operator precedence (basic implementation)
    if 'and' in query:
        terms = query.split(' and ')
        return set(inverted_index[terms[0]]) & set(inverted_index[terms[1]])
    elif 'or' in query:
        terms = query.split(' or ')
        return set(inverted_index[terms[0]]) | set(inverted_index[terms[1]])
    elif 'not' in query:
        terms = query.split(' not ')
        return set(inverted_index[terms[0]]) - set(inverted_index[terms[1]])
    else:
        return inverted_index[query]


In [20]:
# Step 6: Building Inverted Index and Dictionary
inverted_index = build_inverted_index(documents)

# Print the Inverted Index (term: [doc_ids]) line by line
print("Inverted Index:")
for term, doc_ids in inverted_index.items():
    print(f"{term}: {doc_ids}")

# Print the Dictionary (Unique Words) line by line
print("\nDictionary (Unique Words):")
for word in inverted_index:
    print(word)

Inverted Index:
information: [1, 3]
retrieval: [1, 2, 3, 5]
is: [1, 3]
the: [1, 3, 5]
process: [1]
of: [1, 3, 5]
obtaining: [1]
from: [1]
large: [1]
document: [1, 4]
collections: [1]
boolean: [2]
uses: [2]
and: [2]
or: [2]
not: [2]
operators: [2]
to: [2, 4]
find: [2]
relevant: [2]
documents: [2, 4]
inverted: [3]
index: [3]
a: [3]
key: [3]
component: [3]
system: [3]
indexing: [4]
helps: [4]
retrieve: [4]
quickly: [4]
by: [4]
mapping: [4]
terms: [4]
ids: [4]
systems: [5]
improve: [5]
efficiency: [5]
search: [5]
engines: [5]

Dictionary (Unique Words):
information
retrieval
is
the
process
of
obtaining
from
large
document
collections
boolean
uses
and
or
not
operators
to
find
relevant
documents
inverted
index
a
key
component
system
indexing
helps
retrieve
quickly
by
mapping
terms
ids
systems
improve
efficiency
search
engines


In [13]:
# Step 7: Testing Simple Boolean Queries
query_result = boolean_query(inverted_index, "retrieval AND system")
print("Query Result (retrieval AND system):", query_result)

Query Result (retrieval AND system): {3}


In [14]:
# Step 8: Testing Complex Boolean Queries
complex_query_result = parse_complex_query(inverted_index, "retrieval AND system OR document")
print("Complex Query Result (retrieval AND system OR document):", complex_query_result)

Complex Query Result (retrieval AND system OR document): set()


In [16]:
# Step 9: User Input for Queries (Optional for Interactive Use)
user_query = input("Enter a Boolean query (AND, OR, NOT): ")
user_query_result = boolean_query(inverted_index, user_query)
print(f"User Query Result ({user_query}):", user_query_result)

Enter a Boolean query (AND, OR, NOT): information
User Query Result (information): [1, 3]
