In [4]:
# Define the documents
document1 = "The quick brown fox jumped over the lazy dog ."
document2 = "The lazy dog slept in the sun ."

# Step 1: Tokenize the documents
# Convert each document to lowercase and split it into words
tokens1 = document1.lower().split()
tokens2 = document2.lower().split()

# Combine the tokens into a list of unique terms
terms = list(set(tokens1 + tokens2))

# Step 2: Build the inverted index
# Create an empty dictionary to store the inverted index
inverted_index = {}

# For each term, find the documents that contain it
for term in terms:
    documents = []
    if term in tokens1:
        documents.append("Document 1")
    if term in tokens2:
        documents.append("Document 2")
    inverted_index[term] = documents

# Step 3: Print the inverted index
print("Inverted Index:")
for term, documents in inverted_index.items():
    print(term, "->", ", ".join(documents))

# Step 4: Search Query
query = input("\nEnter your search query: ").lower()  # Get the search query from the user
query_terms = query.split()  # Split query into individual terms

# Find the documents for the query
result_docs = set()  # To store the matching documents

# Iterate over the query terms and retrieve documents
for term in query_terms:
    if term in inverted_index:
        result_docs.update(inverted_index[term])  # Add documents that contain the query term

# Step 5: Display the results
if result_docs:
    print("\nDocuments matching the query:")
    for doc in result_docs:
        print(doc)
else:
    print("\nNo documents found for the query.")


Inverted Index:
fox -> Document 1
slept -> Document 2
in -> Document 2
. -> Document 1, Document 2
quick -> Document 1
the -> Document 1, Document 2
lazy -> Document 1, Document 2
sun -> Document 2
jumped -> Document 1
over -> Document 1
dog -> Document 1, Document 2
brown -> Document 1



Enter your search query:  dog



Documents matching the query:
Document 1
Document 2


# Explanation:
1. Inverted Index Construction: The inverted index is created the same way as before, mapping each word to the documents it appears in.
2. Search Input: We prompt the user to input a search query.

      -The query is converted to lowercase and split into individual terms.
3. Search Query Processing: We search the inverted index for each query term and find which documents contain it.
4. Result Display: If any matching documents are found, they are displayed. Otherwise, a message indicates no matches were found.# 

# How it Works:
- Inverted Index: The program builds an index of words (terms) and the documents where those words appear.
- Search: The user enters a search query, and the program checks which documents contain all the words in the query.
- Results: The documents containing all the terms from the query are displayed.

In [None]:
### **Detailed Explanation of the Code**

The given program implements **document retrieval using inverted files**. Below is the step-by-step breakdown of the code along with the theory and basic explanation of all functions used.

---

### **1. Problem Statement**
**Goal:** 
Retrieve documents that contain specific search terms by constructing an **inverted index**, which maps each word (term) to the documents in which it appears. The user can input a query, and the program will output the relevant documents.

---

### **2. Theory: Inverted Index**

#### What is an Inverted Index?
An **inverted index** is a data structure used in information retrieval systems (like search engines). It maps terms (words) to the locations (documents) where they appear. This allows for efficient retrieval of documents based on a query.

For example:
- Documents: 
  - Doc 1: "The quick brown fox jumped over the lazy dog."
  - Doc 2: "The lazy dog slept in the sun."
- Inverted Index:
  ```
  "the" -> Document 1, Document 2
  "lazy" -> Document 1, Document 2
  "dog" -> Document 1, Document 2
  "fox" -> Document 1
  "sun" -> Document 2
  ```

#### Applications:
- Search engines (Google, Bing)
- Document retrieval systems
- Natural Language Processing tasks

---

### **3. Step-by-Step Code Explanation**

#### Step 1: Define Documents
```python
document1 = "The quick brown fox jumped over the lazy dog ."
document2 = "The lazy dog slept in the sun ."
```
- **Purpose**: Define two sample documents (as strings) that will be processed to build the inverted index.
- **Details**:
  - `document1` and `document2` are plain strings representing the text in two documents.

#### Step 2: Tokenization
```python
tokens1 = document1.lower().split()
tokens2 = document2.lower().split()
```
- **Purpose**: Break each document into individual words (tokens) for analysis.
- **Details**:
  - `.lower()` converts all text to lowercase, ensuring that words like "Dog" and "dog" are treated as the same.
  - `.split()` splits the string into a list of words based on spaces.
- **Example Output**:
  - `tokens1 = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog', '.']`
  - `tokens2 = ['the', 'lazy', 'dog', 'slept', 'in', 'the', 'sun', '.']`

#### Step 3: Combine Unique Terms
```python
terms = list(set(tokens1 + tokens2))
```
- **Purpose**: Create a list of unique terms across both documents.
- **Details**:
  - `tokens1 + tokens2` combines the tokens from both documents into a single list.
  - `set()` removes duplicate terms.
  - `list()` converts the set back into a list for further processing.
- **Example Output**:
  - `terms = ['quick', 'dog', 'slept', 'lazy', 'jumped', 'in', 'brown', 'fox', '.', 'over', 'the', 'sun']`

#### Step 4: Build the Inverted Index
```python
inverted_index = {}

for term in terms:
    documents = []
    if term in tokens1:
        documents.append("Document 1")
    if term in tokens2:
        documents.append("Document 2")
    inverted_index[term] = documents
```
- **Purpose**: Create a dictionary where each term maps to the documents that contain it.
- **Details**:
  - `inverted_index = {}` initializes an empty dictionary.
  - For each term:
    - Check if it appears in `tokens1` (Document 1) and/or `tokens2` (Document 2).
    - Add the document's name (`"Document 1"` or `"Document 2"`) to the `documents` list.
  - Store the list of documents for each term in the dictionary.
- **Example Output**:
  ```python
  inverted_index = {
      'quick': ['Document 1'],
      'dog': ['Document 1', 'Document 2'],
      'slept': ['Document 2'],
      'lazy': ['Document 1', 'Document 2'],
      ...
  }
  ```

#### Step 5: Print the Inverted Index
```python
for term, documents in inverted_index.items():
    print(term, "->", ", ".join(documents))
```
- **Purpose**: Display the inverted index to the user.
- **Details**:
  - `.items()` retrieves key-value pairs (term-document mappings) from the dictionary.
  - `", ".join(documents)` converts the list of documents into a comma-separated string for readability.
- **Example Output**:
  ```
  Inverted Index:
  quick -> Document 1
  dog -> Document 1, Document 2
  slept -> Document 2
  lazy -> Document 1, Document 2
  ```

#### Step 6: Search Query
```python
query = input("\nEnter your search query: ").lower()
query_terms = query.split()
```
- **Purpose**: Take a query from the user and tokenize it.
- **Details**:
  - `input()` captures the user's input.
  - `.lower()` ensures case-insensitive matching.
  - `.split()` splits the query into individual words.

#### Step 7: Retrieve Matching Documents
```python
result_docs = set()

for term in query_terms:
    if term in inverted_index:
        result_docs.update(inverted_index[term])
```
- **Purpose**: Identify which documents match the search query.
- **Details**:
  - `result_docs = set()` initializes an empty set to store unique matching documents.
  - For each term in the query:
    - Check if it exists in the inverted index.
    - If yes, add the corresponding documents to `result_docs` using `.update()`.
- **Why a set?**:
  - A set automatically eliminates duplicate documents.

#### Step 8: Display Results
```python
if result_docs:
    print("\nDocuments matching the query:")
    for doc in result_docs:
        print(doc)
else:
    print("\nNo documents found for the query.")
```
- **Purpose**: Print the list of documents that match the query or indicate no matches.

---

### **4. Functions and Constructs Used**
1. **`split()`**: Splits a string into a list of words.
2. **`lower()`**: Converts a string to lowercase.
3. **`set()`**: Removes duplicates and creates a set of unique items.
4. **`dict.items()`**: Returns key-value pairs from a dictionary.
5. **`update()`**: Adds elements to a set without creating duplicates.

---

### **5. Example Run**
#### Input Documents:
- Document 1: "The quick brown fox jumped over the lazy dog."
- Document 2: "The lazy dog slept in the sun."

#### Inverted Index:
```
quick -> Document 1
dog -> Document 1, Document 2
slept -> Document 2
lazy -> Document 1, Document 2
...
```

#### Query:
```
Enter your search query: dog lazy
```

#### Output:
```
Documents matching the query:
Document 1
Document 2
```

---

Let me know if you want further clarifications or have other problems to tackle!