# Document Retrieval System using Inverted Files

## Problem Statement
Implement a program for retrieval of documents using inverted files.

### Background

- **Document Data**: We have a collection of text documents, each represented as a string. In this specific problem, two sample documents are given:

  - Document 1: "The quick brown fox jumped over the lazy dog."
  - Document 2: "The lazy dog slept in the sun."

### Problem Description

The goal of this problem is to develop a document retrieval system using inverted files, which consists of the following steps:

1. **Tokenization**: Tokenize the text documents into individual terms (words) and convert them to lowercase. Create a list of unique terms by merging the terms from all documents.

2. **Inverted Index Construction**: Build an inverted index data structure that maps terms to the documents in which they appear. For each term, identify the documents containing that term.

3. **Print Inverted Index**: Display the inverted index, showing each term and the list of documents in which it appears.

### Input

The input consists of two text documents (document1 and document2), each represented as a string.

### Output

The program produces an inverted index, where each term is associated with the documents that contain it.

In [1]:
document1 = "The quick brown fox jumped over the lazy dog."
document2 = "The lazy dog slept in the sun."

### Tokenizing the documents

In [2]:
tokens1 = document1.lower().split()
tokens2 = document2.lower().split()

terms = list(set(tokens1 + tokens2))

### Building the inverted index

In [3]:
inverted_index = {}

for term in terms:
    documents = []
    
    if term in tokens1:
        documents.append("Document 1")
    if term in tokens2:
        documents.append("Document 2")
    
    inverted_index[term] = documents

###  Printing the inverted index

In [4]:
for term, documents in inverted_index.items():
    print(term, "->", ", ".join(documents))

lazy -> Document 1, Document 2
in -> Document 2
dog -> Document 2
quick -> Document 1
over -> Document 1
fox -> Document 1
slept -> Document 2
sun. -> Document 2
dog. -> Document 1
the -> Document 1, Document 2
jumped -> Document 1
brown -> Document 1
