Skip to content

This project creates an index like a book's index in that it contains list of words and corresponding page numbers on which these words are present in a sorted order. AVL Tress and Tries have been used as data structures.

Notifications You must be signed in to change notification settings

RoSh-23/Index-Creator

Repository files navigation

Index-Creator 📙

Note 🚩 This project has been made as a practice project to showcase DS skills. It has not gone through rigorus testing process and should not be used in production.

  • This project creates an index that is similar to a text book index 📑 in that it contains terms (words) and corresponding page numbers on which these terms are present. 📖
  • It uses a text and a list of terms that need to be indexed to produce an alphabetical index 🔠.
  • This project was inspired by problem 5.2.11 from the book 📘 "Data Structures using C and C++, 2nd edition by Langsam, Augenstein, and Tenenbaum.
  • I've used and implemented 🌴AVL trees and Tries🌳 as my data structures.
    Note 🚩 I have referred to the book "Data Structures using C & C++ , 2nd ed." to write code for AVL Trees and referred to internet resources to write code for Tries.
  • To create this project, I drew on concepts from both generic programming and object-oriented programming.
  • The process for creating the index is as follows ⤵️
    1. All special characters from the text, including $,%,!, and others, were removed using simple tests of ASCII values.
    2. All stop words like and, or, not, why, if, … were eliminated. While traversing each word from the file acquired in the output of step one, if a word was found in a trie of stop words, it was discarded.
    3. The output file from phase 2 is the file being processed in this step. Every word is scanned, and if it is discovered in a trie of terms, it is added to one AVL tree, and the its page number is added to another. The AVL tree of terms and page numbers are connected.
    4. The resultant AVL trees are traversed in-order to produce the index.

Usage 🛠️


Note 🚩 - GCC compiler for C++ i.e. g++ should be downloaded on your machine.

Note 🚩 - Works fine only if words to be indexed are expressed as lowercase alphabets (no special symbols)

  • On Windows
    1. Save the source code for this project & compile it using g++.
    2. Open the command prompt and type
      name_of_executable_file inpt_file_1.txt inpt_file_2.txt output_file_1.txt
  • On Linux
    1. Save the source code for this project & compile it using g++.
    2. Open the linux terminal and type
      ./program_name inpt_file_1.txt inpt_file_2.txt output_file_1.txt
  • Where
    1. inpt_file_1.txt: contains all the words which need to be indexed from the text. Each term on a new line.
    2. inpt_file_2.txt: contains the text on which index has to be created with each new page delimited by 10 '@' symbols.
    3. output_file_1.txt: file where the resultant index will be saved.

Example

  • I have run this program on the book "Software Engineering, 10th edition" written by Ian Somerville.
  • The text.txt file in this repository contains the same book in text format.
  • The terms.txt file in this repository contains the list of words on which I have created the index.
  • The opt.txt file in this repository contains the index which is alphabetically sorted.
  • Some Screenshots: ⤵️ image of terms to be indexed image of text on which index is created image of output index image of book section for verification

About

This project creates an index like a book's index in that it contains list of words and corresponding page numbers on which these words are present in a sorted order. AVL Tress and Tries have been used as data structures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages