Collaborators: Emilio Munoz, Tiffany Nguyen, Antonio Flores, Jocelin Macias, Pierre Martinez
The supersearch program is a custom search engine built to index and query a large collection of financial news articles. It uses an AVL tree-based inverted index to handle documents and metadata efficiently.
- Interactive User Interface: Create indices, perform queries, and manage indices.
- Advanced Queries: Supports operators like
ORG:andPERSON:for filtering results based on metadata. - Relevance Ranking: Uses term frequency-inverse document frequency (tf-idf) and metadata for ranking results.
- Persistence: Saves indices to disk for reuse across sessions.
Ensure the following dependencies are installed:
- C++ Compiler: GCC or Clang (supporting C++17 or later).
- JSON Parser: The project uses RapidJSON, which is included in the template.
- Download the dataset: US Financial News Articles.
- The dataset contains over 300,000 articles in JSON format. Use the dataset for indexing and testing the program.
-
Locate the Executable
Thesupersearchexecutable is precompiled and located in thebuilddirectory. -
Navigate to the Build Directory
Open a terminal and run:cd /path/to/Project4wvdatastructures\ copy/build
-
Run the program Execute the supersearch program: ./supersearch
Index Creation Create an index for a directory containing JSON files: ./supersearch index <directory_with_json_files>
Perform a Query Search the indexed data using specific terms: ./supersearch query "search terms"
Launch the User Interface Start an interactive menu for creating indices, running queries, and managing data: ./supersearch ui