Skip to content

Gitlio11/project4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search Engine Project

Collaborators: Emilio Munoz, Tiffany Nguyen, Antonio Flores, Jocelin Macias, Pierre Martinez

Overview

The supersearch program is a custom search engine built to index and query a large collection of financial news articles. It uses an AVL tree-based inverted index to handle documents and metadata efficiently.

Key Features

  • Interactive User Interface: Create indices, perform queries, and manage indices.
  • Advanced Queries: Supports operators like ORG: and PERSON: for filtering results based on metadata.
  • Relevance Ranking: Uses term frequency-inverse document frequency (tf-idf) and metadata for ranking results.
  • Persistence: Saves indices to disk for reuse across sessions.

Installation and Setup

Prerequisites

Ensure the following dependencies are installed:

  • C++ Compiler: GCC or Clang (supporting C++17 or later).
  • JSON Parser: The project uses RapidJSON, which is included in the template.

Dataset

  • Download the dataset: US Financial News Articles.
  • The dataset contains over 300,000 articles in JSON format. Use the dataset for indexing and testing the program.

How to Run

  1. Locate the Executable
    The supersearch executable is precompiled and located in the build directory.

  2. Navigate to the Build Directory
    Open a terminal and run:

    cd /path/to/Project4wvdatastructures\ copy/build
    
  3. Run the program Execute the supersearch program: ./supersearch


Commands and Arguements

Index Creation Create an index for a directory containing JSON files: ./supersearch index <directory_with_json_files>

Perform a Query Search the indexed data using specific terms: ./supersearch query "search terms"

Launch the User Interface Start an interactive menu for creating indices, running queries, and managing data: ./supersearch ui

About

Project 4 for Data Structures (Search Engine)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5