Information Retrieval System

Introduction

The Information Retrieval System is designed to efficiently retrieve relevant answers to user queries from a dataset of questions and answers. This project implements various text preprocessing techniques and utilizes TF-IDF vectorization for indexing and querying.

Features

Text cleaning and normalization
Tokenization and stopword removal
Stemming and lemmatization
Synonym expansion using WordNet
TF-IDF vectorization for indexing
Query processing and ranking using cosine similarity

Installation

Clone the repository: Download the project from GitHub.
Create and activate a virtual environment: Set up a virtual environment to manage dependencies.
Download NLTK data: Ensure that all required NLTK datasets are available for text processing.

Usage

Preprocess the data: Prepare the dataset by cleaning and normalizing the text.
Index the preprocessed data using TF-IDF: Create a TF-IDF matrix to index the preprocessed text.
Process and query the system: Implement the query system to process user queries and retrieve relevant answers based on the TF-IDF index.

Data Preprocessing

The preprocessing steps include:

Cleaning: Removing punctuation, HTML tags, and brackets to reduce noise.
Normalization: Converting text to lowercase, expanding abbreviations, and correcting spelling mistakes.
Tokenization: Splitting text into individual words or tokens.
Stopword Removal: Eliminating common words that do not contribute to the meaning.
Stemming and Lemmatization: Reducing words to their base or root form.
Synonym Expansion: Using WordNet to expand synonyms and enhance retrieval performance.

Evaluation

The performance of the Information Retrieval System is evaluated using metrics such as:

Precision: The fraction of relevant instances among the retrieved instances.
Recall: The fraction of relevant instances that have been retrieved over the total amount of relevant instances.
F1-Score: The harmonic mean of precision and recall.
Mean Average Precision (MAP): A measure of the quality of the retrieval process.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
apps		apps
libs		libs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval System

Table of Contents

Introduction

Features

Installation

Usage

Data Preprocessing

Evaluation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

ahmadshahal/Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval System

Table of Contents

Introduction

Features

Installation

Usage

Data Preprocessing

Evaluation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages