- Introduction
- Project Objectives
- Setup and Installation
- Running the Application
- Using the GUI
- Building and Running with Docker
- Project Setup Scripts
- License
- Contributors
- Acknowledgments
The WordSense Disambiguation (WSD) project is an advanced implementation aimed at determining the intended sense of words within a given context using Natural Language Processing (NLP) techniques. This project leverages Python's NLTK library, WordNet, and various supervised classification methods to accurately infer the meanings of words in sentences.
Utilizing the senseval-2 dataset, which is rich in annotations and tagged with parts of speech and specific word senses, this project endeavors to create a robust WSD system. The dataset includes a diverse selection of texts from sources like the British National Corpus and the Wall Street Journal, providing a challenging and varied test bed for WSD algorithms.
-
Understanding and Testing the Base Program: Comprehend and ensure the proper functioning of an existing WSD program that uses the Naive Bayes classifier for disambiguating word senses in the senseval-2 dataset.
-
Enhancing Performance Metrics: Extend the base program to include additional performance metrics such as accuracy, F1 score, precision, and recall.
-
Classifier Comparison: Implement and compare other classification algorithms like Random Forest, SVM, and Decision Trees, analyzing their performance in terms of F1 score, accuracy, precision, and recall.
-
Preprocessing Influence: Study the effect of different preprocessing steps on the performance metrics and present the findings.
-
Feature Influence: Test how different features (such as the number of features, tf-idf, and bi-grams) impact the performance of the classifiers.
-
Lesk Algorithm Implementation and Testing: Utilize NLTK's Lesk algorithm for WSD and compare its results against the Naive Bayes classifier on both training and testing datasets.
-
Expanded Lesk Algorithm: Design and implement an enhanced version of the Lesk algorithm that includes related terms extracted from WordNet, and compare its effectiveness with the standard Lesk algorithm.
-
Results Comparison and Analysis: Compare results on a sample of the Senseval2 dataset to draw conclusions on the effectiveness of the various methods employed.
-
GUI Development: Develop a simple graphical user interface (GUI) that enables users to input a target word and context for disambiguation, displaying results from both the simple and expanded Lesk algorithms.
Below is the outline of the project's directory and file structure:
wsd-senseval-lesk # Root folder of the project
│
├── app.py # The flask application entry point [task 8]
│
├── web # React web application (frontend) [task 8]
│
├── notebooks # Jupyter notebooks for all analysis and explorations
│ └── .ipynb files # Individual Jupyter notebooks for analysis [tasks 1-8]
│
├── scripts # Python scripts
│ └── .py files # Standalone Python scripts for various tasks (mostly various implementation of lesk) [task 6-7]
│
├── data # Data directory for CSV exports from our analysis and datasets
│ └── .csv files # CSV dataset files
│
├── *.sh # Customized shell scripts for easily setting up and running this project
│
├── Dockerfile # For running this project as docker container [under development]
│
└── screenshots # Screenshots of the app for documentation
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
To run this project, you'll need to install the following software:
-
Jupyter Notebook: For interactive computing and sharing of live code, equations, visualizations, and narrative text. Installation Guide
-
Node.js and npm: Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine, and npm is the package manager for JavaScript. Installation Guide for Node.js, npm is included with Node.js
-
Python 3: An interpreted, high-level, general-purpose programming language. Make sure you have Python 3.10 or later for this project. Installation Guide
-
Flask: A lightweight WSGI web application framework in Python. It is designed to make getting started quick and easy, with the ability to scale up to complex applications. Installation Guide
-
NLTK: A leading platform for building Python programs to work with human language data. Installation Guide
-
Docker: Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers. Installation Guide
Please follow the installation guides provided by the links to ensure that your environment is set up correctly. After installation, you can verify the installation of each by checking their versions using the respective command line tools.
For example, to check if you have the correct version of Python installed, run:
python3 --version
node --version
npm --version
Once all dependencies are installed, you can proceed with setting up the virtual environment and running the project as described below.
Follow these steps to get your development environment running:
- Clone the repository:
git clone https://github.com/brhn-me/wsd-senseval-lesk.git
cd wsd-senseval-lesk
- Python Virtual Environment Setup:
Ensure Python 3.10+ is installed on your system before setting up a virtual environment. Follow these simple steps to get started:
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Deactivate the environment when done using:
# deactivate
- Install required packages::
pip install -r requirements.txt
- Downloading NLTK Datasets
After setting up your Python virtual environment, you'll need to download specific NLTK datasets
required for the application. To download the stopwords
, punkt
, and wordnet
datasets, you can use the following
commands within your activated virtual environment:
python -m nltk.downloader stopwords
python -m nltk.downloader punkt
python -m nltk.downloader wordnet
- Install web (react) project dependencies:
cd web
npm install
cd ..
You can also check the README
file in /web
folder for more details instructions on how to configure the React app
for development.
To configure the Flask backend API:
(Optional) If your React app makes requests to a Flask API, update the API base URL in your React app's .env file or wherever you store your configuration:
REACT_APP_API_BASE_URL=http://localhost:5000/api
Set environment variables needed for Flask.
To run the project locally:
- Start the React development server:
cd web
npm start
This command will start the React application and open it in your default web browser.
- To run the Flask server, navigate to the server directory and run:
flask run
# or
python3 app.py
Your Flask API should now be serving requests at http://localhost:5000/api
.
To create a production build of the React app, run:
cd web
npm run build
cd ..
This command will create a build directory with all the production files.
To serve the production build files with Flask, ensure the Flask server is configured to serve the static files from the build directory:
from flask import Flask, send_from_directory
app = Flask(__name__, static_folder='../deploy/web')
# ... existing Flask routes
if __name__ == '__main__':
app.run(use_reloader=True, port=5000, threaded=True)
After configuration, run your Flask app, and it will serve your React app at the root endpoint.
This section covers the setup and deployment of the Word Sense Disambiguation GUI using Docker.
- Docker installed on your machine. Installation guides for Docker can be found here.
To build the Docker image for the application, navigate to the root directory of the project where the Dockerfile
is
located and run the following command:
docker build -t wsd-app-image .
To streamline the setup and execution of the web application and its backend, we have included several shell scripts within the repository. These scripts simplify the processes of environment setup, dependency installation, and project building. Below is an overview of each script and instructions on how to use them:
This script is responsible for creating a Python virtual environment using Python 3.10, upgrading pip to the latest
version, and installing all the required dependencies listed in requirements.txt
. It also downloads necessary NLTK
data for the project.
To use the script, simply run:
./setup_venv.sh
Make sure to give executable permissions to the script before running it:
chmod +x setup_venv.sh
The setup_web.sh
script is for setting up the React application located in the /web
directory. It installs all
necessary npm dependencies.
Run the following command in the terminal:
./setup_web.sh
Ensure the script is executable:
chmod +x setup_web.sh
This script will navigate to the /web
directory, install npm dependencies if not already installed, and then build the
React application. After building, it will copy the build artifacts to a deploy/web
directory.
Execute the script with:
./build_web.sh
Again, don't forget to set the script as executable:
chmod +x build_web.sh
- Before running any of the scripts, ensure that you have Python 3.10 and npm installed on your machine.
- You may need to activate the virtual environment created by
setup_venv.sh
script manually usingsource venv/bin/activate
(for Unix-like OS) or.\venv\Scripts\activate
(for Windows) to run Python applications within the virtual environment. - The scripts assume the presence of
requirements.txt
andpackage.json
in their respective directories for dependency management. - Ensure you are in the root directory of the project when running these scripts.
If you encounter any permissions issues, you may need to run the scripts with sudo
on Unix-like systems or as an
administrator on Windows.