You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using Weak Supervision to perform Named Entity Recognition (NER) and Offensive Language Identification (OLI) on r/Singapore, EDMW and OLID comments.
Quick start:
pip install requirements.txt in NER/OLID folder.
NER
Workflow to obtain fine-tuned NER model using weak supervision/gold labels:
To scrape comments for NER, create a config.py file in the NER folder with Reddit API info. (Refer to NER/example_config.py)
Adjust parameters from NER/NER_v7/main_config.py
NER/NER_v7/make_spacy_weak_supervision.py (pipeline to scrape and preprocess comments, apply and resolve aggregated labelling functions, serialize .spacy binary file for fine-tuning)
NER/NER_cli_train.ipynb (jupyter notebook to fine-tune model in command line using serialized file)
NER/NER_v7/evaluate.py (test performance on NER task from best model saved to disk after fine-tuning)