Stars
Minimal Paraphrase Pairs are meaning-preserving paraphrases with a controlled and minimal change
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).
Easy to use and understand multiple-choice question generation algorithm using T5 Transformers.
[EACL'21] Non-Autoregressive with Pretrained Language Model
An open-source NLP research library, built on PyTorch.
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Pytorch implementation of Highly Parallel Autoregressive Entity Linking with Discriminative Correction
Code for the ALiBi method for transformer language models (ICLR 2022)
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
A standard framework for modelling Deep Learning Models for tabular data
Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).
[NAACL 2021] QAGNN: Question Answering using Language Models and Knowledge Graphs 🤖
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
PyTorch Tutorial for Deep Learning Researchers
PyTorch implementation of the U-Net for image semantic segmentation with high quality images
Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering
Pairwise model for commonlit competition
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Collection of papers and resources for data augmentation for NLP.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.