List of projects related to Natural Language Processing (NLP) that make a geek smile for they exist
Switch branches/tags
Nothing to show
Clone or download
costezki Merge pull request #1 from brandon93s/patch-1
Update broken link to ``
Latest commit 72e17ed May 7, 2018

Awesome NLP Projects


This is a curated list of projects directly connected or useful for Natural Language Processing (NLP) which make a geek smile for they exist. Inspired by Joseph Misiti's github project

Related Lists:


Periodic tables

Cheat Sheets

Resources and Frameworks

  • TRIPS . Semantic Lexicon link . semantic parser link . [link] (
  • C&C Boxer . semantic parser link
  • EPILOG . episodic logic framework link
  • KNEXT (the continuation of the Lore project) . knowledge extraction into episodic logic (similar to babelnet) link
  • FRED . semantic parser/knowledge extractor link . link2 . related tools link3
  • LEGALO is a novel Open Knowledge Extraction approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable information. link
  • DELPH-IN . Broader project for NLP; grammar, parser, link
  • LKB . The LKB system is a grammar and lexicon development environment for use with unification-based linguistic formalisms. link
  • Malt parser . dependecy syntax parser link
  • YAGO . knowledge base link
  • GATE . text engineering pipeline link
  • Enju . syntactic parser link
  • Open NLP . NLP framework in Java link
  • CoreNLP . stanford core NLP framework for parsing link
  • NLTK . awersome NLP framework in Python link
  • PyNLPL . Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. link
  • Valex . Categorization of English Verbs link
  • Unified verb Index . VerbNet and FrameNEt togetehr link
  • scikit-learn . Machine learning in Python. Simple and efficient tools for data mining and data analysis link
  • Tuffy . Scalable Markov Logic inference engine link
  • Fexlix . the successor of Tuffy link
  • Alchemy . algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation link
  • pracmln . Marcov Logic in Python, this project started as a fork to ProbCong project. Find more link
  • ProbCog . ProbCog is a statistical relational learning and reasoning system that supports efficient learning and inference in relational domains link
  • KReator . KReator is an integrated development environment (IDE) for relational probabilistic knowledge representation languages. At the moment, KReator supports Bayesian Logic Programs (BLPs), Markov Logic Networks (MLNs), Relational Maximum Entropy (RME), Relational Bayesian Networks (RBN), and Probabilistic Prolog (ProbLog). link
  • pyHTM . pyHTM - Hierarchical Temporal Memory in Python; . Our machine intelligence technology is called Hierarchical Temporal Memory (HTM), which is a detailed computational theory of the neocortex. At the core of HTM are time-based learning algorithms that store and recall spatial and temporal patterns. HTM is well suited to a wide variety of problems, particularly those with the following characteristics: . Streaming data rather than static databases . Underlying patterns in the data change over time . Many individual data sources where hand crafting separate models is impractical . Subtle patterns that can’t always be seen by humans . Time-based patterns . Simple techniques such as thresholds yield substantial false positives and false negatives . [link] (
  • KnowRob . KnowRob is a knowledge processing system that combines knowledge representation and reasoning methods with techniques for acquiring knowledge and for grounding the knowledge in a physical system and can serve as a common semantic framework for integrating information from different sources. KnowRob combines static encyclopedic knowledge, common-sense knowledge, task descriptions, environment models, object information and information about observed actions that has been acquired from various sources (manually axiomatized, derived from observations, or imported from the web). It supports different deterministic and probabilistic reasoning mechanisms, clustering, classification and segmentation methods, and includes query interfaces as well as visualization tools. link
  • GHMM . The General Hidden Markov Model library (GHMM) is a freely available C library implementing efficient data structures and algorithms for basic and extended HMMs with discrete and continous emissions. It comes with Python wrappers which provide a much nicer interface and added functionality. link . pyHSMM "This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations. There are also some extensions: . autoregressive models . switching linear dynamical systems . factorial models . link
  • Prism . symbolic-statistical models; a model checker for temporal logic and quantitative extensions; verification for realtime systems; markov models etc. . link
  • UBY . A Large-Scale Unified Lexical-Semantic Resource link
  • Duckling . probabilistic CFG parser for dimensions (time, temperature, size etc) link
  • SLING - A natural language frame semantics parser . semantic parser implemented using deep Recurrent Neural Network link
  • Wit . intent parser link
  • Mycroft . a company making another intent parser, a Speech2Text and a Text2Speech frameworks in Python link
  • IEPY . IEPY is an open source tool for Information Extraction focused on Relation
  • MITIE . This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors. link
  • SyntaxNet . an open-source neural network framework for TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. Our release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface, an English parser that we have trained for you, and that you can use to analyze English text. link
  • OpenAI Gym . A toolkit for developing and comparing reinforcement learning algorithms. link
  • spiff workflow . Spiff Workflow is a library implementing a framework for workflows. It is based on and implemented in pure Python. link [Workflow patterns] ( The aim of this initiative is to provide a conceptual basis for process technology. In particular, the research provides a thorough examination of the various perspectives (control flow, data, resource, and exception handling) that need to be supported by a workflow language or a business process modelling language.
  • A news reader project . link
  • word sense disambiguation toolkit in python using word2vec (contains datasets too) link
  • annotated document server for FOLIA format link
  • toolkit useful for working with corpus annotations in FOLIA and other formats (compare to Dan's corpkit) link
  • vaderSentiment Sentiment analysis tool for Python link
  • Wowpal Wabbit - a reinformecement learning setup using structured prediction technique link . Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. link
  • TiMBL - an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. link link1 link3
  • PIKES - knowledge extraction suite link link
  • RDFPro - the swiss knowfe ro RDF manipulation, stream base RDF processing link
  • spacy - Industrial-strength Natural Language Processing (NLP) with Python and Cython link
  • textacy - Higher level NLP built on spaCy link
  • Ukb - graph-based WSD and similarity link
  • marseille - Mining Argument Structures with Expressive Inference (Linear and LSTM Engines) link
  • Fluid construction grammar - link
  • Python cognitive modelling suite - link
  • Rasa - Natural language understanding link
  • SenticNet - Talking about SenticNet is talking about concept-level sentiment analysis, that is, performing tasks such as polarity detection and emotion recognition by leveraging on semantics and linguistics in stead of solely relying on word co-occurrence frequencies. link link

Deep Learning goodies

  • Neural Storry teller code
  • open type entity recognition system code


  • BabelNet - Multilingual Enciclopedic Dictionnary link
  • Nasari - semantic vector representation for BabelNet link

Language modelling

  • Adaptive Skip-gram implementation in Julia link
  • Skip Sentence encoder code, paper
  • Attentive reader code, paper
  • GenSim - topic modelling library for python, also includes a vord2vec implementation link
  • vord2vec - Original C implementation and some precomputed resources link
  • Skip Sentence encoder code, paper
  • Attentive reader code, paper
  • FastText - Faster, better text classification, Library for fast text representation and classification. link
  • InferSent - Sentence embeddings (InferSent) and training code for NLI link

Other ML

  • deep learning platform MxNet + NuymPy code

Other Cool stuff

  • Node Box . NodeBox makes it easy to do data visualisations, generative design and complex production challenges. link
  • Callimacus - Linked open data, RDF, web application, data visualization etc. link
  • Feature Forge . This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm). [link][]
  • Storry generator algorithms . [link][]
  • OpenCog AI framework OpenCog is an open-source software project aimed at directly confronting the Artificial General Intelligence (AGI) challenge, using mathematical and biological inspiration and professional software engineering techniques. . link
  • FoLiA Linguistic Annotation Tool link
  • WebAnno - a linguistic annotation tool link
  • Visdom - A flexible tool for creating, organizing, and sharing visualizations of live, rich data. link

Dialogue frameworks

  • Chat script - Natural Language tool/dialog manager - link1, link2,
  • Chatter bot - ChatterBot is a Python library that makes it easy to generate automated responses to a user’s input. ChatterBot uses a selection of machine learning algorithms to produce different types of responses.
  • RiveScript - RiveScript is a simple scripting language for chatbots with a friendly, easy to learn syntax. Create your own chatbot in Go, Java, JavaScript, Perl or Python.
  • SuperScript - A dialog system and bot engine for conversational UI's.
  • BotKit - Botkit is designed to ease the process of designing and running useful, creative bots that live inside messaging platforms.

Similar lists

  • awesome nlp
  • awesome dl nlp
  • Rochester University project list . potentially useful links . link
  • Misiti's list . link
  • Description Logic reasoners . list of reasoners link
  • Illinois Projects List . of software from Illinois Cognitive Computation Group link


Contributions welcome! Read the contribution guidelines first.



To the extent possible under law, Eugeniu Costezki has waived all copyright and related or neighboring rights to this work.