# Grammar Refresher and Rule-Based Models for NLP

Date: 2023-03-06  
Author: Jason Beach  
Categories: Introduction_Tutorial, Data_Science  
Tags: nlp, grammar, rule-based_models, spacy

<!--eofm-->

Grammar and rule-based models are some of the most neglected areas of study for NLP practicitioners.  Most programmers are taking the FANG approach to NLP: more data and bigger models.  This is prohibitively expensive on many projects, and can be much less effective when you have a good understanding and scope in a classification problem.  In this post, we will provide a grammar refresher and see how it corresponds to python's SpaCy module.

## Configure Environment

Lets get some tools and a sample of sentences that we can use, later.

In [3]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [5]:
import nltk

In [33]:
import re

In [40]:
#nltk.download('popular')
nltk.corpus.gutenberg.fileids()[:3]

['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt']

In [34]:
raw = nltk.corpus.gutenberg.raw('melville-moby_dick.txt')
spaces = raw.replace('\n\n',' ').replace('\n',' ').replace('\r',' ')

In [35]:
_RE_COMBINE_WHITESPACE = re.compile(r"\s+")
corpus = _RE_COMBINE_WHITESPACE.sub(" ", spaces).strip()

In [42]:
lcorp = len(corpus)
print(f"corpus' characters: {format(lcorp,',d')}")

corpus' characters: 1,211,073


## TODO

Using examples from 'The Little Brown Handbook' and 'Linguistics Useful for NLP'.

Modules used:
    * [spacy-pattern-builder]()
    * [spacy role pattern]()
    * [styled text]()
    
    
Blog references:
    * [neural coreferences](https://explosion.ai/blog/coref)
    * [applied nlp thinking](https://explosion.ai/blog/applied-nlp-thinking)
    * [statistical nlp](https://explosion.ai/blog/eli5-computers-learn-reading)
    * [basic pos tagger](https://explosion.ai/blog/part-of-speech-pos-tagger-in-python)
    

In [41]:
sents = ["Art can be controversial.",
         "It has caused disputes in Congress and in artists' studios.",
         "The earth trembled.",
         "The earthquake destroyed the city.",
         "The result was chaos.",
         "The government sent the city aid.",
         "The citizens considered teh earthquake a disaster."
        ]
