# Grammatical Chunk Matching with NLU
With the chunker you can filter a data set based on Part of Speech Tags with Regex patterns.    
 
I.e. You could get all nouns or adjectives in your datset with the following parameterization.
```
pipe['default_chunker'].setRegexParsers(['<NN>+', '<JJ>+'])
```

See [here](https://www.rexegg.com/regex-quickstart.html)  for a great reference of Regex operators

## Overview of all Part of Speech Tags : 

|Tag |Description |
|------|------------
|CC| 	Coordinating conjunction |
|CD| 	Cardinal number |
|DT| 	Determiner |
|EX| 	Existential there |
|FW| 	Foreign word |
|IN| 	Preposition or subordinating conjunction |
|JJ| 	Adjective |
|JJR| 	Adjective, comparative |
|JJS| 	Adjective, superlative |
|LS| 	List item marker |
|MD| 	Modal |
|NN| 	Noun, singular or mass |
|NNS| 	Noun, plural |
|NNP| 	Proper noun, singular |
|NNPS| 	Proper noun, plural |
|PDT| 	Predeterminer |
|POS| 	Possessive ending |
|PRP| 	Personal pronoun |
|PRP$| 	Possessive pronoun |
|RB| 	Adverb |
|RBR| 	Adverb, comparative |
|RBS| 	Adverb, superlative |
|RP| 	Particle |
|SYM| 	Symbol |
|TO| 	to |
|UH| 	Interjection |
|VB| 	Verb, base form |
|VBD| 	Verb, past tense |
|VBG| 	Verb, gerund or present participle |
|VBN| 	Verb, past participle |
|VBP| 	Verb, non-3rd person singular present |
|VBZ| 	Verb, 3rd person singular present |
|WDT| 	Wh-determiner |
|WP| 	Wh-pronoun |
|WP\$| 	Possessive wh-pronoun |
|WRB| 	Wh-adverb |










Chunks are Named 


In [1]:
import os
! apt-get update -qq > /dev/null   
# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install nlu  > /dev/null    


# 2. Load the Chunker and print parameters

In [2]:
import nlu 

pipe = nlu.load('match.chunks')
# Now we print the info to see at which index which com,ponent is and what parameters we can configure on them 
pipe.print_info()

match_chunks download started this may take some time.
Approx size to download 4.3 MB
[OK!]
The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> pipe['document_assembler'] has settable params:
pipe['document_assembler'].setCleanupMode('disabled')         | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : disabled
>>> pipe['sentence_detector'] has settable params:
pipe['sentence_detector'].setCustomBounds([])                 | Info: characters used to explicitly mark sentence bounds | Currently set to : []
pipe['sentence_detector'].setDetectLists(True)                | Info: whether detect lists during sentence detection | Currently set to : True
pipe['sentence_detector'].setExplodeSentences(False)          | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
pipe['sentence_det

# 3. Configure pipe to only match nounds and adjvectives and predict on data

In [3]:
# Lets set our Chunker to only match NN
pipe['default_chunker'].setRegexParsers(['<NN>+', '<JJ>+'])
# Now we can predict with the configured pipeline
pipe.predict("Jim and Joe went to the big blue market next to the town hall")

Unnamed: 0_level_0,chunk,pos
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,market,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO..."
0,town hall,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO..."
0,big blue,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO..."
0,next,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO..."
