In [2]:
The **NLTK (Natural Language Toolkit)** library provides a wide range of functions to perform different NLP tasks. Here’s a categorized list of some commonly used functions and modules in the NLTK library:

### 1. **Tokenization**:
   - **Word Tokenization**: Splits text into words.
     ```python
     nltk.word_tokenize(text)
     ```
   - **Sentence Tokenization**: Splits text into sentences.
     ```python
     nltk.sent_tokenize(text)
     ```

### 2. **Stemming**:
   - **PorterStemmer**: Reduces words to their root form.
     ```python
     from nltk.stem import PorterStemmer
     ps = PorterStemmer()
     ps.stem(word)
     ```
   - **LancasterStemmer**: A more aggressive stemmer.
     ```python
     from nltk.stem import LancasterStemmer
     ls = LancasterStemmer()
     ls.stem(word)
     ```

### 3. **Lemmatization**:
   - **WordNet Lemmatizer**: Uses WordNet to reduce words to their base form.
     ```python
     from nltk.stem import WordNetLemmatizer
     lemmatizer = WordNetLemmatizer()
     lemmatizer.lemmatize(word, pos='v')  # specify part of speech
     ```

### 4. **Part-of-Speech Tagging**:
   - **POS Tagger**: Tags words with their part of speech.
     ```python
     nltk.pos_tag(tokens)
     ```

### 5. **Named Entity Recognition (NER)**:
   - **Chunking Named Entities**: Identifies named entities in a sentence.
     ```python
     nltk.ne_chunk(nltk.pos_tag(tokens))
     ```

### 6. **Parsing & Tree Structures**:
   - **Recursive Descent Parser**: A parser for analyzing the syntactic structure.
     ```python
     nltk.RecursiveDescentParser(grammar)
     ```
   - **Dependency Parsing**: Analyzing relationships between words.
     ```python
     nltk.parse.DependencyGraph()
     ```

### 7. **Stopwords**:
   - **Stopwords**: Provides a list of common stopwords (words like “the,” “is,” etc.).
     ```python
     from nltk.corpus import stopwords
     stop_words = set(stopwords.words('english'))
     ```

### 8. **Frequency Distribution**:
   - **Counting Word Frequency**: Analyzes the frequency of tokens in text.
     ```python
     from nltk import FreqDist
     fdist = FreqDist(tokens)
     fdist.most_common(n)  # Top n most common words
     ```

### 9. **N-Grams**:
   - **Generating N-Grams**: Create n-grams (bigrams, trigrams, etc.) from text.
     ```python
     from nltk.util import ngrams
     list(ngrams(tokens, 2))  # Bigrams
     ```

### 10. **Collocations**:
   - **Finding Collocations**: Discover frequent word pairings.
     ```python
     from nltk.collocations import BigramCollocationFinder
     finder = BigramCollocationFinder.from_words(tokens)
     finder.nbest(nltk.BigramAssocMeasures().pmi, 10)  # Top 10 collocations
     ```

### 11. **WordNet (Lexical Database)**:
   - **Synsets**: Get the synonym sets of a word.
     ```python
     from nltk.corpus import wordnet
     synsets = wordnet.synsets('word')
     ```
   - **Definitions**: Get the definition of a word.
     ```python
     synsets[0].definition()
     ```
   - **Antonyms & Synonyms**: Find antonyms or synonyms.
     ```python
     wordnet.synsets('good')[0].lemmas()[0].antonyms()
     ```

### 12. **Text Classification**:
   - **Naive Bayes Classifier**: Train a Naive Bayes model for classification tasks.
     ```python
     from nltk.classify import NaiveBayesClassifier
     classifier = NaiveBayesClassifier.train(training_data)
     classifier.classify(features)
     ```

### 13. **Corpus Access**:
   - **Accessing Preloaded Corpora**: NLTK provides access to several corpora.
     ```python
     from nltk.corpus import gutenberg
     gutenberg.raw('austen-emma.txt')
     ```

### 14. **Language Modeling**:
   - **Conditional Frequency Distribution**: Useful for language modeling tasks.
     ```python
     from nltk.probability import ConditionalFreqDist
     cfd = ConditionalFreqDist((word1, word2) for word1, word2 in bigrams)
     ```

### 15. **Translation & Word Alignment**:
   - **IBM Model for Word Alignment**: Used for bilingual word alignments.
     ```python
     from nltk.translate import IBMModel1
     ```

### 16. **Evaluation & Metrics**:
   - **Accuracy**: Evaluate accuracy of a classifier.
     ```python
     nltk.classify.accuracy(classifier, test_set)
     ```

### 17. **Tokenizers & Text Processing**:
   - **Regex Tokenizer**: Tokenizes text using regular expressions.
     ```python
     from nltk.tokenize import regexp_tokenize
     regexp_tokenize(text, pattern)
     ```

### 18. **Sentiment Analysis**:
   - **VADER Sentiment Analyzer**: Perform simple sentiment analysis.
     ```python
     from nltk.sentiment import SentimentIntensityAnalyzer
     sia = SentimentIntensityAnalyzer()
     sia.polarity_scores("This is a good movie")
     ```

### 19. **Text Similarity**:
   - **Jaccard Similarity**: Calculate text similarity.
     ```python
     from nltk.metrics import jaccard_distance
     jaccard_distance(set1, set2)
     ```

### 20. **Translation & BLEU Score**:
   - **BLEU Score for Translation**: Compute translation quality score.
     ```python
     from nltk.translate.bleu_score import sentence_bleu
     sentence_bleu(reference, candidate)
     ```

### 21. **Chunking (Shallow Parsing)**:
   - **Regexp Parser for Chunking**: Extract noun phrases or other chunks.
     ```python
     nltk.RegexpParser(grammar)
     ```

### How to Explore More Functions:

To explore more functions in **NLTK**, you can:
- Visit the **[official NLTK documentation](https://www.nltk.org/api/nltk.html).
- Explore the source code by loading an interactive Python session and using `dir()` to list available methods:
  ```python
  import nltk
  dir(nltk)
  ```

### Conclusion:
NLTK offers a wide range of tools and functions for various NLP tasks, from basic preprocessing to advanced tasks like classification, parsing, and machine translation. Depending on the task, you can leverage different modules and functions to suit your needs.

SyntaxError: invalid character '’' (U+2019) (2842552178.py, line 1)

In [None]:
There are many functions and modules in NLTK that go beyond the commonly used ones. Below are additional functions categorized by their specific purposes that I haven’t yet covered in the previous list:

### 1. **Advanced Corpus Handling**:
   - **Concordance**: Shows the context of a word within a corpus.
     ```python
     text.concordance('word')
     ```
   - **Word Context Index**: Finds all the occurrences of a word in a text.
     ```python
     nltk.text.ContextIndex(tokens)
     ```
   - **Collocations using TextCorpus**:
     ```python
     text.collocations()
     ```

### 2. **Semantic Processing**:
   - **Semantic Role Labeling**: Determines roles of words in a sentence.
     ```python
     nltk.sem()  # General semantic processing
     ```
   - **VerbNet**: Access to verb-specific information.
     ```python
     from nltk.corpus import verbnet
     verbnet.classids('run')
     ```

### 3. **Discourse Analysis**:
   - **Discourse Representation**: Representing discourse semantics.
     ```python
     from nltk.sem import discourse
     discourse.Discourse()
     ```

### 4. **Classification & Machine Learning**:
   - **Decision Tree Classifier**: Used for text classification tasks.
     ```python
     from nltk.classify import DecisionTreeClassifier
     ```
   - **MaxEnt Classifier**: A maximum entropy model for classification.
     ```python
     from nltk.classify import MaxentClassifier
     ```

### 5. **Chunking**:
   - **Named Entity Chunker**: For named entity recognition with trained models.
     ```python
     nltk.chunk.NamedEntityChunker()
     ```

### 6. **Parsing & Syntax Tree**:
   - **Chart Parser**: A different method for parsing syntax trees.
     ```python
     nltk.ChartParser(grammar)
     ```
   - **Shift-Reduce Parser**: Another parsing method.
     ```python
     nltk.ShiftReduceParser(grammar)
     ```

### 7. **Probability Distributions**:
   - **Laplace ProbDist**: Used for smoothing in probabilistic models.
     ```python
     from nltk.probability import LaplaceProbDist
     ```
   - **MLEProbDist**: Maximum likelihood estimate of a distribution.
     ```python
     from nltk.probability import MLEProbDist
     ```

### 8. **Taggers**:
   - **Unigram Tagger**: Tags each token with its most common tag.
     ```python
     nltk.UnigramTagger()
     ```
   - **Bigram Tagger**: Tags tokens based on pairs of words.
     ```python
     nltk.BigramTagger()
     ```
   - **Brill Tagger**: A more advanced rule-based tagger.
     ```python
     nltk.BrillTagger()
     ```

### 9. **Language Models**:
   - **NgramModel**: Models for computing probabilities based on N-grams.
     ```python
     nltk.model.NgramModel()
     ```
   - **PCFG Grammar**: Probabilistic context-free grammars.
     ```python
     nltk.PCFG.fromstring(grammar)
     ```

### 10. **Parsing with CFG**:
   - **Context-Free Grammar (CFG)**: For defining grammars used in parsing.
     ```python
     nltk.CFG.fromstring(grammar)
     ```

### 11. **Morphology**:
   - **Morphological Parsing**: For parsing complex word forms.
     ```python
     nltk.morphy()
     ```

### 12. **Metrics & Evaluation**:
   - **Precision, Recall, F-measure**: Used for evaluation of classifiers.
     ```python
     from nltk.metrics import precision, recall, f_measure
     precision(reference_set, test_set)
     recall(reference_set, test_set)
     f_measure(reference_set, test_set)
     ```
   - **Confusion Matrix**: Generates a confusion matrix for classification results.
     ```python
     from nltk.metrics import ConfusionMatrix
     cm = ConfusionMatrix(reference_list, test_list)
     ```

### 13. **NLP Data Processing**:
   - **Text Normalization**: Various utilities for text normalization.
     ```python
     from nltk import normalize
     ```
   - **Punkt Sentence Tokenizer**: A pre-trained tokenizer based on unsupervised learning.
     ```python
     nltk.tokenize.punkt.PunktSentenceTokenizer()
     ```

### 14. **Language Translation**:
   - **Alignment Tools**: For bilingual word alignments.
     ```python
     from nltk.translate import Alignment
     ```
   - **Phrase-Based Machine Translation**:
     ```python
     nltk.translate.PhraseTable()
     ```

### 15. **Additional Tools**:
   - **Edit Distance**: Used for spell checking or calculating distance between strings.
     ```python
     nltk.edit_distance('kitten', 'sitting')
     ```
   - **Bigrams/Trigrams from TextCorpus**:
     ```python
     text.collocations()
     ```

### 16. **Corpora Tools**:
   - **Swadesh Corpus**: Vocabulary lists for language comparison.
     ```python
     from nltk.corpus import swadesh
     swadesh.words('en')
     ```
   - **Indian Corpus**: Specific corpus for Indian languages.
     ```python
     from nltk.corpus import indian
     indian.raw('hindi.pos')
     ```

### 17. **Miscellaneous**:
   - **Conditional Probability Distribution**: For probabilistic models.
     ```python
     from nltk.probability import ConditionalProbDist
     ```
   - **Hidden Markov Model (HMM)**: Used for POS tagging and sequence labeling.
     ```python
     nltk.tag.hmm.HiddenMarkovModelTrainer()
     ```

---

To explore additional functions in more detail, I recommend checking the NLTK source code or using the following approach:
- **Use the `help()` Function**: You can find detailed information about any module or function in NLTK.
  ```python
  import nltk
  help(nltk)
  ```
- **Check Documentation**: Visit the official [NLTK Documentation](https://www.nltk.org/).

While this is a comprehensive list, NLTK is extensive, and some of its tools are not frequently used. Depending on your task, the functions you might need can vary significantly.