NLTK Corpora Analysis 📚

📋 Project Overview

Subtask 1: Install and Import NLTK

Install the NLTK library if you haven't already.
Import the necessary modules from NLTK.
Download the required NLTK data files to complete the next subtasks.

Subtask 2: Sentence and Word Tokenization

Open the Gutenberg corpus.
Choose a specific file (e.g., 'austen-emma.txt') and tokenize it into sentences.
Print the total number of sentences and the first sentence.
Tokenize the text into words and print the tokens.

Subtask 3: Bigrams, Trigrams, and POS Tagging

Generate bigrams and trigrams from the word tokens and print the first 10 of each.
Perform POS tagging on the word tokens and print the first 10 tokens with their POS tags.

Subtask 4: Stemming, Lemmatization, and Frequency Distribution

Stem each word token and print the original token, its POS tag, and its stem.
Lemmatize each word token and print the original token and its lemma.
Create a frequency distribution of the word tokens and plot the top 20 words.

🔑 Key Skills

Python Programming
Data Analysis
Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Jupyter		Jupyter
frontend		frontend
venv		venv
.gitignore		.gitignore
README.md		README.md
app.py		app.py
package-lock.json		package-lock.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLTK Corpora Analysis 📚

📋 Project Overview

Subtask 1: Install and Import NLTK

Subtask 2: Sentence and Word Tokenization

Subtask 3: Bigrams, Trigrams, and POS Tagging

Subtask 4: Stemming, Lemmatization, and Frequency Distribution

🔑 Key Skills

🛠️ Tools

📖 Libraries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLTK Corpora Analysis 📚

📋 Project Overview

Subtask 1: Install and Import NLTK

Subtask 2: Sentence and Word Tokenization

Subtask 3: Bigrams, Trigrams, and POS Tagging

Subtask 4: Stemming, Lemmatization, and Frequency Distribution

🔑 Key Skills

🛠️ Tools

📖 Libraries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages