Intelligent Word Frequency Counter

Development Architecture:

- Input Handling: The user enters a paragraph of text.  

- Processing Logic:

  - The program tokenizes the text into individual words.  

  - It counts the occurrences of each word using a dictionary or collections library.  

  - The most frequently used words are identified.  

- Output: Displays the top 3 most common words.  



**🔹 Tips to Build:  

✅ Use **Natural Language Processing (NLP)** libraries like **NLTK** for better text analysis.  

✅ Remove **stop words** (e.g., "the", "is", "and") for accurate results.  

✅ Convert the word count data into a **word cloud** for visual representation.  

✅ Implement **CSV file reading** to analyze text files automatically.  

In [1]:
!pip install nltk wordcloud



In [3]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt

In [4]:
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [13]:
from collections import Counter
import string

# Predefined stopwords list (common English stopwords)
stopwords_list = {
    "the", "is", "in", "and", "to", "was", "not", "by", "over", "a", "of", "on", "for", "with", "as", "at", "this", "that"
}

def clean_and_tokenize(text):
    text = text.lower()  # Convert to lowercase
    text = text.translate(str.maketrans("", "", string.punctuation))  # Remove punctuation
    words = text.split()  # Tokenize text into words
    return [word for word in words if word not in stopwords_list]  # Remove stopwords

def get_most_common_words(text, top_n=3):
    words = clean_and_tokenize(text)
    word_counts = Counter(words)
    return word_counts.most_common(top_n)

if __name__ == "__main__":
    user_text = input("Enter a paragraph: ")
    common_words = get_most_common_words(user_text)

    print("\nTop 3 most common words:")
    for word, count in common_words:
        print(f"{word}: {count}")


Enter a paragraph: The quick brown fox jumps over the lazy dog. The dog was not amused by the fox.

Top 3 most common words:
fox: 2
dog: 2
quick: 1
