# ***N-Grams***

1. N-grams are continuous sequences of N words (or tokens) in a text.
2. They help capture local context and word relationships.
3. Examples:

   * Unigram (1-word): "I", "love", "dogs"
   * Bigram (2-word): "love dogs"
   * Trigam (3-word): "I love dogs"
4. Higher N captures more context but increases complexity and sparsity.
5. Used in language modeling, text prediction, sentiment analysis, and speech processing.
6. They help machine learning models understand phrase patterns, not just individual words.


There are two types of Functions

⭐ 1. Pre-defined Functions

Functions that are already built into the programming language or libraries.

You can use them directly without writing their logic.

Examples:

Python: print(), len(), sum()

Math libraries: sqrt(), log()

They save time and reduce errors because they are tested and optimized.

In [8]:

!pip install nltk
import nltk
nltk.download('punkt_tab')
from nltk.util import ngrams
text="Artificial Intelligence is grrowing rapidly in\
    the world."
words=text.split()
unigrams=ngrams(words,1)
bigrams=ngrams(words,2)
trigrams=ngrams(words,3)
quadgrams=ngrams(words,4)
pentagrams=ngrams(words,5)
print("unigram words:",list(unigrams))
print("bigram words:",list(bigrams))
print("trigram words:",list(trigrams))
print("quadgram words:",list(quadgrams))
print("pentagram words:",list(pentagrams))

#for result in[ unigrams,bigrams,trigrams,quadgrams,pentagrams]:
 # for grams in result:
  #  print(grams)

unigram words: [('Artificial',), ('Intelligence',), ('is',), ('grrowing',), ('rapidly',), ('in',), ('the',), ('world.',)]
bigram words: [('Artificial', 'Intelligence'), ('Intelligence', 'is'), ('is', 'grrowing'), ('grrowing', 'rapidly'), ('rapidly', 'in'), ('in', 'the'), ('the', 'world.')]
trigram words: [('Artificial', 'Intelligence', 'is'), ('Intelligence', 'is', 'grrowing'), ('is', 'grrowing', 'rapidly'), ('grrowing', 'rapidly', 'in'), ('rapidly', 'in', 'the'), ('in', 'the', 'world.')]
quadgram words: [('Artificial', 'Intelligence', 'is', 'grrowing'), ('Intelligence', 'is', 'grrowing', 'rapidly'), ('is', 'grrowing', 'rapidly', 'in'), ('grrowing', 'rapidly', 'in', 'the'), ('rapidly', 'in', 'the', 'world.')]
pentagram words: [('Artificial', 'Intelligence', 'is', 'grrowing', 'rapidly'), ('Intelligence', 'is', 'grrowing', 'rapidly', 'in'), ('is', 'grrowing', 'rapidly', 'in', 'the'), ('grrowing', 'rapidly', 'in', 'the', 'world.')]


[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


⭐2. User-defined Functions

Functions created by the programmer to perform a specific task.

Useful when you need custom behavior that built-in functions don’t provide.

Defined using keywords like def in Python.

⭐ Difference (Short Summary)

Pre-defined: Ready-made functions provided by the language.

User-defined: Custom functions written by the user.

Pre-defined saves time; user-defined gives flexibility.

In [9]:
# User defined ngrams
def get_unigrams(words):
  return[(words[i],) for i in range(len(words))]

def get_bigrams(words):
  return[(words[i],words[i+1]) for i in range(len(words)-1)]

def get_trigrams(words):
  return[(words[i],words[i+1],words[i+2]) for i in range(len(words)-2)]

def get_ngrams(words,n):
  return[tuple(words[i:i+n]) for i in range(len(words)-n+1)]
#Input
text="Education empowers students to learn and education helps teachers to guide and education creates opportunities to learn and grow in education systems around the world."
#splitting into words
words=text.split()
#Generate using user-defined
unigrams=get_unigrams(words)
bigrams=get_bigrams(words)
trigrams=get_trigrams(words)
quadgrams=get_ngrams(words,4)
#Resutl printing
print("original text:",text)
print("----------------------------------------")
print("Unigrams:",unigrams)
print("----------------------------------------")
print("Bigrams:",bigrams)
print("----------------------------------------")
print("Trigrams:",trigrams)
print("----------------------------------------")
print("Quadragrams:",quadgrams)

original text: Education empowers students to learn and education helps teachers to guide and education creates opportunities to learn and grow in education systems around the world.
----------------------------------------
Unigrams: [('Education',), ('empowers',), ('students',), ('to',), ('learn',), ('and',), ('education',), ('helps',), ('teachers',), ('to',), ('guide',), ('and',), ('education',), ('creates',), ('opportunities',), ('to',), ('learn',), ('and',), ('grow',), ('in',), ('education',), ('systems',), ('around',), ('the',), ('world.',)]
----------------------------------------
Bigrams: [('Education', 'empowers'), ('empowers', 'students'), ('students', 'to'), ('to', 'learn'), ('learn', 'and'), ('and', 'education'), ('education', 'helps'), ('helps', 'teachers'), ('teachers', 'to'), ('to', 'guide'), ('guide', 'and'), ('and', 'education'), ('education', 'creates'), ('creates', 'opportunities'), ('opportunities', 'to'), ('to', 'learn'), ('learn', 'and'), ('and', 'grow'), ('grow', 