<a href="https://colab.research.google.com/github/ranamaddy/NLP/blob/main/3_NLP_BASIC_(_Language_Generation_)with_Python_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Language Generation By NLTK**
NLTK (Natural Language Toolkit) is a popular Python library for natural language processing that can also be used for language generation tasks. Here are a few examples of language generation tasks that can be accomplished using NLTK

**Text Summarization**: NLTK provides a summarization module that can be used to generate a summary of a given text. This can be useful for quickly understanding the main points of a long article or document. Here's an example of how to use the summarization modul

In this example, we first obtained a sample text from the Reuters corpus using the **reuters.raw()** method. We then tokenized the text into sentences using the **sent_tokenize()** method and into words using the **word_tokenize()** method. Next, we removed stop words and performed stemming using the stopwords module and PorterStemmer class from NLTK. We then calculated the frequency distribution of words and extracted the 5 most common words. Finally, we generated a summary by selecting sentences that contained at least one of the most common words.

In [8]:
from nltk.corpus import reuters
from nltk.tokenize import sent_tokenize
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
import nltk
nltk.download('punkt')
nltk.download('stopwords')
# Get sample text
text = reuters.raw('test/14826')

# Tokenize text into sentences
sentences = sent_tokenize(text)

# Tokenize sentences into words
words = word_tokenize(text)

# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.casefold() not in stop_words]

# Perform stemming
ps = PorterStemmer()
stemmed_words = [ps.stem(word) for word in filtered_words]

# Calculate word frequency
freq_dist = FreqDist(stemmed_words)

# Get 5 most common words
most_common_words = [word[0] for word in freq_dist.most_common(5)]

# Generate summary
summary_sentences = []
for sentence in sentences:
    for word in most_common_words:
        if word in sentence:
            summary_sentences.append(sentence)
            break
            
summary = ' '.join(summary_sentences)
print(summary)


ASIAN EXPORTERS FEAR DAMAGE FROM U.S.-JAPAN RIFT
  Mounting trade friction between the
  U.S. And Japan has raised fears among many of Asia's exporting
  nations that the row could inflict far-reaching economic
  damage, businessmen and officials said. They told Reuter correspondents in Asian capitals a U.S.
  Move against Japan might boost protectionist sentiment in the
  U.S. And lead to curbs on American imports of their products. But some exporters said that while the conflict would hurt
  them in the long-run, in the short-term Tokyo's loss might be
  their gain. The U.S. Has said it will impose 300 mln dlrs of tariffs on
  imports of Japanese electronics goods on April 17, in
  retaliation for Japan's alleged failure to stick to a pact not
  to sell semiconductors on world markets at below cost. Unofficial Japanese estimates put the impact of the tariffs
  at 10 billion dlrs and spokesmen for major electronics firms
  said they would virtually halt exports of products hit by the


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


**Machine Translation**: NLTK provides a module for machine translation that can be used to translate text from one language to another. Here's an example of how to use the machine translation module

In [12]:
#'In this example, we first loaded a parallel corpus of aligned sentences from the comtrans corpus using the comtrans.aligned_sents() method.
# We then trained an IBM Model 1 translation model using the parallel corpus. Finally, 
#we translated a German text "Guten Morgen" to English by aligning the German text to an empty English sentence and using the align() method of the translation model.'

# Import necessary libraries

from nltk.corpus import comtrans
from nltk.translate import Alignment, IBMModel1# Load parallel corpus

bitext = comtrans.aligned_sents()[:100]
# Tokenize and stem words

from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer('english')# Generate list of stemmed words
stemmed_words = []

for sentence in bitext:
    for word in sentence.words:
        stemmed_words.append(stemmer.stem(word.lower()))# Generate frequency distribution of stemmed words
from nltk.probability import FreqDist
freq_dist = FreqDist(stemmed_words)
# Print the 10 most common stemmed words

print(freq_dist.most_common(10))



[(',', 128), ('.', 87), ('der', 58), ('die', 53), ('ich', 28), ('das', 27), ('in', 26), ('zu', 25), ('ein', 25), ('wir', 24)]
