##### I assumed the following:
- Tokenizes a corpus of text.
- Generates n-gram models from 2-6.
- Produce random phrases.
- Find and save the 10 high frequent (trigram) in the corpus in a text file 
- The user enters desired sentence
- The user enters the seed word
- Print the sentence

**1- Downloads the data required for tokenization. It also imports a number of modules for text processing and generation.**

In [1]:
!pip install --user -U nltk
import nltk
nltk.download('punkt')
import random
from nltk.tokenize import word_tokenize
import collections


[notice] A new release of pip available: 22.3.1 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip




[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


**2- Load the corpus from KSUCCA Corpus Files**

In [2]:
f = open("aa1.txt", "r", encoding="utf-8")
corpus = f.read()
f.close()

**3- Tokenize the corpus and generate N-gram models for n = 2 to 6**

In [3]:
tokens = word_tokenize(corpus)

ngram_models = {}
for n in range(2, 7):
    ngrams = zip(*[tokens[i:] for i in range(n)])
    ngram_models[n] = list(ngrams)

**4- Get the 10 most frequent trigrams and save it into  frequent.txt**

In [4]:
trigrams = ngram_models[3]
freq_dist = collections.Counter(trigrams)

common = freq_dist.most_common(10)

f = open("frequent.txt", "w", encoding="utf-8")
for trigram, frequency in common:
    f.write(f"Trigram: {trigram}: Frequency: {frequency}\n")
f.close()

**5- Generate a sentence of num_words length starting with the start_word provided.**

In [12]:
# generate sentence function 

def generate_sentence(num_words, start_word):
    generated_sentence = start_word
    while len(generated_sentence.split()) < num_words:
        if len(generated_sentence.split()) < 2:
            possible_followers = [follower for (a, follower) in ngram_models[2] if a == generated_sentence.split()[-1]]
        else:
            possible_followers = [follower for (a, b, follower) in trigrams if a == generated_sentence.split()[-2] and b == generated_sentence.split()[-1]]
            
        if possible_followers:
            generated_sentence += " " + random.choice(possible_followers)
        else:
            break
    
    return generated_sentence



**6- Test and Result**

In [15]:
for i in range(10):
    num_words = int(input("Enter the number of words in the desired sentence: "))
    start_word = input("Enter a start word: ")
    print(f"Test {i+1}\n Number of words: {num_words} \n Start word: {start_word} \n Sentence generated: {generate_sentence(num_words, start_word)}")    

Test 1
 Number of words: 10 
 Start word: الله 
 Sentence generated: الله على رسوله منهم فما أوجفتم عليه من أجر وما
Test 2
 Number of words: 9 
 Start word: العزيز 
 Sentence generated: العزيز الحميد الذي له ما في الأرض وجعلنا لكم
Test 3
 Number of words: 8 
 Start word: الرحمن 
 Sentence generated: الرحمن ولدا سبحانه بل عباد مكرمون لا يسبقونه
Test 4
 Number of words: 7 
 Start word: أحد 
 Sentence generated: أحد وامضوا حيث تؤمرون وقضينا إليه ذلك
Test 5
 Number of words: 6 
 Start word: على 
 Sentence generated: على غضب وللكافرين عذاب أليم ولله
Test 6
 Number of words: 5 
 Start word: إن 
 Sentence generated: إن الله غفور رحيم يا
Test 7
 Number of words: 4 
 Start word: غفور 
 Sentence generated: غفور رحيم فإذا قضيتم
Test 8
 Number of words: 3 
 Start word: هو 
 Sentence generated: هو خصيم مبين
Test 9
 Number of words: 2 
 Start word: قريش 
 Sentence generated: قريش إيلافهم
Test 10
 Number of words: 1 
 Start word: الله 
 Sentence generated: الله



**7-  Find and save the 10 high frequent (trigram) in the corpus in a text file**

In [17]:
with open('frequent.txt', 'r' , encoding = 'utf8') as file:
    for line in file:
        print(line.strip())

Trigram: ('الله', 'الرحمن', 'الرحيم'): Frequency: 114
Trigram: ('بسم', 'الله', 'الرحمن'): Frequency: 113
Trigram: ('يا', 'أيها', 'الذين'): Frequency: 92
Trigram: ('أيها', 'الذين', 'آمنوا'): Frequency: 89
Trigram: ('من', 'دون', 'الله'): Frequency: 71
Trigram: ('على', 'كل', 'شيء'): Frequency: 52
Trigram: ('آمنوا', 'وعملوا', 'الصالحات'): Frequency: 50
Trigram: ('إن', 'في', 'ذلك'): Frequency: 50
Trigram: ('في', 'سبيل', 'الله'): Frequency: 44
Trigram: ('ما', 'في', 'السماوات'): Frequency: 39


**8- Resources**

- https://sourceforge.net/projects/ksucca-corpus/files/KSUCCA%20Files/

- https://docs.python.org/3/library/random.html

- https://www.digitalocean.com/community/tutorials/python-counter-python-collections-counter

- https://github.com/shayan09/Text-Generation-using-NGRAM-models
