<a href="https://colab.research.google.com/github/dileep66yadav/codemaster/blob/main/CodeMaster_Gen_AI_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [38]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.layers import LSTM, Dense, Embedding
from keras.models import Sequential, load_model
from keras.optimizers import Adam
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

In [14]:
# Sample text data
text = """
For someone with 9 years of experience preparing for coding interviews, it's crucial to have a strong grasp of both fundamental and advanced data structures and algorithms. Here are some of the most important and common ones you should be familiar with:
Data Structures

    Arrays and Strings
        Operations: Insertion, Deletion, Traversal, Searching
        Common problems: Reverse array/string, palindrome checking, anagram checking

    Linked Lists
        Types: Singly, Doubly, Circular
        Operations: Insertion, Deletion, Reversal, Detecting cycles

    Stacks and Queues
        Implementations: Array-based, Linked List-based
        Common problems: Balancing parentheses, evaluating postfix expressions, implementing a queue using two stacks

    Trees
        Types: Binary Trees, Binary Search Trees (BST), AVL Trees, Red-Black Trees, Heaps (Min-Heap, Max-Heap)
        Operations: Traversals (In-order, Pre-order, Post-order), Insertion, Deletion, Searching
        Common problems: Finding height, checking balanced trees, lowest common ancestor, serialization/deserialization

    Graphs
        Representations: Adjacency Matrix, Adjacency List
        Traversal Algorithms: Depth-First Search (DFS), Breadth-First Search (BFS)
        Common problems: Shortest path (Dijkstra's, Bellman-Ford), cycle detection, connected components, topological sort

    Hash Tables
        Implementations: Open Addressing, Chaining
        Operations: Insertion, Deletion, Searching
        Common problems: Two-sum, substring search, anagram grouping

    Tries
        Usage: Autocomplete, Spell checker
        Operations: Insertion, Searching, Deletion

    Advanced Data Structures
        Segment Trees, Fenwick Trees (Binary Indexed Trees)
        Disjoint Set Union (Union-Find)
        Suffix Arrays and Suffix Trees

Algorithms

    Sorting Algorithms
        Elementary: Bubble Sort, Insertion Sort, Selection Sort
        Advanced: Merge Sort, Quick Sort, Heap Sort, Counting Sort, Radix Sort

    Searching Algorithms
        Linear Search
        Binary Search (both iterative and recursive)

    Dynamic Programming
        Common problems: Fibonacci sequence, Knapsack problem, Longest Common Subsequence (LCS), Longest Increasing Subsequence (LIS), Edit Distance

    Greedy Algorithms
        Common problems: Activity selection, Fractional Knapsack, Huffman Coding

    Backtracking
        Common problems: N-Queens, Sudoku solver, Subset sum, Permutations and combinations

    Divide and Conquer
        Common problems: Merge Sort, Quick Sort, Binary Search, Closest pair of points

Practice Tips

    Understand the Problem Statement: Clarify any doubts about the problem and define the input/output clearly.

    Write Pseudocode: Before diving into coding, write a clear pseudocode to outline your approach.

    Optimize: After writing the initial solution, think about ways to optimize it.

    Edge Cases: Consider edge cases and test your solution against them.

    Complexity Analysis: Analyze the time and space complexity of your solution.

    Mock Interviews: Practice with mock interviews or coding platforms like LeetCode, HackerRank, and CodeSignal.

    Review and Revise: Regularly review and revise the concepts and problems you have solved.

Focusing on these data structures and algorithms will prepare you well for most coding interview scenarios, especially for someone with significant experience.
"""

# Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1

# Create sequences of words
input_sequences = []
for line in text.split('\n'):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

# Pad sequences
max_sequence_len = max([len(seq) for seq in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

# Create predictors and label
X, y = input_sequences[:,:-1], input_sequences[:,-1]
y = tf.keras.utils.to_categorical(y, num_classes=total_words)


In [36]:
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_sequence_len-1))
model.add(LSTM(100))
model.add(Dense(total_words, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.01), metrics=['accuracy'])
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 41, 10)            2410      
                                                                 
 lstm_3 (LSTM)               (None, 100)               44400     
                                                                 
 dense_3 (Dense)             (None, 241)               24341     
                                                                 
Total params: 71151 (277.93 KB)
Trainable params: 71151 (277.93 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [37]:
history = model.fit(X, y, epochs=100, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [18]:
def generate_text(seed_text, next_words, max_sequence_len):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1)

        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text



In [35]:
seed_text = ""
next_words = 25
print(generate_text(seed_text, next_words, max_sequence_len))

  Generate code for additon two number  with 9 years of experience preparing for coding interviews it's crucial to have a strong grasp of both fundamental and advanced data structures and algorithms
