<a href="https://colab.research.google.com/github/catafest/colab_google/blob/master/catafest_045.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


See [this webpage](https://pypi.org/project/textgenrnn/) about :
## <font color='blue'>textgenrnn</font>.

*Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly train on a text using a pretrained model.*

*A modern neural network architecture which utilizes new techniques as attention-weighting and skip-embedding to accelerate training and improve model quality.*

*Able to train on and generate text at either the character-level or word-level.*

*Able to configure RNN size, the number of RNN layers, and whether to use bidirectional RNNs.*

*Able to train on any generic input text file, including large files.*

*Able to train models on a GPU and then use them with a CPU.*

*Able to utilize a powerful CuDNN implementation of RNNs when trained on the GPU, which massively speeds up training time as opposed to normal LSTM implementations.*

*Able to train the model using contextual labels, allowing it to learn faster and produce better results in some cases.*

*Able to generate text interactively for customized stories.*

Clean up data files from previous runs.

In [1]:
!rm * -rf
!ls

In [2]:
import os

<font color='red'>NOTE</font> : Install **textgenrnn** only if exist.

In [None]:
import importlib

package_name_exist = "textgenrnn"

try:
    importlib.import_module(package_name_exist)
    print(f"Pachetul {package_name_exist} este deja instalat.")
except ImportError:
    print(f"Pachetul {package_name_exist} nu este instalat.")
    !pip3 install git+https://github.com/minimaxir/textgenrnn.git#v1.5.0

Get the input text file for training the RNN artificial intelligence.

I used the text file from [this webpage](https://ro.wikisource.org/wiki/%C3%8Endrept%C4%83ri) - Indreptari.

Exported from Wikisource on 12 august 2023.

In [4]:
from google.colab import files
input_txt_file = files.upload()
#input_txt_file

Saving Îndreptări.txt to Îndreptări.txt


Use warning to avoid this error:
```
# This is formatted as code
WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate`
 or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.
```



In [5]:
import warnings

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    from textgenrnn import textgenrnn
    textgen = textgenrnn()






In [6]:

model_cfg = {
    'word_level': True,   # set to True if want to train a word-level model (requires more data and smaller max_length)
    'rnn_size': 128,   # number of LSTM cells of each layer (128/256 recommended)
    'rnn_layers': 4,   # number of LSTM layers (>=2 recommended)
    'rnn_bidirectional': False,   # consider text both forwards and backward, can give a training boost
    'max_length': 3,   # number of tokens to consider before predicting the next (20-40 for characters, 5-10 for words recommended)
    'max_words': 10000,   # maximum number of words to model; the rest will be ignored (word-level model only)
}

train_cfg = {
    'line_delimited': True,   # set to True if each text has its own line in the source file
    'num_epochs': 60,   # set higher to train the model for longer
    'gen_epochs': 5,   # generates sample text from model after given number of epochs
    'train_size': 0.8,   # proportion of input data to train on: setting < 1.0 limits model from learning perfectly
    'dropout': 0.0,   # ignore a random proportion of source tokens each epoch, allowing model to generalize better
    'validation': False,   # If train__size < 1.0, test on holdout dataset; will make overall training slower
    'is_csv': False   # set to True if file is a CSV exported from Excel/BigQuery/pandas
}

You can use for model_conf and train_cfg something like this and you have **better responses** ... :
```
# This is formatted as code
textgen = textgenrnn(name=model_name)

train_function = textgen.train_from_file if train_cfg['line_delimited'] else textgen.train_from_largetext_file

train_function(
    file_path=file_name,
    new_model=True,
    num_epochs=train_cfg['num_epochs'],
    gen_epochs=train_cfg['gen_epochs'],
    batch_size=1024,
    train_size=train_cfg['train_size'],
    dropout=train_cfg['dropout'],
    validation=train_cfg['validation'],
    is_csv=train_cfg['is_csv'],
    rnn_layers=model_cfg['rnn_layers'],
    rnn_size=model_cfg['rnn_size'],
    rnn_bidirectional=model_cfg['rnn_bidirectional'],
    max_length=model_cfg['max_length'],
    dim_embeddings=100,
    word_level=model_cfg['word_level'])
```
But I used in the easy way for this demo:


In [None]:

filename = next(iter(input_txt_file))
print(filename)
modelname, file_extension = os.path.splitext(filename)
with open(filename, "r") as input_txt:
    textgen.train_from_file(filename,
                                new_model=True,
                                rnn_bidirectional=True,
                                rnn_size=64,
                                dim_embeddings=300,
                                num_epochs=1)
print("Model summary is this : ")
print(textgen.model.summary())


Let's see files , and where is stored.

In [8]:
!ls

Îndreptări.txt		textgenrnn_vocab.json
textgenrnn_config.json	textgenrnn_weights.hdf5


In [9]:
!pwd

/content


This source code just show how to use weights ...

In [None]:
textgen = textgenrnn(weights_path='textgenrnn_weights.hdf5',
                       vocab_path='textgenrnn_vocab.json',
                       config_path='textgenrnn_config.json')

textgen.generate_samples(max_gen_length=150)
textgen.generate_to_file('textgenrnn_texts.txt', max_gen_length=150)

Let try an response ...

In [None]:
responses = textgen.generate(6, return_as_list=True, temperature=0.9)

If you want better responses then need to fix by settings for input data and training settings on *model_conf* and *train_cfg* ...

In [12]:
print(" ===== the new alien language!")
for response in responses:
    print("Chatbot:", response)

 ===== the new alien language!
Chatbot: — ’alel aduli.
Chatbot: 
Chatbot: — In mii. Coante. Une orid întator du-, îngând părânde.
Chatbot: 
Chatbot: 
Chatbot: 


If the notebook has errors (e.g. GPU Sync Fail), force-kill the Colaboratory virtual machine and restart it with the command below:



```
# This is formatted as code
!kill -9 -1
```

