# Monero Research Lab text generator
Isthmus / Mithchell

Modification of text generator code from Pranjal Srivastava, see https://www.analyticsvidhya.com/blog/2018/03/text-generation-using-python-nlp/

## Importing Dependencies

In [4]:
# Learning and processing
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import RNN
from keras.utils import np_utils

# IRC Log processing
import re 

## Settings

In [5]:
log_file_path = "mrl_logs_raw.txt"
most_recent_N_characters = 10**5
savelogs = False

## Load data

In [6]:
manip_text_raw = (open(log_file_path).read())

## Process IRC logs

Clean up the logs

In [11]:
# Drop case
manip_text_raw = manip_text_raw.lower() # drop case

# REmove channel notifications
words_to_remove = ('mode','timestamp','joined','left','quit','seconds','channel', '#monero-research-lab', '→')
manip_text = re.sub("[\(\[].*?[\)\]]", "", manip_text_raw) # remove timestamps
for w in range(len(words_to_remove)):
    this_word = words_to_remove[w]
    print(this_word)
    manip_text = re.sub(".*"+this_word+".*", "", manip_text)

mode
timestamp
joined
left
quit
seconds
channel
#monero-research-lab
→


Remove empty rows

In [12]:
max_rows_blank = 200
for i in range(max_rows_blank):
    search_str = "\n"*(max_rows_blank-i)
    manip_text = re.sub(search_str,'\n',manip_text)
    
# Peep the results
print(manip_text[0:1000])


 <ukoehb> is transaction fee 8 bytes?
 <moneromooo> it is a 64 bit value. it is typically encoded as a varint, if that's what you're asking.
 <ukoehb> just looking at storage required
 <ukoehb> varint = variable length integer, so is storage not constant?
 <moneromooo> yes.
 <ukoehb> thanks :)
 <serhack> morning :)
 <suraenoether> monero coffee chat yall~
 <sarang> how did the coffee chat go?
 <sarang> i had a volunteer commitment during that time
 <sarang> we repair bikes and donate them to veterans and kids who need them
 <sneurlax1> good with bikes, eh?
 <sarang> i worked part-time as a mechanic for a few years
 <sneurlax1> i missed the meeting so have no useful comment there sorry.
 <sarang> fixing bikes is a ton of fun
 <sneurlax1> i skipped straight to motorcycles and need to get handy with it quickly
 — sarang is moving bike convo to #monero-research-lounge 
 <needmoney90> my call with bisq is wednesday, would anyone be available to chat about the technical details of how multi

In [14]:
if 1==1:
    text_file = open("mrl_logs_processed.txt", "w")
    text_file.write(manip_text)
    text_file.close()

Only keep the most recent logs

In [9]:
mlen = len(manip_text)
print(mlen)
text = manip_text[(mlen-most_recent_N_characters):(most_recent_N_characters)]

7001579


## Side quest: handles

In [10]:
all_carrots = re.findall("\<.*\>", manip_text_raw)
handles_weighted = [x for x in all_carrots if (" " in x) == False]
unique_handles = list(set(handles_weighted))

# Peep the results
unique_handles

# Save if desired
if savelogs:
    text_file = open("mrl_logs_handles.txt", "w")
    text_file.write(str(unique_handles[:]))
    text_file.close()

## Create character/word mappings

In [7]:
characters = sorted(list(set(text)))

n_to_char = {n:char for n, char in enumerate(characters)}
char_to_n = {char:n for n, char in enumerate(characters)}

## Data pre-processing

In [8]:
X = []
Y = []
length = len(text)
seq_length = 100

for i in range(0, length-seq_length, 1):
    sequence = text[i:i + seq_length]
    label =text[i + seq_length]
    X.append([char_to_n[char] for char in sequence])
    Y.append(char_to_n[label])

In [9]:
X_modified = np.reshape(X, (len(X), seq_length, 1))
X_modified = X_modified / float(len(characters))
Y_modified = np_utils.to_categorical(Y)

## LSTM model

In [10]:
num_wide = 300 # 400 default, 700 wider

model = Sequential()
model.add(LSTM(num_wide, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(num_wide, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(num_wide))
model.add(Dropout(0.2))
model.add(Dense(Y_modified.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

In [11]:
num_epochs = 25# epochs 100 default
batch_sizes = 50 # batch size 50 default

model.fit(X_modified, Y_modified, epochs=num_epochs, batch_size=batch_sizes)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x7f369058ccd0>

In [12]:
# model.load_weights('weights.h5')

## Generating Text

One example

In [38]:
string_mapped = X[99] # 99
full_string = [n_to_char[value] for value in string_mapped]
# generating characters
for i in range(400):
    x = np.reshape(string_mapped,(1,len(string_mapped), 1))
    x = x / float(len(characters))

    pred_index = np.argmax(model.predict(x, verbose=0))
    seq = [n_to_char[value] for value in string_mapped]
    full_string.append(n_to_char[pred_index])

    string_mapped.append(pred_index)
    string_mapped = string_mapped[1:len(string_mapped)]


In [39]:
#combining text
txt=""
for char in full_string:
    txt = txt+char
txt

'ractical details by ecc researchers\n <sarang> inge-: yeah, nothing has changed\n <inge-> and halo 2 in the seal output is also a blockchain is a blockchain is a blockchain in the seal output is also a blockchain is a blockchain is a blockchain in the seal output is also a blockchain is a blockchain is a blockchain in the seal output is also a blockchain is a blockchain is a blockchain in the seal output is also a blockchain is a blockchain is a blockchain in the seal output is also a blockchain is a '

# 

## Loop to extract more

In [48]:
length_var = 100
xarray = [0,99,100,200,500,1000,2000,5000,10000,20000,50000]

In [49]:
for x in xarray: # range(5):
    print(x)
    
    string_mapped = X[x] # 99
    full_string = [n_to_char[value] for value in string_mapped]
    
    # generating characters
    for i in range(length_var):
        x = np.reshape(string_mapped,(1,len(string_mapped), 1))
        x = x / float(len(characters))

        pred_index = np.argmax(model.predict(x, verbose=0))
        seq = [n_to_char[value] for value in string_mapped]
        full_string.append(n_to_char[pred_index])

        string_mapped.append(pred_index)
        string_mapped = string_mapped[1:len(string_mapped)]
        
    #combining text
    txt=""
    for char in full_string:
        txt = txt+char
        
    print(txt)
        

0
point.
 <sarang> some community experts asked for details on their forums, but were not given any proposal donstolled be a blockchain is a blockchain is a blockchain is a blockchain is a blockchain is a bl
99
ractical details by ecc researchers
 <sarang> inge-: yeah, nothing has changed
 <inge-> and halo 2 in the seal output is a blockchain is a blockchain is a blockchain in the seal output is also a blockchain 
100
actical details by ecc researchers
 <sarang> inge-: yeah, nothing has changed
 <inge-> and halo 2 is the seal output is also a blockchain is a blockchain is a blockchain in the seal output is also a block
200
 the only horse in the race currently?
 <sarang> you can see their repo, but all the commits and prsput is also alockchain is a blockchain is a blockchain  <sarang> i don't see the seal output is a bloc
500
ct
 <inge-> no i mean in general, things that could be a future candidate for monero
 <sarang> oh oke of the seal output is a blockchain is a blockchain in the se