In Linux, the environment variable `LD_LIBRARY_PATH` is a colon-separated set of directories where libraries should be searched for first, before the standard set of directories; this is useful when debugging a new library or using a nonstandard library for special purposes. - [source](https://askubuntu.com/questions/844578/what-does-manpath-ld-library-path-mean-in-linux)

__NOTE__: Run it before launching `jupyter`

In [None]:
!export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64

In [None]:
import tensorflow as tf

tf.test.is_built_with_cuda()
tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)

Clear the GPU if needed

In [None]:
!nvidia-smi

In [15]:
from numba import cuda 

device = cuda.get_current_device()
device.reset()

In [3]:
!nvidia-smi

Wed Feb 17 05:57:35 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P0    26W /  70W |    111MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0  

In [2]:
from deeppavlov.dataset_readers.basic_classification_reader import BasicClassificationDatasetReader
from deeppavlov.dataset_iterators.basic_classification_iterator import BasicClassificationDatasetIterator

from deeppavlov.core.data.simple_vocab import SimpleVocabulary

from deeppavlov.models.preprocessors.bert_preprocessor import BertPreprocessor
from deeppavlov.models.preprocessors.one_hotter import OneHotter
from deeppavlov.models.bert.bert_classifier import BertClassifierModel

from sklearn.preprocessing import MultiLabelBinarizer

import pandas as pd
import numpy as np

import datetime
import time
import re
import ast

from tqdm import tqdm_notebook

import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to
[nltk_data]     /home/a-dbaiturs@PMICLOUD.GLOBAL/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/a-dbaiturs@PMICLOUD.GLOBAL/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package perluniprops to
[nltk_data]     /home/a-dbaiturs@PMICLOUD.GLOBAL/nltk_data...
[nltk_data]   Package perluniprops is already up-to-date!
[nltk_data] Downloading package nonbreaking_prefixes to
[nltk_data]     /home/a-dbaiturs@PMICLOUD.GLOBAL/nltk_data...
[nltk_data]   Package nonbreaking_prefixes is already up-to-date!







In [4]:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager

An error occured.
ValueError: "@jupyter-widgets/jupyterlab-manager" is not a valid extension:
schemaDir is empty: "./schema"
See the log file for details:  /tmp/jupyterlab-debug-p7de606v.log


In [3]:
example_iter = [1,2,3,4,5]
for rec in tqdm_notebook(example_iter):
    time.sleep(.1)

HBox(children=(IntProgress(value=0, max=5), HTML(value='')))




In [5]:
PATH = '.'
PATH_RUBERT = PATH + r'/ru_bert_tf'
PATH_MODEL = PATH + r'/modelResults'
PATH_SOURCE = PATH + r'/prodData/source'
PATH_RESULT = PATH + r'/prodData/results'
PATH_TOPICS = PATH + r'/topics/TopicsMap.xlsx'

# Load model

In [6]:
!ls $PATH_MODEL

checkpoint			  modelNew_v2.data-00000-of-00001  vocab_140221
model_140221.data-00000-of-00001  modelNew_v2.index		   vocab_v2
model_140221.index		  modelNew_v2.meta
model_140221.meta		  ts.npy


In [7]:
modelName, vocabName = (r'/model_140221', r'/vocab_140221')

vocab = SimpleVocabulary(save_path = PATH_MODEL + vocabName)
vocab.load()

one_hotter = OneHotter(
    depth=vocab.len, 
    single_vector=True)

bert_preprocessor = BertPreprocessor(
    vocab_file=PATH_RUBERT + r'/vocab.txt',
    do_lower_case=False,
    max_seq_length=64
)

bert_classifier = BertClassifierModel(
    n_classes = vocab.len,
    return_probas = True,
    one_hot_labels = True,
    bert_config_file = PATH_RUBERT + r'/bert_config.json',
    pretrained_bert = r'bert_model.ckpt',
    save_path = PATH_MODEL + modelName,
    load_path = PATH_MODEL + modelName,
    keep_prob = 0.5,
    learning_rate = 1e-05,
    min_learning_rate = 1e-07,
    learning_rate_drop_patience = 5,
    learning_rate_drop_div = 2.0,
    multilabel = True
)

bert_classifier.load()

2021-02-17 05:57:43.684 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /home/a-dbaiturs@PMICLOUD.GLOBAL/Projects/Project1/bertNewModel/modelResults/vocab_140221]
2021-02-17 05:57:43.686 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /home/a-dbaiturs@PMICLOUD.GLOBAL/Projects/Project1/bertNewModel/modelResults/vocab_140221]









Using TensorFlow backend.


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Instructions for updating:
Use standard file APIs to check for files with this prefix.


2021-02-17 05:58:04.921 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /home/a-dbaiturs@PMICLOUD.GLOBAL/Projects/Project1/bertNewModel/modelResults/model_140221]



INFO:tensorflow:Restoring parameters from /home/a-dbaiturs@PMICLOUD.GLOBAL/Projects/Project1/bertNewModel/modelResults/model_140221


2021-02-17 05:58:06.224 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /home/a-dbaiturs@PMICLOUD.GLOBAL/Projects/Project1/bertNewModel/modelResults/model_140221]


INFO:tensorflow:Restoring parameters from /home/a-dbaiturs@PMICLOUD.GLOBAL/Projects/Project1/bertNewModel/modelResults/model_140221


Load thresholds

In [48]:
ts = np.load(PATH_MODEL + r'/ts.npy')

Load data

In [49]:
!ls prodData/source

exportNew.csv


In [50]:
df = pd.read_csv(PATH_SOURCE + r'/exportNew.csv', encoding='windows-1251')
print(df.shape)
df.sample(5)

(15969, 3)


Unnamed: 0,ID,Comment,Comment_EN
0,R_01fCDPVcIbqFZex,Заказ доставили в день заказа. Отправили через...,The order was delivered on the day of the orde...
1,R_BDpWlbMsBRtMZiN,Ожидала более подробной информации,Expected more details
2,R_01avpoevZfsRilr,Словно куришь пластмассы,It's like smoking plastics
3,R_02Q7EBh4KUUttbX,?,??
4,R_03s0dZNNfwZfMJP,Все хорошо,It's all good.


In [103]:
def get_preds(comments, verbose = False):
    y_preds = []
    
    if verbose:
        print(f'Predicting comments')
        for i in tqdm_notebook(range(0, len(df.Comment), 256)):
            x = comments.iloc[i:i + 256]

            y_pred = bert_classifier(bert_preprocessor(x))
            y_preds.append(y_pred)
    else:
        for i in range(0, len(df.Comment), 256):
            x = comments.iloc[i:i + 256]

            y_pred = bert_classifier(bert_preprocessor(x))
            y_preds.append(y_pred)
    y_preds = np.concatenate(y_preds, axis=0)
    
    return y_preds

def clean_preds(preds, ts):
    ind_unc = list(vocab.keys()).index('Uncategorized')
    
    preds_bin = (preds > ts).astype(int)
    preds_bin[(preds_bin[:, ind_unc] == 1) & 
                (preds_bin.sum(axis=1) > 1), ind_unc] = 0
    preds_bin[preds_bin.sum(axis=1) == 0, 0] = 1
    
    return preds_bin

def merge_preds(df, preds):
    categories = list(vocab.keys())
    res = pd.DataFrame(data = preds, columns = categories)
    
    res = pd.concat([df, res], axis = 1, ignore_index = False)
    res = res.melt(id_vars = ['ID', 'Comment', 'Comment_EN'], 
                   var_name = 'l3', value_name = 'Value')
    res = res[res['Value'] == 1][['ID', 'Comment', 'Comment_EN', 'l3']]
    res.reset_index(drop = True, inplace = True)
    return res

def print_samples(df, sampleSize = 5):
    for c in vocab.keys():
        print(c)
        if df[df['l3'] == c].shape[0] > sampleSize:
            for i, comment in enumerate(df[df['l3'] == c].Comment.sample(sampleSize).values):
                print(f'{i+1}) {comment}')
        print('\n\n')

preds = get_preds(df.Comment, verbose = True)
preds = clean_preds(preds, ts)
res = merge_preds(df, preds)
print_samples(res)

Predicting comments


HBox(children=(FloatProgress(value=0.0, max=63.0), HTML(value='')))


Uncategorized
1) Нормально 
2) Курить бросил 
3) Отличный 
4) Все понравилось!
5) Потому что iqos 3 duo классный



Staff reliability
1) Оперативность и вежливость
2) Как всегда сервис выше всяких похвал!
3) Спасибо большое за доставку!!! Оценка моя высокая потому что в связи с короновирусом доставку сделали быстро!!! Ещё раз спасибо!!!
4) Я в восторге! :),Качественный сервис, доброжелательный персонал
5) Долго отвечают служба потдержки



Alternative to cigarette
1) Нет запаха сигарет, менее вреден для здоровья, приятно курить. 
2) Отличная альтернатива сигаретам 
3) Нравится то, что они более легкие по сравнению с оьчными сигаретами. Запах сигарет менее выражен. 
,Представители службы iqos всегда информированы, вежливы
4) После первого дня курения заболело горло , покраснело , высыпали покраснения на лице 
5) Кіріспе вредно для здоровья



Smoke & smell
1) Нет резкого табачного запаха, удобен в использовании
2) В любом случае курить вредно,поэтому эл.сигареты также вредны,не могу по

In [104]:
res.ID.nunique()

15969

In [106]:
topics = pd.read_excel(PATH_TOPICS, engine='openpyxl')
topics.sample(5)

Unnamed: 0,l1,l2,l3
6,L1_1: Brand & Marketing,L2_1: Channel specifics,L3_1: Store cleanliness/ tidiness
12,L1_1: Brand & Marketing,L2_1: Channel specifics,L3_1: Website navigation
44,L1_4: Sales & Customer Service,L2_4: Service quality,L3_4: Staff reliability
4,L1_1: Brand & Marketing,L2_1: Brand positioning,L3_1: Social inclusion
20,L1_2: Operations & Processes,"L2_2: Order, delivery & returns",L3_2: Delivery speed


In [107]:
def merge_topics(res, topics):
    topics = topics.applymap(lambda x: x[6:])
    res = pd.merge(res, topics, how='left', on='l3')
    return res[['ID', 'Comment', 'Comment_EN', 'l1', 'l2', 'l3']]
res = merge_topics(res, topics)
res.sample(10)

Unnamed: 0,ID,Comment,Comment_EN,l1,l2,l3
17399,R_tVAvxyFpNHwB0Kl,Привезли данное устройство за пол дня после оф...,This device was brought half a day after the o...,Operations & Processes,"Order, delivery & returns",Order & delivery details accuracy
3762,R_1N9smwJNVHHiOcV,Мне он нравиться и меня в нем все устраивает,I like him and I'm happy with him.,Uncategorized,Uncategorized,Uncategorized
8698,R_A4n5ZUfqLJFeAQ9,"все отлично, понравилось,оперативность","Everything is fine, liked, speed",Sales & Customer Service,Service quality,Staff reliability
3947,R_1rHSQWkaysvVaym,Очень нравится,I like it very much,Uncategorized,Uncategorized,Uncategorized
13847,R_3DokBLfN5b2ipKy,Нет запаха как от простой сигареты,No smell like from a simple cigarette,Product,Sticks,Smoke & smell
5734,R_SPNgeQlfVDx696p,"Быстро, доступно","Fast, available",Uncategorized,Uncategorized,Uncategorized
11819,R_3Pw2qwMDsi8JzSh,Потому что это лучше чем курить сигареты,Because it's better than smoking cigarettes,Brand & Marketing,Brand positioning,Alternative to cigarette
17473,R_30dw3eA0sCg40Cp,"Заказ был оформлен, обработан и отправлен быст...","The order was issued, processed and sent quick...",Operations & Processes,"Order, delivery & returns",Order & delivery details accuracy
18185,R_2OUaEhZAuDximDM,"На сегодняшний день, я очень довольна продукто...","To date, I am very happy with the product and ...",Brand & Marketing,Brand positioning,Social inclusion
19563,R_32YiyzQCAKgBxYf,"Честно, всё очень Сложно. С пятого раза заказа...","Honestly, it's very difficult. Since the fifth...",Operations & Processes,"Order, delivery & returns",Checkout and payment process


In [108]:
res.ID.nunique()

15969

In [115]:
def save_results(res):
    res = res[['ID', 'l1', 'l2', 'l3']]
    res.to_excel(PATH_RESULT + r'/results.xlsx', encoding='windows-1251', index=False)
save_results(res)