>+  19-5-8：使用LSTM网络，不仅仅只使用前馈，使用双向神经网络，因为keras提供了一个非常好的wrapper bidirectional，使得编码非常高效。其实前面的数据处理，数据集构建部分都是一模一样的，我就直接复制了
+ 19-5-10：在弄注意力机制的时候，终于注意到了大部分人用的都是theano作为后端，因为np.dot tf.dot theano.dot，tf的dot操作和np的不一样，导致无法使用广播机制还是什么的

参考博客：

https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-RNN/

# 数据准备

In [1]:
import pandas as pd
import numpy as np

data_train=pd.read_csv('.../labeledTrainData.tsv',sep='\t')

import re
def clean_str(string):
    string=re.sub(r"\\","",string)
    string=re.sub(r"\'","",string)
    string=re.sub(r'\"','',string)
    return string.strip().lower()

from bs4 import BeautifulSoup
texts=[]
for  i in range(data_train.review.shape[0]):
    text=BeautifulSoup(data_train.review[i],'lxml')
    texts.append(clean_str(text.get_text()))

MAX_SENTENCE_LENGTH=1000  #规定每个review的最大长度
MAX_NB_WORDS=20000 #整个语料库取词频最高的前2w词
EMBEDDING_DIM=300  #用的是glove的100维的词向量原博  我之前是下载的300d的，用自己之前的
VALIDATION_SPLIT=0.2 #验证集比例是0.2

import keras
tokenizer=keras.preprocessing.text.Tokenizer(num_words=MAX_NB_WORDS) #num_words基于词频选择前num_words个词语，和原博客有差异，版本问题
tokenizer.fit_on_texts(texts)  #构建了一个分词器，用texts进行训练（单层列表，不能使用reviews）
sequences=tokenizer.texts_to_sequences(texts) 

data=keras.preprocessing.sequence.pad_sequences(sequences,maxlen=MAX_SENTENCE_LENGTH)
labels=data_train.sentiment.values

indices=np.arange(data.shape[0])
np.random.shuffle(indices)
data=data[indices]
labels=labels[indices]
nb_validation_samples=int(VALIDATION_SPLIT*data.shape[0])

x_train=data[:-nb_validation_samples]
y_train=labels[:-nb_validation_samples]
x_val=data[-nb_validation_samples:]
y_val=labels[-nb_validation_samples:]

GLOVE_DIR='../../TheSecond-Paper/word_embedding/en_model.txt'
embedding_index={}
f=open(GLOVE_DIR)
for  line in f:
    values=line.split()
    word=values[0]
    coefs=np.asarray(values[1:],dtype='float64')
    embedding_index[word]=coefs
f.close()

embedding_matrix=np.random.random((len(tokenizer.word_index)+1,EMBEDDING_DIM)) 
for word,i in tokenizer.word_index.items():   #词 词id
    embedding_vector=embedding_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i]=embedding_vector  #词id是从0开始的 词向量

Using TensorFlow backend.


# 基本模型

In [2]:
sequence_input=keras.layers.Input(shape=(MAX_SENTENCE_LENGTH,),dtype='int32')

embedded_sequences=keras.layers.Embedding(len(tokenizer.word_index)+1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SENTENCE_LENGTH,
                            trainable=False)(sequence_input)
l_lstm=keras.layers.Bidirectional(keras.layers.LSTM(100))(embedded_sequences)
preds=keras.layers.Dense(1,activation='sigmoid')(l_lstm)
model=keras.models.Model(sequence_input,preds)

In [3]:
model.compile(loss='binary_crossentropy',
             optimizer='adam',
             metrics=['acc'])

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 1000)              0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 1000, 300)         24450600  
_________________________________________________________________
bidirectional_1 (Bidirection (None, 200)               320800    
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 201       
Total params: 24,771,601
Trainable params: 321,001
Non-trainable params: 24,450,600
_________________________________________________________________


model.fit(x_train,y_train,validation_data=(x_val,y_val),epochs=10,batch_size=128)

用LSTM之后特别慢，慢的简直。。。。。慢到我选择放弃，哈哈哈

# 注意力机制

## 后端采用的问题

In [5]:
import keras
print(keras.__version__)
import tensorflow as tf 
print(tf.__version__)
import theano
print(theano.__version__)

2.2.4
1.12.0




1.0.4


In [6]:
import numpy as np
print(np.__version__)

1.16.2


theano 1.0.3 与numpy1.16.0版本不兼容，除非升级到theano1.0.4...尴尬，不然报错<br>
```python
'The following error happened while compiling the node',"module 'numpy.core.multiarray' has no attribute '_get_ndarray_c_version'"
```
Note: Theano 1.0.4 supports NumPy 1.16.0

## 动态切换后端

+ theano安装：http://deeplearning.net/software/theano/install_windows.html
    + 只支持conda安装，Python == 2.7* or ( >= 3.4 and < 3.6 )；NumPy >= 1.9.1 <= 1.12；SciPy >= 0.14 < 0.17.1；    
    ``` conda install numpy scipy mkl-service libpython <m2w64-toolchain> <nose> <sphinx> <pydot-ng> <git> ```<br>
    其中<>代表是可选的包，所以实际只要执行前面的就可以<br>
    考虑到分开安装theano和libgpuarray的复杂性，(需要编译，即便是windows环境下，比较麻烦)决定还是直接安装下面的，自己默认去把libgpuarray当依赖安装吧<br>
    ```conda install theano pygpu```  
    安装了1.0.3版本，和numpy1.16不兼容，更新至1.0.4   
    ```pip install --upgrade theano```
    
https://stackoverflow.com/questions/42177658/how-to-switch-backend-with-keras-from-tensorflow-to-theano

In [7]:
from keras import backend as K
import os
import importlib  #python3 reload模块 reload不再是内建函数 需要显式调用

def set_keras_backend(backend):
    if K.backend() != backend:
        os.environ['KERAS_BACKEND'] = backend
        importlib.reload(K)
        assert K.backend() == backend

set_keras_backend("theano")  #如果后端不是theano 就换成theano
import theano
# theano.config.floatX= 'float32'

Using Theano backend.


In [8]:
print(theano.config.floatX)  #然而，很明显，这里已经是32位了

float32


In [9]:
os.path.expanduser('~/.theanorc.txt') #然而 666的是，找不到。。。卧槽

'C:\\Users\\shanshan/.theanorc.txt'

## Attention代码

        The following code can only strictly run on Theano backend 
        since tensorflow matrix dot product doesn’t behave the same as np.dot.
        I don’t know how to get a 2D tensor 
        by dot product of 3D tensor of recurrent layer output and 1D tensor of weight.
        以下代码只能运行在Theano后端下，因为TensorFlow的矩阵dot操作与np.dot不一样，我不知道如何使得一个3d的RNN网络输出张量和一个1d的权重张量相乘得到一个2d的张量

In [10]:
from keras import initializers
from keras.layers import Layer

In [11]:
class AttLayer(Layer):
    def __init__(self,**kwargs):
        '''
        查看源码，https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/keras/initializers.py
        125行 可知 normal这些都是别名  Compatibility aliases 兼容的别名，版本问题真是，呵呵哒
        normal = random_normal = RandomNormal
        原博这里用的是 initializations.get('normal')
        我当前版本应该是 initializers.get('RandomNormal')
        '''
        self.init=initializers.get('RandomNormal') 
        super(AttLayer,self).__init__(**kwargs)
        
    def build(self,input_shape):
        assert len(input_shape)==3
        self.W=self.init((input_shape[-1],))  #对比之前看到的那个注意力机制的实现，其实这里写的不是很规范
        self.trainable_weights=[self.W]
        super(AttLayer,self).build(input_shape)
        
    def call(self,x,mask=None):
        '''
        https://blog.csdn.net/niuwei22007/article/details/48949869
        theano中dimshuffle()函数讲解
        '''
        eij=K.tanh(K.dot(x,self.W))
        ai=K.exp(eij)
        weights=ai/K.sum(ai,axis=1).dimshuffle(0,'x')
        
        weighted_input=x*weights.dimshuffle(0,1,'x')
        return weighted_input.sum(axis=1)
    def compute_output_shape(self, input_shape):
        return (input_shape[0],input_shape[-1])
        

## Bi_GRU with Attention

In [12]:
sequence_input=keras.layers.Input(shape=(MAX_SENTENCE_LENGTH,),dtype='int32')

embedded_sequences=keras.layers.Embedding(len(tokenizer.word_index)+1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SENTENCE_LENGTH,
                            trainable=False)(sequence_input)
l_gru=keras.layers.Bidirectional(keras.layers.GRU(100,return_sequences=True))(embedded_sequences)
l_att=AttLayer()(l_gru)
preds=keras.layers.Dense(1,activation='sigmoid')(l_att)
model=keras.models.Model(sequence_input,preds)

In [13]:
model.compile(loss='binary_crossentropy',
             optimizer='adam',
             metrics=['acc'])

In [14]:
model.fit(x_train[:30],y_train[:30],validation_data=(x_val,y_val),epochs=10,batch_size=128)

TypeError: ('An update must have the same type as the original shared variable (shared_var=training/Adam/variable, shared_var.type=TensorType(float32, scalar), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float32, vector)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

## 使用TensorFlow做后端的Attention实现

>+ https://github.com/thushv89/attention_keras  这个是基于TensorFlow为后端的keras写的
+ 终于注意到这个后端的问题，搜索关键字 keras attention tensorflow backend 找到了另一个attention实现，https://gist.github.com/cbaziotis/6428df359af27d58078ca5ed9792bd6d <br>
其中下面有人回答说：
This works for me on TF 1.0.1 and Keras 2.0.6, thank you. Did someone test this and tried using/dropping the bias? For me, it doesn't change the results at all. If I initialize the bias with e.g. glorot uniform, the result changes. It seems the bias is not trained and stays all 0's. Any ideas why this might be happening?
+ https://github.com/datalogue/keras-attention
+ https://gist.github.com/wassname/5292f95000e409e239b9dc973295327a
+ https://www.kaggle.com/sermakarevich/hierarchical-attention-network/notebook

## np.dot theano.dot和tf.matmul(),tf.multiply(),np.dot(),npp.multiply()比较

参考：<br> 
+ np和tf对比的：https://www.jianshu.com/p/2a83eac1e35e
+ theano的：https://blog.csdn.net/guotong1988/article/details/76919838

In [None]:
import tensorflow as tf
import numpy as np
x=tf.constant([[1,2,3],[1,2,3],[1,2,3]])  
y=tf.constant([[2,1,1],[2,1,1],[2,1,1]])
x1=([[1,2,3],[1,2,3],[1,2,3]])
y1=([[2,1,1],[2,1,1],[2,1,1]])
z=tf.multiply(x,y)
z1=tf.matmul(x,y)
z2 = np.dot(x1,y1)
print('dot\n',z2)
z3 = np.multiply(x1,y1)
print('np.multiply\n',z3)
with tf.Session() as sess:
    print('tf.multiply\n',sess.run(z))
    print('tf.matmul\n',sess.run(z1))
    print(sess.run(np.dot(x,y)))    

## Python异常处理

python提供了两个非常重要的功能来处理python程序在运行中出现的异常和错误。你可以使用该功能来调试python程序。

异常处理和断言(Assertions)

https://www.runoob.com/python/python-exceptions.html

### 断言(Assertions)

https://blog.csdn.net/shijichao2/article/details/61421735

https://blog.csdn.net/qq_24753293/article/details/78066426

断言一般用在检查参数合法性上：<br>
通常情况传递参数不会有误，但编写大量的参数检查影响编程效率，而且不需要检查参数的合法性。

    函数原型：assert expression
    作为一条特殊的编程语句，检查表达式的正确性，可以理解为“这里一定是成立的”，如果表达式不成立（False），则抛出AssertionError异常, 并且错误可以自己填写。

# 过程记录

在配置过theano之后依然报错：

'An update must have the same type as the original shared variable (shared_var=training/Adam/variable, shared_var.type=TensorType(float32, scalar), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float32, vector)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.'

所以决定，在台式机上安装py27版本，安装和这个程序一样的环境去跑一遍。

终于想起来本机这个默认的python环境是py3.6 而不是py3.5 而 http://deeplearning.net/software/theano/install_windows.html这个网页里要求明确是

Python == 2.7* or ( >= 3.4 and < 3.6 )<br>
The conda distribution is highly recommended. Python 2.4 was supported up to and including the release 0.6. Python 2.6 was supported up to and including the release 0.8.2. Python 3.3 was supported up to and including release 0.9.

但是py27已经停止支持了，那我笔记本用py35 台式机试试py27好了。都要记得给jupyter添加内核.

看到https://stackoverflow.com/questions/47822119/how-can-i-use-blas-functionality-with-pythons-theano-library <br>
有人回答:虽然官方不支持3.6 但是在用的过程中没太大问题。

在台式机py2.7 import theano之后，<br>
**WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.**

参考：  https://blog.csdn.net/m0_38058163/article/details/80657447

conda install mkl-service 然后改配置文件即可，改成对应的 ldflags=-lmkl_rt就不会再报错了