- python 2.7
- tensorflow 1.8.0
- Data Preporcessing --------- Data.py Data_Visualization.ipynb
- TFIDF model ---------------------- TFIDF.py
- LM model ---------------------- LM.py
- CNN model ---------------------- CNN_model.py CNN_train.py
- Inference ------------------ Inference.py
- Address origin data (Chinese stop words removal, QA-pair --> pred_QA-pair.csv)
- Generate CNN data (padding sentences 32 words each sentence and word embedding)
- Data preprocessing and read pred data
- Generate tfidf representation for each sentence
- Caculate cosine similarity for question - question pair
- Get the answer with the most similar question
- Data preprocessing and read pred data
- Caculate LM similarity for question - question pair
- Get the answer with the most similar question
- Data preprocessing and read pred data
- Generate word embedding (initialize with baidu baike vector) representation
- Train the CNN model with question - answer pair
- Feed the question - answer pair and Get the most similar answer
- Input : question, top_k
- Output : TFIDF LM CNN top k response
plus: Data not uploaded beacuse of the privacy