Skip to content

arleigh418/Base-On-K_means-and-Doc2Vec-Pixnet-Article-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Kmeans-And-Doc2Vec-Based-On-Pixnet-Article-Classification

1.You first need to train a Doc2vec model base on Pixnet open data. --> https://github.com/pixnet/2017-pixnet-hackathon-TaskOrientedBot/blob/master/opendata.md

2.Thd Doc2vec model used in this project is too large to upload,so you need to train by yourself(You can reference : https://github.com/arleigh418/Word-Embedding-With-Gensim/blob/master/doc2vec.py)

3.stop.txt is used to remove unimportant Chinese words.(like '什麼' or '於是')

4.I use a little part of Pixnet data to train Kmeans for test,and use the whole pixnet open data to train Doc2vec.

5.For Unsupervised Learning I think the result is not bad , you can feel a little different in each category.(There are 8 categories).

6.Try it! It very funny! And if you find any problems, please contact me for free ,Thanks!

About

K-means+Doc2vec 用於痞客幫開放資料文章分類

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages