Kmeans-And-Doc2Vec-Based-On-Pixnet-Article-Classification

2.Thd Doc2vec model used in this project is too large to upload,so you need to train by yourself(You can reference : https://github.com/arleigh418/Word-Embedding-With-Gensim/blob/master/doc2vec.py)

3.stop.txt is used to remove unimportant Chinese words.(like '什麼' or '於是')

4.I use a little part of Pixnet data to train Kmeans for test,and use the whole pixnet open data to train Doc2vec.

5.For Unsupervised Learning I think the result is not bad , you can feel a little different in each category.(There are 8 categories).

6.Try it! It very funny! And if you find any problems, please contact me for free ,Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
KMEANS.py		KMEANS.py
README.md		README.md
pix_test.xlsx		pix_test.xlsx
stop.txt		stop.txt

Provide feedback