Skip to content

豆瓣电影爬虫——a crawler which is able to crawl movie detail and short comments, save them to database mysql, also include Sentiment analysis based on comments

License

DMinerJackie/JewelCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JewelCrawler

a crawler which is able to crawl movie detail and short comments, save them to database mysql, also include Sentiment analysis based on comments

Modules

  • com.ansj.vec是Word2Vec算法的Java版本实现
  • com.jackie.crawler.doubanmovie是爬虫实现模块
  • constants包是存放常量类
  • crawl包存放爬虫入口程序
  • entity包映射数据库表的实体类
  • test包存放测试类
  • utils包存放工具类
  • resource模块存放的是配置文件和资源文件
  • beans.xml:Spring上下文的配置文件
  • seed.properties:种子文件
  • stopwords.dic:停用词库
  • comment12031715.txt:爬取的短评数据
  • tokenizerResult.txt:使用IKAnalyzer分词后的结果文件
  • vector.mod:基于Word2Vec算法训练的模型数据

More details please refer to

About

豆瓣电影爬虫——a crawler which is able to crawl movie detail and short comments, save them to database mysql, also include Sentiment analysis based on comments

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages