Information_retrieva_Projectl-

新闻检索：定向采集3-4个网页，实现网页信息的抽取、检索和索引。网页个数不少于10个，能按时间、相关度、热度等属性进行排序，并实现相似主题的自动聚类。要求有：有相关搜索推荐、snippet生成、结果预览(鼠标移到相关结果，能预览)功能 #依赖项 scrapy 安装方法：pip install Scrapy webpy 安装方法：sudo easy_install web.py 官方网站：http://webpy.org/ jieba 安装方法：pip install jieba 官方网站：https://pypi.python.org/pypi/jieba

数据10万条网易新闻网页、倒排索引等数据 baidu网盘http://pan.baidu.com/s/1gfkDb4B 
    下载后，将data文件夹放在Information_retrieva_Projectl-目录下即可

#使用方法：交互式查询：linux下cd 至web/ 文件夹下终端下键入python main.py 浏览器中打开：http://0.0.0.0:8080/ #参考文献： 1.scrapy手册 http://scrapy-chs.readthedocs.org/zh_CN/1.0/intro/tutorial.html 2.webpy 手册 http://webpy.org/ #运行效果

！！！更多技术细节、学习资料请查看report文件。

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.idea		.idea
crawl		crawl
data		data
screenshot		screenshot
web		web
word2Vec		word2Vec
.gitignore		.gitignore
Dictionary.py		Dictionary.py
LICENSE		LICENSE
News_Recommend.py		News_Recommend.py
README.md		README.md
config.py		config.py
inverted_files.py		inverted_files.py
main.py		main.py
merge_inverted_files.py		merge_inverted_files.py
report.pdf		report.pdf
scrapy.cfg		scrapy.cfg
similar_doc.py		similar_doc.py

License

Google1234/Information_retrieva_Projectl-

Folders and files

Latest commit

History

Repository files navigation

Information_retrieva_Projectl-

About

Resources

License

Stars

Watchers

Forks

Languages