GitHub - Luoyadan/UCR-CS172-Project: Collect stream data from Tweets and design a

#UCR 2016FALL CS172 Information Retrieval Project: Tweets Retrieval and Location

Licensing Information: MIT
Project features finess but based on https://github.com/khuan013/CS172-Crawler.git

Author & Contributor List

Yadan (Ada) Luo
Zhiba Su
Kenneth Huang
Tien Tran

##Overview

###Part 1 Twitter Crawler (Python)

This application uses the Twitter Streaming API to collect geolocated tweets and stores them in text files of 10MB each.

####Instructions on how to deploy the system

In order to run the program, you must have:

Python 2.7
Tweepy Twitter API library installed.

  sudo pip install tweepy

lxml

Download the repository from https://github.com/khuan013/CS172-TwitterSearch.git

If on Unix/Linux, run the crawler.sh shellscript, and pass the number of tweets you want to search (if the number is 0, the crawler will go on untill it reaches 5 GB in data) and output directory name, which will execute the Python program.

By default, the files are placed in /data and number of tweets are not limited.

Examples:

./crawler.sh [num-tweets] [output-dir]
./crawler.sh [num-tweets]
./crawler.sh

Part 2 Indexing/Webpage (Java, JSP)

Instructions on how to deploy the system

In order to run the program you must have the following installed:

Eclipse for Java EE
Apache Tomcat version 7.0
Lucene version 3.7.2

Download the repository from https://github.com/khuan013/CS172-TwitterSearch.git
Put MyLucene.java and MySearch.jsp into your Eclipse project directory.
If you already have twitter data, run MyLucene.java to create an index. Otherwise run the python program twitterGeo.py, refer to Part A documentation on how to use it.
Once MyLucene.java finishes it will create a folder called testIndex. Put this folder at your Desktop directory.
Run MySearch.jsp on the tomcat servers using Eclipse. This should bring up a webpage with a search bar.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
WebContent		WebContent
src/cs172		src/cs172
LICENSE		LICENSE
README.md		README.md
crawler.sh		crawler.sh
moveDataToHDFS.sh		moveDataToHDFS.sh
twitter.py		twitter.py
twitterGeo.py		twitterGeo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebContent

WebContent

src/cs172

src/cs172

LICENSE

LICENSE

README.md

README.md

crawler.sh

crawler.sh

moveDataToHDFS.sh

moveDataToHDFS.sh

twitter.py

twitter.py

twitterGeo.py

twitterGeo.py

Repository files navigation

Author & Contributor List

Part 2 Indexing/Webpage (Java, JSP)

Instructions on how to deploy the system

About

Releases

Packages

Contributors 3

Languages

License

Luoyadan/UCR-CS172-Project

Folders and files

Latest commit

History

Repository files navigation

Author & Contributor List

Part 2 Indexing/Webpage (Java, JSP)

Instructions on how to deploy the system

About

Resources

License

Stars

Watchers

Forks

Languages