#UCR 2016FALL CS172 Information Retrieval Project: Tweets Retrieval and Location
Licensing Information: MIT
Project features finess but based on https://github.com/khuan013/CS172-Crawler.git
- Yadan (Ada) Luo
- Zhiba Su
- Kenneth Huang
- Tien Tran
##Overview
###Part 1 Twitter Crawler (Python)
This application uses the Twitter Streaming API to collect geolocated tweets and stores them in text files of 10MB each.
####Instructions on how to deploy the system
In order to run the program, you must have:
-
Python 2.7
-
Tweepy Twitter API library installed.
sudo pip install tweepy
- lxml
Download the repository from https://github.com/khuan013/CS172-TwitterSearch.git
If on Unix/Linux, run the crawler.sh shellscript, and pass the number of tweets you want to search (if the number is 0, the crawler will go on untill it reaches 5 GB in data) and output directory name, which will execute the Python program.
By default, the files are placed in /data and number of tweets are not limited.
Examples:
- ./crawler.sh [num-tweets] [output-dir]
- ./crawler.sh [num-tweets]
- ./crawler.sh
In order to run the program you must have the following installed:
- Eclipse for Java EE
- Apache Tomcat version 7.0
- Lucene version 3.7.2
- Download the repository from https://github.com/khuan013/CS172-TwitterSearch.git
- Put MyLucene.java and MySearch.jsp into your Eclipse project directory.
- If you already have twitter data, run MyLucene.java to create an index. Otherwise run the python program twitterGeo.py, refer to Part A documentation on how to use it.
- Once MyLucene.java finishes it will create a folder called testIndex. Put this folder at your Desktop directory.
- Run MySearch.jsp on the tomcat servers using Eclipse. This should bring up a webpage with a search bar.