Demo programs for Hadoop etc.
Java Python PigLatin
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
CreateWeblogs.java
CreateWeblogsMapper.java
GeoWeb.java
GeoWebApacheLogGenerator_readme.md
GeoWebMapper.java
GeoWeb_readme.md
LICENSE
README.md
SumReducer.java
access_logs.tar.gz
all_classbs.txt
geoweb.pig
geoweb.py
geoweb.q
referrers.txt
requests.txt
user_agents.txt

README.md

BigDataDemos

Demo programs for Hadoop etc.

Dave Jaffe
davejaffe7@gmail.com
@davejaffe7

GeoWeb Apache Log Generator and Analysis Tools

Demo programs to generate Apache web logs and analyze them with MapReduce, Hive and Pig

See whitepaper, Three Approaches to Data Analysis with Hadoop, at http://en.community.dell.com/techcenter/extras/m/white_papers/20437941/download.aspx

GeoWeb Apache Log Generator

  • MapReduce program to generate large volumes of realistic Apache web logs
  • Files: CreateWeblogs.java, CreateWeblogsMapper.java, all_classbs.txt, referrers.txt, requests.txt, user_agents.txt
  • See GeoWebApacheWebLogGenerator_readme.md

GeoWeb MapReduce Program

  • MapReduce program to analyze Apache web logs, producing counts per country per hour
  • Files: GeoWeb.java, GeoWebMapper.java, SumReducer.java, all_classbs.txt
  • See GeoWeb_readme.md

GeoWeb Hive Script

  • Hive script to analyze Apache web logs, producing counts per country per hour
  • Files: geoweb.q, all_classbs.txt
  • See GeoWeb_readme.md

GeoWeb Pig Script

  • Pig script to analyze Apache web logs, producing counts per country per hour
  • Files: geoweb.pig, all_classbs.txt
  • See GeoWeb_readme.md

GeoWeb Spark Program

  • Spark program to analyze Apache web logs, producing counts per country per hour
  • Files: geoweb.py, all_classbs.txt
  • See GeoWeb_readme.md