Skip to content

Commit

Permalink
Updates to .pig file and readme's
Browse files Browse the repository at this point in the history
  • Loading branch information
mwinkle committed Jun 21, 2012
1 parent 7c40e77 commit 166a482
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 6 deletions.
11 changes: 6 additions & 5 deletions Pig/computeAirportDelays.pig
@@ -1,7 +1,7 @@
-- here is a basic pig script to read the data
-- out of the airport file and compute average delay and then order them

flights = LOAD 'fixed_flights' USING PigStorage(',') AS (arrDelayMinutes, carrier, dayOfMonth, depDelayMinutes, dest, flightDate, month, origin, rowId, year);
flights = LOAD 'fixed_flights' USING PigStorage(',') AS (arrDelayMinutes:int, carrier, dayOfMonth, depDelayMinutes, dest, flightDate, month, origin, rowId, year);

interestingData = FOREACH flights GENERATE dest, arrDelayMinutes;

Expand All @@ -11,12 +11,13 @@ destinationGroup = GROUP longDelays BY (dest);

averages = FOREACH destinationGroup GENERATE group, COUNT(longDelays) as numberOfFlights, AVG(longDelays.arrDelayMinutes) as delay;

busyAirports = FILTER averages BY numberOfFlights > 1000;
busyAirports = FILTER averages BY numberOfFlights > 5000;

orderedDelays = ORDER busyAirports BY delay DESC;
--orderedDelays = ORDER busyAirports BY delay DESC;

--top10 = LIMIT orderedDelays 10;
--STORE orderedDelays INTO 'pigAverageDelays' USING PigStorage();

STORE orderedDelays INTO 'top10PigLongDelays' USING PigStorage();

STORE busyAirports INTO 'pigBusyLongDelays' USING PigStorage();


12 changes: 11 additions & 1 deletion README.md
@@ -1,4 +1,14 @@
TEE2012_HadoopDemos
===================

TEE2012_HadoopDemos
This is a set of demos used at TechEd Europe 2012

Data Set assumptions
===================

The bulk of these demos operate on a set of flight delay information, originally obtained from the Azure DataMarket (available here: https://datamarket.azure.com/dataset/e29b7fb9-3d2e-4f35-8088-c97dbd75cd1f)

We expect the following comma separated schema for these demo jobs:

ArrDelayMinutes Carrier DayofMonth DepDelayMinutes Dest FlightDate Month Origin RowId Year

23 changes: 23 additions & 0 deletions basicStreaming/readme.md
@@ -0,0 +1,23 @@
Basic Streaming
======================

This is a sample of basic map reduce based streaming using the streaming jar.

Streaming i

Build Instructions
======================
Build projects in VS

or

msbuild basicNetStreaming.sln

should do the trick

Execution Instructions
======================
From a Hadoop command prompt (note, this assumes that %HADOOP_HOME% is defined, the streaming jar has been built, and a full path to the executables:

hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -mapper f:\dev\src\csharp\hadoopdebugger\GenerateAirlineKeyMapper\bin\Debug\GenerateAirlineKeyMapper.exe -reducer f:\dev\src\csharp\hadoopdebugger\AirlineFlightCountReducer\bin\Debug\AirlineFlightCountReducer.exe -input fixed_flights -output test_streaming

0 comments on commit 166a482

Please sign in to comment.