Skip to content

Commit

Permalink
finish section on Pig
Browse files Browse the repository at this point in the history
  • Loading branch information
jonniesweb committed Dec 14, 2016
1 parent a11c138 commit 9cb346a
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 2 deletions.
6 changes: 4 additions & 2 deletions honours-project.tex
Original file line number Diff line number Diff line change
Expand Up @@ -677,8 +677,9 @@ \subsubsection{Hive}

\subsubsection{Pig}

Pig \cite{pig} is a platform and a dynamically typed high level language for data analysis on top of Hadoop MapReduce. Pig uses the Pig Latin \cite{piglatin} language for programming MapReduce queries and Pig manages the execution of those programs. Both Pig and Pig Latin are used interchangeably. Pig takes language ideas such as SQL's declarative statements and functional programming. As described in \cite{sakr2013hadoop}, writing Pig can be described as being similar to a SQL query execution plan. Experienced programmers are drawn towards using Pig because it is better than writing SQL and persuading the optimizer to write an efficient query plan.


Pig has primitives that perform filtering, grouping, and aggregation. It also allows the programmer to define their own functions that can be included in the control flow. Pig programs are parsed by an interpreter, checked for valid syntax, and then transformed into a logical plan of dependencies which is then optimized. The resulting Directed Acyclic Graph (DAG) is converted to a sequence of MapReduce jobs, which is very similar to Hive. Another optimization round is completed and then the resulting DAG of MapReduce jobs is run in topological order on Hadoop MapReduce \cite{sakr2013hadoop,polato2014hadoop}.


\subsubsection{HBase}
Expand All @@ -687,7 +688,8 @@ \subsubsection{HBase}



\cite{zhang2016survey,polato2014hadoop,sakr2013hadoop}

% \cite{zhang2016survey,polato2014hadoop,sakr2013hadoop}
% zhang2016 - 55,57,72,73

\section{Security} \label{sec:security}
Expand Down
36 changes: 36 additions & 0 deletions research.bib
Original file line number Diff line number Diff line change
Expand Up @@ -917,6 +917,42 @@ @online{yarnarchitecture
url={http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html}
}

@article{pig,
author = {Gates, Alan F. and Natkovich, Olga and Chopra, Shubham and Kamath, Pradeep and Narayanamurthy, Shravan M. and Olston, Christopher and Reed, Benjamin and Srinivasan, Santhosh and Srivastava, Utkarsh},
title = {Building a High-level Dataflow System on Top of Map-Reduce: The Pig Experience},
journal = {Proc. VLDB Endow.},
issue_date = {August 2009},
volume = {2},
number = {2},
month = aug,
year = {2009},
issn = {2150-8097},
pages = {1414--1425},
numpages = {12},
url = {http://dx.doi.org/10.14778/1687553.1687568},
doi = {10.14778/1687553.1687568},
acmid = {1687568},
publisher = {VLDB Endowment},
}

@inproceedings{piglatin,
author = {Olston, Christopher and Reed, Benjamin and Srivastava, Utkarsh and Kumar, Ravi and Tomkins, Andrew},
title = {Pig Latin: A Not-so-foreign Language for Data Processing},
booktitle = {Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data},
series = {SIGMOD '08},
year = {2008},
isbn = {978-1-60558-102-6},
location = {Vancouver, Canada},
pages = {1099--1110},
numpages = {12},
url = {http://doi.acm.org/10.1145/1376616.1376726},
doi = {10.1145/1376616.1376726},
acmid = {1376726},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dataflow language, pig latin},
}


% Security
Expand Down

0 comments on commit 9cb346a

Please sign in to comment.