**Recuerde no agregar o quitar celdas en este notebook, ni modificar su tipo. Si lo hace, el sistema automaticamente lo calificará con cero punto cero (0.0)**

Ordene el archivo por letra y valor (3ra columna).

In [1]:
%%writefile input.txt
B   1999-08-28   14
E   1999-12-06   12
E   1993-07-21   9
C   1991-02-12   13
E   1995-04-25   2
A   1992-08-22   14
B   1999-06-11   121
E   1993-01-27   9
E   1999-09-10   11
E   1990-05-03   16
E   1994-02-14   10
A   1988-04-27   121
A   1990-10-06   10
E   1985-02-12   16
E   1998-09-14   2
B   1994-08-30   17
A   1997-12-15   13
B   1995-08-23   1
B   1998-11-22   131
B   1997-04-09   14
E   1993-12-27   18
E   1999-01-14   15
A   1992-09-19   8
B   1993-03-02   14
B   1999-10-21   13
A   1990-08-31   12
C   1994-01-25   10
E   1990-02-09   18
A   1990-09-26   8
A   1993-05-08   16
B   1995-09-06   14
E   1991-02-18   141
A   1993-01-11   14
A   1990-07-22   0
C   1994-09-09   15
C   1994-07-27   104
D   1990-10-10   15
A   1990-09-05   11
B   1991-10-01   9
A   1994-10-25   13

Writing input.txt


## Mapper

In [2]:
%%writefile mapper.py
#! /usr/bin/env python

import sys

class Mapper:
    
    NOTHING      = 0
    JUST_COUNTER = 1
    JUST_STATUS  = 2
    ALL          = 3
    
    def __init__(s, str_in, str_out, str_err, log_level=NOTHING):
        s.str_in  = str_in
        s.str_out = str_out
        s.str_err = str_err
        s.log_level = log_level
    
    def emit(s, key, value=1):
        s.str_out.write('{}\t{}\n'.format(key, value))

    def status(s, message):
        s.str_err.write('reporter:status: {}\n'.format(message))
    
    def counter(s, counter, amount=1, group='ApplicationCounter'):
        s.str_err.write('reporter:counter: {},{},{}\n'.format(group, counter, amount))
    
    def map(s):
        
        counter = 0
        
        for line in s:
            
            counter += 1
            if s.log_level in [2, 3]: s.status('Processing line {}'.format(line))
            if s.log_level in [1, 3]: s.counter('NumberLines', counter)
            
            s.emit('{},{:0>4}'.format(line[0],line[17:]),line)

    def __iter__(s):
        for line in s.str_in:
            yield line[:-1]
            

if __name__ == "__main__": 
    
    mapper = Mapper(sys.stdin, sys.stdout, sys.stderr)
    
    mapper.map()

Writing mapper.py


## Reducer

In [3]:
%%writefile reducer.py
#!/usr/bin/env python

import sys
from itertools import groupby

class Reducer:
    
    def __init__(s, str_in, str_out, str_err):
        s.str_in  = str_in
        s.str_out = str_out
        s.str_err = str_err
    
    def emit(s, value):
        s.str_out.write('{}\n'.format(value))
    
    def reduce(s):
        for key, val in s:
            s.emit(val)
    
    def __iter__(s):
        for line in s.str_in:
            key, val = line.split('\t')
            yield(key,val[:-1])

if __name__ == '__main__': 
    reducer = Reducer(sys.stdin,sys.stdout,sys.stderr)
    reducer.reduce()

Writing reducer.py


## Ejecución

In [4]:
%%bash
rm -rf output
STREAM=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar
chmod +x mapper.py
chmod +x reducer.py
hadoop jar $STREAM -input input.txt -output output  -mapper mapper.py -reducer reducer.py
cat output/part-00000

A   1990-07-22   0	
A   1990-09-26   8	
A   1992-09-19   8	
A   1990-10-06   10	
A   1990-09-05   11	
A   1990-08-31   12	
A   1997-12-15   13	
A   1994-10-25   13	
A   1993-01-11   14	
A   1992-08-22   14	
A   1993-05-08   16	
A   1988-04-27   121	
B   1995-08-23   1	
B   1991-10-01   9	
B   1999-10-21   13	
B   1997-04-09   14	
B   1995-09-06   14	
B   1993-03-02   14	
B   1999-08-28   14	
B   1994-08-30   17	
B   1999-06-11   121	
B   1998-11-22   131	
C   1994-01-25   10	
C   1991-02-12   13	
C   1994-09-09   15	
C   1994-07-27   104	
D   1990-10-10   15	
E   1998-09-14   2	
E   1995-04-25   2	
E   1993-07-21   9	
E   1993-01-27   9	
E   1994-02-14   10	
E   1999-09-10   11	
E   1999-12-06   12	
E   1999-01-14   15	
E   1990-05-03   16	
E   1985-02-12   16	
E   1990-02-09   18	
E   1993-12-27   18	
E   1991-02-18   141	


In [5]:
!rm -rf mapper.py reducer.py output input.txt

---

Para realizar la evaluación automática de este libro:

* Abra un Terminal.
* Asegurece que esat en la misma carpeta que contiene este notebook.
* Salve el notebook.
* Ejecute el siguiente comando:

      ./gradetool 08-Taller.ipynb

---