<b>This notebook is just a tutorial for you to get familiar with skip-gram and MapReduce.  
<font color="red">You don't need to hand in this notebook</font>, so feel free to jump to [Requirement section](#Assignment-Requirement) and directly work on your `mapper.py` and `reducer.py` if you already have the idea of how to do so.</b>  

# Week 05: Skip-gram and MapReduce

In previous assignments, you have known the concept of ngrams and how to generate them.  
This week, we are introducing another gram type, called *skip-gram*, to you.  
Also, we are going to calculate it on a large dataset, so you'll have to process it with the MapReduce technique.  

So, first thing first: what is skip-gram?  

## Skip-gram

<i>\[S\]kip-grams are a generalization of n-grams in which the components (typically words) need not be consecutive in the text under consideration, but may leave gaps that are skipped over.  - from [Wikipedia](https://en.wikipedia.org/wiki/N-gram#Skip-gram)</i>  

That is, skip-gram is actually the same as ngram, but allowed to skip some words in between.  
In the sentence <i>"Strong winds blew roofs away"</i>, two of its bigrams are <i>"winds blew"</i> and <i>"blew roofs"</i>, while <i>"blew away"</i> is one of the skipgrams with distance 2, since it skipped one word <i>"roofs"</i> .  
As you can see, skipgram is able to capture the phrase seperated by other words.  

Now consider another sentence

> "Skip-gram is used to predict the context word for a given target word".

With a pivot word *predict*, all of its skip-grams within distance 5 are as below:
```
.------------------------------------------------------------------------------.
| distance || -5 |     -4    | -3 |  -2  | -1 |  1  |    2    |  3   |  4  | 5 |
|----------||----|-----------|----|------|----|-----|---------|------|-----|---|
| predict  || -  | Skip-gram | is | used | to | the | context | word | for | a |
'------------------------------------------------------------------------------'
```

<a name="Practice"></a>
### Practice: Distance table of skip-gram

Now, let's practice!  
Given a sentence <i>"Skip-gram is used to predict the context word for a given target word"</i>, <u>**output all of its skip-gram with distance between -3 to 3</u> and show the result in a table**.  

**Example**
```
distance      -3            -2            -1            1             2             3             
--------------------------------------------------------------------------------------------
Skip-gram     -             -             -             is            used          to            
is            -             -             Skip-gram     used          to            predict       
used          -             Skip-gram     is            to            predict       the           
to            Skip-gram     is            used          predict       the           context       
predict       is            used          to            the           context       word          
the           used          to            predict       context       word          for           
context       to            predict       the           word          for           a             
word          predict       the           context       for           a             given         
for           the           context       word          a             given         target        
a             context       word          for           given         target        word          
given         word          for           a             target        word          -             
target        for           a             given         word          -             -             
word          a             given         target        -             -             -             
```

\*Hint: Try to get the skip-grams for a single word first if you have trouble generating them all at once. 
```
(predict, is, -3)
(predict, used, -2)
(predict, to, -1)
(predict, the, 1)
(predict, context, 2)
(predict, word, 3)
```

In [1]:
tokens = "Skip-gram is used to predict the context word for a given target word".split()
token_length = len(tokens)

for idx in range(token_length):
    ...

distance      -3            -2            -1            1             2             3             
--------------------------------------------------------------------------------------------
Skip-gram     -             -             -             is            used          to            
is            -             -             Skip-gram     used          to            predict       
used          -             Skip-gram     is            to            predict       the           
to            Skip-gram     is            used          predict       the           context       
predict       is            used          to            the           context       word          
the           used          to            predict       context       word          for           
context       to            predict       the           word          for           a             
word          predict       the           context       for           a             given         
for           th

## MapReduce

<i>MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. - from [Wikipedia](https://en.wikipedia.org/wiki/MapReduce)</i> 

### Why MapReduce?

Imagine that you are working on a pretty large dataset, say all pages on Wikipedia (whose size has already reached 94GB in 2013).  
Most likely you are not able to process the whole corpus in the memory or on a single computer. Even a simple frequency counter would be challenging under such a huge data size.  
To deal with this, Google proposed a big-data processing model called MapReduce, and it has been implemented and supported by many distributed computing systems, such as Apache Hadoop.  
The core concept of MapReduce is to **split, apply and then combine**, so that each data segment can be handled separately.  

### Mapper-Shuffler-Reducer

![](https://www.todaysoftmag.com/images/articles/tsm33/large/a11.png)
<small><i> - image source: [Today Software Magazine](https://www.todaysoftmag.com/article/1358/hadoop-mapreduce-deep-diving-and-tuning)</i></small> 

As you can see in the picture:  
First, the whole data is split into some smaller partitions, each partition able to be processed by an independant machine.  
In this step, **mappers** will generate one or more key-value pair(s) that can easily be clustered.  
 - example: in a word counter, it would generate the word and the word's current count.  

Then, we will **shuffle** and group all outputs from mappers.  
 - example: sort the output from mappers.  

Lastly, we can combine the grouped values and **reduce** them into final results.  
 - example: calculate total frequency in each group.  

## MapReduce for skip-gram

Now, having the concepts of skip-gram and MapReduce in mind, it's time to put these all together: let's generate skip-gram table with MapReduce technique!  

It may sound scary to some of you, so let's break it down first.  
There are 3 steps to do, and each step is described as below:
1. **Mapper**: Print all skip-gram with its distance infomation, and the current count of it.  
   ```
   a b -3 1
   a c 3  1
   c e -2 1
   a c 1  1
   a b -3 1
   b d 2  1
   ```
2. **Shuffler**: Group all skipgrams by its text. This can be easily achieved with sorting.  
   ```
   a b -3 1
   a b -3 1
   a c 1  1
   a c 3  1
   b d 2  1
   c e -2 1
   ```
3. **Reducer** :  
   Since the results have been sorted in the previous step, we can easily calculate the frequency of each skip-gram with different distance.  
   So we can know that the frequency of skipgram `a b` with distance $-3$ should be $1+1=2$, while other skip-grams' are all $1$.

### Step 1: Mapper

First, in the mapper we want to generate all skip-grams within distance $-5$ to $5$.  
Remember that you've already done something similar in [previous Practice](#Practice)? Just modify it to MapReduce format!  

Output: 
 - `"{pivot}\t{word}\t{distance}\t{count}"`  

Example: 
```
predict is  -3  1
predict used    -2  1
predict the -1  1
predict the 1   1
...
```



In [2]:
import os

In [3]:
with open(os.path.join('data', 'wiki1G.txt')) as f:
    for line in f:
        ...
        break # for the sake of this practice, just test the first page now

anarchism	anarchism	1	1
anarchism	is	2	1
anarchism	a	3	1
anarchism	political	4	1
anarchism	philosophy	5	1
anarchism	anarchism	-1	1
anarchism	is	1	1
anarchism	a	2	1
anarchism	political	3	1
anarchism	philosophy	4	1
anarchism	and	5	1
is	anarchism	-2	1
is	anarchism	-1	1
is	a	1	1
is	political	2	1
is	philosophy	3	1
is	and	4	1
is	movement	5	1
a	anarchism	-3	1
a	anarchism	-2	1
a	is	-1	1
a	political	1	1
a	philosophy	2	1
a	and	3	1
a	movement	4	1
a	that	5	1
political	anarchism	-4	1
political	anarchism	-3	1
political	is	-2	1
political	a	-1	1
political	philosophy	1	1
political	and	2	1
political	movement	3	1
political	that	4	1
political	is	5	1
philosophy	anarchism	-5	1
philosophy	anarchism	-4	1
philosophy	is	-3	1
philosophy	a	-2	1
philosophy	political	-1	1
philosophy	and	1	1
philosophy	movement	2	1
philosophy	that	3	1
philosophy	is	4	1
philosophy	sceptical	5	1
and	anarchism	-5	1
and	is	-4	1
and	a	-3	1
and	political	-2	1
and	philosophy	-1	1
and	movement	1	1
and	that	2	1
and	is	3	1
and	sceptical	4	1
a

anarchism	criticism	-2	1
anarchism	of	-1	1
anarchism	mainly	1	1
anarchism	focuses	2	1
anarchism	on	3	1
anarchism	claims	4	1
anarchism	of	5	1
mainly	human	-5	1
mainly	society	-4	1
mainly	criticism	-3	1
mainly	of	-2	1
mainly	anarchism	-1	1
mainly	focuses	1	1
mainly	on	2	1
mainly	claims	3	1
mainly	of	4	1
mainly	it	5	1
focuses	society	-5	1
focuses	criticism	-4	1
focuses	of	-3	1
focuses	anarchism	-2	1
focuses	mainly	-1	1
focuses	on	1	1
focuses	claims	2	1
focuses	of	3	1
focuses	it	4	1
focuses	being	5	1
on	criticism	-5	1
on	of	-4	1
on	anarchism	-3	1
on	mainly	-2	1
on	focuses	-1	1
on	claims	1	1
on	of	2	1
on	it	3	1
on	being	4	1
on	internally	5	1
claims	of	-5	1
claims	anarchism	-4	1
claims	mainly	-3	1
claims	focuses	-2	1
claims	on	-1	1
claims	of	1	1
claims	it	2	1
claims	being	3	1
claims	internally	4	1
claims	inconsistent	5	1
of	anarchism	-5	1
of	mainly	-4	1
of	focuses	-3	1
of	on	-2	1
of	claims	-1	1
of	it	1	1
of	being	2	1
of	internally	3	1
of	inconsistent	4	1
of	violent	5	1
it	mainly	-5	1
it	focu

associate	do	-2	1
associate	not	-1	1
associate	themselves	1	1
associate	with	2	1
associate	authoritarian	3	1
associate	socialists	4	1
associate	or	5	1
themselves	marxists	-5	1
themselves	who	-4	1
themselves	do	-3	1
themselves	not	-2	1
themselves	associate	-1	1
themselves	with	1	1
themselves	authoritarian	2	1
themselves	socialists	3	1
themselves	or	4	1
themselves	a	5	1
with	who	-5	1
with	do	-4	1
with	not	-3	1
with	associate	-2	1
with	themselves	-1	1
with	authoritarian	1	1
with	socialists	2	1
with	or	3	1
with	a	4	1
with	vanguard	5	1
authoritarian	do	-5	1
authoritarian	not	-4	1
authoritarian	associate	-3	1
authoritarian	themselves	-2	1
authoritarian	with	-1	1
authoritarian	socialists	1	1
authoritarian	or	2	1
authoritarian	a	3	1
authoritarian	vanguard	4	1
authoritarian	party	5	1
socialists	not	-5	1
socialists	associate	-4	1
socialists	themselves	-3	1
socialists	with	-2	1
socialists	authoritarian	-1	1
socialists	or	1	1
socialists	a	2	1
socialists	vanguard	3	1
socialists	party	4	1
socialists

suggestion	coercive	-4	1
suggestion	society	-3	1
suggestion	and	-2	1
suggestion	a	-1	1
suggestion	on	1	1
suggestion	how	2	1
suggestion	to	3	1
suggestion	act	4	1
suggestion	to	5	1
on	coercive	-5	1
on	society	-4	1
on	and	-3	1
on	a	-2	1
on	suggestion	-1	1
on	how	1	1
on	to	2	1
on	act	3	1
on	to	4	1
on	pursue	5	1
how	society	-5	1
how	and	-4	1
how	a	-3	1
how	suggestion	-2	1
how	on	-1	1
how	to	1	1
how	act	2	1
how	to	3	1
how	pursue	4	1
how	the	5	1
to	and	-5	1
to	a	-4	1
to	suggestion	-3	1
to	on	-2	1
to	how	-1	1
to	act	1	1
to	to	2	1
to	pursue	3	1
to	the	4	1
to	ideal	5	1
act	a	-5	1
act	suggestion	-4	1
act	on	-3	1
act	how	-2	1
act	to	-1	1
act	to	1	1
act	pursue	2	1
act	the	3	1
act	ideal	4	1
act	of	5	1
to	suggestion	-5	1
to	on	-4	1
to	how	-3	1
to	to	-2	1
to	act	-1	1
to	pursue	1	1
to	the	2	1
to	ideal	3	1
to	of	4	1
to	anarchy	5	1
pursue	on	-5	1
pursue	how	-4	1
pursue	to	-3	1
pursue	act	-2	1
pursue	to	-1	1
pursue	the	1	1
pursue	ideal	2	1
pursue	of	3	1
pursue	anarchy	4	1
pursue	herbert	5	1
the	how	-5	1
t

most	as	-3	1
most	a	-2	1
most	reaction	-1	1
most	notable	1	1
most	precursors	2	1
most	to	3	1
most	anarchism	4	1
most	in	5	1
notable	espoused	-5	1
notable	as	-4	1
notable	a	-3	1
notable	reaction	-2	1
notable	most	-1	1
notable	precursors	1	1
notable	to	2	1
notable	anarchism	3	1
notable	in	4	1
notable	the	5	1
precursors	as	-5	1
precursors	a	-4	1
precursors	reaction	-3	1
precursors	most	-2	1
precursors	notable	-1	1
precursors	to	1	1
precursors	anarchism	2	1
precursors	in	3	1
precursors	the	4	1
precursors	ancient	5	1
to	a	-5	1
to	reaction	-4	1
to	most	-3	1
to	notable	-2	1
to	precursors	-1	1
to	anarchism	1	1
to	in	2	1
to	the	3	1
to	ancient	4	1
to	world	5	1
anarchism	reaction	-5	1
anarchism	most	-4	1
anarchism	notable	-3	1
anarchism	precursors	-2	1
anarchism	to	-1	1
anarchism	in	1	1
anarchism	the	2	1
anarchism	ancient	3	1
anarchism	world	4	1
anarchism	were	5	1
in	most	-5	1
in	notable	-4	1
in	precursors	-3	1
in	to	-2	1
in	anarchism	-1	1
in	the	1	1
in	ancient	2	1
in	world	3	1
in	were	4	1
in	in	

progress	the	-4	1
progress	optimism	-3	1
progress	for	-2	1
progress	social	-1	1
progress	during	1	1
progress	the	2	1
progress	french	3	1
progress	revolution	4	1
progress	partisan	5	1
during	the	-5	1
during	optimism	-4	1
during	for	-3	1
during	social	-2	1
during	progress	-1	1
during	the	1	1
during	french	2	1
during	revolution	3	1
during	partisan	4	1
during	groups	5	1
the	optimism	-5	1
the	for	-4	1
the	social	-3	1
the	progress	-2	1
the	during	-1	1
the	french	1	1
the	revolution	2	1
the	partisan	3	1
the	groups	4	1
the	such	5	1
french	for	-5	1
french	social	-4	1
french	progress	-3	1
french	during	-2	1
french	the	-1	1
french	revolution	1	1
french	partisan	2	1
french	groups	3	1
french	such	4	1
french	as	5	1
revolution	social	-5	1
revolution	progress	-4	1
revolution	during	-3	1
revolution	the	-2	1
revolution	french	-1	1
revolution	partisan	1	1
revolution	groups	2	1
revolution	such	3	1
revolution	as	4	1
revolution	the	5	1
partisan	progress	-5	1
partisan	during	-4	1
partisan	the	-3	1
partisan	fr

from	response	-4	1
from	to	-3	1
from	their	-2	1
from	expulsion	-1	1
from	the	1	1
from	first	2	1
from	international	3	1
from	anarchists	4	1
from	formed	5	1
the	response	-5	1
the	to	-4	1
the	their	-3	1
the	expulsion	-2	1
the	from	-1	1
the	first	1	1
the	international	2	1
the	anarchists	3	1
the	formed	4	1
the	the	5	1
first	to	-5	1
first	their	-4	1
first	expulsion	-3	1
first	from	-2	1
first	the	-1	1
first	international	1	1
first	anarchists	2	1
first	formed	3	1
first	the	4	1
first	st	5	1
international	their	-5	1
international	expulsion	-4	1
international	from	-3	1
international	the	-2	1
international	first	-1	1
international	anarchists	1	1
international	formed	2	1
international	the	3	1
international	st	4	1
international	imier	5	1
anarchists	expulsion	-5	1
anarchists	from	-4	1
anarchists	the	-3	1
anarchists	first	-2	1
anarchists	international	-1	1
anarchists	formed	1	1
anarchists	the	2	1
anarchists	st	3	1
anarchists	imier	4	1
anarchists	international	5	1
formed	from	-5	1
formed	the	-4	1
forme

former	platformism	-5	1
former	and	-4	1
former	synthesis	-3	1
former	anarchism	-2	1
former	the	-1	1
former	sought	1	1
former	to	2	1
former	create	3	1
former	a	4	1
former	coherent	5	1
sought	and	-5	1
sought	synthesis	-4	1
sought	anarchism	-3	1
sought	the	-2	1
sought	former	-1	1
sought	to	1	1
sought	create	2	1
sought	a	3	1
sought	coherent	4	1
sought	group	5	1
to	synthesis	-5	1
to	anarchism	-4	1
to	the	-3	1
to	former	-2	1
to	sought	-1	1
to	create	1	1
to	a	2	1
to	coherent	3	1
to	group	4	1
to	that	5	1
create	anarchism	-5	1
create	the	-4	1
create	former	-3	1
create	sought	-2	1
create	to	-1	1
create	a	1	1
create	coherent	2	1
create	group	3	1
create	that	4	1
create	would	5	1
a	the	-5	1
a	former	-4	1
a	sought	-3	1
a	to	-2	1
a	create	-1	1
a	coherent	1	1
a	group	2	1
a	that	3	1
a	would	4	1
a	push	5	1
coherent	former	-5	1
coherent	sought	-4	1
coherent	to	-3	1
coherent	create	-2	1
coherent	a	-1	1
coherent	group	1	1
coherent	that	2	1
coherent	would	3	1
coherent	push	4	1
coherent	for	5	1
group	sought	

exemplified	associated	-5	1
exemplified	with	-4	1
exemplified	punk	-3	1
exemplified	subculture	-2	1
exemplified	as	-1	1
exemplified	by	1	1
exemplified	bands	2	1
exemplified	such	3	1
exemplified	as	4	1
exemplified	crass	5	1
by	with	-5	1
by	punk	-4	1
by	subculture	-3	1
by	as	-2	1
by	exemplified	-1	1
by	bands	1	1
by	such	2	1
by	as	3	1
by	crass	4	1
by	and	5	1
bands	punk	-5	1
bands	subculture	-4	1
bands	as	-3	1
bands	exemplified	-2	1
bands	by	-1	1
bands	such	1	1
bands	as	2	1
bands	crass	3	1
bands	and	4	1
bands	the	5	1
such	subculture	-5	1
such	as	-4	1
such	exemplified	-3	1
such	by	-2	1
such	bands	-1	1
such	as	1	1
such	crass	2	1
such	and	3	1
such	the	4	1
such	sex	5	1
as	as	-5	1
as	exemplified	-4	1
as	by	-3	1
as	bands	-2	1
as	such	-1	1
as	crass	1	1
as	and	2	1
as	the	3	1
as	sex	4	1
as	pistols	5	1
crass	exemplified	-5	1
crass	by	-4	1
crass	bands	-3	1
crass	such	-2	1
crass	as	-1	1
crass	and	1	1
crass	the	2	1
crass	sex	3	1
crass	pistols	4	1
crass	the	5	1
and	by	-5	1
and	bands	-4	1
and	such	-3	1
a

constitute	anarchism	5	1
political	of	-5	1
political	anarchist	-4	1
political	movements	-3	1
political	which	-2	1
political	constitute	-1	1
political	anarchism	1	1
political	lies	2	1
political	philosophical	3	1
political	anarchism	4	1
political	which	5	1
anarchism	anarchist	-5	1
anarchism	movements	-4	1
anarchism	which	-3	1
anarchism	constitute	-2	1
anarchism	political	-1	1
anarchism	lies	1	1
anarchism	philosophical	2	1
anarchism	anarchism	3	1
anarchism	which	4	1
anarchism	holds	5	1
lies	movements	-5	1
lies	which	-4	1
lies	constitute	-3	1
lies	political	-2	1
lies	anarchism	-1	1
lies	philosophical	1	1
lies	anarchism	2	1
lies	which	3	1
lies	holds	4	1
lies	that	5	1
philosophical	which	-5	1
philosophical	constitute	-4	1
philosophical	political	-3	1
philosophical	anarchism	-2	1
philosophical	lies	-1	1
philosophical	anarchism	1	1
philosophical	which	2	1
philosophical	holds	3	1
philosophical	that	4	1
philosophical	the	5	1
anarchism	constitute	-5	1
anarchism	political	-4	1
anarchism	anarchism	

economic	society	5	1
aspects	differ	-5	1
aspects	on	-4	1
aspects	organisational	-3	1
aspects	and	-2	1
aspects	economic	-1	1
aspects	of	1	1
aspects	their	2	1
aspects	ideal	3	1
aspects	society	4	1
aspects	mutualism	5	1
of	on	-5	1
of	organisational	-4	1
of	and	-3	1
of	economic	-2	1
of	aspects	-1	1
of	their	1	1
of	ideal	2	1
of	society	3	1
of	mutualism	4	1
of	is	5	1
their	organisational	-5	1
their	and	-4	1
their	economic	-3	1
their	aspects	-2	1
their	of	-1	1
their	ideal	1	1
their	society	2	1
their	mutualism	3	1
their	is	4	1
their	an	5	1
ideal	and	-5	1
ideal	economic	-4	1
ideal	aspects	-3	1
ideal	of	-2	1
ideal	their	-1	1
ideal	society	1	1
ideal	mutualism	2	1
ideal	is	3	1
ideal	an	4	1
ideal	18th	5	1
society	economic	-5	1
society	aspects	-4	1
society	of	-3	1
society	their	-2	1
society	ideal	-1	1
society	mutualism	1	1
society	is	2	1
society	an	3	1
society	18th	4	1
society	century	5	1
mutualism	aspects	-5	1
mutualism	of	-4	1
mutualism	their	-3	1
mutualism	ideal	-2	1
mutualism	society	-1	1
mutual

society	by	4	1
society	workers	5	1
democratically	state	-5	1
democratically	with	-4	1
democratically	a	-3	1
democratically	new	-2	1
democratically	society	-1	1
democratically	self	1	1
democratically	managed	2	1
democratically	by	3	1
democratically	workers	4	1
democratically	the	5	1
self	with	-5	1
self	a	-4	1
self	new	-3	1
self	society	-2	1
self	democratically	-1	1
self	managed	1	1
self	by	2	1
self	workers	3	1
self	the	4	1
self	basic	5	1
managed	a	-5	1
managed	new	-4	1
managed	society	-3	1
managed	democratically	-2	1
managed	self	-1	1
managed	by	1	1
managed	workers	2	1
managed	the	3	1
managed	basic	4	1
managed	principles	5	1
by	new	-5	1
by	society	-4	1
by	democratically	-3	1
by	self	-2	1
by	managed	-1	1
by	workers	1	1
by	the	2	1
by	basic	3	1
by	principles	4	1
by	of	5	1
workers	society	-5	1
workers	democratically	-4	1
workers	self	-3	1
workers	managed	-2	1
workers	by	-1	1
workers	the	1	1
workers	basic	2	1
workers	principles	3	1
workers	of	4	1
workers	anarcho	5	1
the	democratically	-5	1
t

a	it	4	1
a	ranks	5	1
common	gender	-5	1
common	equality	-4	1
common	can	-3	1
common	be	-2	1
common	a	-1	1
common	principle	1	1
common	although	2	1
common	it	3	1
common	ranks	4	1
common	as	5	1
principle	equality	-5	1
principle	can	-4	1
principle	be	-3	1
principle	a	-2	1
principle	common	-1	1
principle	although	1	1
principle	it	2	1
principle	ranks	3	1
principle	as	4	1
principle	a	5	1
although	can	-5	1
although	be	-4	1
although	a	-3	1
although	common	-2	1
although	principle	-1	1
although	it	1	1
although	ranks	2	1
although	as	3	1
although	a	4	1
although	higher	5	1
it	be	-5	1
it	a	-4	1
it	common	-3	1
it	principle	-2	1
it	although	-1	1
it	ranks	1	1
it	as	2	1
it	a	3	1
it	higher	4	1
it	priority	5	1
ranks	a	-5	1
ranks	common	-4	1
ranks	principle	-3	1
ranks	although	-2	1
ranks	it	-1	1
ranks	as	1	1
ranks	a	2	1
ranks	higher	3	1
ranks	priority	4	1
ranks	to	5	1
as	common	-5	1
as	principle	-4	1
as	although	-3	1
as	it	-2	1
as	ranks	-1	1
as	a	1	1
as	higher	2	1
as	priority	3	1
as	to	4	1
as	anarcha	5	1
a

part	many	3	1
part	anarchists	4	1
part	especially	5	1
in	successful	-5	1
in	anarchists	-4	1
in	also	-3	1
in	took	-2	1
in	part	-1	1
in	revolutions	1	1
in	many	2	1
in	anarchists	3	1
in	especially	4	1
in	the	5	1
revolutions	anarchists	-5	1
revolutions	also	-4	1
revolutions	took	-3	1
revolutions	part	-2	1
revolutions	in	-1	1
revolutions	many	1	1
revolutions	anarchists	2	1
revolutions	especially	3	1
revolutions	the	4	1
revolutions	galleanists	5	1
many	also	-5	1
many	took	-4	1
many	part	-3	1
many	in	-2	1
many	revolutions	-1	1
many	anarchists	1	1
many	especially	2	1
many	the	3	1
many	galleanists	4	1
many	believed	5	1
anarchists	took	-5	1
anarchists	part	-4	1
anarchists	in	-3	1
anarchists	revolutions	-2	1
anarchists	many	-1	1
anarchists	especially	1	1
anarchists	the	2	1
anarchists	galleanists	3	1
anarchists	believed	4	1
anarchists	that	5	1
especially	part	-5	1
especially	in	-4	1
especially	revolutions	-3	1
especially	many	-2	1
especially	anarchists	-1	1
especially	the	1	1
especially	galleanist

groups	member	3	1
groups	is	4	1
groups	responsible	5	1
where	for	-5	1
where	small	-4	1
where	informal	-3	1
where	affiliation	-2	1
where	groups	-1	1
where	each	1	1
where	member	2	1
where	is	3	1
where	responsible	4	1
where	for	5	1
each	small	-5	1
each	informal	-4	1
each	affiliation	-3	1
each	groups	-2	1
each	where	-1	1
each	member	1	1
each	is	2	1
each	responsible	3	1
each	for	4	1
each	their	5	1
member	informal	-5	1
member	affiliation	-4	1
member	groups	-3	1
member	where	-2	1
member	each	-1	1
member	is	1	1
member	responsible	2	1
member	for	3	1
member	their	4	1
member	own	5	1
is	affiliation	-5	1
is	groups	-4	1
is	where	-3	1
is	each	-2	1
is	member	-1	1
is	responsible	1	1
is	for	2	1
is	their	3	1
is	own	4	1
is	actions	5	1
responsible	groups	-5	1
responsible	where	-4	1
responsible	each	-3	1
responsible	member	-2	1
responsible	is	-1	1
responsible	for	1	1
responsible	their	2	1
responsible	own	3	1
responsible	actions	4	1
responsible	but	5	1
for	where	-5	1
for	each	-4	1
for	member	-3	1
for	is	-2	1

significant	affinity	-4	1
significant	groups	-3	1
significant	plays	-2	1
significant	a	-1	1
significant	tactical	1	1
significant	role	2	1
significant	anarchists	3	1
significant	have	4	1
significant	employed	5	1
tactical	affinity	-5	1
tactical	groups	-4	1
tactical	plays	-3	1
tactical	a	-2	1
tactical	significant	-1	1
tactical	role	1	1
tactical	anarchists	2	1
tactical	have	3	1
tactical	employed	4	1
tactical	various	5	1
role	groups	-5	1
role	plays	-4	1
role	a	-3	1
role	significant	-2	1
role	tactical	-1	1
role	anarchists	1	1
role	have	2	1
role	employed	3	1
role	various	4	1
role	methods	5	1
anarchists	plays	-5	1
anarchists	a	-4	1
anarchists	significant	-3	1
anarchists	tactical	-2	1
anarchists	role	-1	1
anarchists	have	1	1
anarchists	employed	2	1
anarchists	various	3	1
anarchists	methods	4	1
anarchists	in	5	1
have	a	-5	1
have	significant	-4	1
have	tactical	-3	1
have	role	-2	1
have	anarchists	-1	1
have	employed	1	1
have	various	2	1
have	methods	3	1
have	in	4	1
have	order	5	1
employed	significa

anarchists	is	2	1
anarchists	a	3	1
anarchists	way	4	1
anarchists	to	5	1
squatting	ideal	-5	1
squatting	as	-4	1
squatting	seen	-3	1
squatting	by	-2	1
squatting	anarchists	-1	1
squatting	is	1	1
squatting	a	2	1
squatting	way	3	1
squatting	to	4	1
squatting	regain	5	1
is	as	-5	1
is	seen	-4	1
is	by	-3	1
is	anarchists	-2	1
is	squatting	-1	1
is	a	1	1
is	way	2	1
is	to	3	1
is	regain	4	1
is	urban	5	1
a	seen	-5	1
a	by	-4	1
a	anarchists	-3	1
a	squatting	-2	1
a	is	-1	1
a	way	1	1
a	to	2	1
a	regain	3	1
a	urban	4	1
a	space	5	1
way	by	-5	1
way	anarchists	-4	1
way	squatting	-3	1
way	is	-2	1
way	a	-1	1
way	to	1	1
way	regain	2	1
way	urban	3	1
way	space	4	1
way	from	5	1
to	anarchists	-5	1
to	squatting	-4	1
to	is	-3	1
to	a	-2	1
to	way	-1	1
to	regain	1	1
to	urban	2	1
to	space	3	1
to	from	4	1
to	the	5	1
regain	squatting	-5	1
regain	is	-4	1
regain	a	-3	1
regain	way	-2	1
regain	to	-1	1
regain	urban	1	1
regain	space	2	1
regain	from	3	1
regain	the	4	1
regain	capitalist	5	1
urban	is	-5	1
urban	a	-4	1
urban	way	-3	1

current	survives	1	1
current	as	2	1
current	a	3	1
current	tendency	4	1
current	to	5	1
survives	in	-5	1
survives	contemporary	-4	1
survives	anarchism	-3	1
survives	this	-2	1
survives	current	-1	1
survives	as	1	1
survives	a	2	1
survives	tendency	3	1
survives	to	4	1
survives	support	5	1
as	contemporary	-5	1
as	anarchism	-4	1
as	this	-3	1
as	current	-2	1
as	survives	-1	1
as	a	1	1
as	tendency	2	1
as	to	3	1
as	support	4	1
as	polyamory	5	1
a	anarchism	-5	1
a	this	-4	1
a	current	-3	1
a	survives	-2	1
a	as	-1	1
a	tendency	1	1
a	to	2	1
a	support	3	1
a	polyamory	4	1
a	and	5	1
tendency	this	-5	1
tendency	current	-4	1
tendency	survives	-3	1
tendency	as	-2	1
tendency	a	-1	1
tendency	to	1	1
tendency	support	2	1
tendency	polyamory	3	1
tendency	and	4	1
tendency	queer	5	1
to	current	-5	1
to	survives	-4	1
to	as	-3	1
to	a	-2	1
to	tendency	-1	1
to	support	1	1
to	polyamory	2	1
to	and	3	1
to	queer	4	1
to	anarchism	5	1
support	survives	-5	1
support	as	-4	1
support	a	-3	1
support	tendency	-2	1
support	to	-1	1
s

of	mutual	1	1
of	aid	2	1
of	anarchist	3	1
of	writers	4	1
of	such	5	1
mutual	to	-5	1
mutual	be	-4	1
mutual	an	-3	1
mutual	act	-2	1
mutual	of	-1	1
mutual	aid	1	1
mutual	anarchist	2	1
mutual	writers	3	1
mutual	such	4	1
mutual	as	5	1
aid	be	-5	1
aid	an	-4	1
aid	act	-3	1
aid	of	-2	1
aid	mutual	-1	1
aid	anarchist	1	1
aid	writers	2	1
aid	such	3	1
aid	as	4	1
aid	william	5	1
anarchist	an	-5	1
anarchist	act	-4	1
anarchist	of	-3	1
anarchist	mutual	-2	1
anarchist	aid	-1	1
anarchist	writers	1	1
anarchist	such	2	1
anarchist	as	3	1
anarchist	william	4	1
anarchist	godwin	5	1
writers	act	-5	1
writers	of	-4	1
writers	mutual	-3	1
writers	aid	-2	1
writers	anarchist	-1	1
writers	such	1	1
writers	as	2	1
writers	william	3	1
writers	godwin	4	1
writers	political	5	1
such	of	-5	1
such	mutual	-4	1
such	aid	-3	1
such	anarchist	-2	1
such	writers	-1	1
such	as	1	1
such	william	2	1
such	godwin	3	1
such	political	4	1
such	justice	5	1
as	mutual	-5	1
as	aid	-4	1
as	anarchist	-3	1
as	writers	-2	1
as	such	-1	1
as	william	

for	need	-1	1
for	a	1	1
for	system	2	1
for	that	3	1
for	focuses	4	1
for	on	5	1
a	focusing	-5	1
a	on	-4	1
a	the	-3	1
a	need	-2	1
a	for	-1	1
a	system	1	1
a	that	2	1
a	focuses	3	1
a	on	4	1
a	children	5	1
system	on	-5	1
system	the	-4	1
system	need	-3	1
system	for	-2	1
system	a	-1	1
system	that	1	1
system	focuses	2	1
system	on	3	1
system	children	4	1
system	s	5	1
that	the	-5	1
that	need	-4	1
that	for	-3	1
that	a	-2	1
that	system	-1	1
that	focuses	1	1
that	on	2	1
that	children	3	1
that	s	4	1
that	creativity	5	1
focuses	need	-5	1
focuses	for	-4	1
focuses	a	-3	1
focuses	system	-2	1
focuses	that	-1	1
focuses	on	1	1
focuses	children	2	1
focuses	s	3	1
focuses	creativity	4	1
focuses	rather	5	1
on	for	-5	1
on	a	-4	1
on	system	-3	1
on	that	-2	1
on	focuses	-1	1
on	children	1	1
on	s	2	1
on	creativity	3	1
on	rather	4	1
on	than	5	1
children	a	-5	1
children	system	-4	1
children	that	-3	1
children	focuses	-2	1
children	on	-1	1
children	s	1	1
children	creativity	2	1
children	rather	3	1
children	than	4	1
ch

domination	coercion	-1	1
domination	by	1	1
domination	means	2	1
domination	of	3	1
domination	coercion	4	1
domination	camouflaged	5	1
by	state	-5	1
by	as	-4	1
by	meaning	-3	1
by	coercion	-2	1
by	domination	-1	1
by	means	1	1
by	of	2	1
by	coercion	3	1
by	camouflaged	4	1
by	if	5	1
means	as	-5	1
means	meaning	-4	1
means	coercion	-3	1
means	domination	-2	1
means	by	-1	1
means	of	1	1
means	coercion	2	1
means	camouflaged	3	1
means	if	4	1
means	possible	5	1
of	meaning	-5	1
of	coercion	-4	1
of	domination	-3	1
of	by	-2	1
of	means	-1	1
of	coercion	1	1
of	camouflaged	2	1
of	if	3	1
of	possible	4	1
of	but	5	1
coercion	coercion	-5	1
coercion	domination	-4	1
coercion	by	-3	1
coercion	means	-2	1
coercion	of	-1	1
coercion	camouflaged	1	1
coercion	if	2	1
coercion	possible	3	1
coercion	but	4	1
coercion	unceremonious	5	1
camouflaged	domination	-5	1
camouflaged	by	-4	1
camouflaged	means	-3	1
camouflaged	of	-2	1
camouflaged	coercion	-1	1
camouflaged	if	1	1
camouflaged	possible	2	1
camouflaged	but	3	1
camoufla

ecological	had	-2	1
ecological	an	-1	1
ecological	aesthetic	1	1
ecological	and	2	1
ecological	offered	3	1
ecological	an	4	1
ecological	example	5	1
aesthetic	impressionist	-5	1
aesthetic	movement	-4	1
aesthetic	had	-3	1
aesthetic	an	-2	1
aesthetic	ecological	-1	1
aesthetic	and	1	1
aesthetic	offered	2	1
aesthetic	an	3	1
aesthetic	example	4	1
aesthetic	of	5	1
and	movement	-5	1
and	had	-4	1
and	an	-3	1
and	ecological	-2	1
and	aesthetic	-1	1
and	offered	1	1
and	an	2	1
and	example	3	1
and	of	4	1
and	an	5	1
offered	had	-5	1
offered	an	-4	1
offered	ecological	-3	1
offered	aesthetic	-2	1
offered	and	-1	1
offered	an	1	1
offered	example	2	1
offered	of	3	1
offered	an	4	1
offered	anarchist	5	1
an	an	-5	1
an	ecological	-4	1
an	aesthetic	-3	1
an	and	-2	1
an	offered	-1	1
an	example	1	1
an	of	2	1
an	an	3	1
an	anarchist	4	1
an	perception	5	1
example	ecological	-5	1
example	aesthetic	-4	1
example	and	-3	1
example	offered	-2	1
example	an	-1	1
example	of	1	1
example	an	2	1
example	anarchist	3	1
example	per

of	philosophical	-2	1
of	anarchism	-1	1
of	being	1	1
of	ineffective	2	1
of	all	3	1
of	talk	4	1
of	and	5	1
being	critique	-5	1
being	towards	-4	1
being	philosophical	-3	1
being	anarchism	-2	1
being	of	-1	1
being	ineffective	1	1
being	all	2	1
being	talk	3	1
being	and	4	1
being	thoughts	5	1
ineffective	towards	-5	1
ineffective	philosophical	-4	1
ineffective	anarchism	-3	1
ineffective	of	-2	1
ineffective	being	-1	1
ineffective	all	1	1
ineffective	talk	2	1
ineffective	and	3	1
ineffective	thoughts	4	1
ineffective	and	5	1
all	philosophical	-5	1
all	anarchism	-4	1
all	of	-3	1
all	being	-2	1
all	ineffective	-1	1
all	talk	1	1
all	and	2	1
all	thoughts	3	1
all	and	4	1
all	in	5	1
talk	anarchism	-5	1
talk	of	-4	1
talk	being	-3	1
talk	ineffective	-2	1
talk	all	-1	1
talk	and	1	1
talk	thoughts	2	1
talk	and	3	1
talk	in	4	1
talk	the	5	1
and	of	-5	1
and	being	-4	1
and	ineffective	-3	1
and	all	-2	1
and	talk	-1	1
and	thoughts	1	1
and	and	2	1
and	in	3	1
and	the	4	1
and	meantime	5	1
thoughts	being	-5	1
though

has	is	-3	1
has	that	-2	1
has	it	-1	1
has	a	1	1
has	utopian	2	1
has	character	3	1
has	because	4	1
has	all	5	1
a	anarchism	-5	1
a	is	-4	1
a	that	-3	1
a	it	-2	1
a	has	-1	1
a	utopian	1	1
a	character	2	1
a	because	3	1
a	all	4	1
a	individuals	5	1
utopian	is	-5	1
utopian	that	-4	1
utopian	it	-3	1
utopian	has	-2	1
utopian	a	-1	1
utopian	character	1	1
utopian	because	2	1
utopian	all	3	1
utopian	individuals	4	1
utopian	should	5	1
character	that	-5	1
character	it	-4	1
character	has	-3	1
character	a	-2	1
character	utopian	-1	1
character	because	1	1
character	all	2	1
character	individuals	3	1
character	should	4	1
character	have	5	1
because	it	-5	1
because	has	-4	1
because	a	-3	1
because	utopian	-2	1
because	character	-1	1
because	all	1	1
because	individuals	2	1
because	should	3	1
because	have	4	1
because	anarchist	5	1
all	has	-5	1
all	a	-4	1
all	utopian	-3	1
all	character	-2	1
all	because	-1	1
all	individuals	1	1
all	should	2	1
all	have	3	1
all	anarchist	4	1
all	views	5	1
individuals	a	-5	1
indivi

### Step 2: Shuffler

All we need to do in the shuffler is sorting, so let's use the built-in command to do this for us!  

Try this on your terminal/command prompt ;)  
(You can get the sample input from [here](https://drive.google.com/drive/folders/1vKxr--sLd2J4kdsXUzJDBZdG3AmV4NGl?usp=sharing))

**Unix**  
```bash
sort -k1,3 < mapper.sample.tsv
```
**Windows**
```powershell
type mapper.sample.tsv | sort
```

### Step 3: Reducer

Since all the input should have been sorted in previous shuffler, the task of reducer is pretty simple: just count how many times the same gram appears, and then print the count out!

Input: 
 - `"{pivot}\t{word}\t{distance}\t{count}"`
 - You can get a sample input file `shuffler.sample.tsv` from [here](https://drive.google.com/drive/folders/1vKxr--sLd2J4kdsXUzJDBZdG3AmV4NGl?usp=sharing)

Output: 
 - `"{pivot}\t{word}\t{total_freq}\t{-5}\t{-4}\t{-3}\t{-2}\t{-1}\t{1}\t{2}\t{3}\t{4}\t{5}"`
 - The first two column are the skipgram; the third column is the sum of total frequency; column 4\~13 are the frequency with distance -5\~5, without 0.

Example:
 - `arouse  open    4       0       0       3       0       0       0       0       0       0       1`

Hints: 
1. Parse the input from shuffler
2. Check if this is the same skipgram as the previous one
3. If so, add the frequency according to its distance
4. If not, output the previous skipgram data

Note that you may NOT want to store all your counting results in a dict or any data structure.  
Recall that one purpose of MapReduce is to prevent memory exhaustion. It loses its value if you end up storing it again.  
Instead, <u>directly print it out or write it into a file</u> .  
(Don't get me wrong: of course you can store some temporary data, but let's not store the whole result and then print it out at once, okay?)


In [4]:
with open(os.path.join('data', 'shuffler.sample.tsv')) as f:
    
    for line in f:
        # 1) Parse the input from shuffler
        # 2) Check if this is the same skipgram
        # 3) If so, add the frequency according to its distance
        # 4) If not, output the previous skipgram data
        
        ...

1539	a	1	0	0	0	0	0	0	0	0	0	1
1539	anarchisme	1	0	1	0	0	0	0	0	0	0	0
1539	anarchy	1	0	0	0	1	0	0	0	0	0	0
1539	and	1	0	0	1	0	0	0	0	0	0	0
1539	as	1	1	0	0	0	0	0	0	0	0	0
1539	early	1	0	0	0	0	0	1	0	0	0	0
1539	empahised	1	0	0	0	0	0	0	0	0	1	0
1539	english	1	0	0	0	0	0	0	1	0	0	0
1539	from	1	0	0	0	0	1	0	0	0	0	0
1539	usages	1	0	0	0	0	0	0	0	1	0	0
1642	anarchism	1	1	0	0	0	0	0	0	0	0	0
1642	anarchisme	1	0	0	0	0	0	0	1	0	0	0
1642	anarchy	1	0	0	0	0	0	0	0	0	1	0
1642	and	1	0	0	0	0	0	0	0	1	0	0
1642	appears	1	0	1	0	0	0	0	0	0	0	0
1642	as	1	0	0	0	0	0	1	0	0	0	0
1642	english	1	0	0	0	1	0	0	0	0	0	0
1642	from	2	0	0	0	0	1	0	0	0	0	1
1642	in	1	0	0	1	0	0	0	0	0	0	0
1756	1808	1	0	0	0	0	0	0	0	0	0	1
1756	1836	1	0	0	0	0	0	1	0	0	0	0
1756	and	1	0	0	0	0	0	0	1	0	0	0
1756	as	1	0	0	1	0	0	0	0	0	0	0
1756	century	1	1	0	0	0	0	0	0	0	0	0
1756	godwin	1	0	0	0	0	1	0	0	0	0	0
1756	such	1	0	1	0	0	0	0	0	0	0	0
1756	weitling	1	0	0	0	0	0	0	0	0	1	0
1756	wilhelm	1	0	0	0	0	0	0	0	1	0	0
1756	william	1	0	0	0	1	0	0	0	0	0	0
1808	1756	1	1	0	0	0	0	0	0	0	0	0

although	common	1	0	0	0	1	0	0	0	0	0	0
although	consensus	1	0	0	0	0	1	0	0	0	0	0
although	contemporary	1	0	0	0	0	0	1	0	0	0	0
although	engage	1	1	0	0	0	0	0	0	0	0	0
although	explicitly	1	0	0	0	0	0	0	1	0	0	0
although	favours	1	0	0	0	0	0	0	0	1	0	0
although	few	1	0	0	0	0	0	1	0	0	0	0
although	governed	1	0	0	1	0	0	0	0	0	0	0
although	gradual	1	1	0	0	0	0	0	0	0	0	0
although	higher	1	0	0	0	0	0	0	0	0	0	1
although	highly	1	0	0	0	0	0	0	0	0	0	1
although	horizontalism	1	0	0	0	0	1	0	0	0	0	0
although	in	1	0	0	1	0	0	0	0	0	0	0
although	is	2	0	1	0	0	0	0	1	0	0	0
although	it	2	1	0	0	0	0	1	0	0	0	0
although	labelled	1	1	0	0	0	0	0	0	0	0	0
although	less	1	0	1	0	0	0	0	0	0	0	0
although	many	1	0	0	0	0	0	0	0	0	0	1
although	movement	1	0	0	0	0	1	0	0	0	0	0
although	not	1	0	0	0	0	0	1	0	0	0	0
although	on	1	0	0	0	1	0	0	0	0	0	0
although	opponents	1	0	0	1	0	0	0	0	0	0	0
although	or	1	1	0	0	0	0	0	0	0	0	0
although	over	1	0	0	0	0	0	0	0	0	0	1
although	overlap	1	0	0	0	0	0	0	0	0	1	0
although	personal	1	0	0	0	0	0	0	0	0	0	1
although	p

and	humanism	1	0	0	0	1	0	0	0	0	0	0
and	humans	3	2	0	0	0	0	0	0	1	0	0
and	i	2	0	0	1	0	0	0	0	0	1	0
and	ideal	3	0	0	0	1	1	0	0	0	0	1
and	idealistic	1	0	0	0	0	0	0	1	0	0	0
and	ideals	1	0	0	0	0	0	0	0	0	0	1
and	ideas	3	0	0	0	0	1	0	1	0	1	0
and	identity	1	0	0	0	0	1	0	0	0	0	0
and	ideological	1	0	0	0	0	0	0	0	0	1	0
and	ideologically	3	0	1	0	0	0	1	0	0	0	1
and	ideology	1	0	0	0	0	1	0	0	0	0	0
and	if	2	0	1	0	0	0	0	1	0	0	0
and	illegalism	1	0	0	0	0	1	0	0	0	0	0
and	illegitimate	1	0	0	0	0	0	0	0	0	0	1
and	illustrating	1	1	0	0	0	0	0	0	0	0	0
and	imperialism	1	0	0	0	0	0	1	0	0	0	0
and	implemented	1	0	0	0	0	0	0	0	0	1	0
and	important	2	1	0	0	0	0	0	0	0	0	1
and	in	47	3	7	3	6	0	1	8	7	6	6
and	inceptive	1	0	0	0	0	0	0	1	0	0	0
and	include	2	2	0	0	0	0	0	0	0	0	0
and	included	2	0	0	0	0	0	2	0	0	0	0
and	including	3	2	0	0	0	0	0	0	1	0	0
and	inconsistent	1	0	0	0	1	0	0	0	0	0	0
and	individual	5	0	0	0	1	3	1	0	0	0	0
and	individualism	3	0	0	0	1	1	1	0	0	0	0
and	individualist	5	0	0	1	0	1	1	0	1	1	0
and	individuality	2	0	0	0	0	1	1	0	0	0	0

authoritiy	inceptive	1	0	0	0	0	0	0	0	1	0	0
authoritiy	justified	1	0	0	0	0	1	0	0	0	0	0
authoritiy	network	1	1	0	0	0	0	0	0	0	0	0
authoritiy	organisation	1	0	1	0	0	0	0	0	0	0	0
authority	a	4	1	0	1	0	0	0	1	0	0	1
authority	acceptance	1	0	0	0	1	0	0	0	0	0	0
authority	acknowledging	1	0	0	0	0	1	0	0	0	0	0
authority	advantages	1	0	0	1	0	0	0	0	0	0	0
authority	against	1	0	0	0	1	0	0	0	0	0	0
authority	aim	1	0	1	0	0	0	0	0	0	0	0
authority	all	3	0	0	0	0	0	0	1	1	0	1
authority	also	1	0	0	0	0	0	1	0	0	0	0
authority	an	2	0	0	0	1	0	0	0	0	0	1
authority	anarchism	2	0	1	0	0	0	0	0	0	0	1
authority	anarchistic	1	0	0	0	0	0	0	0	0	1	0
authority	and	6	1	0	0	0	0	4	0	1	0	0
authority	another	1	0	0	0	0	0	0	0	0	0	1
authority	are	1	1	0	0	0	0	0	0	0	0	0
authority	argues	1	1	0	0	0	0	0	0	0	0	0
authority	articulated	1	0	0	0	0	0	0	0	1	0	0
authority	as	2	0	0	0	0	0	1	1	0	0	0
authority	autonomy	1	0	0	0	0	0	0	1	0	0	0
authority	be	1	1	0	0	0	0	0	0	0	0	0
authority	because	2	0	1	0	0	0	0	0	0	1	0
authority	belief	1	0	0	0	0	0	0	0	1	0	0
author

character	a	1	0	0	0	1	0	0	0	0	0	0
character	all	1	0	0	0	0	0	0	1	0	0	0
character	because	1	0	0	0	0	0	1	0	0	0	0
character	has	1	0	0	1	0	0	0	0	0	0	0
character	have	1	0	0	0	0	0	0	0	0	0	1
character	individuals	1	0	0	0	0	0	0	0	1	0	0
character	it	1	0	1	0	0	0	0	0	0	0	0
character	should	1	0	0	0	0	0	0	0	0	1	0
character	that	1	1	0	0	0	0	0	0	0	0	0
character	utopian	1	0	0	0	0	1	0	0	0	0	0
characterised	1840	1	0	0	1	0	0	0	0	0	0	0
characterised	a	1	0	0	0	0	0	0	0	0	1	0
characterised	as	2	0	0	0	0	0	1	0	1	0	0
characterised	been	1	0	0	0	1	0	0	0	0	0	0
characterised	between	1	0	0	0	0	0	0	0	0	1	0
characterised	first	1	0	0	0	0	1	0	0	0	0	0
characterised	goal	1	0	0	0	0	0	0	1	0	0	0
characterised	has	1	0	0	1	0	0	0	0	0	0	0
characterised	his	1	0	0	0	0	0	1	0	0	0	0
characterised	ideologically	1	0	0	0	0	0	0	1	0	0	0
characterised	individualist	1	0	0	0	0	0	0	0	0	0	1
characterised	is	1	1	0	0	0	0	0	0	0	0	0
characterised	mutualism	1	0	1	0	0	0	0	0	0	0	0
characterised	people	1	1	0	0	0	0	0	0	0	0	0
characterised	property	1	0	1

considered	golden	1	0	0	0	0	0	0	1	0	0	0
considered	is	1	0	0	0	0	1	0	0	0	0	0
considered	of	1	0	0	0	0	0	0	0	0	1	0
considered	spanish	1	1	0	0	0	0	0	0	0	0	0
considered	the	1	0	0	0	0	0	1	0	0	0	0
considered	war	1	0	0	1	0	0	0	0	0	0	0
constant	after	1	0	0	0	0	1	0	0	0	0	0
constant	and	1	0	0	0	0	0	0	0	0	0	1
constant	by	1	0	0	0	0	0	0	1	0	0	0
constant	closed	1	0	0	0	1	0	0	0	0	0	0
constant	harassment	1	0	0	0	0	0	1	0	0	0	0
constant	school	1	0	0	1	0	0	0	0	0	0	0
constant	state	1	0	0	0	0	0	0	0	0	1	0
constant	students	1	1	0	0	0	0	0	0	0	0	0
constant	the	2	0	1	0	0	0	0	0	1	0	0
constantly	and	1	0	0	0	0	0	1	0	0	0	0
constantly	athenian	1	0	0	0	1	0	0	0	0	0	0
constantly	authorities	1	0	0	0	0	1	0	0	0	0	0
constantly	autonomy	1	1	0	0	0	0	0	0	0	0	0
constantly	insisted	1	0	0	0	0	0	0	1	0	0	0
constantly	on	1	0	0	0	0	0	0	0	1	0	0
constantly	questioned	1	0	0	1	0	0	0	0	0	0	0
constantly	right	1	0	0	0	0	0	0	0	0	0	1
constantly	socrates	1	0	1	0	0	0	0	0	0	0	0
constantly	the	1	0	0	0	0	0	0	0	0	1	0
constellations	activists	1	1	0	

dynamics	anarchism	1	0	0	0	0	0	0	0	1	0	0
dynamics	and	1	1	0	0	0	0	0	0	0	0	0
dynamics	autonomy	1	0	1	0	0	0	0	0	0	0	0
dynamics	because	1	0	0	1	0	0	0	0	0	0	0
dynamics	carry	1	0	0	1	0	0	0	0	0	0	0
dynamics	gender	1	0	0	0	0	0	0	1	0	0	0
dynamics	hierarchy	1	0	0	0	0	0	0	1	0	0	0
dynamics	impose	1	0	0	0	0	0	0	0	0	0	1
dynamics	is	1	0	0	0	0	0	0	0	0	1	0
dynamics	obliged	1	0	0	0	0	0	0	0	0	0	1
dynamics	of	2	0	0	0	1	0	1	0	0	0	0
dynamics	roles	1	0	0	0	0	0	0	0	1	0	0
dynamics	s	1	1	0	0	0	0	0	0	0	0	0
dynamics	sexuality	1	0	1	0	0	0	0	0	0	0	0
dynamics	that	1	0	0	0	0	0	1	0	0	0	0
dynamics	the	1	0	0	0	0	1	0	0	0	0	0
dynamics	them	1	0	0	0	0	1	0	0	0	0	0
dynamics	traditionally	1	0	0	0	0	0	0	0	0	1	0
e	an	1	0	0	0	1	0	0	0	0	0	0
e	anarchism	1	0	0	0	1	0	0	0	0	0	0
e	anarcho	1	0	0	0	0	0	0	0	1	0	0
e	and	4	2	0	0	0	0	0	1	0	0	1
e	arkhos	2	0	0	0	1	0	0	0	0	0	1
e	as	1	0	0	1	0	0	0	0	0	0	0
e	at	1	0	0	0	0	0	1	0	0	0	0
e	autocratic	1	0	0	0	1	0	0	0	0	0	0
e	bolshevism	1	0	0	0	0	0	0	0	1	0	0
e	but	1	0	0	0	0	0	0	0	1	0	0
e	capitalism	1	0	

finds	than	1	0	0	1	0	0	0	0	0	0	0
finds	that	1	0	0	0	0	0	0	0	0	1	0
finds	the	1	0	0	0	0	0	0	0	0	1	0
finds	to	1	1	0	0	0	0	0	0	0	0	0
finds	which	1	0	0	0	1	0	0	0	0	0	0
finds	while	1	0	0	0	0	0	0	0	0	0	1
first	1840	1	0	0	0	1	0	0	0	0	0	0
first	1864	1	0	0	0	0	0	0	0	0	0	1
first	1886	1	0	0	0	0	0	0	0	0	1	0
first	19th	1	0	0	1	0	0	0	0	0	0	0
first	20th	1	0	0	0	0	0	0	0	0	1	0
first	a	1	0	0	0	0	0	0	0	0	0	1
first	adopted	1	0	0	0	0	0	1	0	0	0	0
first	among	1	0	0	0	1	0	0	0	0	0	0
first	an	1	0	0	0	0	0	0	0	1	0	0
first	anarchism	1	0	0	0	0	0	0	0	0	1	0
first	anarchist	1	0	0	0	0	0	1	0	0	0	0
first	anarchists	3	0	0	0	0	1	0	1	0	0	1
first	and	4	0	2	0	1	0	0	0	0	1	0
first	article	1	0	0	0	0	0	0	0	0	0	1
first	articulated	1	0	0	0	0	0	1	0	0	0	0
first	as	4	0	0	0	1	1	0	1	0	1	0
first	authority	1	0	0	0	1	0	0	0	0	0	0
first	back	1	0	0	1	0	0	0	0	0	0	0
first	beliefs	1	0	0	0	1	0	0	0	0	0	0
first	biological	1	1	0	0	0	0	0	0	0	0	0
first	but	1	0	0	1	0	0	0	0	0	0	0
first	by	1	0	0	0	0	0	0	1	0	0	0
first	call	1	0	0	0	0	0	0	0	0	1	0
first	centu

higher	priority	1	0	0	0	0	0	1	0	0	0	0
higher	ranks	1	0	0	1	0	0	0	0	0	0	0
higher	than	1	0	0	0	0	0	0	0	0	0	1
higher	to	1	0	0	0	0	0	0	1	0	0	0
highly	a	1	0	0	0	0	0	0	0	0	0	1
highly	although	1	1	0	0	0	0	0	0	0	0	0
highly	as	1	0	0	0	0	1	0	0	0	0	0
highly	at	1	0	0	0	0	1	0	0	0	0	0
highly	beliefs	1	0	0	0	0	0	0	1	0	0	0
highly	events	1	0	0	0	0	0	0	1	0	0	0
highly	forms	1	0	1	0	0	0	0	0	0	0	0
highly	how	1	0	0	0	0	0	0	0	0	1	0
highly	make	1	0	0	0	0	0	0	0	1	0	0
highly	of	1	0	0	1	0	0	0	0	0	0	0
highly	on	1	0	0	0	0	0	0	0	1	0	0
highly	protesting	1	0	0	0	1	0	0	0	0	0	0
highly	saw	1	0	0	1	0	0	0	0	0	0	0
highly	symbolic	1	0	0	0	0	0	1	0	0	0	0
highly	they	1	0	1	0	0	0	0	0	0	0	0
highly	this	1	0	0	0	1	0	0	0	0	0	0
highly	to	1	0	0	0	0	0	0	0	0	0	1
highly	unlikely	1	0	0	0	0	0	1	0	0	0	0
highly	up	1	0	0	0	0	0	0	0	0	1	0
highly	various	1	1	0	0	0	0	0	0	0	0	0
himself	an	1	0	0	0	0	0	1	0	0	0	0
himself	anarchist	1	0	0	0	0	0	0	1	0	0	0
himself	call	1	0	0	0	0	1	0	0	0	0	0
himself	first	1	1	0	0	0	0	0	0	0	0	0
himself	joseph	1	0	0	0	0	0	

isolation	the	1	0	0	0	0	0	0	0	0	1	0
isolation	which	1	0	0	0	0	0	0	1	0	0	0
isolation	world	1	0	0	0	1	0	0	0	0	0	0
it	1960s	1	1	0	0	0	0	0	0	0	0	0
it	19th	1	0	0	0	0	0	0	0	0	0	1
it	a	12	0	2	0	0	0	1	5	3	0	1
it	ability	2	0	0	0	0	0	0	0	1	1	0
it	abolition	1	1	0	0	0	0	0	0	0	0	0
it	advocates	1	0	0	0	0	0	1	0	0	0	0
it	after	2	1	0	0	0	0	0	1	0	0	0
it	alienation	1	1	0	0	0	0	0	0	0	0	0
it	all	1	0	1	0	0	0	0	0	0	0	0
it	alongside	2	0	0	1	0	0	0	0	0	1	0
it	also	4	1	0	1	0	0	2	0	0	0	0
it	although	2	0	0	0	0	1	0	0	0	0	1
it	an	1	0	0	0	0	0	0	0	1	0	0
it	anarchism	7	3	1	2	0	0	0	0	0	0	1
it	anarchist	2	1	1	0	0	0	0	0	0	0	0
it	anarchists	2	0	0	1	0	1	0	0	0	0	0
it	anarcho	2	0	0	1	0	0	0	0	1	0	0
it	and	11	0	1	1	3	1	0	0	1	0	4
it	antipathetic	1	1	0	0	0	0	0	0	0	0	0
it	apart	1	0	0	0	0	0	1	0	0	0	0
it	appeals	1	0	0	0	0	0	1	0	0	0	0
it	argues	2	0	0	0	1	0	1	0	0	0	0
it	argument	1	0	0	0	0	0	0	0	1	0	0
it	arguments	1	0	0	0	0	0	0	0	0	0	1
it	arose	1	0	1	0	0	0	0	0	0	0	0
it	art	1	0	1	0	0	0	0	0	0	0	0
it	as	10	0	3	0	0	2	3	1	0	1	0
it	asininity

method	indoctrination	1	0	1	0	0	0	0	0	0	0	0
method	mainstream	1	0	0	0	0	0	0	0	0	1	0
method	spread	1	0	0	0	0	0	0	1	0	0	0
method	teaching	1	0	0	0	0	1	0	0	0	0	0
method	than	1	1	0	0	0	0	0	0	0	0	0
methods	a	1	0	0	0	0	0	0	0	0	0	1
methods	anarchist	1	1	0	0	0	0	0	0	0	0	0
methods	anarchists	1	0	1	0	0	0	0	0	0	0	0
methods	build	1	0	0	0	0	0	0	0	0	1	0
methods	by	1	0	0	0	0	0	1	0	0	0	0
methods	disagree	1	0	0	1	0	0	0	0	0	0	0
methods	employed	1	0	0	0	1	0	0	0	0	0	0
methods	forms	1	0	0	0	0	0	0	0	0	1	0
methods	have	1	0	0	1	0	0	0	0	0	0	0
methods	in	1	0	0	0	0	0	1	0	0	0	0
methods	on	1	0	0	0	1	0	0	0	0	0	0
methods	order	1	0	0	0	0	0	0	1	0	0	0
methods	role	1	1	0	0	0	0	0	0	0	0	0
methods	schools	1	0	1	0	0	0	0	0	0	0	0
methods	should	1	0	0	0	0	0	0	0	0	0	1
methods	the	1	0	0	0	0	1	0	0	0	0	0
methods	these	1	0	0	0	0	0	0	0	1	0	0
methods	to	1	0	0	0	0	0	0	0	1	0	0
methods	various	1	0	0	0	0	1	0	0	0	0	0
methods	which	1	0	0	0	0	0	0	1	0	0	0
mexico	and	2	0	0	0	0	1	1	0	0	0	0
mexico	as	1	0	1	0	0	0	0	0	0	0	0
mexico	black	1	0	0	0	0	

of	wave	4	0	0	0	0	4	0	0	0	0	0
of	way	1	0	0	0	0	1	0	0	0	0	0
of	websites	1	0	0	0	1	0	0	0	0	0	0
of	well	4	2	0	0	0	0	0	0	1	1	0
of	were	13	2	0	1	3	0	0	2	1	2	2
of	what	3	0	0	0	0	0	1	0	1	1	0
of	where	3	0	0	1	0	0	0	0	2	0	0
of	whether	1	0	0	0	0	0	1	0	0	0	0
of	which	14	3	2	1	0	0	1	1	4	1	1
of	while	1	0	0	0	0	0	0	1	0	0	0
of	whilst	2	1	1	0	0	0	0	0	0	0	0
of	who	2	0	0	2	0	0	0	0	0	0	0
of	whom	1	0	0	0	0	0	1	0	0	0	0
of	whose	3	1	0	0	0	0	0	0	1	0	1
of	widely	2	0	0	1	0	0	0	0	1	0	0
of	wider	3	0	1	0	1	0	0	1	0	0	0
of	will	6	0	1	0	0	2	0	0	2	1	0
of	william	1	0	0	0	0	0	0	0	1	0	0
of	wing	4	0	0	1	0	2	0	1	0	0	0
of	with	14	2	2	6	0	0	0	0	0	3	1
of	within	4	0	1	0	0	0	0	1	0	1	1
of	without	5	1	1	2	0	0	0	0	1	0	0
of	witnessed	1	0	0	1	0	0	0	0	0	0	0
of	women	2	0	0	0	0	0	1	0	0	0	1
of	work	2	0	0	0	0	1	0	0	0	0	1
of	workers	6	0	1	0	0	1	1	0	2	0	1
of	world	7	0	0	0	0	2	1	2	0	1	1
of	would	1	0	0	0	0	0	0	0	0	0	1
of	writers	1	0	0	0	0	0	0	0	0	1	0
of	wrote	1	0	0	0	0	0	0	0	0	1	0
of	wto	1	0	0	1	0	0	0	0	0	0	0
of	yet	1	0	0	1	0	0	0	0	0	0	0
of

possible	along	1	0	0	0	0	0	0	0	1	0	0
possible	and	3	1	0	0	0	0	0	0	2	0	0
possible	any	1	0	0	0	1	0	0	0	0	0	0
possible	but	1	0	0	0	0	0	1	0	0	0	0
possible	by	1	0	0	1	0	0	0	0	0	0	0
possible	camouflaged	1	0	0	0	1	0	0	0	0	0	0
possible	capitalism	1	0	0	0	0	0	0	1	0	0	0
possible	coercion	1	0	0	1	0	0	0	0	0	0	0
possible	emma	1	0	0	0	0	0	1	0	0	0	0
possible	errico	1	0	0	0	0	0	0	0	0	1	0
possible	goldman	1	0	0	0	0	0	0	1	0	0	0
possible	if	2	0	0	0	0	1	0	0	0	0	1
possible	its	1	1	0	0	0	0	0	0	0	0	0
possible	justice	1	0	0	1	0	0	0	0	0	0	0
possible	malatesta	1	0	0	0	0	0	0	0	0	0	1
possible	means	2	1	0	0	0	1	0	0	0	0	0
possible	not	1	0	0	0	0	1	0	0	0	0	0
possible	of	1	0	1	0	0	0	0	0	0	0	0
possible	oppression	1	0	1	0	0	0	0	0	0	0	0
possible	other	1	0	0	0	0	0	0	0	0	0	1
possible	overt	1	0	0	0	0	0	0	0	0	1	0
possible	social	1	0	1	0	0	0	0	0	0	0	0
possible	unceremonious	1	0	0	0	0	0	0	1	0	0	0
possible	under	1	0	0	0	0	0	1	0	0	0	0
possible	were	1	0	0	0	1	0	0	0	0	0	0
possible	with	1	0	0	0	0	0	0	0	0	1	0
post	19th	1	0	1	0	0	0	0

say	an	1	0	0	0	0	0	0	0	0	1	0
say	anarchism	1	0	0	0	0	0	0	1	0	0	0
say	approach	1	0	0	0	0	0	0	0	0	0	1
say	be	1	0	0	1	0	0	0	0	0	0	0
say	cluster	1	0	0	0	0	0	0	0	0	0	1
say	decision	1	0	0	0	0	0	0	0	1	0	0
say	each	1	0	0	0	0	0	0	1	0	0	0
say	equal	1	0	0	0	0	1	0	0	0	0	0
say	everyone	1	0	0	1	0	0	0	0	0	0	0
say	having	1	0	0	0	1	0	0	0	0	0	0
say	in	1	0	0	0	0	0	1	0	0	0	0
say	is	1	0	0	0	0	0	0	0	1	0	0
say	it	1	1	0	0	0	0	0	0	0	0	0
say	might	1	0	1	0	0	0	0	0	0	0	0
say	that	1	0	0	0	0	0	1	0	0	0	0
say	to	1	0	0	0	0	1	0	0	0	0	0
say	true	1	0	0	0	1	0	0	0	0	0	0
say	way	1	1	0	0	0	0	0	0	0	0	0
say	with	1	0	1	0	0	0	0	0	0	0	0
scenes	anarchists	1	0	0	0	0	0	0	0	0	1	0
scenes	as	1	0	0	0	0	0	0	1	0	0	0
scenes	associated	1	0	0	1	0	0	0	0	0	0	0
scenes	been	1	0	1	0	0	0	0	0	0	0	0
scenes	has	1	1	0	0	0	0	0	0	0	0	0
scenes	music	1	0	0	0	0	1	0	0	0	0	0
scenes	punk	1	0	0	0	0	0	0	0	1	0	0
scenes	such	2	0	0	0	0	0	1	0	0	0	1
scenes	with	1	0	0	0	1	0	0	0	0	0	0
sceptical	all	1	0	0	0	0	0	0	0	0	0	1
sceptical	and	2	0	1	0	0	0	0	0	1	0	0
sceptical	au

such	william	2	0	0	0	0	0	0	2	0	0	0
such	wing	1	0	1	0	0	0	0	0	0	0	0
such	with	3	0	0	1	1	0	0	0	0	0	1
such	writers	1	0	0	0	0	1	0	0	0	0	0
such	wrote	1	0	0	0	0	0	0	0	1	0	0
suffix	current	1	0	0	0	0	0	0	0	0	0	1
suffix	denotes	1	0	0	0	0	0	0	1	0	0	0
suffix	e	1	1	0	0	0	0	0	0	0	0	0
suffix	ideological	1	0	0	0	0	0	0	0	0	1	0
suffix	ism	1	0	0	0	0	0	1	0	0	0	0
suffix	leader	1	0	1	0	0	0	0	0	0	0	0
suffix	or	1	0	0	1	0	0	0	0	0	0	0
suffix	ruler	1	0	0	0	1	0	0	0	0	0	0
suffix	the	2	0	0	0	0	1	0	0	1	0	0
suffrage	anarchist	1	0	1	0	0	0	0	0	0	0	0
suffrage	but	1	0	0	0	0	0	1	0	0	0	0
suffrage	differed	1	0	0	0	1	0	0	0	0	0	0
suffrage	feminists	1	0	0	1	0	0	0	0	0	0	0
suffrage	non	1	1	0	0	0	0	0	0	0	0	0
suffrage	nonetheless	1	0	0	0	0	0	0	0	0	1	0
suffrage	on	1	0	0	0	0	1	0	0	0	0	0
suffrage	supportive	1	0	0	0	0	0	0	0	0	0	1
suffrage	they	1	0	0	0	0	0	0	1	0	0	0
suffrage	were	1	0	0	0	0	0	0	0	1	0	0
suggesting	a	1	0	0	0	0	0	1	0	0	0	0
suggesting	an	1	0	0	0	0	0	0	0	0	0	1
suggesting	feminist	1	0	0	0	1	0	0	0	0	0	0
suggesting	from	1	1	0	

theory	anarcho	1	0	1	0	0	0	0	0	0	0	0
theory	and	2	1	0	0	0	0	0	0	0	0	1
theory	at	1	0	0	0	0	0	1	0	0	0	0
theory	belief	1	0	0	0	0	0	0	0	0	1	0
theory	bitter	1	0	1	0	0	0	0	0	0	0	0
theory	by	1	0	0	0	0	0	1	0	0	0	0
theory	capitalism	1	0	0	0	0	0	0	0	0	1	0
theory	century	1	0	0	0	1	0	0	0	0	0	0
theory	communism	1	0	0	1	0	0	0	0	0	0	0
theory	compatibility	1	0	0	0	0	0	0	1	0	0	0
theory	concerns	1	0	0	1	0	0	0	0	0	0	0
theory	criticism	1	0	0	0	0	0	0	0	0	0	1
theory	debates	1	0	0	1	0	0	0	0	0	0	0
theory	definitional	1	0	1	0	0	0	0	0	0	0	0
theory	developed	2	0	0	1	0	0	0	0	1	0	0
theory	economic	1	0	0	0	0	1	0	0	0	0	0
theory	favours	1	0	1	0	0	0	0	0	0	0	0
theory	fertile	1	0	0	0	0	0	0	0	0	1	0
theory	found	1	0	0	0	0	0	0	0	1	0	0
theory	groups	1	0	0	0	0	0	0	0	1	0	0
theory	in	3	0	0	0	1	0	1	0	0	0	1
theory	into	3	0	1	0	1	0	0	0	0	1	0
theory	is	2	1	0	0	1	0	0	0	0	0	0
theory	its	1	0	0	0	0	0	0	0	0	0	1
theory	joseph	2	0	0	1	0	0	0	0	1	0	0
theory	line	1	0	0	0	0	0	0	0	1	0	0
theory	mutualism	1	0	0	0	0	0	0	1	0	0	0
theory	nationalis

view	a	1	0	0	0	0	0	0	1	0	0	0
view	according	1	0	1	0	0	0	0	0	0	0	0
view	anarchists	1	0	0	0	0	0	0	1	0	0	0
view	become	1	0	0	1	0	0	0	0	0	0	0
view	by	1	0	0	0	0	0	0	0	1	0	0
view	dominating	1	0	0	0	0	1	0	0	0	0	0
view	go	1	1	0	0	0	0	0	0	0	0	0
view	idea	1	0	0	0	0	0	0	0	0	1	0
view	late	1	0	0	0	0	0	0	0	0	0	1
view	marxist	1	0	0	0	0	1	0	0	0	0	0
view	of	1	0	0	0	0	0	1	0	0	0	0
view	onto	1	0	1	0	0	0	0	0	0	0	0
view	social	1	0	0	0	0	0	0	0	1	0	0
view	that	1	0	0	0	0	0	1	0	0	0	0
view	the	3	0	0	0	2	0	0	0	0	1	0
view	to	1	0	0	1	0	0	0	0	0	0	0
view	values	1	1	0	0	0	0	0	0	0	0	0
view	would	1	0	0	0	0	0	0	0	0	0	1
views	a	2	1	0	0	0	0	0	0	0	1	0
views	according	1	0	0	0	0	0	0	0	1	0	0
views	accused	1	0	0	1	0	0	0	0	0	0	0
views	all	1	1	0	0	0	0	0	0	0	0	0
views	anarchism	1	0	0	0	1	0	0	0	0	0	0
views	anarchist	1	0	0	0	0	1	0	0	0	0	0
views	anarchists	1	0	0	0	0	0	0	0	1	0	0
views	and	1	0	0	0	0	0	1	0	0	0	0
views	as	1	0	0	0	0	0	0	0	1	0	0
views	branch	1	0	1	0	0	0	0	0	0	0	0
views	few	1	1	0	0	0	0	0	0	0	0	0
views	have	1	0	0	0	1	0	0	0	0

### Step 4: Combine them together!  

Now you can move your code above into mapper.py and reducer.py (with some tiny modifications, of course), and this is your assignment this week!   
See below for detailed requirement description.  

**Hints: What should I modify in my mapper and reducer?**  

1. Receive/pass data from standard I/O, rather than the file (We've already done this for you)
2. Process with the whole dataset, rather than only the first line

That's it!  

The processing takes some times (~1hr w/o parallel computing), so go enjoy some coffee or movies (or sleep) during the waiting time ;)

<a name="Assignment-Requirement"></a>
## Assignment Requirement 

1. You need to implement the `mapper.py` and `reducer.py` to calculate the skip-gram table.

2. In `mapper.py`, you need to generate skipgrams with distance within -5 to 5 (inclusive).  
   - Input: Pure text file (`wiki1G.txt`) with each line as a wikipage.
   - Output: `"{pivot}\t{word}\t{distance}\t{count}"`
   - Example: 
     ```
     predict is  -3  1
     predict used    -2  1
     predict the -1  1
     predict the 1   1
     ...
     ```
   - Sample output: `mapper.sample.tsv` (Find it [here](https://drive.google.com/drive/folders/1vKxr--sLd2J4kdsXUzJDBZdG3AmV4NGl?usp=sharing); no need to be exactly the same)

3. In `reducer.py`, you have to collect the output from the shuffler (`sort`) and generate the skip-gram table.
   - Input: `"{pivot}\t{word}\t{distance}\t{count}"`
   - Output: 
     - `"{pivot}\t{word}\t{total}\t{-5}\t{-4}\t{-3}\t{-2}\t{-1}\t{1}\t{2}\t{3}\t{4}\t{5}"`
     - The first two column are the skipgram; the third column is the sum of total frequency; column 4\~13 are the frequency with distance -5\~5, without 0.
   - Example:
     ```
     arouse  of      1       0       0       0       0       0       0       0       1       0       0
     arouse  open    4       0       0       3       0       0       0       0       0       0       1
     arouse  so      2       0       1       0       0       0       0       0       0       1       0
     arouse  sufficiently    1       0       1       0       0       0       0       0       0       0       0
     ...
     ```
   - Sample output: `reducer.sample.tsv` (Find it [here](https://drive.google.com/drive/folders/1vKxr--sLd2J4kdsXUzJDBZdG3AmV4NGl?usp=sharing); no need to be exactly the same)

4. Concate your MapReduce procedure and generate the skip-gram on wiki1G dataset
   - Unix: 
     - Use the [local map-reduce tool](https://github.com/dspp779/local-mapreduce) (faster),
     - or run it directly: `python mapper.py < wiki1G.txt | sort -k1,2 -k3n | python reducer.py > skipgram.tsv` (slower)
   - Windows: 
     - CMD: `python mapper.py < wiki1G.txt | sort | python reducer.py > skipgram.tsv`
     - PS: `type wiki1G.txt | python mapper.py | sort | python reducer.py > skipgram.tsv`
     - or the bash environment you installed last week.  
   - See [Appendix](#built-in-command) if you want to know what these commands mean

During the demo, you need to 

1. show us your skip-gram result on the given dataset, and
2. explain your implementation in `mapper.py` and `reducer.py`.  

Note that the final result would be a large file (~6 GB), so **you may want to show it with `more` or `less` command**.  

## TA's note

Congratulations! You've learned how to calculate skipgram frequency and to deal with a huge dataset with MapReduce technique.  

Remember to <b><a href="https://docs.google.com/spreadsheets/d/1QGeYl5dsD9sFO9SYg4DIKk-xr-yGjRDOOLKZqCLDv2E/edit?usp=sharing">make an appoiment with TA</a> to demo/explain your implementation <u>before <font color="red">10/21 15:30</font></u></b> .  
You should also submit your `mapper.py` and `reducer.py` to <a href="https://eeclass.nthu.edu.tw/course/homework/3285">eeclass</a> .

<a name="built-in-command"></a>
## Appendix: useful built-in commands

Several built-in commands are very useful in the MapReducer procedure.  
Here we introduce `cat` and `type`, `<` and `>`, `sort`, and pipe `|`.  

### cat (on Unix)
`cat` command, which is definitly not indicating some cute creatures (*meow~*), is the abbreviation of `concatenate`. ([doc](https://man7.org/linux/man-pages/man1/cat.1.html))   

When you `cat` a file, it means you want to print the content from a file (or some files) to standard output.  
Now open your bash and test the command below!  
```bash
cat file.txt
```

You should see something like this: 

![picture](https://i.imgur.com/Z9shOYQ.png)

### type (on Windows)
`type` command works exactly the same as `cat` on Unix, but without its cute nickname (Shame on you, Windows). ([doc](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/type))  

Similarly, if you `type` a file, it means you print the content from a file (or some files) to standard output.  

```powershell
type file.txt
```

You should see something like this:  
![](https://i.imgur.com/5WFhxkq.png)

### `>`? `<`? `>///<`? 

`<` and `>` are the I/O redirections.  
`program < filename` means that you want to redirect the input from a file to a program, while `program > filename` means that you want to redirect the output of a program to that file.  

For example,
```bash
echo "hello world" > greet.txt
```
writes the string "hello world" into a file `greet.txt`.  

On the other hand, 
```bash
head < greet.txt
```
makes `head` receive the content from `greet.txt`, so it will print out the string in `greet.txt`.  
![](https://i.imgur.com/swxv8LG.png)
<small>p.s. `>///<` is just a joke. Don't take it seriously.</small>

### sort

As its name suggests, `sort` sorts the data that it receives. (doc on [Linux](https://man7.org/linux/man-pages/man1/sort.1.html) and on [Windows](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/sort))  
Try this:
```
sort sample.txt
```
You can see that the content has been sorted before printed onto your screen.  
![](https://i.imgur.com/QFEq3Tc.png)

### Pipe `|`

Pipe passes the output from previous program to the next program.  
For example, 
```bash
python program.py | sort
```
will pass the output of `program.py` to `sort` command.  