## Libraries

In [2]:
!pip install -q mrjob

[0m

## Task Solution

In [130]:
ls ../data

[0m[01;32mPrinciplesOfLazersZvelto.pdf[0m*  SW_EpisodeV.txt   rubrics.json
SW_EpisodeIV.txt               SW_EpisodeVI.txt


In [4]:
path_to_txt = "../data/SW_EpisodeIV.txt"

In [5]:
with open(path_to_txt, "r") as fo:
    exp = fo.readlines()

In [60]:
line = exp[13]

In [65]:
_, chr_name, words = line.split('" "')

In [66]:
print(words)

We intercepted no transmissions. Aaah...  This is a consular ship. Were on a diplomatic mission."



### MapRed Job

In [72]:
%%file long_talking.py

from mrjob.job import MRJob, MRStep
from collections import defaultdict

class MRTheLongestTalking(MRJob):

    def init_get_phrases(self):
        self.phrases = defaultdict(list)
    
    def get_phrase(self, _, line):
        try:
            _, chr_name, words = line.split('" "')
            # Save only the longest phrase mapper've seen so far
            if len(words) > len(self.phrases[chr_name]):
                self.phrases[chr_name] = words
        except ValueError:
            # Trigger first bad excess line
            pass
        
    def final_get_phrase(self):
        for chr_name, phrase in self.phrases.items():
            yield chr_name, phrase

    def leave_longest_phrase(self, chr_name, phrases):
        """Find and leave the longest phrase per character."""
        longest_phrase_len = float("-inf")
        longest_phrase = ""
        for phrase in phrases:
            if len(phrase) > longest_phrase_len:
                # Update vals
                longest_phrase = phrase
                longest_phrase_len = len(phrase)
        yield None, (chr_name, longest_phrase)
    
    def sort_phrases(self, _, pairs):
        sorted_pairs = sorted(pairs, reverse=True, key=lambda x: len(x[1]))
        for pair in sorted_pairs:
            yield pair

    def steps(self):
        return [MRStep(mapper_init=self.init_get_phrases,
                       mapper=self.get_phrase,
                       mapper_final=self.final_get_phrase,
                       reducer=self.leave_longest_phrase,
                      ),
               MRStep(reducer=self.sort_phrases)
               ]

if __name__ == '__main__':
    MRTheLongestTalking.run()

Overwriting long_talking.py


## Test locally

In [73]:
!python3 long_talking.py "../data/SW_EpisodeIV.txt"

No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/long_talking.root.20231129.202208.505182
Running step 1 of 2...
Running step 2 of 2...
job output is in /tmp/long_talking.root.20231129.202208.505182/output
Streaming final output from /tmp/long_talking.root.20231129.202208.505182/output...
"LEIA"	"General Kenobi, years ago you served my father in the Clone Wars.  Now he begs you to help him in his struggle against the Empire.  I regret that I am unable to present my father's request to you in person, but my ship has fallen under attack and I'm afraid my mission to bring you to Alderaan has failed.  I have placed information vital to the survival of the Rebellion into the memory systems of this R2 unit.  My father will know how to retrieve it.  You must see this droid safely delivered to him on Alderaan.  This is our most desperate hour.  Help me, Obi-Wan Kenobi, you're my only hope.\""
"BIGGS"	"I feel for you, Luke,

## Running on the cluster

### Put the data on the cluster

In [74]:
!hadoop fs -ls /

Found 6 items
drwxr-xr-x   - root supergroup          0 2023-11-29 22:52 /MR_data
drwxr-xr-x   - root supergroup          0 2023-11-29 22:58 /MR_data1
drwxr-xr-x   - root supergroup          0 2023-11-29 23:09 /MR_data2
drwxr-xr-x   - root supergroup          0 2023-11-28 21:33 /book
drwx-wx-wx   - root supergroup          0 2023-11-28 21:21 /tmp
drwxr-xr-x   - root supergroup          0 2023-11-28 21:21 /user


In [75]:
!hadoop fs -mkdir /MR_data2

mkdir: `/MR_data2': File exists


In [76]:
!hadoop fs -put -f ../data/SW_EpisodeIV.txt /MR_data && \
 hadoop fs -put -f ../data/SW_EpisodeV.txt /MR_data && \
 hadoop fs -put -f ../data/SW_EpisodeVI.txt /MR_data

### Let's go

In [77]:
!python3 long_talking.py -r hadoop hdfs:///MR_data/SW_EpisodeIV.txt --output /MR_data2/outputIV && \
 python3 long_talking.py -r hadoop hdfs:///MR_data/SW_EpisodeV.txt --output /MR_data2/outputV && \
 python3 long_talking.py -r hadoop hdfs:///MR_data/SW_EpisodeVI.txt --output /MR_data2/outputVI && \
 python3 long_talking.py -r hadoop hdfs:///MR_data/SW_EpisodeIV.txt \
                                   hdfs:///MR_data/SW_EpisodeV.txt \
                                   hdfs:///MR_data/SW_EpisodeVI.txt  --output /MR_data2/outputALL

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /opt/hadoop/bin...
Found hadoop binary: /opt/hadoop/bin/hadoop
Using Hadoop version 3.3.6
Looking for Hadoop streaming jar in /opt/hadoop...
Found Hadoop streaming jar: /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
Creating temp directory /tmp/long_talking.root.20231129.202241.298940
uploading working dir files to hdfs:///user/root/tmp/mrjob/long_talking.root.20231129.202241.298940/files/wd...
Copying other local files to hdfs:///user/root/tmp/mrjob/long_talking.root.20231129.202241.298940/files/
Running step 1 of 2...
  packageJobJar: [/tmp/hadoop-unjar2823933443929228326/] [] /tmp/streamjob7737101144379604595.jar tmpDir=null
  Connecting to ResourceManager at resourcemanager/172.21.0.3:8032
  Connecting to ResourceManager at resourcemanager/172.21.0.3:8032
  Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1701267673176_

## Collect Data Result

In [78]:
!hadoop fs -cat /MR_data2/outputIV/part-00000

"LEIA"	"General Kenobi, years ago you served my father in the Clone Wars.  Now he begs you to help him in his struggle against the Empire.  I regret that I am unable to present my father's request to you in person, but my ship has fallen under attack and I'm afraid my mission to bring you to Alderaan has failed.  I have placed information vital to the survival of the Rebellion into the memory systems of this R2 unit.  My father will know how to retrieve it.  You must see this droid safely delivered to him on Alderaan.  This is our most desperate hour.  Help me, Obi-Wan Kenobi, you're my only hope.\""
"BIGGS"	"I feel for you, Luke, you're going to have to learn what seems to be important or what really is important.  What good is all your uncle's work if it's taken over by the Empire?...  You know they're starting to nationalize commerce in the central systems...it won't be long before your uncle is merely a tenant, slaving for the greater glory of the Empire.\""
"DODONNA"	"The approach

In [79]:
!hadoop fs -cat /MR_data2/outputV/part-00000

"YODA"	"Ready, are you? What know you of ready? For eight hundred years  have I trained Jedi. My own counsel will I keep on who is to be trained! A Jedi must have the deepest commitment, the most serious mind.  This one a long time have I watched. Never his mind on where he was. Hmm? What he was doing. Hmph. Adventure. Heh! Excitement. Heh! A Jedi craves not these things.  You are reckless!\""
"VADER"	"There is no escape. Don't make me destroy you. You do not yet  realize your importance. You have only begun to discover you power. Join me and I will complete your training. With our combined strength, we can end this destructive conflict and bring order to the galaxy.\""
"LEIA"	"All troop carriers will assemble at the north entrance. The  heavy transport ships will leave as soon as they're loaded. Only two fighter escorts per ship. The energy shield can only be opened for a short time, so you'll have to stay very close to your transports.\""
"THREEPIO"	"Don't try to blame me. I didn't a

In [80]:
!hadoop fs -cat /MR_data2/outputVI/part-00000

"BEN"	"The Organa household was high-born and politically quite powerful in that system. Leia became a princess by virtue of lineage... no one knew she'd been adopted, of course. But it was a title without real power, since Alderaan had long been a democracy.  Even so, the family continued to be politically powerful, and Leia, following in her foster father's path, became a senator as well.  That's not all she became, of course... she became the leader of her cell in the Alliance against the corrupt Empire. And because she had diplomatic immunity, she was a vital link for getting information to the Rebel cause.  That's what she was doing when her path crossed yours... for her foster parents had always told her to contact me on Tatooine, if her troubles became desperate.\""
"ACKBAR"	"You can see here the Death Star orbiting the forest Moon of Endor. Although the weapon systems on this Death Star are not yet operational, the Death Star does have a strong defense mechanism. It is protecte

In [81]:
!hadoop fs -cat /MR_data2/outputALL/part-00000

"BEN"	"The Organa household was high-born and politically quite powerful in that system. Leia became a princess by virtue of lineage... no one knew she'd been adopted, of course. But it was a title without real power, since Alderaan had long been a democracy.  Even so, the family continued to be politically powerful, and Leia, following in her foster father's path, became a senator as well.  That's not all she became, of course... she became the leader of her cell in the Alliance against the corrupt Empire. And because she had diplomatic immunity, she was a vital link for getting information to the Rebel cause.  That's what she was doing when her path crossed yours... for her foster parents had always told her to contact me on Tatooine, if her troubles became desperate.\""
"LEIA"	"General Kenobi, years ago you served my father in the Clone Wars.  Now he begs you to help him in his struggle against the Empire.  I regret that I am unable to present my father's request to you in person, b

In [82]:
!hadoop fs -get -f /MR_data2/outputIV/part-00000  ../data/MR2/outputIV.txt && \
 hadoop fs -get -f /MR_data2/outputV/part-00000  ../data/MR2/outputV.txt && \
 hadoop fs -get -f /MR_data2/outputVI/part-00000  ../data/MR2/outputVI.txt && \
 hadoop fs -get -f /MR_data2/outputALL/part-00000  ../data/MR2/outputALL.txt