## Задача 2: Воодушевляющая речь.

In [None]:
!pip install mrjob

In [3]:
%%file task_2_script.py

import csv

from mrjob.job import MRJob
from mrjob.step import MRStep


class EpisodeLongestPhrases(MRJob):
    def mapper(self, _, line):
        reader = csv.reader([line], delimiter=' ')
        elements = next(reader)
        if len(elements) == 3:
            character = elements[1]
            phrase = elements[2]
        
            yield character, phrase

    def reducer(self, character, phrases):
        longest_phrase = max(phrases, key=len)
        
        yield None, (character, longest_phrase)

    def reducer_sort(self, _, longest_phrases):
        sorted_phrases = sorted(longest_phrases, key=lambda x: len(x[1]), reverse=True)
        for character, phrase in sorted_phrases:
            yield character, phrase

    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer),
            MRStep(reducer=self.reducer_sort)
        ]

if __name__ == '__main__':
    EpisodeLongestPhrases.run()


Overwriting task_2_script.py


### Локальный запуск

In [4]:
!python3 task_2_script.py StarWars_data/SW_EpisodeIV.txt

No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/task_2_script.root.20231126.110309.286223
Running step 1 of 2...
Running step 2 of 2...
job output is in /tmp/task_2_script.root.20231126.110309.286223/output
Streaming final output from /tmp/task_2_script.root.20231126.110309.286223/output...
"LEIA"	"General Kenobi, years ago you served my father in the Clone Wars.  Now he begs you to help him in his struggle against the Empire.  I regret that I am unable to present my father's request to you in person, but my ship has fallen under attack and I'm afraid my mission to bring you to Alderaan has failed.  I have placed information vital to the survival of the Rebellion into the memory systems of this R2 unit.  My father will know how to retrieve it.  You must see this droid safely delivered to him on Alderaan.  This is our most desperate hour.  Help me, Obi-Wan Kenobi, you're my only hope."
"BIGGS"	"I feel for you, Luke

In [5]:
!python3 task_2_script.py StarWars_data/SW_EpisodeV.txt

No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/task_2_script.root.20231126.110443.523270
Running step 1 of 2...
Running step 2 of 2...
job output is in /tmp/task_2_script.root.20231126.110443.523270/output
Streaming final output from /tmp/task_2_script.root.20231126.110443.523270/output...
"YODA"	"Ready, are you? What know you of ready? For eight hundred years  have I trained Jedi. My own counsel will I keep on who is to be trained! A Jedi must have the deepest commitment, the most serious mind.  This one a long time have I watched. Never his mind on where he was. Hmm? What he was doing. Hmph. Adventure. Heh! Excitement. Heh! A Jedi craves not these things.  You are reckless!"
"VADER"	"There is no escape. Don't make me destroy you. You do not yet  realize your importance. You have only begun to discover you power. Join me and I will complete your training. With our combined strength, we can end this destructive c

In [7]:
!python3 task_2_script.py StarWars_data/SW_EpisodeVI.txt

No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/task_2_script.root.20231126.110516.583702
Running step 1 of 2...
Running step 2 of 2...
job output is in /tmp/task_2_script.root.20231126.110516.583702/output
Streaming final output from /tmp/task_2_script.root.20231126.110516.583702/output...
"BEN"	"The Organa household was high-born and politically quite powerful in that system. Leia became a princess by virtue of lineage... no one knew she'd been adopted, of course. But it was a title without real power, since Alderaan had long been a democracy.  Even so, the family continued to be politically powerful, and Leia, following in her foster father's path, became a senator as well.  That's not all she became, of course... she became the leader of her cell in the Alliance against the corrupt Empire. And because she had diplomatic immunity, she was a vital link for getting information to the Rebel cause.  That's what she

In [9]:
!python3 task_2_script.py StarWars_data/SW_EpisodeIV.txt StarWars_data/SW_EpisodeV.txt StarWars_data/SW_EpisodeVI.txt

No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/task_2_script.root.20231126.110613.575560
Running step 1 of 2...
Running step 2 of 2...
job output is in /tmp/task_2_script.root.20231126.110613.575560/output
Streaming final output from /tmp/task_2_script.root.20231126.110613.575560/output...
"BEN"	"The Organa household was high-born and politically quite powerful in that system. Leia became a princess by virtue of lineage... no one knew she'd been adopted, of course. But it was a title without real power, since Alderaan had long been a democracy.  Even so, the family continued to be politically powerful, and Leia, following in her foster father's path, became a senator as well.  That's not all she became, of course... she became the leader of her cell in the Alliance against the corrupt Empire. And because she had diplomatic immunity, she was a vital link for getting information to the Rebel cause.  That's what she

### На кластере

In [10]:
!python3 task_2_script.py -r hadoop hdfs://namenode:8020/block_2/data/SW_EpisodeIV.txt --output-dir=hdfs://namenode:8020/block_2/task_2/SW_EpisodeIV

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /opt/hadoop/bin...
Found hadoop binary: /opt/hadoop/bin/hadoop
Using Hadoop version 3.3.6
Looking for Hadoop streaming jar in /opt/hadoop...
Found Hadoop streaming jar: /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
Creating temp directory /tmp/task_2_script.root.20231126.110727.481837
uploading working dir files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.110727.481837/files/wd...
Copying other local files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.110727.481837/files/
Running step 1 of 2...
  packageJobJar: [/tmp/hadoop-unjar641075808230926842/] [] /tmp/streamjob8727777784015293225.jar tmpDir=null
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_170099512349

In [12]:
!hadoop fs -cat /block_2/task_2/SW_EpisodeIV/part-00000

"LEIA"	"General Kenobi, years ago you served my father in the Clone Wars.  Now he begs you to help him in his struggle against the Empire.  I regret that I am unable to present my father's request to you in person, but my ship has fallen under attack and I'm afraid my mission to bring you to Alderaan has failed.  I have placed information vital to the survival of the Rebellion into the memory systems of this R2 unit.  My father will know how to retrieve it.  You must see this droid safely delivered to him on Alderaan.  This is our most desperate hour.  Help me, Obi-Wan Kenobi, you're my only hope."
"BIGGS"	"I feel for you, Luke, you're going to have to learn what seems to be important or what really is important.  What good is all your uncle's work if it's taken over by the Empire?...  You know they're starting to nationalize commerce in the central systems...it won't be long before your uncle is merely a tenant, slaving for the greater glory of the Empire."
"DODONNA"	"The approach wil

In [13]:
!python3 task_2_script.py -r hadoop hdfs://namenode:8020/block_2/data/SW_EpisodeV.txt --output-dir=hdfs://namenode:8020/block_2/task_2/SW_EpisodeV

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /opt/hadoop/bin...
Found hadoop binary: /opt/hadoop/bin/hadoop
Using Hadoop version 3.3.6
Looking for Hadoop streaming jar in /opt/hadoop...
Found Hadoop streaming jar: /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
Creating temp directory /tmp/task_2_script.root.20231126.110938.753735
uploading working dir files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.110938.753735/files/wd...
Copying other local files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.110938.753735/files/
Running step 1 of 2...
  packageJobJar: [/tmp/hadoop-unjar4034544277075201627/] [] /tmp/streamjob3787230915441824165.jar tmpDir=null
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_17009951234

In [14]:
!hadoop fs -cat /block_2/task_2/SW_EpisodeV/part-00000

"YODA"	"Ready, are you? What know you of ready? For eight hundred years  have I trained Jedi. My own counsel will I keep on who is to be trained! A Jedi must have the deepest commitment, the most serious mind.  This one a long time have I watched. Never his mind on where he was. Hmm? What he was doing. Hmph. Adventure. Heh! Excitement. Heh! A Jedi craves not these things.  You are reckless!"
"VADER"	"There is no escape. Don't make me destroy you. You do not yet  realize your importance. You have only begun to discover you power. Join me and I will complete your training. With our combined strength, we can end this destructive conflict and bring order to the galaxy."
"LEIA"	"All troop carriers will assemble at the north entrance. The  heavy transport ships will leave as soon as they're loaded. Only two fighter escorts per ship. The energy shield can only be opened for a short time, so you'll have to stay very close to your transports."
"THREEPIO"	"Don't try to blame me. I didn't ask you

In [15]:
!python3 task_2_script.py -r hadoop hdfs://namenode:8020/block_2/data/SW_EpisodeVI.txt --output-dir=hdfs://namenode:8020/block_2/task_2/SW_EpisodeVI

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /opt/hadoop/bin...
Found hadoop binary: /opt/hadoop/bin/hadoop
Using Hadoop version 3.3.6
Looking for Hadoop streaming jar in /opt/hadoop...
Found Hadoop streaming jar: /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
Creating temp directory /tmp/task_2_script.root.20231126.111108.852654
uploading working dir files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.111108.852654/files/wd...
Copying other local files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.111108.852654/files/
Running step 1 of 2...
  packageJobJar: [/tmp/hadoop-unjar7837555150586241474/] [] /tmp/streamjob9007567859013873450.jar tmpDir=null
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_17009951234

In [16]:
!hadoop fs -cat /block_2/task_2/SW_EpisodeVI/part-00000

"BEN"	"The Organa household was high-born and politically quite powerful in that system. Leia became a princess by virtue of lineage... no one knew she'd been adopted, of course. But it was a title without real power, since Alderaan had long been a democracy.  Even so, the family continued to be politically powerful, and Leia, following in her foster father's path, became a senator as well.  That's not all she became, of course... she became the leader of her cell in the Alliance against the corrupt Empire. And because she had diplomatic immunity, she was a vital link for getting information to the Rebel cause.  That's what she was doing when her path crossed yours... for her foster parents had always told her to contact me on Tatooine, if her troubles became desperate."
"ACKBAR"	"You can see here the Death Star orbiting the forest Moon of Endor. Although the weapon systems on this Death Star are not yet operational, the Death Star does have a strong defense mechanism. It is protected 

In [17]:
!python3 task_2_script.py -r hadoop --output-dir=hdfs://namenode:8020/block_2/task_2/SW_Episodes_All \
    hdfs://namenode:8020/block_2/data/SW_EpisodeIV.txt \
    hdfs://namenode:8020/block_2/data/SW_EpisodeV.txt \
    hdfs://namenode:8020/block_2/data/SW_EpisodeVI.txt

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /opt/hadoop/bin...
Found hadoop binary: /opt/hadoop/bin/hadoop
Using Hadoop version 3.3.6
Looking for Hadoop streaming jar in /opt/hadoop...
Found Hadoop streaming jar: /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
Creating temp directory /tmp/task_2_script.root.20231126.111243.485096
uploading working dir files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.111243.485096/files/wd...
Copying other local files to hdfs:///user/root/tmp/mrjob/task_2_script.root.20231126.111243.485096/files/
Running step 1 of 2...
  packageJobJar: [/tmp/hadoop-unjar2600843384701371047/] [] /tmp/streamjob4086937073291636753.jar tmpDir=null
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Connecting to ResourceManager at resourcemanager/172.21.0.8:8032
  Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_17009951234

In [18]:
!hadoop fs -cat /block_2/task_2/SW_Episodes_All/part-00000

"BEN"	"The Organa household was high-born and politically quite powerful in that system. Leia became a princess by virtue of lineage... no one knew she'd been adopted, of course. But it was a title without real power, since Alderaan had long been a democracy.  Even so, the family continued to be politically powerful, and Leia, following in her foster father's path, became a senator as well.  That's not all she became, of course... she became the leader of her cell in the Alliance against the corrupt Empire. And because she had diplomatic immunity, she was a vital link for getting information to the Rebel cause.  That's what she was doing when her path crossed yours... for her foster parents had always told her to contact me on Tatooine, if her troubles became desperate."
"LEIA"	"General Kenobi, years ago you served my father in the Clone Wars.  Now he begs you to help him in his struggle against the Empire.  I regret that I am unable to present my father's request to you in person, but