# Big Data Platform
## Assignment 3: ServerLess

**By:**  

Oren Ben-Eliyahu 204079453 
<br>Yuval Barkan, 205714447

<br><br>

**The goal of this assignment is to:**
- Understand and practice the details of Serverless

**Instructions:**
- Students will form teams of two people each, and submit a single homework for each team.
- The same score for the homework will be given to each member of your team.
- Your solution is in the form of a Jupyter notebook file (with extension ipynb).
- Images/Graphs/Tables should be submitted inside the notebook.
- The notebook should be runnable and properly documented. 
- Please answer all the questions and include all your code.
- You are expected to submit a clear and pythonic code.
- You can change functions signatures/definitions.

**Submission:**
- Submission of the homework will be done via Moodle by uploading (not Zip):
    - Jupyter Notebook
    - 2 Log files
    - Additional local scripts
- The homework needs to be entirely in English.
- The deadline for submission is on Moodle.
- Late submission won't be allowed.

  
- In case of identical code submissions - both groups will get a Zero. 
- Some groups might be selected randomly to present their code.

**Requirements:**  
- Python 3.6 should be used.  
- You should implement the algorithms by yourself using only basic Python libraries (such as numpy,pandas,etc.)

<br><br><br><br>

**Grading:**
- Q0 - 10 points - Setup
- Q1 - 40 points - Serverless MapReduceEngine
- Q2 - 20 points - MapReduce job to calculate inverted index
- Q3 - 30 points - Shuffle

`Total: 100`

<br><br>

# Question 0
## Setup

1. Navigate to IBM Cloud and open a trial account. No need to provide a credit card
2. Choose IBM Cloud Object Storage service from the catalog
3. Create a new bucket in IBM Cloud Object Storage
4. Create credentials for the bucket with HMAC (access key and secret key)
5. Choose IBM Cloud Functions service from the catalog and create a service


#### Lithops setup
1. By using “git” tool, install master branch of the Lithops project from
https://github.com/lithops-cloud/lithops
2. Follow Lithops documentation and configure Lithops against IBM Cloud Functions and IBM Cloud Object Storage
3. Configure Lithops log level to be in DEBUG mode
4. Run Hello World example by using Futures API and verify all is working properly.


#### IBM Cloud Object Storage setup
1. Upload all the input CSV files that you used in homework 2 into the bucket you created in IBM Cloud Object Storage


<br><br><br>

In [9]:
#!pip install git+https://github.com/lithops-cloud/lithops.git

In [3]:
!lithops test

2022-01-10 20:40:19,344 [INFO] lithops.config -- Lithops v2.5.9.dev0
2022-01-10 20:40:19,348 [INFO] lithops.storage.backends.localhost.localhost -- Localhost storage client created
2022-01-10 20:40:19,348 [INFO] lithops.localhost.localhost -- Localhost compute client created
2022-01-10 20:40:19,348 [INFO] lithops.invokers -- ExecutorID 7075f1-0 | JobID A000 - Selected Runtime: python 
2022-01-10 20:40:19,351 [INFO] lithops.invokers -- ExecutorID 7075f1-0 | JobID A000 - Starting function invocation: hello() - Total: 1 activations
2022-01-10 20:40:19,646 [INFO] lithops.invokers -- ExecutorID 7075f1-0 | JobID A000 - View execution logs at /private/var/folders/lg/pbb4ypp52kx1shc2q76nnnlc0000gn/T/lithops/logs/7075f1-0-A000.log
2022-01-10 20:40:19,647 [INFO] lithops.wait -- ExecutorID 7075f1-0 - Getting results from functions

  100%|██████████████████████████████████████████████████████████████████| 1/1  

2022-01-10 20:40:21,694 [INFO] lithops.executors -- ExecutorID 7075f1-0 - Cleaning te

In [10]:
import lithops
import os
from io import StringIO
import pandas as pd

In [13]:
def hello(name):
    return 'Hello {}!'.format(name)

with lithops.FunctionExecutor() as fexec:
    fut = fexec.call_async(hello, 'World')
    print(fut.result())

2022-01-10 20:41:41,534 [INFO] lithops.config -- Lithops v2.5.9.dev0
2022-01-10 20:41:41,543 [INFO] lithops.storage.backends.localhost.localhost -- Localhost storage client created
2022-01-10 20:41:41,543 [INFO] lithops.localhost.localhost -- Localhost compute client created
2022-01-10 20:41:41,544 [INFO] lithops.invokers -- ExecutorID ffb656-0 | JobID A000 - Selected Runtime: python 
2022-01-10 20:41:41,547 [INFO] lithops.invokers -- ExecutorID ffb656-0 | JobID A000 - Starting function invocation: hello() - Total: 1 activations
2022-01-10 20:41:42,062 [INFO] lithops.invokers -- ExecutorID ffb656-0 | JobID A000 - View execution logs at /private/var/folders/lg/pbb4ypp52kx1shc2q76nnnlc0000gn/T/lithops/logs/ffb656-0-A000.log
2022-01-10 20:41:42,070 [INFO] lithops.storage.backends.localhost.localhost -- Localhost storage client created


Hello World!


# Question 1
## Serverless MapReduceEngine

Modify MapReduceEngine from homework 2 into the MapReduceServerlessEngine where map and reduce tasks executed as a serverless actions, instead of local threads. In particular:
1. Deploy all map tasks as a serverless actions by using Lithops against IBM Cloud Functions.
2. Collect results from all map tasks and store them in the same SQLite as you used in MapReduceEngine and use the same code for the sort and shuffle phase.
3. Deploy reduce tasks by using Lithops against IBM Cloud Functions. Instead of persisting results from reduce tasks, return results back to the MapReduceServerlessEngine and proceed with the same workflow as in MapReduceEngine
4. Return results of reduce tasks to the user

**Please attach:**  
Text file with all log messages Lithops printed to console during the execution. Make
sure log level is set to DEBUG mode.

#### Code:

In [14]:
import sqlite3
from sqlite3 import Error

In [15]:
def create_connection(db_data):
    try:
        conn = sqlite3.connect(db_data)
        print("Establish connection")
    except Error as e:
        print(e)
    finally:
        return conn


def create_table(conn, create_table_query):
    try:
        c = conn.cursor()
        c.execute(create_table_query)
    except Error as e:
        print(e)
        
def query(conn, select_query):
    try:
        c = conn.cursor()
        select_all = select_query
        rows = c.execute(select_all).fetchall()
        # Output to the console screen
        return rows
    
    except Error as e:
        print(e)

In [16]:
MYDATA_DB = 'temp_results.db'

create_table_temp_results = '''CREATE TABLE IF NOT EXISTS temp_results(
                                key TEXT,
                                value TEXT)
                            '''
conn = create_connection(MYDATA_DB)
create_table(conn,create_table_temp_results)

Establish connection


In [17]:
import concurrent.futures as cf

class MapReduceServerlessEngine():
    
    def execute(self, input_data, map_function, reduce_function, params):
        
        fexec = lithops.FunctionExecutor(log_level='DEBUG')
                
        # map function
        map_collector = []
        for count, csv_file in enumerate(input_data):
            fexec.call_async(inverted_map, [csv_file,params])
            map_collector.append(fexec.get_result())

        # Insert data into temp_result db:
        insert_query = '''INSERT INTO temp_results (key,value) VALUES (?,?)'''
        cur = conn.cursor()
        
        for file in map_collector:
            cur.executemany(insert_query,file)
            
        cur.close()
        
        # SQL statements:
        grouping_query = "SELECT key, GROUP_CONCAT(value,',') FROM temp_results GROUP BY key"
        reduce_input_values = query(conn,grouping_query)
        
        unique_keys = "SELECT  count(distinct key) FROM temp_results"
        unique_keys = query(conn,unique_keys)[0][0]
        
        reduce_collector =[]
        # reduce function
        for i in range(unique_keys):
            key = reduce_input_values[i][0]
            values = reduce_input_values[i][1].split(',')
            
            fexec.call_async(reduce_function, [key,values])
            reduce_collector.append(fexec.get_result())
            
        return reduce_collector

In [18]:
from os import listdir

def find_csv_filenames(path_to_dir, suffix=".csv" ):
    filenames = listdir(path_to_dir)
    return [ filename for filename in filenames if filename.endswith( suffix ) ]

In [19]:
input_data = find_csv_filenames('csv_files/')

In [22]:
def inverted_map(document_name_index):
    csv_dir = f'csv_files/{document_name_index[0]}'
    file = pd.read_csv(csv_dir)
    list_mycsv=[]
    for value in file.iloc[:,document_name_index[1]['column']].values:
        list_mycsv.append((value,document_name_index[0]))
    return list_mycsv

In [26]:
def inverted_reduce(key_documents):
    reduced_list = []
    reduced_loc = list(set(key_documents[1]))
    reduced_list.append((key_documents[0],reduced_loc))
    return reduced_list

# Task 2
## Submit MapReduce job to calculate inverted index
1. Use input_data: `cos://bucket/<path to CSV data>`
2. Submit MapReduce job with reduce and map functions as you used in homework 2, as follows

    `mapreduce = MapReduceServerlessEngine()`  
    `results = mapreduce.execute(input_data, inverted_map, inverted_index)`   
    `print(results)`

**Please attach:**  
Text file with all log messages Lithops printed to console during the execution. Make
sure log level is set to DEBUG mode.

#### Code:

In [28]:
mapreduce = MapReduceServerlessEngine()
params = {'column':2}
results = mapreduce.execute(input_data, inverted_map, inverted_reduce, params)
print(results)

2022-01-10 20:43:22,687 [INFO] lithops.config -- Lithops v2.5.9.dev0
2022-01-10 20:43:22,687 [DEBUG] lithops.config -- Config file not found
2022-01-10 20:43:22,688 [DEBUG] lithops.config -- Loading compute backend module: localhost
2022-01-10 20:43:22,689 [DEBUG] lithops.config -- Loading Storage backend module: localhost
2022-01-10 20:43:22,689 [DEBUG] lithops.storage.backends.localhost.localhost -- Creating Localhost storage client
2022-01-10 20:43:22,690 [INFO] lithops.storage.backends.localhost.localhost -- Localhost storage client created
2022-01-10 20:43:22,690 [DEBUG] lithops.localhost.localhost -- Creating Localhost compute client
2022-01-10 20:43:22,691 [INFO] lithops.localhost.localhost -- Localhost compute client created
2022-01-10 20:43:22,692 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 - Invoker initialized. Max workers: 1
2022-01-10 20:43:22,692 [DEBUG] lithops.executors -- Function executor for localhost created with ID: ffb656-1
2022-01-10 20:43:22,693 [INFO] litho

2022-01-10 20:43:28,068 [DEBUG] lithops.localhost.localhost -- Staring localhost job manager
2022-01-10 20:43:28,070 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A002 - Job invoked (0.002s) - Activation ID: ffb656-1-A002
2022-01-10 20:43:28,070 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A002 - Running 1 activations in the localhost worker
2022-01-10 20:43:28,070 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A002 - View execution logs at /private/var/folders/lg/pbb4ypp52kx1shc2q76nnnlc0000gn/T/lithops/logs/ffb656-1-A002.log
2022-01-10 20:43:28,072 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Starting Storage job monitor
2022-01-10 20:43:28,075 [INFO] lithops.wait -- ExecutorID ffb656-1 - Getting results from functions
2022-01-10 20:43:29,228 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A002 - Execution finished
2022-01-10 20:43:29,229 [DEBUG] lithops.localhost.localhost -- Localhost job manager stopped
2022-01-10 20:4

2022-01-10 20:43:37,174 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A005 - Uploading data to the storage backend
2022-01-10 20:43:37,176 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A005 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 20:43:37,176 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A005 - Worker processes: 16 - Chunksize: 16
2022-01-10 20:43:37,177 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A005 - Putting job into localhost queue
2022-01-10 20:43:37,178 [DEBUG] lithops.localhost.localhost -- Staring localhost job manager
2022-01-10 20:43:37,179 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A005 - Job invoked (0.002s) - Activation ID: ffb656-1-A005
2022-01-10 20:43:37,179 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A005 - Running 1 activations in the localhost worker
2022-01-10 20:43:37,179 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A005 - View 

2022-01-10 20:43:46,277 [DEBUG] lithops.storage.storage -- Runtime metadata found in local memory cache
2022-01-10 20:43:46,278 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A008 - Serializing function and data
2022-01-10 20:43:46,281 [DEBUG] lithops.job.serialize -- Referenced modules: /Users/ah11500/opt/anaconda3/envs/yb1/lib/python3.8/site-packages/pandas/__init__.py
2022-01-10 20:43:46,281 [DEBUG] lithops.job.serialize -- Modules to transmit: None
2022-01-10 20:43:46,282 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A008 - Function and modules found in local cache
2022-01-10 20:43:46,283 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A008 - Uploading data to the storage backend
2022-01-10 20:43:46,284 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A008 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 20:43:46,284 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A008 - Worker processes: 16 - Chunksize: 16
2022-

2022-01-10 20:43:55,381 [DEBUG] lithops.future -- ExecutorID ffb656-1 | JobID A010 - Got output from call 00000 - Activation ID: 16eb15436eae
2022-01-10 20:43:55,382 [INFO] lithops.executors -- ExecutorID ffb656-1 - Cleaning temporary data
2022-01-10 20:43:55,384 [DEBUG] lithops.executors -- ExecutorID ffb656-1 - Finished getting results
2022-01-10 20:43:55,384 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A011 - Selected Runtime: python 
2022-01-10 20:43:55,385 [DEBUG] lithops.storage.storage -- Runtime metadata found in local memory cache
2022-01-10 20:43:55,385 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A011 - Serializing function and data
2022-01-10 20:43:55,388 [DEBUG] lithops.job.serialize -- Referenced modules: /Users/ah11500/opt/anaconda3/envs/yb1/lib/python3.8/site-packages/pandas/__init__.py
2022-01-10 20:43:55,388 [DEBUG] lithops.job.serialize -- Modules to transmit: None
2022-01-10 20:43:55,389 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A011

2022-01-10 20:44:02,631 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A013 - Execution finished
2022-01-10 20:44:02,632 [DEBUG] lithops.localhost.localhost -- Localhost job manager stopped
2022-01-10 20:44:03,478 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Pending: 0 - Running: 0 - Done: 1
2022-01-10 20:44:03,479 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Storage job monitor finished
2022-01-10 20:44:03,480 [DEBUG] lithops.future -- ExecutorID ffb656-1 | JobID A013 - Got status from call 00000 - Activation ID: e7f2e882a4b9 - Time: 0.24 seconds
2022-01-10 20:44:03,482 [DEBUG] lithops.future -- ExecutorID ffb656-1 | JobID A013 - Got output from call 00000 - Activation ID: e7f2e882a4b9
2022-01-10 20:44:03,482 [INFO] lithops.executors -- ExecutorID ffb656-1 - Cleaning temporary data
2022-01-10 20:44:03,484 [DEBUG] lithops.executors -- ExecutorID ffb656-1 - Finished getting results
2022-01-10 20:44:03,485 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID

2022-01-10 20:44:09,556 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A016 - Running 1 activations in the localhost worker
2022-01-10 20:44:09,556 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A016 - View execution logs at /private/var/folders/lg/pbb4ypp52kx1shc2q76nnnlc0000gn/T/lithops/logs/ffb656-1-A016.log
2022-01-10 20:44:09,558 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Starting Storage job monitor
2022-01-10 20:44:09,558 [INFO] lithops.wait -- ExecutorID ffb656-1 - Getting results from functions
2022-01-10 20:44:10,705 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A016 - Execution finished
2022-01-10 20:44:10,705 [DEBUG] lithops.localhost.localhost -- Localhost job manager stopped
2022-01-10 20:44:11,573 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Pending: 0 - Running: 0 - Done: 1
2022-01-10 20:44:11,574 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Storage job monitor finished
2022-01-10 20:44:12,575 [DEBUG] lithop

2022-01-10 20:44:18,651 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A019 - Worker processes: 16 - Chunksize: 16
2022-01-10 20:44:18,652 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A019 - Putting job into localhost queue
2022-01-10 20:44:18,653 [DEBUG] lithops.localhost.localhost -- Staring localhost job manager
2022-01-10 20:44:18,654 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A019 - Job invoked (0.003s) - Activation ID: ffb656-1-A019
2022-01-10 20:44:18,655 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A019 - Running 1 activations in the localhost worker
2022-01-10 20:44:18,655 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A019 - View execution logs at /private/var/folders/lg/pbb4ypp52kx1shc2q76nnnlc0000gn/T/lithops/logs/ffb656-1-A019.log
2022-01-10 20:44:18,656 [DEBUG] lithops.monitor -- ExecutorID ffb656-1 - Starting Storage job monitor
2022-01-10 20:44:18,656 [INFO] lithops.wait -- ExecutorID ffb656-1 - G

2022-01-10 20:44:27,749 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A022 - Uploading data to the storage backend
2022-01-10 20:44:27,750 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A022 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 20:44:27,751 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A022 - Worker processes: 16 - Chunksize: 16
2022-01-10 20:44:27,752 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A022 - Putting job into localhost queue
2022-01-10 20:44:27,752 [DEBUG] lithops.localhost.localhost -- Staring localhost job manager
2022-01-10 20:44:27,755 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A022 - Job invoked (0.003s) - Activation ID: ffb656-1-A022
2022-01-10 20:44:27,755 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A022 - Running 1 activations in the localhost worker
2022-01-10 20:44:27,756 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A022 - Vi

2022-01-10 20:44:36,853 [DEBUG] lithops.job.serialize -- Referenced modules: None
2022-01-10 20:44:36,853 [DEBUG] lithops.job.serialize -- Modules to transmit: None
2022-01-10 20:44:36,854 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A025 - Function and modules found in local cache
2022-01-10 20:44:36,854 [DEBUG] lithops.job.job -- ExecutorID ffb656-1 | JobID A025 - Uploading data to the storage backend
2022-01-10 20:44:36,856 [INFO] lithops.invokers -- ExecutorID ffb656-1 | JobID A025 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 20:44:36,857 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobID A025 - Worker processes: 16 - Chunksize: 16
2022-01-10 20:44:36,858 [DEBUG] lithops.localhost.localhost -- ExecutorID ffb656-1 | JobID A025 - Putting job into localhost queue
2022-01-10 20:44:36,859 [DEBUG] lithops.localhost.localhost -- Staring localhost job manager
2022-01-10 20:44:36,860 [DEBUG] lithops.invokers -- ExecutorID ffb656-1 | JobI

[[('Haifa', ['myCSV19.csv', 'myCSV15.csv', 'myCSV13.csv', 'myCSV4.csv', 'myCSV8.csv', 'myCSV6.csv', 'myCSV1.csv', 'myCSV11.csv', 'myCSV10.csv', 'myCSV3.csv', 'myCSV17.csv', 'myCSV5.csv', 'myCSV16.csv', 'myCSV7.csv', 'myCSV12.csv', 'myCSV20.csv', 'myCSV14.csv', 'myCSV2.csv', 'myCSV18.csv'])], [('Hamburg', ['myCSV9.csv', 'myCSV6.csv', 'myCSV19.csv', 'myCSV7.csv', 'myCSV18.csv', 'myCSV20.csv', 'myCSV10.csv', 'myCSV11.csv', 'myCSV13.csv', 'myCSV12.csv', 'myCSV8.csv', 'myCSV14.csv', 'myCSV17.csv', 'myCSV5.csv', 'myCSV16.csv'])], [('Kiel', ['myCSV1.csv', 'myCSV4.csv', 'myCSV5.csv', 'myCSV14.csv', 'myCSV15.csv', 'myCSV11.csv', 'myCSV16.csv', 'myCSV2.csv', 'myCSV20.csv', 'myCSV9.csv', 'myCSV3.csv', 'myCSV12.csv', 'myCSV10.csv', 'myCSV7.csv', 'myCSV19.csv'])], [('London', ['myCSV1.csv', 'myCSV10.csv', 'myCSV17.csv', 'myCSV6.csv', 'myCSV20.csv', 'myCSV11.csv', 'myCSV3.csv', 'myCSV15.csv', 'myCSV14.csv', 'myCSV9.csv', 'myCSV19.csv', 'myCSV12.csv', 'myCSV2.csv'])], [('München', ['myCSV15.csv', 'my

# Question 3
## Shuffle

MapReduceServerlessEngine deploys both map and reduce tasks as serverless invocations.   
However, once map stage completed, the result are transferred from the map tasks to the SQLite database located on the client machine (laptop in your case), then performed local shuffle and then invoked reduce tasks passing them relevant parameters.

(To support your answers, feel free to use examples, Images, etc.)
<br><br>

**1. Explain why this approach is not efficient and what are cons and pros of such architecture in general. In broader scope you may assume that MapReduceServerlessEngine executed in some powerful machine and not just laptop.**

<u>Answer:</u>
<br>In our answer we will address the the pros and cons of this approach.
<ul>
    <li><u>Pros-</u>
        <ul>
            <li>Low storage expense, the database is stored using SQLite, which behaves like a file more than a database, which takes less computing power and memory.
                </li>
            <li>Sort and shuffle require a lot of communication between the serverless functions and the storage system, which affects negatively the throughput of the system. In the question's approach, when the sort and shuffle are performed locally the map and reduce functions are not affected. 
                </li>
            </ul>
        </li>
    <li><u>Cons-</u>
        <ul>
            <li>The map phase is performed in the server, after it ends the server sends the results to the client, on these results the sort and shuffle are performed locally, and lastly the results are sent to the server which performes the reduce. Due to the dependency on the client machine, that most of the time has no backup, the process is not highly available.</li>
            <li>After the map phase is done there is a massive amount of data to transfer on the internet lines to start the reduce phase. Conversely, if the process was completely serverless the transfered data would be smaller, since we receive only the aggregate results.       
            </li>
          </ul>
        </li>  
</ul>

<br><br>
**2. Suggest how can you improve shuffle so intermediate data will not be downloaded to the client at all and shuffle performed in the cloud as well. Explain pros and cons of the approaches you suggest.**


We can improve the MapReduceServerlessEngine to be fully serverless by using serverless services such:
<br>S3 - storage, and Elasticache - for compute power and cache usage, or other serverless databases with read and write permissions.
<br>In the original question it was not possible to perform the sort and shuffle using the server since in s3 the data cannot be updated, which is a necessary operation in the sort and shuffle phase.
<ul>
    <li><u>Pros-</u>
        <ul>
            <li>Scalable - In the proposed solution, cloud computing allows a dynamic computing power that makes it possible to scale easily. While in a localhost we had to replace the computer to scale.
                </li>
            <li>Pay-per-use - In the serverless solution you need to pay only when you query the data, while int he local solution you would buy a computer to run this operations. 
                </li>
            <li>Allows access from anywhere.
                </li>
            </ul>
        </li>
    <li><u>Cons-</u>
        <ul>
            <li>Since the data is not stored locally, if we run the same query multiple times we will need to communicate with the server every time, this operation results both time-consuming, increases the costs and load the server. While the data is stored locally all of the above would not happen.
                </li>
            </ul>
        </li>
    
</ul>


<br><br>
**3. Can you make serverless shuffle?**


Yes, as explained in Q2

<br><br><br><br>
Good Luck :) 