## Basic Map Reduce Operations

* We can create range of values using `to` operator.

In [1]:
val range = (1 to 100)

Intitializing Scala interpreter ...

Spark Web UI available at http://192.168.1.138:4043
SparkContext available as 'sc' (version = 3.3.0, master = local[*], app id = local-1670361198781)
SparkSession available as 'spark'


range: scala.collection.immutable.Range.Inclusive = Range 1 to 100


* We can convert `range` to any other Sequence like `Array`, `List` or `Set`.

In [2]:
val range_to_list = (1 to 100).toList

range_to_list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)


In [3]:
val range_to_array = (1 to 100).toArray

range_to_array: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)


In [4]:
val range_to_set = (1 to 100).toSet

range_to_set: scala.collection.immutable.Set[Int] = Set(69, 88, 5, 10, 56, 42, 24, 37, 25, 52, 14, 20, 46, 93, 57, 78, 29, 84, 61, 89, 1, 74, 6, 60, 85, 28, 38, 70, 21, 33, 92, 65, 97, 9, 53, 77, 96, 13, 41, 73, 2, 32, 34, 45, 64, 17, 22, 44, 59, 27, 71, 12, 54, 49, 86, 81, 76, 7, 39, 98, 91, 66, 3, 80, 35, 48, 63, 18, 95, 50, 67, 16, 31, 11, 72, 43, 99, 87, 40, 26, 55, 23, 8, 75, 58, 82, 36, 30, 51, 19, 4, 79, 94, 47, 15, 68, 62, 90, 83, 100)


* Any `Row` level transformations like `filtering` comes under `Map` operation.

* Any transformation like `Join`, `aggregation` comes under `Reduce` operation.

In [5]:
// Get Sum of Squares of all even numbers till 100

def main(args: Array[String]) : Int =
{
    val l = (1 to 100).toList
    
    val f = l.filter(ele => ele % 2 == 0)    // filter takes a function with one argument
    
    val m = f.map(rec => rec * rec)    // map takes a function with one argument
    
    // val r = m.sum
    val r = m.reduce((total, ele) => total + ele) // reduce takes a function with two arguments
    
    return r
}

main: (args: Array[String])Int


In [6]:
main(Array(""))

res0: Int = 171700


### Sorting Sequence

* We can use `Sorted` method to Sort the Sequences.

In [7]:
val l = Set(100, 50, 0, -3, 8 , 11)

l: scala.collection.immutable.Set[Int] = Set(0, -3, 50, 11, 8, 100)


In [8]:
// Can't sort a Set

l.sorted

<console>: 28: error: value sorted is not a member of scala.collection.immutable.Set[Int]

In [9]:
// TypeCast the Set to Array and Sort

l.toArray.sorted

res2: Array[Int] = Array(-3, 0, 8, 11, 50, 100)


In [10]:
// TypeCast the Set to List and Sort

l.toList.sorted

res3: List[Int] = List(-3, 0, 8, 11, 50, 100)


### Problem : Find Revenue from each order of order id as 2

In [11]:
import scala.io.Source

// Read file and convert it into list
val orderItems = Source.fromFile("data/retail_db/order_items/part-00000").getLines.toList

import scala.io.Source
orderItems: List[String] = List(1,1,957,1,299.98,299.98, 2,2,1073,1,199.99,199.99, 3,2,502,5,250.0,50.0, 4,2,403,1,129.99,129.99, 5,4,897,2,49.98,24.99, 6,4,365,5,299.95,59.99, 7,4,502,3,150.0,50.0, 8,4,1014,4,199.92,49.98, 9,5,957,1,299.98,299.98, 10,5,365,5,299.95,59.99, 11,5,1014,2,99.96,49.98, 12,5,957,1,299.98,299.98, 13,5,403,1,129.99,129.99, 14,7,1073,1,199.99,199.99, 15,7,957,1,299.98,299.98, 16,7,926,5,79.95,15.99, 17,8,365,3,179.97,59.99, 18,8,365,5,299.95,59.99, 19,8,1014,4,199.92,49.98, 20,8,502,1,50.0,50.0, 21,9,191,2,199.98,99.99, 22,9,1073,1,199.99,199.99, 23,9,1073,1,199.99,199.99, 24,10,1073,1,199.99,199.99, 25,10,1014,2,99.96,49.98, 26,10,403,1,129.99,129.99, 27,10,917,1,21.99,21.99, 28,10,1073,1,199.99,199.99, 29,11,365,1,59.99,59.99, 30,11,627,...


In [12]:
// Filter all those items having order id = 2
val orderItemsFilter = orderItems.filter(orderItem => orderItem.split(",")(1).toInt == 2)

orderItemsFilter: List[String] = List(2,2,1073,1,199.99,199.99, 3,2,502,5,250.0,50.0, 4,2,403,1,129.99,129.99)


In [13]:
// Map all the mrp for order id = 2
val orderItemsMap = orderItemsFilter.map(orderItem => orderItem.split(",")(4).toFloat)

orderItemsMap: List[Float] = List(199.99, 250.0, 129.99)


In [14]:
// Total Revenue using sum method
orderItemsMap.sum

res4: Float = 579.98


In [15]:
// Total Revenue using Reduce Operation
orderItemsMap.reduce((total, orderItemSubtotal) => total + orderItemSubtotal)

res5: Float = 579.98


#### Solution using `_` function

In [16]:
import scala.io.Source

// Read file and convert it into list
val orderItems = Source.fromFile("data/retail_db/order_items/part-00000").getLines.toList

import scala.io.Source
orderItems: List[String] = List(1,1,957,1,299.98,299.98, 2,2,1073,1,199.99,199.99, 3,2,502,5,250.0,50.0, 4,2,403,1,129.99,129.99, 5,4,897,2,49.98,24.99, 6,4,365,5,299.95,59.99, 7,4,502,3,150.0,50.0, 8,4,1014,4,199.92,49.98, 9,5,957,1,299.98,299.98, 10,5,365,5,299.95,59.99, 11,5,1014,2,99.96,49.98, 12,5,957,1,299.98,299.98, 13,5,403,1,129.99,129.99, 14,7,1073,1,199.99,199.99, 15,7,957,1,299.98,299.98, 16,7,926,5,79.95,15.99, 17,8,365,3,179.97,59.99, 18,8,365,5,299.95,59.99, 19,8,1014,4,199.92,49.98, 20,8,502,1,50.0,50.0, 21,9,191,2,199.98,99.99, 22,9,1073,1,199.99,199.99, 23,9,1073,1,199.99,199.99, 24,10,1073,1,199.99,199.99, 25,10,1014,2,99.96,49.98, 26,10,403,1,129.99,129.99, 27,10,917,1,21.99,21.99, 28,10,1073,1,199.99,199.99, 29,11,365,1,59.99,59.99, 30,11,627,...


In [17]:
// Filter all those items having order id = 2
val orderItemsFilter = orderItems.filter(_.split(",")(1).toInt == 2)

orderItemsFilter: List[String] = List(2,2,1073,1,199.99,199.99, 3,2,502,5,250.0,50.0, 4,2,403,1,129.99,129.99)


In [18]:
// Map all the mrp for order id = 2
val orderItemsMap = orderItemsFilter.map(_.split(",")(4).toFloat)

orderItemsMap: List[Float] = List(199.99, 250.0, 129.99)


In [19]:
// Total Revenue using Reduce Operation
orderItemsMap.reduce(_ + _)

res6: Float = 579.98
