## Scala Record IO
In [image_io](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/image_io_scala.ipynb) we already learned how to pack image into standard recordio format and load it with ImageRecordIter. This tutorial will walk through the scala interface for reading and writing record io files. It can be useful when you need more more control over the details of data pipeline. For example, when you need to augument image and label together for detection and segmentation, or when you need a custom data iterator for triplet sampling and negative sampling.

You can find relevant code [here](https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/RecordIO.scala). There are two classes: [MXRecordIO](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.MXRecordIO), which supports sequential read and write, and [MXIndexedRecordIO](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.MXIndexedRecordIO), which supports random read and sequential write.

## Jupyter Scala kernel
Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:

**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.

We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook.

```
classpath.addPath(<path_to_jar>)

e.g
classpath.addPath("mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar")
```

## MXRecordIO
First let's take a look at `MXRecordIO`. It takes path to recordIO file and `MXRecordIO.IOFlag` as input. `MXRecordIO.IOFlag` is `MXRecordIO.IORead` for reading and `MXRecordIO.Write` for writing. 

We open a file tmp.rec and write 5 strings to it with `MXRecordIO.IOWrite` flag:

In [2]:
import ml.dmlc.mxnet._
import java.io._

val fRec = File.createTempFile("tmpFile", ".tmp")
val N = 5

val writer = new MXRecordIO(fRec.getAbsolutePath, MXRecordIO.IOWrite)
for (i <- 0 until N) {
    writer.write("record_"+i)
}
writer.close()


log4j:WARN No appenders could be found for logger (MXNetJVM).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


[32mimport [36mml.dmlc.mxnet._[0m
[32mimport [36mjava.io._[0m
[36mfRec[0m: [32mjava[0m.[32mio[0m.[32mFile[0m = /var/folders/f4/gts7qnkx319_nv4176gbz4jjrjzb4y/T/tmpFile2805315252382756478.tmp
[36mN[0m: [32mInt[0m = [32m5[0m
[36mwriter[0m: [32mml[0m.[32mdmlc[0m.[32mmxnet[0m.[32mMXRecordIO[0m = ml.dmlc.mxnet.MXRecordIO@28a470f1

Then we can read it back by opening the same file with `MXRecordIO.IORead` flag as follows:

In [3]:
val reader = new MXRecordIO(fRec.getAbsolutePath, MXRecordIO.IORead)
for (i <- 0 until N) {
    val res = reader.read()
    println(res)
}

record_0
record_1
record_2
record_3
record_4


[36mreader[0m: [32mMXRecordIO[0m = ml.dmlc.mxnet.MXRecordIO@5bf33b34

## MXIndexedRecordIO
Some times you need random access for more complex tasks. MXIndexedRecordIO is designed for this. Here we create a indexed record tmp.rec and a corresponding index file tmp.idx:

In [4]:
val fIdxRec = File.createTempFile("tmpIdxFile", ".tmp")
val fIdx = File.createTempFile("tmpIdx", ".tmp")
val N = 5

val writer = new MXIndexedRecordIO(fIdx.getAbsolutePath, fIdxRec.getAbsolutePath, MXRecordIO.IOWrite)
for (i <- 0 until N) {
  writer.writeIdx(i, "record_"+i)
}
writer.close()

[36mfIdxRec[0m: [32mFile[0m = /var/folders/f4/gts7qnkx319_nv4176gbz4jjrjzb4y/T/tmpIdxFile9045730139606611372.tmp
[36mfIdx[0m: [32mFile[0m = /var/folders/f4/gts7qnkx319_nv4176gbz4jjrjzb4y/T/tmpIdx2844802785206482836.tmp
[36mN[0m: [32mInt[0m = [32m5[0m
[36mwriter[0m: [32mMXIndexedRecordIO[0m = ml.dmlc.mxnet.MXIndexedRecordIO@43acea84

We can then access records with keys:

In [5]:
val reader = new MXIndexedRecordIO(fIdx.getAbsolutePath, fIdxRec.getAbsolutePath, MXRecordIO.IORead)
var keys = reader.keys().map(_.asInstanceOf[Int]).toList.sorted
 //   assert(keys.zip(0 until N).forall(x => x._1 == x._2))
keys = scala.util.Random.shuffle(keys)
for (k <- keys) {
    val res = reader.readIdx(k)
    println(res)
}

record_1
record_4
record_3
record_0
record_2


[36mreader[0m: [32mMXIndexedRecordIO[0m = ml.dmlc.mxnet.MXIndexedRecordIO@35c87f7a
[36mkeys[0m: [32mList[0m[[32mInt[0m] = [33mList[0m([32m1[0m, [32m4[0m, [32m3[0m, [32m0[0m, [32m2[0m)

You can list all keys with:

In [6]:
reader.keys

[36mres5[0m: [32mIterable[0m[[32mAny[0m] = [33mSet[0m(0, 1, 2, 3, 4)

## Packing and Unpacking Data
Each record in a .rec file can contain arbitrary binary data, but machine learning data typically has a label/data structure. IO.MXRecordIO also contains a few utility functions for packing such data, namely: pack, unpack.

### Binary Data
`pack` and `unpack` methods are used for storing 1d array of float label and binary data which is shown in following example.

`IRHeader` class takes flag, label, id and id2 as parameters.

`pack` method takes header of type IRHeader(header of the image record) and string to pack as input parameters and returns the resulting packed string.

`unpack` method takes string buffer from MXRecordIO.read as input and returns header of type IRHeader(header of the image record) and unpacked string

In [10]:
def pack(header1, header2, data:String): Unit{
    s1 = MXRecordIO.pack(header1, data)
    s2 = MXRecordIO.pack(header2, data)
}

val data = "data"
val label1 = Array(1f)
var s1: String = null
var s2: String = null
val header1 = MXRecordIO.IRHeader(0, label1, 1, 0)

val label2 = Array(1f, 2f, 3f)
val header2 = MXRecordIO.IRHeader(0, label2, 2, 0)

pack(header1, header2, data)



: 

In [10]:
// unpack
val (rHeader1, rContent1) = MXRecordIO.unpack(s1)
val (rHeader2, rContent2) = MXRecordIO.unpack(s2)


[36mrHeader1[0m: [32mMXRecordIO[0m.[32mIRHeader[0m = [33mIRHeader[0m([32m1[0m, [33mArray[0m([32m1.0F[0m), [32m1[0m, [32m0[0m)
[36mrContent1[0m: [32mString[0m = [32m"data"[0m
[36mrHeader2[0m: [32mMXRecordIO[0m.[32mIRHeader[0m = [33mIRHeader[0m([32m3[0m, [33mArray[0m([32m1.0F[0m, [32m2.0F[0m, [32m3.0F[0m), [32m2[0m, [32m0[0m)
[36mrContent2[0m: [32mString[0m = [32m"data"[0m

## Next Step
- [Advanced Image IO](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/advanced_img_io.ipynb) Advanced image IO for detection, segmentation, etc...