# Streaming

Rationalizing the flow of data between different sources.

Sources of streamed data include
- strings
- string collections
- files
- other streams (network, memory, blobs)
there are also wrappers  which potentially contain, package or deliver deliver these, e.g.
- folders
- repositories
- archives
- crypto packages
When we work with these, we might need to distinguish different content types:
- binary
- text
- zipped
- encrypted
- json
- xml
- media
There are thousands of content types in the web's content-type, and we clearly don't need to complicate things.
<pre>
Let's regard this as the use cases we need to accommodate, and see whether they fit together:

 File system --> Folder      --> File    \                       /  File     --> Folder      --> File system       
Blob Storage --> Folder      --> Blob     \                     /   Blob     --> Folder      --> Blob Storage       
     Network --> Resource    --> Stream    \                   /    Stream   --> Resource    --> Network     
    Internet --> url         --> Response   == Source / Sink ==     Response --> url         --> Internet      
    Database --> Query       --> Response  /                   \    Response --> Query       --> Database       
                 Environment --> String   /                     \   String   --> Environment      
                 Memory      --> Object  /                       \  Object   --> Memory                 
                 Console     --> String /                         \ String   --> Console     

The potential inputs mirror the outputs so it might be more meaningful to layer this:

     Folder     Folder                          
      File       Blob      Network    Internet    Console   Database  Memory   Environment  
        |          |          |           |          |         |        |          |   
      ------------------------ stream -----------------------------    object-- string -----
                                  |                                             |
                                 ------------------------------------------------- 
                                                  Source / Target
                                 -------------------------------------------------
                                   |       |      |      |        |          |
                                Binary   Text   Json    Xml    Zipped    Encrypted      
                                                         unzip <--|          |
                                                           zip -->|          |
                                                                  decrypt <--| 
                                                                  encrypt -->|           |
</pre>

## 🤔 Thoughts

There would be great advantages to using a consistent interface to move streaming data between different locations with a minimum of handling. Clearly there are several commonalities among the elements above, including:

- As sources, may be single or collections
- Behave either as a stream or as a string
- Are good async candidates (except memory resident strings)
- Can be consumed as targets and delivered as sources without regard for the actual format
- Can be converted between formats by operations that typically behave a streams
- May be chunked arbitrarily in numerous ways, e.g. line breaks, packet sizes, buffer sizes, bytes 
- All support basic stream ops: seek & read (if can be read), write & clear (if can be written)
- As targets, may be single or collections
- In common use, may be passed through from stage to stage in asynchronous or synchronous streaming

The class used as a source would need to be initialized in different ways depending on the source type, but could then present a uniform interface for read operations.
The class being used as a target would likewise be initialized ahead of the data flow, and would use a consistent interface for drawing down the data. 

There is a definite use case for an instance being targeted by one step of a process, then being used by another step as a source. In the context of async streaming, this would point towards a single object being both a source and a target, BUT this violates the exclusive lock required by most persistance systems, so if this is ever a use case it should be achieved using asynchronous concurrent queues or a similar technology.

For processing between states, we would need a receiver and a transmitter, and an operation to perform in transit. The interfaces for the first two are the same as those needed by the preceding lines. the operation could be completely generic using lambdas.



This looks like it could be consistent between the types we want to handle. The separation of the Source and Target has a problem though. Typically we would be using some objects as interim storage between stages in a process, and as such would want to read the data from something that we have written to without incurring the overhead of rebuilding the object parameters, maybe we can simply change mode. This would be feasible as long as the act of changing over does the necessary release of the underlying stream resources.

The implicit law would be that the object should be connectable only as a source or as a target at any time, never both. This implies a further state of disconnected.

This might be simpler:

```mermaid
classDiagram

direction BT

class IBin{
  <<interface>>
  Read()
  Write()
  Seek()
  +Mode
}

IBin <|-- Bin
IBin <|-- Mode

IDisposable <|-- Bin

class Bin{
  <<abstract>>
  Dispose()
}
class Mode {
<<Enumeration>>
    Disconnected,
    Readable,
    Writable
}



BlobBin --|> Bin
DataBin --|> Bin
FileBin --|> Bin
StringBin --|> Bin
EnvironmentBin --|> Bin
NetworkBin --|> Bin
InternetBin --|> Bin




```

## 💡 Possibilities

So what we need is an underlying Disposable Abstract class that has concrete versions for our desired source types, all of which implement the simple IBin interface to encapsulate Read/Write/Seek operations. Each can be in one of three state, connected for read or write, or not connected.

It then becomes feasible to use a generic processor to operate between a reader and writer, with the consistent interface

```mermaid
classDiagram

class IBinProcessor{
    <<interface>>
    Execute(source target) BinResult
    Cancel()
    GetProgress() BinResult
    -sourceIBin
    -targetIBin  
    -BinMonitor[]
}
class BinProcessor{
    <<virtual>>

}

BinProcessor --|> IBinProcessor
BinProcessor o-- BinReport

class BinReport{
    +Status
    +Info
}

class IBinMonitor{
    <<interface>>
    ReportStatus() BinReport
}

class BinMonitor{
    -BinProcessor
    -Digest
}

BinProcessor "1" o-- "0-n" BinMonitor
BinMonitor --|> IBinMonitor
```

We would typically initialize an instance of the default operation Processor, but to accommodate more complex scenarios this class is virtual, allowing specialized operation Processors as needed for different situations.

## 🚧 Constructive Implications

As of now, it looks like the following is emerging as a solution:

A **Bin** is a piece of state that can be pulled from or pushed to its persistent medium: blob, file, string, web, whatever.

To do anything to a Bin we can use an **BinProcessor** which can read from one _Bin_ while writing to another, wrapping asynchronous and streaming operations internally. The inputs and output can also be multiple bins to allow fan-in and and fan-out.

All _Bins_ and _BinProcessors_ communicate by interfaces so multiple variants can be defined with appropriate separation of their internals.

The _BinProcessor_ returns an **BinReport** on completion, but this can also be read using **ReportProgress()** and cancelled, delayed, restarted by using **SetStatusl()**. _(The implementations may not do anything meaningful with cancel or progress calls, sending a light beam of information to Mars is neither cancellable in flight or assessable in its progress), but if we can send these messages_

As an alternative to polling progress, the processor can also be given an **BinMonitor** to push notifications to.

Noting that object-in-memory requiring serialization/deserializing process might require additional information, for example Type information, and some operations (for example decryption, zipping) might require information beyond these interface: this is no problem as the class structure allows for dedicated data-rich BinProcessors Bins. The pattern is one of using minimal interfaces for the interconnections and rich specialized objects as need. The BionMonitor also has access to the digest, which is information held from prior reports.
