# UnpackAI Library Development Plan
> What library should we be?
This proposal, is more of a branding proposal, targeting people who's going to play with AI, from various back grounds.
* That means, we're going to talk about how people view this library, how they think of ```pip install -Uqq unpackai``` like if I have dandroff recengtly and my mind just jump right into the headshoulders.
* For ML, currently, the **jump** is about the following, this is not a throught marketing research, just quick examples from a deep learning practitioner:
    * Try free structure quickly, do experiments: pytorch
    * Goes to production, run model on edge devices, Tensorflow
    * Play with GPU accelerated tensor calculation: Jax
    * Play with tf but in simpler layer sense: Keras
    * Transformer in clean code: Huggingface
    * Visualize things with interactive features: Plotly
    * Deploy model prototype: streamlit
* Surely you think I fail to mention ```fastai```, this is where the **branding goes wrong**, fastai library is bounded tightly with the education. It's considered a good creation along side its famous course, after the education. Its product feature has many limitation: docs too brief, not supporting multi-device training, very limited numbers of callbacks went beyond Jeremy H's own teaching.
* Most important of all, ```fastai``` isn't enjoyable to use, **it's just packing many things mentioned in the course**.

## What we shouldn't be
I know the course is life changing for me and I feel very grateful. But let's not be their library.

### The pipeline wrapping plan
It all started from a notebook, quite like a template notebook we have for the course. A notebook that achieves the data processing, model building, interpretation for a specific DL task.

Then came the packaging part, we wrap **dozens of lines of codes**, which scares our kind students, into simple functions, or class.

The wrapped functions are simple to use, to look at, it was executed in 1 line mostly. So friendly to our innocent students.

This is what a python library is about, right? Wrap things into functions which can be further wraped into even less lines.

It's nothing wrong about this approach at first. Some DL task, if need be, can be shrank into **less than 10 lines of codes.**
* The 1st line load the data, 
* the 2nd line set how to transform data, 
* the 3rd line build/load the model, 
* the 4th line trained model.
* the 5th line interpret the model in various ways

Well the above do look like a decent **structure** to start with, then we pave out the tasks, different contributors take different tasks, can be developed in parallel, and we can have the agile/crum/kanban fun to track our progress!

Even if we do this, we could build a useful product, no less.

#### Bad side about pipeline wrapping plan
So so many libraries are doing the same, from awesome people even. They usually end up to the following:
* It's a mess of functions, among them many good functions but a mess. It ends up a branding disaster. (**There is no way to answer: what can you library do, in a slogan**)
* A model zoo for a specific domain.
* Wraping things up means less and less involvement from the user. The user will spend very little time play with the functions, and each function usually achieve very specific task. Actually I do believe there is a equilibrium like:
$\large{UserPlayHours = a * Task Transferability}$

## Alternative approach

The salvation plan is somehow simpler at how we perceive the library:
* A library that allows you experiment AI/DL for various tasks

**BUT!!!**
* Many module with in the pipeline should be dropdown-list/checkbox **Choosable**.
* The **level of detail** we let them to play and choose, is the **level of the difficulty** we want them to enjoy

### What is level of detail ?
Level of details is the level of fuss we want user to focus on, this is the exact part fastai library got **WRONG**, which will explain most of our struggle so far:
* It offers smooth/ easy pipelines, for newbies and business people even.
* Any amount of reconfigure, is usually way too complicated for such audience
* There is a **GAP** between the 2 points above, hence no room for playing

#### Keras Example 
I started my AI journey with Keras, and I love keras by that time, because:
* Keras plays with **layers**(eg. Linear, Convolution), its most strenth is at astracting details beneath this level, and let users play with layers. 
* I spent lots of time, having fun playing with layers
* Aside from the things I have to redesign layer, I can deploy almost all kinds of models mentioned in any DL paper (𝑈𝑠𝑒𝑟𝑃𝑙𝑎𝑦𝐻𝑜𝑢𝑟𝑠=𝑎∗𝑇𝑎𝑠𝑘𝑇𝑟𝑎𝑛𝑠𝑓𝑒𝑟𝑎𝑏𝑖𝑙𝑖𝑡𝑦)

#### Pytorch lightning example
Well I moved on to the career team. I have to deal with layer level, I have to deal with different data/forward pipeline. PL is a good library because:
* It allows me play with the things I mentioned, but save my energy on things like looping, logging, multidevice training detail etc.
* If you see a training notebook built by PL, you'll see very little lines around training template.
* You'll find about a lots of lines on the specifications you intend to be different.

>The branding image of the examples are simple:
* Keras: play TensorFlow in a concept of layers
* Pytorch-Lightning: writting less template code

#### Unpackai Example
For our lib, I intend for them to focus on, exactly the same range of things we want people to learn:
* choose the columns they intend to use, in what way
* choose the data transformations
* choose the loss, the model structure to use (not keras.layer, not nn.module)
* hit run

## Demo of such example

In [1]:
from ipywidgets import interact, interact_manual
from forgebox.imports import *
from forgebox.category import Category

In [2]:
HOME = Path(os.environ['HOME'])

Let's skip data download here, it's download, we're not going to reinvent brilliant stuff around download

In [3]:
BEAR_DATASET = HOME/"Downloads"/"bear_dataset"

### Step 1 Everything starts with dataframe

For fastai, everything starts from list, an ItemList to be specific. **ImageList** and **TextList** is [**ItemList**](https://fastai1.fast.ai/tutorial.itemlist.html) with some slight enhanced feature.```[🧂, 🏓, 🍷, 🐻]```

For the clarity of education, or for simplecity as ultimate form of beauty, we use [**DataFrame**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) as starting point, ItemList in table format. In this way, every dataset has the same starting point, even the tabular data. 

In [4]:
def df_creator_image_folder(path: Path):
    path = Path(path)
    files = list(path.rglob("*.jpg"))
    files.extend(path.rglob("*.JPG"))
    files.extend(path.rglob("*.jpeg"))
    files.extend(path.rglob("*.JPEG"))
    files.extend(path.rglob("*.png"))
    files.extend(path.rglob("*.PNG"))
    return pd.DataFrame({"path":files}).sample(frac=1.).reset_index(drop=True)

### Enrich columns (feature transformation, label extraction)
After this step, there will only be **MORE** column ➕

In [5]:
from typing import List, Dict, Callable, Any, Tuple
from torchvision import transforms as tfm
from PIL import Image
from forgebox.html import DOM

In [6]:
class Phase:
    """
    A configuration management mechanism
    """
    is_phase = True
    def __init__(self, **kwargs):
        self.config = dict()
        self.config.update(kwargs)
        
    def __setitem__(self, k, v):
        self.config[k] = v
    
    def __getitem__(self, k):
        return self.config[k]
    
    def __call__(self):
        return self.get_data(self.config)
    
    def get_data(self, raw):
        """
        Reconstruct back to dict or list or value format
        """
        if hasattr(raw,"is_phase"):
            return raw.get_data(raw.config)
        if type(raw) == list:
            raw = list(self.get_data(i) for i in raw)
            return raw
        if type(raw) == dict:
            for k, v in raw.items():
                raw[k] = self.get_data(v)
            return raw
        return raw

    def __str__(self):
        return json.dumps(self(), indent=2)
    
    def __repr__(self,):
        return f"Phase:{self}"
    

class EnrichPhase(Phase):
    def __init__(self, *steps):
        super().__init__()
        self.config['steps'] = []
        for step in steps:
            checked = self.check_step(step)
            if checked:
                self.config['steps'].append(checked)
                
    def new_step(self, process, dst:str, src: str=None):
        self.config['steps'].append({
            "process":process,
            "src":src,
            "dst":dst
        })
    
    def check_step(self,step):
        return step

### typings for interactives

In [7]:
from ipywidgets import (
    Text, Textarea, IntSlider, SelectMultiple, Dropdown,
    Layout, Button
)
from typing import List, Dict, Any

In [8]:
class InteractiveTyping:
    """
    Typing for interactive details
    self.__call__() will create widgets directly
    """
    name = "anything"
    is_typing = True

    def solid(self, default) -> None:
        """
        Reset default value
        """
        if default is not None:
            self.default = default


class INT(InteractiveTyping):
    def __init__(self, min_: int = 0, max_: int = 10, step: int = 1, default: int = None):
        self.max_ = max_
        self.min_ = min_
        self.step = step
        self.default = default if default is not None else 1

    def __repr__(self):
        return f"int[{self.min_}-{self.max_}, :{self.step}]={self.default}"

    def __call__(self, default: int = None):
        self.solid(default)
        return IntSlider(
            value=self.default,
            min=self.min_,
            max=self.max_,
            step=self.step,
        )


class STR(InteractiveTyping):
    """
    String object
    will create text or textarea
    """

    def __init__(self, default: str = None, use_area: bool = False):
        """
        use_area: do we use Textarea, if False,we use Text
        """
        self.default = "" if default is None else default
        self.use_area = use_area

    def __repr__(self):
        return f"str='{self.default}'"

    def __call__(self, default: str = None):
        self.solid(default)
        if self.use_area:
            return Textarea(value=self.default, layout=Layout(width="80%"))
        return Text(value=self.default)


class LIST(InteractiveTyping):
    """
    dropdown list type or multiselection type
    """

    def __init__(self, options: List[Any] = [], default: Any = None, multi: bool = False):
        """
        if multi: default should be iterable
        else: default should be one of the option
        """
        self.options = options
        self.default = default
        self.multi = multi

    def __repr__(self):
        if self.multi:
            size = f"[0-{self.len(self.default)}]/{self.len(self.default)}"
        else:
            size = f"1/{self.len(self.default)}"
        return f"list,{size}"

    def __call__(self, default: Any = None):
        self.solid(default)
        if self.multi:
            inter = SelectMultiple(options=self.options)
        else:
            inter = Dropdown(options=self.options)

        if self.default is not None:
            # if multi: default should be iterable
            # else: default should be one of the option
            inter.value = self.default
        return inter

In [9]:
STR('RGB')()

Text(value='RGB')

In [14]:
class Enrich:
    is_enrich = True
    prefer = None
    lazy = False  # shall we execute enrichment only through the iteration
    def __init__(self): pass

    def __call__(self, row):
        return row


class EnrichImage(Enrich):
    """
    Create Image column from image path column
    """
    prefer = "QuantifyImage"
    typing = Image
    lazy = True

    def __init__(
        self, convert: STR("RGB") = "RGB",
        size: LIST(options=[28,128,224, 256, 512], default=224) = 224,
    ):
        self.convert = convert
        self.size = size

    def __repr__(self):
        return f"[Image:{self.size}]"

    def __call__(self, x):
        img = Image.open(x).convert(self.convert)
        img = img.resize((self.size, self.size))
        return img


class ParentAsLabel(Enrich):
    def __call__(
        path: Path,
    ) -> str:
        """
        Use parent folder name as label
        """
        return Path(path).parent.name

### Initialize => interact 

In [15]:
def print_kwargs(kwargs):
    print(kwargs)
    return kwargs

def reconfig_manual_interact(widget, description:str="Create"):
    btn=None
    for w in widget.children:
        if type(w)==Button:
            btn = w
            break
    btn.description=description
    return btn

def init_interact(cls, result_cb:Callable=print_kwargs):
    """
    Initialize a class with interactive features
    """
    annotations = cls.__init__.__annotations__
    defaults = cls.__init__.__defaults__
    kwargs = dict()
    if defaults is not None:
        for (k,typing), default in zip(annotations.items(), defaults):
            kwargs.update({k:typing(default)})
    obj = dict()
    def fillin_init(**kwargs):
        obj.update({
            "kwargs":kwargs,
        })
    f= interact_manual(fillin_init, **kwargs)

    

    btn = reconfig_manual_interact(f.widget)
    if btn is not None:
        original = btn.click
        
        def new_click_event():
            original()
            return result_cb(obj['kwargs'])
        btn.click = new_click_event
    
    return obj, f

In [16]:
obj,f = init_interact(EnrichImage)

interactive(children=(Text(value='RGB', description='convert'), Dropdown(description='size', index=2, options=…

In [17]:
ENRICHMENTS = dict(
    EnrichImage=EnrichImage,
    ParentAsLabel=ParentAsLabel,
)

In [18]:
phase = Phase()

In [177]:
bear_df = df_creator_image_folder(BEAR_DATASET)

bear_df

Unnamed: 0,path
0,/Users/xiaochen.zhang/Downloads/bear_dataset/b...
1,/Users/xiaochen.zhang/Downloads/bear_dataset/t...
2,/Users/xiaochen.zhang/Downloads/bear_dataset/g...
3,/Users/xiaochen.zhang/Downloads/bear_dataset/t...
4,/Users/xiaochen.zhang/Downloads/bear_dataset/t...
...,...
517,/Users/xiaochen.zhang/Downloads/bear_dataset/t...
518,/Users/xiaochen.zhang/Downloads/bear_dataset/b...
519,/Users/xiaochen.zhang/Downloads/bear_dataset/b...
520,/Users/xiaochen.zhang/Downloads/bear_dataset/b...


In [51]:
from ipywidgets import VBox, HBox, HTML

In [52]:
vbx = VBox(list([HTML("23")]))
vbx

VBox(children=(HTML(value='23'),))

In [54]:
vbx.children = list(vbx.children)

In [58]:
children = list(vbx.children)
children.append(HTML('42'))

In [60]:
vbx.children = children

In [70]:
id(vbx)

5267650000

In [98]:
class MoreOrLess(VBox):
    """
    Interactive list
    """
    def __init__(self, ):
        super().__init__([])
        self.list_ = list()
        
    def create_line(self, data):
        children = list(self.children)
        children.append(self.new_line(data))
        self.children = children
        
    def new_line(self, data) -> HBox:
        del_btn = Button(description="🗑Remove")
        hbox = HBox([del_btn, HTML(json.dumps(data))])
        hbox.data = data
        def remove_hbox():
            children = list(self.children)
            for i, c in enumerate(children):
                if id(c)==id(hbox):
                    children.remove(c)
            self.children = children
        del_btn.click = remove_hbox
        return hbox
        
    def __add__(self, data):
        self.list_.append(data)
        self.create_line(data)
        return self

In [99]:
mol = MoreOrLess()

In [None]:
mol+{}

In [100]:
def set_enrich(df):
    DOM(f"{len(df)} rows of data, example table", "h3")()
    display(df.sample(5))
    def setting_col():
        @interact_manual
        def set_enrich_(src = ["[all_columns]",]+list(df.columns)):
            DOM(f"Setting up: {src}", "h4")()
            if src=="[all_columns]":
                display(df.head())
            else:
                display(df[[src,]].head())
            @interact_manual
            def choose_enrich(dst="", enrich = ENRICHMENTS):
                DOM(f"Source: {src}, Destination: {dst}, for {enrich.__class__.__name__}", "h4")()
                DOM(f"{enrich.__doc__}", "quote")()
                obj = init_interact(enrich)
    setting_col()

In [195]:
set_enrich(bear_df)

Unnamed: 0,path
254,/Users/xiaochen.zhang/Downloads/bear_dataset/g...
301,/Users/xiaochen.zhang/Downloads/bear_dataset/g...
271,/Users/xiaochen.zhang/Downloads/bear_dataset/g...
351,/Users/xiaochen.zhang/Downloads/bear_dataset/t...
387,/Users/xiaochen.zhang/Downloads/bear_dataset/g...


interactive(children=(Dropdown(description='src', options=('[all_columns]', 'path'), value='[all_columns]'), B…

In [14]:
bear_df = enrich_parent_as_label(bear_df,)
bear_df

Unnamed: 0,path,img,parent
0,/Users/xiaochen.zhang/Downloads/bear_dataset/b...,[Image],black
1,/Users/xiaochen.zhang/Downloads/bear_dataset/b...,[Image],black
2,/Users/xiaochen.zhang/Downloads/bear_dataset/b...,[Image],black
3,/Users/xiaochen.zhang/Downloads/bear_dataset/t...,[Image],teddys
4,/Users/xiaochen.zhang/Downloads/bear_dataset/b...,[Image],black
...,...,...,...
517,/Users/xiaochen.zhang/Downloads/bear_dataset/b...,[Image],black
518,/Users/xiaochen.zhang/Downloads/bear_dataset/g...,[Image],grizzly
519,/Users/xiaochen.zhang/Downloads/bear_dataset/g...,[Image],grizzly
520,/Users/xiaochen.zhang/Downloads/bear_dataset/b...,[Image],black


### 


### Quantify: Choose columns as X and Y, put them into number

In [15]:
class Quantify:
    is_quantify = True
    """
    Transform list of things to torch tensor
    """
    def __call__(self, list_of_items):
        return torch.Tensor(list_of_items)


class QuantifyImage(Quantify):
    """
    Transform PIL.Image to tensor
    """

    def __init__(
        self,
        image_size: Tuple[int] = (224, 224),
        mean_=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ):
        self.transform = tfm.Compose([
            tfm.Resize(image_size),
            tfm.ToTensor(),
            tfm.Normalize(mean=mean_, std=std),
        ])

    def __call__(self, list_of_image):
        return self.transform(list_of_image)


class QuantifyCategory(Quantify):
    """
    Transform single categorical data to index numbers in pytorch tensors
    """

    def __init__(self, col_name: str, min_frequency: int = 5):
        self.col_name = col_name
        self.min_frequency = min_frequency

    def summarize_category(self, df):
        df = pd.DataFrame(df)
        value_counts = df.vc(self.col_name)
        categories = np.array(
            list(value_counts.index[value_counts.values > self.min_frequency]))
        self.category = Category(arr=categories, pad_mst=True)
        self.num_category = len(self.category)

    def __call__(self, list_of_strings):
        return torch.LongTensor(self.category.c2i[np.array(list_of_strings)])


class QuantifyMultiCategory(Quantify):
    """
    Turn Multi-categorical data to n_hot encoding numbers in pytorch tensors
    """
    def __init__(self, col_name: str):
        self.col_name = col_name
        
QUANTIFY = dict(
    Quantify=Quantify,
    QuantifyImage=QuantifyImage,
    QuantifyCategory=QuantifyCategory,
    QuantifyMultiCategory=QuantifyMultiCategory
)

In [None]:
def set_column(df):
    _, sample_row = next(df.iterrows())
    @interact_manual
    def 

###  Choose your model, loss

### Training