# MLLib Development Notes
April 2022

Reverse time order - Newest on top

8 April 2022
- Training LIT
- Increasing the learning rate 1 2e-4 appears to have pushed the model convergence to train the network to find nothing - always background
- There is a heavy class imbalance
- Adding class weighting to full training
- Next work on switching datasets with parameter
- Add cityscapes dataset
- Add denoise dataset
- Pytorch [Dataset and Dataloaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)
- Uses: load, split, train, test, view
- [datasets/cocostore.py](../datasets/cocostore.py) - loads coco into torch dataset from S3
- [datasets/imstore.py](../datasets/imstore.py) - loads directories of images into torch dataset
- [dataset/citytorch.py](../datasets/citytorch.py) - loads cityscapes to a torch dataset 
- [OpenImages dataset](https://storage.googleapis.com/openimages/web/factsfigures.html)
- Disable augmentation on validation images

4 April 2022
- Scale annealing and prune basis based on start & end points.  
- Switch to exponential function where start and end point are specified
- Initially grow based on epochs
- Experiment with growing based on step to make training more independent of dataset size

2 April 2022
- CRISP training with pruning basis function crisplit_20220401i_pb00_
- Pruned 94% of network while increasing segmentation similarity
- May achieve~ 25X speedup
- Pruning search not strictly 1 and 0.  


[T00cw]: ../img/crisplit_20220401i_pb00_00_cw.png
[T01cw]: ../img/crisplit_20220401i_pb00_01_cw.png
[T02cw]: ../img/crisplit_20220401i_pb00_02_cw.png
[T03cw]: ../img/crisplit_20220401i_pb00_03_cw.png
[T00gn]: ../img/crisplit_20220401i_pb00_00_gn.png
[T01gn]: ../img/crisplit_20220401i_pb00_01_gn.png
[T02gn]: ../img/crisplit_20220401i_pb00_02_gn.png
[T03gn]: ../img/crisplit_20220401i_pb00_03_gn.png
|  | normalized-train | train | train_fine | prune |
|:--:|:--:|:--:|:--:|:--:|
|Cross Entropy Loss|0.163|0.0068|0.0112|0.005|
|Reamining Ratio   |1.0  |0.132 |0.131 |0.057|
|Test similarity   |     |      |0.113 |0.817|
| Prune Weights |![][T00cw]|![][T01cw] |![][T02cw] |![][T03cw]|
| Gradient Norm |![][T00gn]|![][T01gn] |![][T02gn] |![][T03gn]|

- Final losses before pruning:

| Loss               | Value    |
|:------------------:|:--------:|
| cross_entropy_loss | 0.011264 |
| architecture_loss  | 0.001306 |
| prune_loss         | 0.000196 |

- Increasing prune loss should incrase this sepration.


In [None]:
class Exponential():
    def __init__(self,vx=0.0, vy=0.0, px=1.0, py=1.0, power=2.0):
        self.vx = vx
        self.vy = vy
        self.px = px
        self.py = py
        if power < 0:
            raise ValueError('Exponential error power {} must be >= 0'.format(power))
        self.power = power
        if px <= vx:
            raise ValueError('Exponential error px={} must be > vx'.format(px, vx))
        else:
            self.a = (py-vy)/np.power(px-vx,power)
    def f(self, x):
        dx = x-self.vx
        y = self.a*np.power(x-self.vx,self.power) + self.vy
        return y

vx = -1
px = 1
expf =  Exponential(vx=vx, vy=0.0, px=px, py=1.0, power=2.0)

x = np.arange(vx, px, 0.01)
plt.plot(x, expf.f(x))
plt.show()

1 April 2022
- Prune loss basis function training to 0 size
- This resulted in much more agressive pruning
- Slightly worse cross entropy loss for train and train_fine
- Much worse cross entropy loss for pruned network
- The tensorboard plots show a similar timing of the architecture reduction curves but with prune_loss, the architecture settles at a lower level.
- Architecture reduction occurs early and converges rapidly compared to cross entropy loss.  
- Reduce k_structure from 0.03 to 0.1 to slow architecture search
- Increase k_prune_basis from 0.01 to 0.3 to see if the resuls in eliminating fence sittes more effectively
- Increase k_prune_exp from 5.0 to 50.0 apply prunce basis later in training

[T00cw]: ../img/crisplit_20220331h_pb0_00_cw.png
[T01cw]: ../img/crisplit_20220331h_pb0_01_cw.png
[T02cw]: ../img/crisplit_20220331h_pb0_02_cw.png
[T03cw]: ../img/crisplit_20220331h_pb0_03_cw.png
[T00gn]: ../img/crisplit_20220331h_pb0_00_gn.png
[T01gn]: ../img/crisplit_20220331h_pb0_01_gn.png
[T02gn]: ../img/crisplit_20220331h_pb0_02_gn.png
[T03gn]: ../img/crisplit_20220331h_pb0_03_gn.png
|  | normalized-train | train | train_fine | prune |
|:--:|:--:|:--:|:--:|:--:|
|Cross Entropy Loss|0.164|0.032|0.020|0.040|
|Reamining Ratio|1.0|0.050|0.041|0.00024|
|Test similarity|  |  |0.106|0.028|
| Prune Weights |![][T00cw]|![][T01cw] |![][T02cw] |![][T03cw]|
| Gradient Norm |![][T00gn]|![][T01gn] |![][T02gn] |![][T03gn]|

- Signmod scale as linear function stiffens very quickly.  Want it to be gradual at the beginning and stiffen towards the end of the training


In [None]:
def SigmoidScale(step, start_x = 0, end_x = 25, start_y = 0, end_y=100):
    kSigmoid = start_y + (end_y-start_y)*step/(end_x-start_x)
    return kSigmoid
x = np.arange(0.0, 30.0, 0.01)
k_prune_exp = 1
sigmoid_scale = 5
sigmoid_scale_exp = 0.25
plt.plot(x, SigmoidScale(x, end_x=25, end_y=150))
plt.show()

x = np.arange(-1.0, 1.0, 0.01)
step =3
kSigmoid = SigmoidScale(step, end_x=25, end_y=150)
print('kSigmoid({})={}'.format(step, kSigmoid))
plt.plot(x, sigmoid(x, k=kSigmoid))
plt.show()

12 April 2022
- Added imstore.py CreateDataLoaders to load unique datasets for any number of sets with unique parameters 
- Moving imstore common processing to ImUtil
- Move into  library for dataset loading 

13 April 2022
- updated [pymlutil](https://github.com/bhlarson/pymlutil) to deliver new tag and pipy version when excuting ./setup
- Added pattern for documentation

2 May 2022
- The past couple of weeks, I have been splitting off code 
- The python library [pymlutil](https://pypi.org/project/pymlutil/) contains general utility functions
- The python library [torchdatasetuitl](https://pypi.org/project/torchdatasetutil/) contains creating and using datasets
- I will be removing this code from pymlutil soon.
- One goal for this is to enable the dataset to be selected as a parameter
- To do this the input channels and format must to be specified in the network creation.
- I am adding class_dictionary['input_channels'] and class_dictionary['input_type'] to the class_dictionary.  This way, selecting the class dictionary defines the input type
- If the input dataset can be selected by parameter rather than requiring code changes, I would like to enable transfer learning across training sets.
- I would like to pick up the Torch pretrained [UNET](https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/) and [Deeplab V3](https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/)
- To enable transfer learning, I need to change from torch.load to model.load_state_dict: https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html
- torch.load loads the pickled python object which I needed to do previously because pruned state in class variables - e.g. - convolution size after pruning is complete
- Transfer learning only may only work for unpruned models
- After a model is pruned, how would transfer learning from an unpruned model work?  
    - require model pruning weights
    - copy parameters based on pruning weights
    - easier and same result if you initialize the model just before pruning,
    - initialize parameters from transfer learning
    - prune model with weights from transfer learning

    "input_channels":3,
    "input_type": "float32",
- For transfer learning, need to handle changes in input channel depth from 1 to 3 and visa versa - for datasets with grayscale and RGB inputs

10 May 2022
- CRISP training of UNET on the LIT dataset takes a few hours
- The same parameters on COCO take a few days
- The COCO dataset is much more diverse than the LIT dataset
- The COCO dataset is 100x larger than the LIT dataset
- A6000 GPU efficiency is 70% on COCO
- I am only using half the available A6000 memory.  Increasing this should reduce the training time
- Look at Deeplab V3 with CRISP