<span style="font-family:PT Sans Narrow;"> 

# <span style="color:orange; font-size:1.31em"> Main Idea: </span>

Train for 20% of chest data with typical loss function (e.g., cross entropy). Then look at the loss and probability values for parent and child classes. see if we can fit a function to them or if each of the above candidates would improve the accuracy. 
Also, to prove the claim that it works for all chest x-rays, we can repeat the same thing on other datasets like Stanford. 

- Fit for this:  $ p_{i} = g (q_{i} , q_{ancestor(i,j)}) $

- Then use this: $ \hat{q}_{i} = g (q_{i} , q_{ancestor(i,j)} )$

- Loss would be: $f(x,y)=-p_{i} (x) * \log⁡{\hat{q}_{i} (x)} -  \left(1 - p_{i}(x) \right) * \log{ \left(1-\hat{q}_{i} (x) \right)}$

### Procedure:

- Take 30% of dataset for training 
- Train a network on all 30% (make sure it converges)
- Take 20% of the instances’ output (predicted probabilities) to fit a function
- Repeat the (3) for the aforementioned candidates.
- Use the fitted function on the remaining 10% to make sure it’s not overfitted
- Select the final candidate for fitting function
- Apply the final fitting function on the whole training dataset during a full training of the dataset.

 </span>


In [1]:
!python --version

Python 3.8.13


In [1]:
%reload_ext autoreload
%autoreload 2

import tensorflow as tf
from main.utils import funcs
import pandas as pd
import numpy as np
import warnings


# setup tensorflow to only use GPU memory as needed
# physical_devices = tf.config.experimental.list_physical_devices('GPU')
# for physical_device in physical_devices:
#     tf.config.experimental.set_memory_growth(physical_device, True)

warnings.filterwarnings('ignore')

%reload_ext main.utils.funcs

  def resize(img, size, interpolation=Image.BILINEAR):
  def perspective(img, perspective_coeffs, interpolation=Image.BICUBIC, fill=None):


Metal device set to: Apple M1

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB



2022-11-03 18:26:48.140351: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-11-03 18:26:48.140596: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [3]:
dataset = 'chexpert' # 'cifar100' , 'chexpert
mode = 'train'

### <span style="font-family:PT Sans Narrow; font-size:1em"> Setting up mlflow config  </span>

In [12]:
aim1_1 = funcs.AIM1_1(run_mlflow=True, experiment_name='label_inter_dependence', run_name=dataset, new_run=True, dataset=dataset)

Connecting to the server...






************************************************
Access is restricted to AUTHORIZED USERS only! If
you are not authorized LEAVE NOW!
************************************************



bind [127.0.0.1]:5000: Address already in use
channel_setup_fwd_listener_tcpip: cannot listen to port: 5000
Could not request local forwarding.


Killing all active runs...
Setting up the experiment...
setting the tracking URI
setting/creating the experiment
Running the session...


client_loop: send disconnect: Broken pipe
client_loop: send disconnect: Broken pipe
client_loop: send disconnect: Broken pipe


In [9]:
p = aim1_1.mlflow_ui(VIEW_PORT=6789)

MLFlow UI is already running on localhost:6789


Connection to localhost port 6789 [tcp/smc-https] succeeded!


### <span style="font-family:PT Sans Narrow; font-size:1em"> Getting the parent info  </span>

In [5]:
print(aim1_1.max_sample)

1000


### <span style="font-family:PT Sans Narrow; font-size:1em"> Loading the data  </span>

In [6]:
dir = '/Users/personal-macbook/Documents/PhD/dataset/chexpert/'
data_loader = aim1_1.get_data(dataset=dataset , data_mode=mode, max_sample=100000000, site='local', dir=dir) 

before sample-pruning
train: (223414, 19)
test: (234, 19)

after sample-pruning
train (certain): (86920, 20)
train (uncertain): (52940, 20)
valid: (21730, 20)
test: (169, 20) 

Found 86920 validated image filenames.
Found 21730 validated image filenames.


In [7]:
# data_loader.coarse_fine_label_map_df['ids']
next(data_loader.generators['valid'])[0].shape

(128, 224, 224, 3)

### <span style="font-family:PT Sans Narrow; font-size:1em"> Getting the optimized model from server  </span>

In [8]:
weights = '/Users/personal-macbook/Documents/PhD/code/my_main_code/main/pre_trained_models/efficient_b0_model_freeze_best.pth'
model = aim1_1.get_model(data_loader=data_loader, optimize_model=True, epochs=30, architecture_name='EfficientNetB0' , weights=weights, num_classes=14)

2022-10-30 20:45:13.776718: W tensorflow/core/util/tensor_slice_reader.cc:96] Could not open /Users/personal-macbook/Documents/PhD/code/my_main_code/my_main_code/pre_trained_models/efficient_b0_model_freeze_best.pth: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?


OSError: Unable to open file (file signature not found)

In [20]:
next( data_loader.generators['train_with_augments']  )[1]

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

### <span style="font-family:PT Sans Narrow; font-size:1em"> Getting the optimization values </span>

In [21]:
if dataset == 'chexpert':
    x,y = next(data_loader.generators['test'])
    
elif dataset =='cifar100':
    x,y = np.array( data_loader.dataframes['test'].data.to_list())  ,  np.array( data_loader.dataframes['test'].labels.to_list()) 


df_score = aim1_1.get_optimization_values(x=x ,y=y , labels=data_loader.label_names )

df_score.head()

Unnamed: 0,loss_avg,acc,pred,pred_binary,truth,loss
0,0.101001,"[0.0, 0.0]","[0.042432427, 0.0053589987, 0.00010218861, 0.0...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[0.043358862, 0.005373285, 0.00010204836, 0.04..."
1,0.030085,"[1.0, 1.0]","[0.0014294072, 0.016350685, 0.082793, 0.000565...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[0.0014302821, 0.016485713, 0.08642196, 0.0005..."
2,0.071144,"[0.0, 0.0]","[0.1702439, 0.018769003, 0.0022778984, 0.00211...","[False, False, False, False, False, False, Fal...","[True, False, False, False, False, False, Fals...","[1.7705226, 0.018947277, 0.0022803897, 0.00211..."
3,0.070066,"[0.0, 0.0]","[0.53047, 0.026757104, 0.0022328799, 0.0453493...","[True, False, False, False, False, False, Fals...","[False, False, False, False, True, False, Fals...","[0.75602293, 0.027121486, 0.0022352864, 0.0464..."
4,0.008137,"[1.0, 1.0]","[0.0017710548, 0.020554377, 0.00029985755, 0.0...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[0.0017724836, 0.020768423, 0.0002997967, 0.03..."


### <span style="font-family:PT Sans Narrow; font-size:1em">  Measuring updated results for all proposed techniques </span>

In [22]:
# updated_values, accuracies = aim1_1.measure_new_predicted_probabilities(hierarchy=data_loader.hierarchy, weight=1, bias=0.5, METRIC_USED_TO_MEASURE_COEFFICIENT='pred', MODE_UPDATE_LOSS='pred', THRESHOLD_UPDATE_LOSS=0.5)

print('coarse accuracy')
print( accuracies.coarse.mean() )

print('\nfine accuracy')
print( accuracies.fine.mean() )

coarse accuracy
original    0.4743
new         0.4743
dtype: float32

fine accuracy
original    0.3550
new         0.3567
dtype: float32


### <span style="font-family:PT Sans Narrow; font-size:1em">  Checking the results per subject </span>

In [13]:
subjcet_index = 3

data_subject = {}
for name in ['coefficient' ,'loss' , 'truth' , 'pred' , 'pred_new']:
    data_subject[name] = updated_values[name].iloc[subjcet_index]

pd.DataFrame(data_subject).round(decimals=4)

Unnamed: 0,coefficient,loss,truth,pred,pred_new
aquatic_mammals,1.0000,0.3887,0,0.3221,0.3221
fish,1.0000,0.0323,0,0.0318,0.0318
flowers,1.0000,0.0057,0,0.0057,0.0057
food_containers,1.0000,0.0250,0,0.0246,0.0246
fruit_and_vegetables,1.0000,3.1148,1,0.0444,0.0444
...,...,...,...,...,...
whale,0.8221,0.0088,0,0.0088,0.0072
willow_tree,0.5455,0.0015,0,0.0015,0.0008
wolf,0.6442,0.0524,0,0.0511,0.0329
woman,0.5027,0.0008,0,0.0008,0.0004


# <span style="color:orange; font-family:PT Sans Narrow; font-size:1em"> killing the mlflow & ssh sessions </span>

In [11]:
# closing the child mlflow session
aim1_1.cleanup_mlflow_after_runs()

# closing the ssh session
aim1_1.ssh_session.kill()

# <span style="color:orange; font-family:PT Sans Narrow; font-size:1em"> Tensorflow-Hub Model </span>

In [None]:
import matplotlib.pyplot as plt
import tensorflow_hub as hub

model = hub.load("https://tfhub.dev/rishit-dagli/swin-transformer/1")

subject_index = 59

im = data_loader.dataframes['train'].loc[subject_index,'data']

im = tf.image.convert_image_dtype(im[np.newaxis , ...], tf.float32)

# pred = model(im)

# pred_label = np.argmax(pred)

# true_label = np.where( data_loader.dataframes['train'].loc[subject_index,'labels'])[0]

# print('true_label', (true_label[0] , true_label[1]-20) , 'pred_label', pred_label)

# pred = pred.numpy().reshape(-1)
# plt.plot( pred )

In [None]:
# import torch, torchvision
# # from torchvision.models import resnet50, ResNet50_Weights

In [None]:
# torchvision.models.Inception3()

In [None]:
# torch.hub.list('pytorch/vision')