# Fetch experiments data from Neptune using [Query API](https://docs.neptune.ai/python-api/query-api.html)

This notebooks show example usage of the query API. It is set of Python methods that let you fetch experiments data from neptune. This notebook presents some use cases of analysis with the data downloaded from Neptune system.

## Methods
This notebook covers most common methods like:

1. [get_experiments()](https://docs.neptune.ai/neptune-client/docs/project.html#neptune.projects.Project.get_experiments) - get a list of the [Experiment objects](https://docs.neptune.ai/neptune-client/docs/experiment.html). We will need them to fetch data from selected experiments.
1. [get_leaderboard()](https://docs.neptune.ai/neptune-client/docs/project.html#neptune.projects.Project.get_leaderboard) - get experiments table as a pandas DataFrame. Example experiment table is [here](https://ui.neptune.ai/o/USERNAME/org/example-project/experiments?viewId=6013ecbc-416d-4e5c-973e-871e5e9010e9).
1. [get_hardware_utilization()](https://docs.neptune.ai/neptune-client/docs/experiment.html#neptune.experiments.Experiment.get_hardware_utilization) - for the Experiment in question, get hardware utilization metrics as pandas DataFrame ([example metrics](https://ui.neptune.ai/o/USERNAME/org/example-project/e/HELLO-177/monitoring)).
1. [get_logs()](https://docs.neptune.ai/neptune-client/docs/experiment.html#neptune.experiments.Experiment.get_logs) - get dict, where keys are log names and values are Channel objects.
1. [get_numeric_channels_values()](https://docs.neptune.ai/neptune-client/docs/experiment.html#neptune.experiments.Experiment.get_numeric_channels_values) - get values of numeric logs as pandas DataFrame ([example logs](https://ui.neptune.ai/o/USERNAME/org/example-project/e/HELLO-177/charts)).

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [2]:
from utils.tokens import NEPTUNE_API_TOKEN
import neptune
from scipy.stats import hmean
import pandas as pd
from training_utils import problem_kind
import plotly.express as px

# Set project to work with (as usual)

In [3]:
project = neptune.init('createrandom/mus-RQ1',
                       api_token=NEPTUNE_API_TOKEN)



# Visualize metrics

`get_experiments()` below

In [41]:
attribute = 'Sex'
problem_type = problem_kind[attribute]
print(problem_type)
# get experiments objects that satisfy all three conditions. Note that running time is in seconds.
experiments = project.get_experiments(tag=attribute)

binary


In [42]:
len(experiments)

27

In [43]:
machines = ['ESAOTE_6100/val/', 'Philips_iU22/val/']
metric_mapping = {'regression': ['mae'],
              'binary': ['accuracy', 'p', 'r']}

metric_list = metric_mapping[problem_type]

logs_names = []
for machine in machines:
    for metric in metric_list:
        final_metric = machine + metric
        logs_names.append(final_metric)


In [44]:
metrics_df = pd.DataFrame(columns=['id', *logs_names])
for experiment in experiments:
    df = experiment.get_numeric_channels_values(*logs_names)  # get logs values
   # print(df)
   # df['tags'] = experiment.get_tags()
   # params = experiment.get_parameters()
    output = df.append(df, ignore_index=True)
    df.insert(loc=0, column='id', value=experiment.id)
    metrics_df = metrics_df.append(df, sort=True)


In [51]:
def compute_f1_esaote(entry):
    return hmean([entry['ESAOTE_6100/val/p'], entry['ESAOTE_6100/val/r']])

def compute_f1_philips(entry):
    return hmean([entry['Philips_iU22/val/p'], entry['Philips_iU22/val/r']])

if problem_type == 'binary':
    metrics_df['ESAOTE_6100/val/f1'] = metrics_df.apply(compute_f1_esaote, axis=1)
    metrics_df['Philips_iU22/val/f1'] = metrics_df.apply(compute_f1_philips, axis=1)
  #  metrics_df['val_f1_gap'] = metrics_df['ESAOTE_6100/val/f1']  -metrics_df['Philips_iU22/val/f1']
else:
    metrics_df['val_mae_gap'] = metrics_df['ESAOTE_6100/val/mae']  -metrics_df['Philips_iU22/val/mae']

metrics_df.rename(columns={'x': 'epoch'},inplace=True)
metrics_df.head(n=5)

Unnamed: 0,ESAOTE_6100/val/accuracy,ESAOTE_6100/val/p,ESAOTE_6100/val/r,Philips_iU22/val/accuracy,Philips_iU22/val/p,Philips_iU22/val/r,id,epoch,ESAOTE_6100/val/f1,Philips_iU22/val/f1
0,0.77451,0.785047,0.785047,0.50463,0.528205,0.872881,MUS1-464,1.0,0.785047,0.658147
1,0.769608,0.8,0.747664,0.638889,0.80303,0.449153,MUS1-464,2.0,0.772947,0.576087
2,0.764706,0.790476,0.761468,0.611111,0.646552,0.635593,MUS1-464,3.0,0.775701,0.641026
3,0.794118,0.761905,0.888889,0.560185,0.567251,0.822034,MUS1-464,4.0,0.820513,0.67128
4,0.789216,0.762295,0.869159,0.583333,0.584337,0.822034,MUS1-464,5.0,0.812227,0.683099


In [52]:
# grab the best scoring epoch for each experiment
if problem_type == 'binary':
    best_scores = metrics_df.sort_values(['ESAOTE_6100/val/f1'], ascending=[False]).groupby('id').first()
else:
    best_scores = metrics_df.sort_values(['ESAOTE_6100/val/mae'], ascending=[True]).groupby('id').first()

In [53]:
all_data = project.get_leaderboard(tag=attribute).set_index('id').convert_dtypes()
metrics_df['id']=metrics_df['id'].astype(str)
plot_frame = best_scores.join(all_data)
plot_frame.head(n=10)

Unnamed: 0_level_0,ESAOTE_6100/val/accuracy,ESAOTE_6100/val/p,ESAOTE_6100/val/r,Philips_iU22/val/accuracy,Philips_iU22/val/p,Philips_iU22/val/r,epoch,ESAOTE_6100/val/f1,Philips_iU22/val/f1,name,...,parameter_mil_pooling,parameter_n_epochs,parameter_n_params_backend,parameter_n_params_classifier,parameter_n_params_pooling,parameter_prediction_target,parameter_problem_type,parameter_source_train,parameter_use_pseudopatients,parameter_val
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MUS1-464,0.848039,0.848214,0.87156,0.569444,0.567568,0.889831,9.0,0.859729,0.693069,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-465,0.897059,0.88,0.907216,0.606481,0.569892,0.540816,5.0,0.893401,0.554974,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-466,0.877451,0.846154,0.907216,0.564815,0.7,0.071429,5.0,0.875622,0.12963,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-467,0.867647,0.927083,0.816514,0.62037,0.669811,0.601695,10.0,0.868293,0.633929,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-468,0.862745,0.831683,0.884211,0.587963,0.666667,0.183673,9.0,0.857143,0.288,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-469,0.872549,0.90099,0.850467,0.648148,0.659091,0.737288,5.0,0.875,0.696,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-470,0.852941,0.94382,0.770642,0.615741,0.619048,0.771186,10.0,0.848485,0.686792,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-471,0.857843,0.97619,0.752294,0.587963,0.808511,0.322034,10.0,0.849741,0.460606,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-472,0.857843,0.884615,0.844037,0.606481,0.620438,0.720339,9.0,0.86385,0.666667,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"
MUS1-473,0.862745,0.893204,0.844037,0.601852,0.591954,0.872881,9.0,0.867925,0.705479,Untitled,...,,10.0,11177025.0,,,Sex,image,ESAOTE_6100_train,,"['ESAOTE_6100_val', 'Philips_iU22_val']"


In [54]:
params_of_interest = ['epoch', 'parameter_problem_type', 'parameter_mil_pooling', 'parameter_lr']
to_include =  params_of_interest + logs_names
# TODO add the gaps back in
if problem_type == 'binary':
    to_include.append('ESAOTE_6100/val/f1')
    to_include.append('Philips_iU22/val/f1')
    comp_frame = plot_frame.sort_values('ESAOTE_6100/val/f1', ascending=False)[to_include]
else:
    comp_frame = plot_frame.sort_values('ESAOTE_6100/val/mae')[to_include]
    

comp_frame['parameter_mil_pooling'] = comp_frame['parameter_mil_pooling'].fillna('NA')
comp_frame

Unnamed: 0_level_0,epoch,parameter_problem_type,parameter_mil_pooling,parameter_lr,ESAOTE_6100/val/accuracy,ESAOTE_6100/val/p,ESAOTE_6100/val/r,Philips_iU22/val/accuracy,Philips_iU22/val/p,Philips_iU22/val/r,ESAOTE_6100/val/f1,Philips_iU22/val/f1
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
MUS1-465,5.0,image,,0.0788480377114249,0.897059,0.88,0.907216,0.606481,0.569892,0.540816,0.893401,0.554974
MUS1-475,10.0,image,,0.0347279111820452,0.897059,0.87,0.915789,0.606481,0.651163,0.285714,0.892308,0.397163
MUS1-466,5.0,image,,0.0929042484447034,0.877451,0.846154,0.907216,0.564815,0.7,0.071429,0.875622,0.12963
MUS1-469,5.0,image,,0.0351261200968086,0.872549,0.90099,0.850467,0.648148,0.659091,0.737288,0.875,0.696
MUS1-478,8.0,bag,mean,0.013850773307231,0.86,0.861111,0.877358,0.569444,0.569832,0.864407,0.869159,0.686869
MUS1-467,10.0,image,,0.054686828018487,0.867647,0.927083,0.816514,0.62037,0.669811,0.601695,0.868293,0.633929
MUS1-473,9.0,image,,0.0877621876261126,0.862745,0.893204,0.844037,0.601852,0.591954,0.872881,0.867925,0.705479
MUS1-485,10.0,bag,mean,0.0183940305879178,0.84,0.781955,0.971963,0.652778,0.686957,0.669492,0.866667,0.678112
MUS1-472,9.0,image,,0.0794512734435762,0.857843,0.884615,0.844037,0.606481,0.620438,0.720339,0.86385,0.666667
MUS1-464,9.0,image,,0.096753966942173,0.848039,0.848214,0.87156,0.569444,0.567568,0.889831,0.859729,0.693069


In [40]:
if problem_type == 'binary':
    print(comp_frame.groupby(['parameter_problem_type','parameter_mil_pooling']).min()['val_f1_gap'])
else:
    print(comp_frame.groupby(['parameter_problem_type','parameter_mil_pooling']).max()['val_mae_gap'])

KeyError: 'val_mae_gap'

In [78]:
plot_frame['parameter_lr'] = plot_frame['parameter_lr'].astype(float)
#plot_frame['parameter_backend_lr'] = plot_frame['parameter_backend_lr'].astype(float)

#plot_frame.drop(columns=['tags'], inplace=True)
fig = px.parallel_coordinates(plot_frame, dimensions=['parameter_lr', 'ESAOTE_6100/val/f1'])
fig.show()



ValueError: Value of 'dimensions_1' is not the name of a column in 'data_frame'. Expected one of ['ESAOTE_6100/val/mae', 'Philips_iU22/val/mae', 'epoch', 'val_mae_gap', 'name', 'created', 'finished', 'owner', 'notes', 'running_time', 'size', 'tags', 'channel_ESAOTE_6100/val/loss', 'channel_ESAOTE_6100/val/mae', 'channel_ESAOTE_6100/val/max_att', 'channel_ESAOTE_6100/val/mean', 'channel_ESAOTE_6100/val/mean_att', 'channel_ESAOTE_6100/val/min_att', 'channel_ESAOTE_6100/val/var', 'channel_ESAOTE_6100/val/var_att', 'channel_ESAOTE_6100/val_image/loss', 'channel_ESAOTE_6100/val_image/mae', 'channel_ESAOTE_6100/val_image/mean', 'channel_ESAOTE_6100/val_image/var', 'channel_Philips_iU22/val/loss', 'channel_Philips_iU22/val/mae', 'channel_Philips_iU22/val/max_att', 'channel_Philips_iU22/val/mean', 'channel_Philips_iU22/val/mean_att', 'channel_Philips_iU22/val/min_att', 'channel_Philips_iU22/val/var', 'channel_Philips_iU22/val/var_att', 'channel_Philips_iU22/val_image/loss', 'channel_Philips_iU22/val_image/mae', 'channel_Philips_iU22/val_image/mean', 'channel_Philips_iU22/val_image/var', 'channel_stderr', 'channel_stdout', 'channel_training/loss', 'channel_training/mae', 'channel_training/max_att', 'channel_training/mean', 'channel_training/mean_att', 'channel_training/min_att', 'channel_training/var', 'channel_training/var_att', 'parameter_attention_mode', 'parameter_backend', 'parameter_backend_cutoff', 'parameter_backend_lr', 'parameter_backend_mode', 'parameter_batch_size', 'parameter_fc_hidden_layers', 'parameter_fc_use_bn', 'parameter_lr', 'parameter_mil_mode', 'parameter_mil_pooling', 'parameter_n_epochs', 'parameter_n_params_backend', 'parameter_n_params_classifier', 'parameter_n_params_pooling', 'parameter_prediction_target', 'parameter_problem_type', 'parameter_source_train', 'parameter_use_pseudopatients', 'parameter_val'] but received: ESAOTE_6100/val/f1