# GQA balancing

Intuitively, the downsampling parameters should be estimatable and the balancing should be reverse engineerable:

```
for each global group: downsample
for each local group: downsample
```

Downsampling for each answer with normalized count
1. for sure, $f_{min} \le \frac{c_{i+1}}{c_i} \le f_{max} \forall a_i, a_{i+1}$
2. $\frac{\sum_{j \le i} c_j}{1 - \sum_{j \le i} c_j} \le b$ for the 'head' of the distribution up to $i$, which is iteratively increased
    * probably we can't successfully guess the $i$, so no chance

Problematic steps afterwards: 
* there is further downsampling occuring, which means that the parameters can't be told for sure.
* We split the dataset into 70% train, 10% validation, 10% test and 10% challenge, making sure that all the questions about a given image appear in the same split.

Purported and infered postconditions:
* this ensures that the relative frequency-based answer ranking stays the same (p. 6)
* we iterate over the answers of that group in decreasing frequency order and reweight P's head up to the current iteration to make it more comparable to the tail size

In [1]:
from pathlib import Path
import json
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

In [2]:
show_big_figures = True

## Loading GQA

It takes quiet some time to execute the next cell (on my machine 17min):

In [3]:
gqa_data_path = Path('ver1.2/')

# balanced questions
gqa_train_balanced_questions = pd.DataFrame(json.load(open(Path(gqa_data_path / ('train_' + 'balanced_' + 'questions.json'))))).T
gqa_val_balanced_questions = pd.DataFrame(json.load(open(Path(gqa_data_path / ('val_' + 'balanced_' + 'questions.json'))))).T
gqa_testdev_balanced_questions = pd.DataFrame(json.load(open(Path(gqa_data_path / ('testdev_' + 'balanced_' + 'questions.json'))))).T

gqa_testdev_balanced_questions_test = json.load(open(Path(gqa_data_path / ('testdev_' + 'balanced_' + 'questions.json'))))

# all questions
gqa_train_all_questions_file_paths = [file for file in Path(gqa_data_path / 'train_all_questions').glob('*.json')]
gqa_train_all_questions = pd.DataFrame(json.load(open(gqa_train_all_questions_file_paths[0]))).T
for file in gqa_train_all_questions_file_paths[1:]:
    gqa_train_all_questions = pd.concat([gqa_train_all_questions, pd.DataFrame(json.load(open(file))).T])
gqa_val_all_questions = pd.DataFrame(json.load(open(Path(gqa_data_path / ('val_' + 'all_' + 'questions.json'))))).T
gqa_testdev_all_questions = pd.DataFrame(json.load(open(Path(gqa_data_path / ('testdev_' + 'all_' + 'questions.json'))))).T

gqa_val_all_questions_test = json.load(open(Path(gqa_data_path / ('val_' + 'all_' + 'questions.json'))))

In [4]:
gqa_train_balanced_questions['datasplit'] = 'train'
gqa_val_balanced_questions['datasplit'] = 'val'
gqa_testdev_balanced_questions['datasplit'] = 'testdev'

gqa_train_all_questions['datasplit'] = 'train'
gqa_val_all_questions['datasplit'] = 'val'
gqa_testdev_all_questions['datasplit'] = 'testdev'

## Filling the DataFrames

### Concatinating and Calculating

In [5]:
gqa_question_stats_balanced = pd.concat([gqa_train_balanced_questions[['answer', 'groups', 'imageId', 'question', 'entailed', 'equivalent']],
                                         gqa_val_balanced_questions[['answer', 'groups', 'imageId', 'question', 'entailed', 'equivalent']],
                                         gqa_testdev_balanced_questions[['answer', 'groups', 'imageId', 'question', 'entailed', 'equivalent']],])
gqa_question_stats_balanced.loc[slice(None),'local_question_group'] = gqa_question_stats_balanced['groups'].apply(lambda x: x['local'])
gqa_question_stats_balanced.loc[slice(None),'local_question_group_clean'] = gqa_question_stats_balanced['groups'].apply(lambda x: x['local'].split('-')[-1].split(',')[0] if type(x['local']) is str else None)
gqa_question_stats_balanced.loc[slice(None),'global_question_group'] = gqa_question_stats_balanced['groups'].apply(lambda x: x['global'])
gqa_question_stats_balanced.head(5)

Unnamed: 0,answer,groups,imageId,question,entailed,equivalent,local_question_group,local_question_group_clean,global_question_group
2930152,yes,"{'global': None, 'local': '06-sky_dark'}",2354786,Is the sky dark?,"[02930160, 02930158, 02930159, 02930154, 02930...",[02930152],06-sky_dark,sky_dark,
7333408,pipe,"{'global': '', 'local': '14-wall_on,s'}",2375429,What is on the white wall?,[],[07333408],"14-wall_on,s",wall_on,
7333405,no,"{'global': None, 'local': '06-pipe_red'}",2375429,Is that pipe red?,[07333406],[07333405],06-pipe_red,pipe_red,
15736264,large,"{'global': 'size', 'local': '10c-clock_size'}",2368326,Is the tall clock small or large?,"[15736259, 15736258, 15736267, 15736253, 15736...",[15736264],10c-clock_size,clock_size,size
111007521,girl,"{'global': 'person', 'local': '14-shirt_wearin...",2331819,Who is wearing a shirt?,[],[111007521],"14-shirt_wearing,s",shirt_wearing,person


In [6]:
gqa_question_stats_all = pd.concat([gqa_train_all_questions[['answer', 'groups', 'imageId', 'question', 'entailed', 'equivalent']],
                                    gqa_val_all_questions[['answer', 'groups', 'imageId', 'question', 'entailed', 'equivalent']],
                                    gqa_testdev_all_questions[['answer', 'groups', 'imageId', 'question', 'entailed', 'equivalent']],])
gqa_question_stats_all.loc[slice(None),'local_question_group'] = gqa_question_stats_all['groups'].apply(lambda x: x['local'])
gqa_question_stats_all.loc[slice(None),'local_question_group_clean'] = gqa_question_stats_all['groups'].apply(lambda x: x['local'].split('-')[-1].split(',')[0] if type(x['local']) is str else None)
gqa_question_stats_all.loc[slice(None),'global_question_group'] = gqa_question_stats_all['groups'].apply(lambda x: x['global'])
gqa_question_stats_all.head(5)

Unnamed: 0,answer,groups,imageId,question,entailed,equivalent,local_question_group,local_question_group_clean,global_question_group
8519876,no,"{'global': None, 'local': '04-chair_tan'}",2317174,Is there a chair that is tan in this scene?,[08519877],[08519876],04-chair_tan,chair_tan,
8519877,no,"{'global': None, 'local': '04-chair_tan'}",2317174,Are there chairs in the photograph that are tan?,[08519876],[08519877],04-chair_tan,chair_tan,
8519870,yes,"{'global': None, 'local': '06-chair_red'}",2317174,Does the chair to the left of the person look ...,"[08519869, 08519868, 08519880, 08519872, 08519...",[08519870],06-chair_red,chair_red,
8519871,red,"{'global': 'color', 'local': '10q-chair_color'}",2317174,Which color is the chair?,"[08519869, 08519868, 08519880, 08519872, 08519...",[08519871],10q-chair_color,chair_color,color
8519872,no,"{'global': None, 'local': '06-chair_beige'}",2317174,Is the chair beige?,"[08519873, 08519874]","[08519872, 08519873]",06-chair_beige,chair_beige,


### helper functions

In [7]:
# calculate the ratio between consecutive answers
def calculate_consec_ratio(group):
    group['consec_ratio'] = group['normalized_count'] / group['normalized_count'].shift(1)
    return group

# add a column for the answer, with the value indicating the position in the group
def calculate_position(group):
    group['answer_position'] = range(1, len(group) + 1)
    return group

# make a new column of the head probability weight, which takes the cumsum of the normalized_count so far for its group
def calculate_cumsum(group):
    group['head_prob_weight'] = group['normalized_count'].cumsum()
    return group

In [8]:
def estimate_parameter(df, level, filter_threshold=1000, conditionals=None, plot=True, answers_filter_threshold=15, dataset_type='balanced'):
    """
    Try to estimate and plot the paramters of the balancing.
    Only use types with many answers.
    
    level: local (answer type) or global (question group)
    filter_threshold: the minimum number of answers for a group to be included OR
    conditionals: a list of conditionals to filter the df if the level is local, overwrites the filter_threshold
    plot: whether to plot the results
    """
    if 'local' in level and not conditionals is not None:
        column = 'local_question_group_clean'
    elif 'local' in level and conditionals:
        column = 'local_question_group'
    elif 'global' in level:
        column = 'global_question_group'
    else:  
        raise ValueError('level must be local or global')

    suffix = ""
    if conditionals is not None and 'local' in level:
        # filter the df to only include the conditionals
        group_counts = df[df['local_question_group'].isin(conditionals)][column].value_counts()
        df_filtered = df[df[column].isin(group_counts.index)]
        suffix = "conditionals"
    else:
        # filter the df to only include groups with many answers
        group_counts = df[column].value_counts()
        group_counts = group_counts[group_counts > filter_threshold]
        df_filtered = df[df[column].isin(group_counts.index)]
        suffix = "filterthreshold"

    # calculate the normalized count of each type
    group_counts = df_filtered.groupby(by=column)['answer'].value_counts(normalize=True)
    group_counts = pd.DataFrame(group_counts)
    group_counts = group_counts.rename(columns={group_counts.columns[0]: 'normalized_count'})

    # calculate the ratio between consecutive answers
    group_counts_ratio = group_counts.groupby(level=0).apply(calculate_consec_ratio).reset_index(level=0, drop=True) # TODO: function not working

    # reconstruct the column
    group_counts_ratio[column] = df_filtered.groupby(by=column)['answer'].value_counts(normalize=True).index.get_level_values(0)

    # add a column indicating the position of the group in the df
    group_counts_ratio['group_position'] = group_counts_ratio.index
    group_counts_ratio['group_position'] = group_counts_ratio['group_position'].apply(lambda x: x[0])
    group_counts_ratio['group_position'] = group_counts_ratio['group_position'].astype('category')
    group_counts_ratio['group_position'] = group_counts_ratio['group_position'].cat.codes

    # add a column with a positional index within the group
    group_counts_ratio['answer_position'] = group_counts_ratio.groupby(level=0).cumcount()

    if plot:
        # filter the first x answers
        group_counts_ratio_filtered = group_counts_ratio[group_counts_ratio['answer_position'] <= answers_filter_threshold]

        # scatter plot of the consecutive answer ratio, with the group position as the x-axis and the answer position as the color
        fig = px.scatter(group_counts_ratio_filtered, x=column, y='consec_ratio', color='answer_position', title=f'Consecutive answer ratio for {dataset_type} questions with {suffix}')
        fig.write_image(f'results/BAL_{dataset_type}_{level}_{suffix}_consecutive_ratio.png', width=1000, height=800)
        if show_big_figures:
            fig.show()

    # add a column of the cumsum of the probablities so far for each group
    group_counts_ratio = group_counts_ratio.groupby(level=0).apply(calculate_cumsum).reset_index(level=0, drop=True)

    # add a colum of the head2tail probability weight, which takes the cumsum of the normalized_count so far for its group
    group_counts_ratio['head2tail'] = group_counts_ratio['head_prob_weight'].apply(lambda x: x / (1 - x) if x != 1. else np.nan)

    if plot:
        # filter the first x answers
        group_counts_ratio_filtered = group_counts_ratio[group_counts_ratio['answer_position'] <= answers_filter_threshold]

        # scatter plot of the head2tail, answer_type_clean as the x-axis and with the position in group as the color
        fig = px.scatter(group_counts_ratio_filtered, x=column, y='head2tail', color='answer_position', title=f'Head count weight to tail for {dataset_type} questions with {suffix}') # group_counts_ratio_filtered

        # the value easily explodes, so the y-axis is truncated to [0, 10]
        fig.update_layout(yaxis_range=[0, 10])
        fig.write_image(f'results/BAL_{dataset_type}_{level}_{suffix}_head2tail.png', width=1000, height=800)
        if show_big_figures:
            fig.show()
    
    return group_counts_ratio

## Estimating the Parameters (ReadMe 2.)

Taking only groups with at least 1000 answers, through construction of the algo, especially for the `local question group` (`answer type`), the parameters of the downsampling should be directly spotable.

In [48]:
frequent_conditionals = gqa_question_stats_all['local_question_group'].value_counts().head(50)
frequent_conditionals = list(frequent_conditionals.index)
frequent_conditionals

['03-fence',
 '03-car',
 '09existOr-car_fence',
 '03-glasses',
 '03-helmet',
 '03-chair',
 '02q-place',
 '09existOr-fence_helmet',
 '03-bag',
 '03-window',
 '02c-location',
 '03-bus',
 '01-location_indoors',
 '01-location_outdoors',
 '09existOr-fence_glasses',
 '03-clock',
 '13-man_woman',
 '13-woman_man',
 '03-mirror',
 '11q-furniture',
 '09existOr-door_window',
 '06-sky_blue',
 '03-boy',
 '09existAnd-door_window',
 '04-grass_brown',
 '11q-animal',
 '03-door',
 '03-woman',
 '09existOr-bus_car',
 '03-grass',
 '09existOr-bus_fence',
 '13-man_bag',
 '03-lamp',
 '03-picture',
 '13-man_helmet',
 '04-grass_green',
 '03-plate',
 '10q-shirt_color',
 '13-man_chair',
 '03-table',
 '09existOr-boy_fence',
 '03-umbrella',
 '04-table_wood',
 '02c-place',
 '06-sky_white',
 '10q-sky_color',
 '03-train',
 '02q-weather',
 '10c-sky_color',
 '13-man_fence']

In [31]:
gqa_question_stats_balanced[gqa_question_stats_balanced['local_question_group'] == '11q-furniture']['answer'].value_counts()


answer
table                   863
bed                     677
desk                    468
chair                   463
couch                   328
shelf                   305
cabinet                 223
sofa                    128
cabinets                112
cupboard                 42
shelves                  40
drawer                   34
bookshelf                32
chairs                   30
computer desk            26
closet                   22
dresser                  18
bookcase                 17
tv stand                 16
coffee table             15
dining table             15
entertainment center     14
drawers                  10
tables                    8
medicine cabinet          6
armchair                  6
nightstand                5
beds                      4
office chair              4
side table                2
cupboards                 2
bar stools                2
wardrobe                  2
bookshelves               2
Name: count, dtype: int64

In [49]:
# get the balancing based on the conditionals
local_group_counts_ratio_balanced_conditionals = estimate_parameter(gqa_question_stats_balanced, 'local', conditionals=frequent_conditionals, dataset_type='balanced')

In [50]:
local_group_counts_ratio_all_conditionals = estimate_parameter(gqa_question_stats_all, 'local', conditionals=frequent_conditionals, dataset_type='all')

In [23]:
local_group_counts_ratio_balanced = estimate_parameter(gqa_question_stats_balanced, 'local', answers_filter_threshold=20, dataset_type='balanced')

There might be sth like a bound at $0.5$ for the consecutive ratio, but for many it does not work... for the head2tail I can see nothing like a bound.

In [24]:
local_group_counts_ratio_all = estimate_parameter(gqa_question_stats_all, 'local', dataset_type='all')

## Relative Frequency Based Ranking (ReadMe 3.) - conditionals

As the downsampling seems to be based on the conditionals, get the statistics there:

In [25]:
local_group_counts_ratio_balanced_conditionals = estimate_parameter(gqa_question_stats_balanced, 'local', conditionals=frequent_conditionals,
                                                                    dataset_type='balanced', plot=False)
local_group_counts_ratio_all_conditionals = estimate_parameter(gqa_question_stats_all, 'local', conditionals=frequent_conditionals,
                                                               dataset_type='all', plot=False)

local_group_counts_ratio_everything_conditionals = local_group_counts_ratio_balanced_conditionals.join(local_group_counts_ratio_all_conditionals,
                                                                                                       rsuffix='_all', lsuffix='_balanced')

In [26]:
local_group_counts_ratio_everything_conditionals[['normalized_count_balanced', 'answer_position_balanced',
                                                              'normalized_count_all', 'answer_position_all']].sort_values(by='answer_position_all', ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,normalized_count_balanced,answer_position_balanced,normalized_count_all,answer_position_all
local_question_group,answer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
01-location_indoors,no,0.504689,0,0.808118,0
03-bus,no,0.550633,0,0.862546,0
03-chair,no,0.508557,0,0.786122,0
03-clock,no,0.502825,0,0.851224,0
03-door,yes,0.478903,1,0.563351,0
...,...,...,...,...,...
02q-place,coffee shop,0.000448,83,0.000184,85
02q-place,cemetery,0.000224,87,0.000123,86
02q-place,pub,0.000336,85,0.000123,87
02q-place,lounge,0.000224,88,0.000082,88


In [34]:
local_group_counts_ratio_everything_conditionals[(local_group_counts_ratio_everything_conditionals['answer_position_all'] <20) & 
                                                 (local_group_counts_ratio_everything_conditionals['answer_position_balanced'] != local_group_counts_ratio_everything_conditionals['answer_position_all'])][['normalized_count_balanced', 'answer_position_balanced', 'normalized_count_all', 'answer_position_all']]

Unnamed: 0_level_0,Unnamed: 1_level_0,normalized_count_balanced,answer_position_balanced,normalized_count_all,answer_position_all
local_question_group,answer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
02q-place,street,0.090471,1,0.105155,2
02q-place,ocean,0.066984,2,0.054750,5
02q-place,road,0.051289,3,0.122292,1
02q-place,park,0.034978,5,0.040996,6
02q-place,pavement,0.032511,6,0.024946,7
...,...,...,...,...,...
13-woman_man,right,0.261523,1,0.072132,2
13-woman_man,no,0.258517,2,0.364671,1
13-woman_man,yes,0.190381,3,0.489279,0
13-woman_man,behind,0.008016,4,0.001896,5


In [40]:
# which is the worst group? weight the difference by 3 if the answer_position_balanced is less then 10, 2 if less than 30, 1 if less than 30
local_group_counts_ratio_everything_conditionals['weight'] = local_group_counts_ratio_everything_conditionals['answer_position_all'].apply(lambda x: 3 if x < 10 else (2 if x < 30 else 1))
local_group_counts_ratio_everything_conditionals['difference_weighted'] = local_group_counts_ratio_everything_conditionals['weight'] * abs(local_group_counts_ratio_everything_conditionals['answer_position_balanced'] - local_group_counts_ratio_everything_conditionals['answer_position_all'])

local_group_counts_ratio_everything_conditionals

Unnamed: 0_level_0,Unnamed: 1_level_0,normalized_count_balanced,consec_ratio_balanced,local_question_group_balanced,group_position_balanced,answer_position_balanced,head_prob_weight_balanced,head2tail_balanced,normalized_count_all,consec_ratio_all,local_question_group_all,group_position_all,answer_position_all,head_prob_weight_all,head2tail_all,weight,difference_weighted
local_question_group,answer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
01-location_indoors,no,0.504689,,01-location_indoors,0,0,0.504689,1.018933e+00,0.808118,,01-location_indoors,0,0,0.808118,4.211541,3,0
01-location_indoors,yes,0.495311,0.981419,01-location_indoors,0,1,1.000000,,0.191882,0.237443,01-location_indoors,0,1,1.000000,,3,0
01-location_outdoors,yes,0.510836,,01-location_outdoors,1,0,0.510836,1.044304e+00,0.806546,,01-location_outdoors,1,0,0.806546,4.169193,3,0
01-location_outdoors,no,0.489164,0.957576,01-location_outdoors,1,1,1.000000,,0.193454,0.239855,01-location_outdoors,1,1,1.000000,,3,0
02c-location,outdoors,0.554244,,02c-location,2,0,0.554244,1.243379e+00,0.806086,,02c-location,2,0,0.806086,4.156926,3,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13-woman_man,right,0.261523,0.952555,13-woman_man,27,1,0.536072,1.155508e+00,0.072132,0.197800,13-woman_man,29,2,0.926081,12.528367,3,3
13-woman_man,no,0.258517,0.988506,13-woman_man,27,2,0.794589,3.868293e+00,0.364671,0.745323,13-woman_man,29,1,0.853949,5.846941,3,3
13-woman_man,yes,0.190381,0.736434,13-woman_man,27,3,0.984970,6.553333e+01,0.489279,,13-woman_man,29,0,0.489279,0.958015,3,9
13-woman_man,behind,0.008016,0.042105,13-woman_man,27,4,0.992986,1.415714e+02,0.001896,0.912281,13-woman_man,29,5,1.000000,,3,3


In [42]:
# highest difference_weighted groupedby answer_type_clean
local_group_counts_ratio_everything_conditionals.groupby('local_question_group')['difference_weighted'].sum().sort_values(ascending=False)

local_question_group
02q-place                  551
11q-animal                 179
11q-furniture               50
13-woman_man                30
13-man_woman                30
03-bag                       6
09existOr-fence_helmet       6
09existOr-fence_glasses      6
03-door                      6
03-grass                     6
09existOr-car_fence          6
09existOr-bus_car            6
09existAnd-door_window       6
06-sky_blue                  6
04-grass_brown               0
09existOr-door_window        0
01-location_indoors          0
03-woman                     0
01-location_outdoors         0
03-mirror                    0
03-helmet                    0
03-fence                     0
03-clock                     0
03-chair                     0
03-bus                       0
03-boy                       0
02c-location                 0
03-window                    0
Name: difference_weighted, dtype: int64

### How it was actually sampled

In [43]:
column = '02q-place'

In [44]:
# get all the questions that were excluded from all questions when it was downsampled to be the balanced set
balanced_filtered_by_column = gqa_question_stats_balanced[gqa_question_stats_balanced['local_question_group'] == column]
balanced_qids_filtered_by_column = set(balanced_filtered_by_column.index)
print(f"The number of questions of column {column} is {len(balanced_qids_filtered_by_column)}")

all_filtered_by_column = gqa_question_stats_all[gqa_question_stats_all['local_question_group'] == column]
all_qids_filtered_by_column = set(all_filtered_by_column.index)
print(f"The number of questions of column {column} is {len(all_qids_filtered_by_column)}")

# get the qids that were excluded from the balanced set
excluded_qids = all_qids_filtered_by_column.difference(balanced_qids_filtered_by_column)

# print the questions excluded from all
all_filtered_by_column.loc[list(excluded_qids)]

The number of questions of column 02q-place is 17840
The number of questions of column 02q-place is 48785


Unnamed: 0,answer,groups,imageId,question,entailed,equivalent,local_question_group,local_question_group_clean,global_question_group
04594688,road,"{'global': 'road', 'local': '02q-place'}",2344715,Which place is it?,"[04594686, 04594687]",[04594688],02q-place,place,road
16448988,field,"{'global': 'place', 'local': '02q-place'}",2345015,Which place is it?,"[16448990, 16448991, 16448992, 16448989, 16448...",[16448988],02q-place,place,place
17573667,road,"{'global': 'road', 'local': '02q-place'}",2392625,What kind of place is it?,"[17573666, 17573668]","[17573667, 17573666]",02q-place,place,road
14936132,field,"{'global': 'place', 'local': '02q-place'}",2344376,What place is this?,"[14936133, 14936134, 14936135, 14936136, 14936...",[14936132],02q-place,place,place
01848024,field,"{'global': 'place', 'local': '02q-place'}",2398383,What place is pictured?,"[01848025, 01848026, 01848027, 01848028]",[01848024],02q-place,place,place
...,...,...,...,...,...,...,...,...,...
0777212,road,"{'global': 'road', 'local': '02q-place'}",2395208,What place is it?,"[0777213, 0777211]","[0777212, 0777211]",02q-place,place,road
0067022,road,"{'global': 'road', 'local': '02q-place'}",2369970,What is the photo showing?,[0067023],[0067022],02q-place,place,road
091017017,store,"{'global': 'place', 'local': '02q-place'}",2347055,Which place is it?,"[091017020, 091017015, 091017016, 091017019, 0...",[091017017],02q-place,place,place
18899534,street,"{'global': 'place', 'local': '02q-place'}",2367938,What place is pictured?,"[18899533, 18899535]","[18899533, 18899534]",02q-place,place,place


In [45]:
# plot the answer distribution change with a plotly stacked histogram, counting the number of ansers in balanced_filtered_by_column and all_filtered_by_column
fig = go.Figure()
fig.add_trace(go.Histogram(x=all_filtered_by_column['answer'], name='all'))
fig.add_trace(go.Histogram(x=balanced_filtered_by_column['answer'], name='balanced'))

order = list(all_filtered_by_column['answer'].value_counts().index)
fig.update_xaxes(categoryorder='array', categoryarray= order)
fig.update_layout(barmode='overlay', title=f'Answer count distribution for {column} questions')

fig.write_image(f'results/BAL_{column}_answer_distribution.png', width=1500, height=600)
if show_big_figures:
    fig.show()

In [46]:
balanced_filtered_by_column_vc = balanced_filtered_by_column['answer'].value_counts()
all_filtered_by_column_vc = all_filtered_by_column['answer'].value_counts()

balanced_filtered_by_column_vc = balanced_filtered_by_column_vc / all_filtered_by_column_vc

# set all values of all_filtered_by_column_vc to 1
all_filtered_by_column_vc = all_filtered_by_column_vc / all_filtered_by_column_vc

In [47]:
# plot a plotly stacked bar chart with balanced_filtered_by_column_vc, all_filtered_by_column_vc
fig = go.Figure()
fig.add_trace(go.Bar(x=all_filtered_by_column_vc.index, y=all_filtered_by_column_vc.values, name='all'))
fig.add_trace(go.Bar(x=balanced_filtered_by_column_vc.index, y=balanced_filtered_by_column_vc.values, name='balanced'))

fig.update_xaxes(categoryorder='array', categoryarray= order)
fig.update_layout(barmode='overlay', title=f'Answer ratio distribution for {column} questions')

fig.write_image(f'results/BAL_{column}_answer_distribution_ratio.png', width=2000, height=600)
if show_big_figures:
    fig.show()

## Relative Frequency Based Ranking (ReadMe 3.) - OLD VERSION

This version of the code is based on the assumption that the downsampling was performed on a cleaned version of the local groups, but that seems to be wrong.

In [11]:
local_group_counts_ratio_balanced = estimate_parameter(gqa_question_stats_balanced, 'local', plot=False,  answers_filter_threshold=np.inf)
local_group_counts_ratio_all = estimate_parameter(gqa_question_stats_all, 'local', plot=False,  answers_filter_threshold=np.inf)

local_group_counts_ratio_everything = local_group_counts_ratio_balanced.join(local_group_counts_ratio_all, rsuffix='_all', lsuffix='_balanced')

In [12]:
local_group_counts_ratio_everything.loc['woman_wearing'].head(30)[['normalized_count_balanced', 'answer_position_balanced',
                                                              'normalized_count_all', 'answer_position_all']].sort_values(by='answer_position_all', ascending=True)

Unnamed: 0_level_0,normalized_count_balanced,answer_position_balanced,normalized_count_all,answer_position_all
answer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
shirt,0.089286,1,0.226518,0
dress,0.101562,0,0.081848,1
jacket,0.077009,2,0.065992,2
glasses,0.058036,3,0.054213,3
pants,0.053571,4,0.045757,4
hat,0.053292,5,0.040018,5
jeans,0.049665,7,0.039112,6
coat,0.050223,6,0.038055,7
skirt,0.041016,8,0.03141,8
sweater,0.034319,9,0.028239,9


This shows that there is quiet a change in the relative frequency-based ranking of the answers, at least for this `local question group`.

In [13]:
# get some rows in which answer_position != answer_position_all and where the original answer position is not too high
local_group_counts_ratio_everything[(local_group_counts_ratio_everything['answer_position_all'] <20) & (local_group_counts_ratio_everything['answer_position_balanced'] != local_group_counts_ratio_everything['answer_position_all'])][['normalized_count_balanced', 'answer_position_balanced', 'normalized_count_all', 'answer_position_all']].sample(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,normalized_count_balanced,answer_position_balanced,normalized_count_all,answer_position_all
local_question_group_clean,answer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
shelf_on,container,0.006415,34,0.013679,10
street_on,train,0.005333,22,0.003619,18
cap_wearing,dog,0.00605,17,0.005577,12
vehicle,wagon,0.018491,11,0.004189,13
shirt_wearing,surfer,0.005503,13,0.002058,17
pants_color,light brown,0.001513,18,0.001325,17
man_to the right of,boat,0.001742,85,0.013615,14
sign_on,letter,0.053488,2,0.044074,4
hat_color,maroon,0.0039,16,0.002709,17
animal,kitten,0.01024,12,0.004125,11


There are nearly 800 occurences of that, I just sample some.

In [14]:
# which is the worst group? weight the difference by 3 if the answer_position_balanced is less then 10, 2 if less than 30, 1 if less than 30
local_group_counts_ratio_everything['weight'] = local_group_counts_ratio_everything['answer_position_all'].apply(lambda x: 3 if x < 10 else (2 if x < 30 else 1))
local_group_counts_ratio_everything['difference_weighted'] = local_group_counts_ratio_everything['weight'] * abs(local_group_counts_ratio_everything['answer_position_balanced'] - local_group_counts_ratio_everything['answer_position_all'])

local_group_counts_ratio_everything

Unnamed: 0_level_0,Unnamed: 1_level_0,normalized_count_balanced,consec_ratio_balanced,local_question_group_clean_balanced,group_position_balanced,answer_position_balanced,head_prob_weight_balanced,head2tail_balanced,normalized_count_all,consec_ratio_all,local_question_group_clean_all,group_position_all,answer_position_all,head_prob_weight_all,head2tail_all,weight,difference_weighted
local_question_group_clean,answer,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
allanimals,yes,0.501440,,allanimals,0,0,0.501440,1.005776e+00,0.501905,,allanimals,16,0,0.501905,1.007649,3,0
allanimals,no,0.498560,0.994257,allanimals,0,1,1.000000,,0.498095,0.992409,allanimals,16,1,1.000000,,3,0
animal,cat,0.243498,,animal,1,0,0.243498,3.218733e-01,0.197914,,animal,19,0,0.197914,0.246749,3,0
animal,dog,0.179603,0.737595,animal,1,1,0.423101,7.334043e-01,0.161665,0.816842,animal,19,1,0.359579,0.561472,3,0
animal,horse,0.123080,0.685291,animal,1,2,0.546181,1.203520e+00,0.117568,0.727235,animal,19,2,0.477147,0.912583,3,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
woman_wearing,shirts,0.000279,1.000000,woman_wearing,76,96,0.998884,8.950000e+02,0.000151,1.000000,woman_wearing,2728,100,0.999698,3310.000000,1,4
woman_wearing,snowsuit,0.000279,1.000000,woman_wearing,76,97,0.999163,1.193667e+03,0.000151,1.000000,woman_wearing,2728,102,1.000000,,1,5
woman_wearing,suitcase,0.000279,1.000000,woman_wearing,76,98,0.999442,1.791000e+03,0.000453,1.000000,woman_wearing,2728,87,0.996678,300.000000,1,11
woman_wearing,table,0.000279,1.000000,woman_wearing,76,99,0.999721,3.583000e+03,0.000604,1.000000,woman_wearing,2728,77,0.991996,123.943396,1,22


In [15]:
# highest difference_weighted groupedby answer_type_clean
local_group_counts_ratio_everything.groupby('local_question_group_clean')['difference_weighted'].sum().sort_values(ascending=False)

local_question_group_clean
table_on               17571
man_to the right of     9278
man_in front of         8649
man_to the left of      7854
man_holding             6841
                       ...  
door_window                0
chair_hposition            0
car_hposition              0
bag_hposition              0
location_outdoors          0
Name: difference_weighted, Length: 77, dtype: int64

In [16]:
# make a histogram of the above with plotly
fig = px.histogram(local_group_counts_ratio_everything, x='difference_weighted', title='Ranking difference weighted by group')
fig.write_image('results/BAL_local_ranking_difference_weighted_by_group.png', width=1000, height=600)
if show_big_figures:
    fig.show()

For most the answers it's not massive, but for quiet some it is.

### Worst Cases

In [17]:
local_group_counts_ratio_everything.loc['table_on'][['answer_position_balanced', 'answer_position_all', 'normalized_count_balanced', 'normalized_count_all']]

Unnamed: 0_level_0,answer_position_balanced,answer_position_all,normalized_count_balanced,normalized_count_all
answer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
laptop,0,0,0.036761,0.063468
glass,1,4,0.035271,0.027569
bowl,2,8,0.025087,0.018184
lamp,3,9,0.024839,0.017832
cell phone,4,5,0.023845,0.025575
...,...,...,...,...
vitamins,443,454,0.000248,0.000117
walkway,444,455,0.000248,0.000117
wall,445,257,0.000248,0.000469
water glass,446,456,0.000248,0.000117


In [18]:
local_group_counts_ratio_everything.loc['man_to the right of'][['answer_position_balanced', 'answer_position_all',
                                                                'normalized_count_balanced', 'normalized_count_all']]

Unnamed: 0_level_0,answer_position_balanced,answer_position_all,normalized_count_balanced,normalized_count_all
answer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
chair,0,1,0.099303,0.075494
car,1,0,0.069686,0.131654
horse,2,2,0.055168,0.043159
dog,3,5,0.044135,0.034173
bus,4,3,0.032520,0.039415
...,...,...,...,...
trash bag,226,149,0.000581,0.000613
trucks,227,135,0.000581,0.000749
wardrobe,228,255,0.000581,0.000204
wii,229,89,0.000581,0.001430


#### How were the answers actually sampled?

As we have both `all` and `balanced` we should be able to see what was done (modulo a hopefully proportional thing which is the dataset split).

In [19]:
column = 'table_on'

In [20]:
# get all the questions that were excluded from all questions when it was downsampled to be the balanced set
balanced_filtered_by_column = gqa_question_stats_balanced[gqa_question_stats_balanced['local_question_group_clean'] == column]
balanced_qids_filtered_by_column = set(balanced_filtered_by_column.index)
print(f"The number of questions of column {column} is {len(balanced_qids_filtered_by_column)}")

all_filtered_by_column = gqa_question_stats_all[gqa_question_stats_all['local_question_group_clean'] == column]
all_qids_filtered_by_column = set(all_filtered_by_column.index)
print(f"The number of questions of column {column} is {len(all_qids_filtered_by_column)}")

# get the qids that were excluded from the balanced set
excluded_qids = all_qids_filtered_by_column.difference(balanced_qids_filtered_by_column)

# print the questions excluded from all
all_filtered_by_column.loc[list(excluded_qids)]

The number of questions of column table_on is 4026
The number of questions of column table_on is 8524


Unnamed: 0,answer,groups,imageId,question,entailed,equivalent,local_question_group,local_question_group_clean,global_question_group
13910835,pizza,"{'global': 'fast food', 'local': '15-table_on,s'}",2403540,What is the food on the table called?,"[13910702, 13910842, 13910834, 13910836]",[13910835],"15-table_on,s",table_on,fast food
03495543,phone,"{'global': 'device', 'local': '15-table_on,s'}",2410346,What is the device that is on the black table?,"[03495549, 03495531, 03495542]",[03495543],"15-table_on,s",table_on,device
05101484,speaker,"{'global': 'device', 'local': '15-table_on,s'}",2412614,How is the device on the table called?,"[05101496, 05101495, 05101448, 05101485]","[05101484, 05101485]","15-table_on,s",table_on,device
13925177,napkin,"{'global': 'thing', 'local': '14-table_on,s'}",2378920,What is on the table that looks black and brown?,"[13925185, 13925184, 13925178]",[13925177],"14-table_on,s",table_on,thing
15350766,orange,"{'global': 'color', 'local': '15-table_on,s'}",2362214,What is the fruit that is on the table made of...,"[15350731, 15350775, 15350767, 15350765, 15350...","[15350767, 15350766]","15-table_on,s",table_on,color
...,...,...,...,...,...,...,...,...,...
18946913,laptop,"{'global': 'device', 'local': '15-table_on,s'}",2355792,What device is on the table?,"[18946914, 18946911, 18946888]",[18946913],"15-table_on,s",table_on,device
02585774,cake,"{'global': 'dessert', 'local': '15-table_on,s'}",2353242,What dessert is on the table?,"[02585781, 02585773, 02585677]",[02585774],"15-table_on,s",table_on,dessert
18284033,kettle,"{'global': 'cooking utensil', 'local': '14-tab...",2387027,What is on the table made of wood?,"[18284042, 18284035, 18284034, 18284036, 18284...",[18284033],"14-table_on,s",table_on,cooking utensil
10608776,cell phone,"{'global': 'device', 'local': '15-table_on,s'}",2404269,What is the device on the table?,"[10608765, 10608777, 10608767, 10608766, 10608...",[10608776],"15-table_on,s",table_on,device


In [21]:
# plot the answer distribution change with a plotly stacked histogram, counting the number of ansers in balanced_filtered_by_column and all_filtered_by_column
fig = go.Figure()
fig.add_trace(go.Histogram(x=all_filtered_by_column['answer'], name='all'))
fig.add_trace(go.Histogram(x=balanced_filtered_by_column['answer'], name='balanced'))

order = list(all_filtered_by_column['answer'].value_counts().index)
fig.update_xaxes(categoryorder='array', categoryarray= order)
fig.update_layout(barmode='overlay', title=f'Answer count distribution for {column} questions')

fig.write_image('results/BAL_table_on_answer_distribution.png', width=1500, height=600)
if show_big_figures:
    fig.show()

In [22]:
balanced_filtered_by_column_vc = balanced_filtered_by_column['answer'].value_counts()
all_filtered_by_column_vc = all_filtered_by_column['answer'].value_counts()

balanced_filtered_by_column_vc = balanced_filtered_by_column_vc / all_filtered_by_column_vc

# set all values of all_filtered_by_column_vc to 1
all_filtered_by_column_vc = all_filtered_by_column_vc / all_filtered_by_column_vc

In [23]:
# plot a plotly stacked bar chart with balanced_filtered_by_column_vc, all_filtered_by_column_vc
fig = go.Figure()
fig.add_trace(go.Bar(x=all_filtered_by_column_vc.index, y=all_filtered_by_column_vc.values, name='all'))
fig.add_trace(go.Bar(x=balanced_filtered_by_column_vc.index, y=balanced_filtered_by_column_vc.values, name='balanced'))

fig.update_xaxes(categoryorder='array', categoryarray= order)
fig.update_layout(barmode='overlay', title=f'Answer ratio distribution for {column} questions')

fig.write_image('results/BAL_table_on_answer_distribution_ratio.png', width=2000, height=600)
if show_big_figures:
    fig.show()

* `cake` is strongly downsamples relative to `glass`
* some answers are downsampled to $0$, for `boat`, `bird`?