Evaluation metrics per class #42

mariiak2021 · 2023-04-13T12:24:38Z

Can you please tell if there are existing any evaluation metrics per class or type of objects (pickable/openable..)?

Best regards,
Mariia

Lucaweihs · 2023-04-18T23:40:40Z

Hi @mariiak2021,

Unfortunately there are no per-class evaluation metrics currently defined but, if you're interested, these would be very easy to add. You'd want to update the metrics function to return such values. E.g. in pseudo-code you could do something like:

for object_type in object_types_whose_position_was_different_at_the_start_of_the_unshuffle_phase:
    metrics[f"{object_type}_fixed"] = end_energy_of_object == 0.0

Let me know if you'd need any help getting this working.

mariiak2021 · 2023-04-20T14:06:42Z

Hi @Lucaweihs thanx a lot for your reply!

I might need your help if possible. :) Will it look like this? To compute the end energy of each object which was misplaced:

def metrics(self) -> Dict[str, Any]:
        if not self.is_done():
            return {}
        env = self.unshuffle_env
        ips, gps, cps = env.poses
        for gp, ip, cp in zip(gps, ips, cps):
            gp_vs_ip = env.compare_poses(gp, ip)
            if (gp_vs_ip["iou"] != 1.0 and gp_vs_ip["iou"] is not None) or (gp_vs_ip["openness_diff"] is not None and gp_vs_ip["openness_diff"] != 0.0):
                object_class = gp["name"].split("_")[0]
                end_energy_of_object = env.pose_difference_energy(gp, ip)
                metrics = {f"{object_class}/end_energy" : end_energy_of_object == 0.0}
        return metrics

Please correct if it's a wrong way. Also which metrics can I reuse per class except for end energy?

best,
Mariia

mariiak2021 · 2023-05-03T06:19:52Z

Hi @Lucaweihs sorry for disturbing, but did you have a time to look into my question? :) Thank you!

Lucaweihs · 2023-05-08T23:30:55Z

Hi @mariiak2021 ,

Just to double check, the number that you want reported per class is equal to the a = average energy of an object of this class conditional on it ending misplaced at the end of an episode right? Note that this means you'll never be recording cases where the energy is 0 so if you're trying to measure something like b = the average energy of an object at the end of the episode assuming that it _started_ misplaced then you'd need to compute things a bit differently. If you want to compute a, then this is how I'd do it:

    def metrics(self) -> Dict[str, Any]:
        metrics = ...  # Old UnshuffleTask metrics code 

        # New, per-class, metrics code
        key_to_count = defaultdict(lambda: 0)
        for object_type, end_energy in zip([gp["type"] for gp in gps], end_energies):
            if end_energy > 0.0:
                key = f"end_energy__{object_type}"

                # Undo the running average across object type and recompute it with the new value/count
                metrics[key] = metrics.get(key, 0.0) * key_to_count[key] + end_energy
                key_to_count[key] += 1
                metrics[key] /= key_to_count[key]

        return metrics

Note that I'm reusing the gps and end_energies variables defined in the original UnshuffleTask.metrics code. A few ways this differs from the code you wrote:

If you want to compare the status of objects at the end of the episode vs the goal positions then you'll need to use cps/gps and not ips/gps (ips=initial poses, gps = goal poses, cps = current poses), note that end_energies equals the pairwise comparisons between cps and gps.
Multiple objects of the same category might be misplaced in the same scene, I've added some code to average in this case.
I saw you were saving end_energy_of_object == 0.0 and not just end_energy_of_object. The boolean end_energy_of_object == 0.0 would be True if the object poses are equal and False otherwise so this would be the same as recording the per-object-class-fixed rate. This seems like something interesting to measure but is not the same as the energy. If you want to measure the, per-object-class-fixed-rate, then I would recommend doing something like:

First compare gps and ips to figure out which objects had different initial/goal states (i.e. the objects that should be rearranged).
Next grab the energies corresponding to these "should be rearranged" objects at the end of the episode (i.e. comparing cps to gps at this stage).
For each such object, record a metric like f"fixed__{object_type}": energy == 0.0.

Let me know if that helps!

Lucaweihs self-assigned this May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation metrics per class #42

Evaluation metrics per class #42

mariiak2021 commented Apr 13, 2023

Lucaweihs commented Apr 18, 2023

mariiak2021 commented Apr 20, 2023 •

edited

mariiak2021 commented May 3, 2023

Lucaweihs commented May 8, 2023

Evaluation metrics per class #42

Evaluation metrics per class #42

Comments

mariiak2021 commented Apr 13, 2023

Lucaweihs commented Apr 18, 2023

mariiak2021 commented Apr 20, 2023 • edited

mariiak2021 commented May 3, 2023

Lucaweihs commented May 8, 2023

mariiak2021 commented Apr 20, 2023 •

edited