Machine perceptual quality evaluation

* Images
  * Dataset: [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k)
  * Model: [Distilled data-efficient Image Transformer (DeiT)](https://huggingface.co/facebook/deit-small-distilled-patch16-224)
  * Metric: Image classification accuracy
  * Compression:
    * JPEG Q=5/100
    * HIFIC
    * TFCI
* Audio
  * Dataset: [Common Voice Corpus 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0)
  * Model: [Whisper](https://huggingface.co/openai/whisper-small)
  * Metric: Speech recognition word error rate
  * Compression:
    * MP3 kbps
    * Descript
    * Encodec 

In [1]:
from datasets import load_dataset
from evaluate import evaluator
from transformers import pipeline

In [2]:
data = load_dataset("imagenet-1k", split="validation[:100]")

pipe = pipeline(
    task="image-classification",
    model="facebook/deit-small-distilled-patch16-224"
)

task_evaluator = evaluator("image-classification")
eval_results = task_evaluator.compute(
    model_or_pipeline=pipe,
    data=data,
    metric="accuracy",
    label_mapping=pipe.model.config.label2id
)
eval_results

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


{'accuracy': 0.81,
 'total_time_in_seconds': 1.8894966077059507,
 'samples_per_second': 52.924148999352056,
 'latency_in_seconds': 0.018894966077059507}

In [3]:
data = load_dataset("mozilla-foundation/common_voice_11_0", "en", split="validation[:40]")

pipe = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-small",
)

task_evaluator = evaluator("automatic-speech-recognition")
task_evaluator.PIPELINE_KWARGS.pop('truncation', None)

eval_results = task_evaluator.compute(
    model_or_pipeline=pipe,
    data=data,
    input_column="audio",
    label_column="sentence",
    metric="wer",
)
eval_results

{'wer': 0.24324324324324326,
 'total_time_in_seconds': 36.000162658281624,
 'samples_per_second': 1.11110609081646,
 'latency_in_seconds': 0.9000040664570407}