Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentiment Analysis baseline #9

Closed
pushkalkatara opened this issue Feb 13, 2022 · 7 comments
Closed

Sentiment Analysis baseline #9

pushkalkatara opened this issue Feb 13, 2022 · 7 comments

Comments

@pushkalkatara
Copy link
Contributor

Hi,

I wanted to reproduce the sentiment analysis baseline through

bash baselines/sentiment/e2e_scripts/ft-w2v2-base-senti.sh manifest/slue-voxceleb save/sentiment/w2v2-base

Fairseq Config log:

[2022-02-14 01:39:15,687][fairseq_cli.train][INFO] - {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': 'json', 'log_fil[37/1798]
 'tensorboard_logdir': 'save/sentiment/w2v2-base/tb_logs', 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_b
f16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_c$
nvert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': $
/root/pushkal/slue-toolkit/slue_toolkit/fairseq_addon', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile':
False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None$
 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank'$
 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'c10d', 'ddp_comm_hook': '$
one', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': True, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadc$
st_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance$
: None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devi$
es': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': True, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scat$
er': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 0, 'skip_invalid_size_inputs_valid_te$
t': False, 'max_tokens': 1400000, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset$
: 'fine-tune', 'valid_subset': 'dev', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1000000, 'validate_interval_updates': 0, 'validate_a$
ter_updates': 2000, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 1400000, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'ge$
_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None$
 'max_epoch': 0, 'max_update': 50000, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': True, 'update_freq': [1], 'lr': [2e-05], 'stop_min_lr': -1.0, 'use_bmuf': False, 'sk$
p_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'rese$
_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 50, 'save_interval_updates': 1000, $
keep_interval_updates': 1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': True, 'no_last_checkp$
ints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'macro_f1', 'maximize_best_checkpoint_metric': True, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_$
hard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_mome$
tum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbes$
': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': $
.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, '$
onstraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': Non$
, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker$
: False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'out$
ut_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'mode$
': {'_name': 'wav2vec2_seq_cls', 'w2v_path': '/root/pushkal/slue-toolkit/save/pretrained/wav2vec_small.pt', 'no_pretrained_weights': False, 'dropout_input': 0.0, 'final_dropout': 0$
0, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.1, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'encoder_embed_dim'$
 768, 'apply_mask': True, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 6$
, 'mask_channel_prob': 0.5, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'freeze_finetune_updates': 2000, 'feature_grad_mult': 0.0$
 'layerdrop': 0.1, 'mask_channel_min_space': 1, 'mask_channel_before': False, 'normalize': '${task.normalize}', 'data': '${task.data}', 'w2v_args': None, 'pool_method': 'self_attn'$
 'classifier_dropout': 0.2}, 'task': {'_name': 'slue_audio_classification', 'data': '/root/pushkal/slue-toolkit/manifest/slue-voxceleb', 'labels': 'sent', 'binarized_dataset': Fals$
, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_sample_size': None, 'min_sample_size': None, 'num_batch_buckets': 0, 'precompute_mask_indices': False, 'in$
erred_w2v_config': None, 'tpu': '${common.tpu}', 'text_compression_level': none, 'label_dir': '???'}, 'criterion': {'_name': 'slue_sequence_classification'}, 'optimizer': {'_name':
'adam', 'adam_betas': '(0.9,0.98)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': False, 'fp16_adam_stats': False, 'tpu': False, 'lr': [2e-05]}, 'lr_scheduler': {'_name':
'tri_stage', 'warmup_steps': 0, 'hold_steps': 0, 'decay_steps': 0, 'phase_ratio': [0.1, 0.0, 0.9], 'init_lr_scale': 0.01, 'final_lr_scale': 0.05, 'max_update': 50000.0, 'lr': [2e-05
]}, 'scoring': None, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq'
: 1, 'ema_fp32': False}, 'job_logging_cfg': {'version': 1, 'formatters': {'simple': {'format': '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'}}, 'handlers': {'console': {'cl
ass': 'logging.StreamHandler', 'formatter': 'simple', 'stream': 'ext://sys.stdout'}, 'file': {'class': 'logging.FileHandler', 'formatter': 'simple', 'filename': 'hydra_train.log'}},
 'root': {'level': 'INFO', 'handlers': ['console', 'file']}, 'disable_existing_loggers': False}}

But facing this error:

Traceback (most recent call last):
  File "/root/miniconda3/envs/slue/bin/fairseq-hydra-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-hydra-train')())
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/hydra_train.py", line 87, in cli_main
    hydra_main()
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/hydra_train.py", line 27, in hydra_main
    _hydra_main(cfg)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/hydra_train.py", line 56, in _hydra_main
    distributed_utils.call_main(cfg, pre_main, **kwargs)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/train.py", line 97, in main
    criterion = task.build_criterion(cfg.criterion)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/tasks/fairseq_task.py", line 352, in build_criterion
    return criterions.build_criterion(cfg, self)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/criterions/__init__.py", line 29, in build_criterion
    return build_criterion_(cfg, task)
  File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/registry.py", line 55, in build_x
    cls = REGISTRY[choice]
KeyError: 'slue_sequence_classification'
@sshon-asapp
Copy link
Collaborator

Hi.

Did you follow the installation here?
https://github.com/asappresearch/slue-toolkit#installation

@pushkalkatara
Copy link
Contributor Author

Hi @sshon-asapp , yes I followed the installation instructions. I guess fairseq is missing as a dependency. I installed fairseq-main and faced this issue.
do i need to use a specific fairseq commit rather than master branch?

@sshon-asapp
Copy link
Collaborator

fairseq is dependency of slue-toolkit and you will need slue-toolkit library installation as well.

Fairseq doesn't have this sequence level criteria, so we defined as below.

"slue_sequence_classification", dataclass=SequenceClassificationCriterionConfig

@pushkalkatara
Copy link
Contributor Author

Yes, but i guess in slue-toolkit installation, fairseq is missing as a dependency.

when i follow the installation instructions, i get this error while running the baseline.

Traceback (most recent call last):
  File "/root/miniconda3/envs/slue/bin/fairseq-hydra-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-hydra-train')())
  File "/root/miniconda3/envs/slue/bin/fairseq-hydra-train", line 22, in importlib_load_entry_point
    for entry_point in distribution(dist_name).entry_points
  File "/root/miniconda3/envs/slue/lib/python3.8/importlib/metadata.py", line 503, in distribution
    return Distribution.from_name(distribution_name)
  File "/root/miniconda3/envs/slue/lib/python3.8/importlib/metadata.py", line 177, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: fairseq

So i installed fairseq but came up with the issue above.

@sshon-asapp
Copy link
Collaborator

seems fairseq is not properly installed. Can you try to remove fairseq and re-install?

@pushkalkatara
Copy link
Contributor Author

@sshon-asapp , it was related to fairseq addon structuring. made a PR to fix it and now it works for me.

@sshon-asapp
Copy link
Collaborator

Resolved by #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants