It is a study project with main goals:
- DS-competition practice | Contest site: https://boosters.pro/championship/data_fusion/overview
- Work with real data
- Find gaps in knowledge and fulfil them with practice experiments and needed theory
- Learn how catboost model works with text. Learn it from this repo: https://github.com/exotol/data_fusion_vtb/
- Find out how we can use unlabeled data in Text MultiClassification Task with DistiliBERT + Agglomerative Clustering. Really interesting idea in repo: https://github.com/v-pozdnyakov/data-fusion-contest
- Here, where using ensemble: https://github.com/antklen/data_fusion_solution
- Useful hints for BERT architecture in classification task: https://github.com/pskliff/vtb-data-fusion