Skip to content

bakarov/bakarov.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 

Repository files navigation


Education

  • PhD Student, Computational Linguistics, GPA 10/10, The National Research University Higher School of Economics, Moscow, Russia.
  • Master, Computational Linguistics, GPA 9/10 (2017 - 2019), The National Research University Higher School of Economics, Moscow, Russia.
  • Bachelor, Computer Science, GPA 4.3/5.0 (2013 - 2017), Novosibirsk State University, Novosibirsk, Russia.

Research Experience

Senior Research Engineer (August 2024 - present), Anecdote AI, Montréal, Canada.

  • Created LLM pipelines and LLMOps infrastructure for automatic CX & Product insights extraction;
  • Fine-tuned LLMs; worked with agentic workflows.

Senior Research Engineer (August 2020 - February, 2023), Logiciel Behavox Inc, Montréal, Canada.

  • created a core text classification engine for ``Behavox Quantum'', a market-leading compliance solution;
  • designed and implemented the experiment tracking and model deployment workflows (ETL, CI/CD/CT, QAA);.

Senior Research Engineer (September 2018 - July 2020), Huawei Technologies, St. Petersburg, Russia.

  • created a Natural Language Understanding component for Huawei voice assistant "Celia";
  • worked on R&D on Zero-Shot Learning and Data Augmentation; presented the results in industrial conferences.

Research Engineer (October 2017 - August 2018), Federal Research Center "Computer Science and Control" of Russian Academy of Sciences, Moscow, Russia.

  • created a multilingual Plagiarism Detection system for Russia's largest academic electronic library "RuCont";
  • worked on R&D on Multilingual NLP; published the results in journals indexed in Springer, WoS, ACL Anthology.

Research Engineer (February 2017 - August 2017), Expasoft Ltd., Novosibirsk, Russia.

  • created Text Classification and Named Entity Recognition modules for enterprise dialogue assistant "chatme.ai";
  • worked on R&D on Paraphrase Detection task; published the results in Springer-indexed journals.

Publications

  1. Bakarov, A. (2022). Distributional Word Vectors as Semantic Maps Framework. Computación y Sistemas, 26(3), 1343-1364
  2. Bakarov, A. (2021). Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings. Recent Trends in Analysis of Images, Social Networks and Texts, 1357, 3.
  3. Parinov, S., Bakarov, A., Vodolazcky, D. (2020). Layout logical labelling and finding the semantic relationships between citing and cited paper content. International Journal of Metadata, Semantics and Ontologies, 14(1), 54-62.
  4. Artemova, E., Bakarov, A., Artemov, A., Burnaev, E., Sharaev, M. (2020). Data-driven models and computational tools for neurolinguistics: a language technology perspective. Journal of Cognitive Science, 21(1), 15-52.
  5. Bakarov, A (2018, December). Vector Space Models for Automatic Misogyny Identification. EVALITA Evaluation of NLP and Speech Tools for Italian 12 (2018): 211.
  6. Yadrintsev V., Bakarov. A., Suvorov, R., Sochenkov, I. (2018, September). Fast and Accurate Patent Classification in Search Engines. In Big Data Conference (Vol. 1117, No. 1, p. 012004). IOP Publishing.
  7. Nikishina, I., Bakarov. A., Kutuzov, A. (2018, July). RusNLP: Semantic search engine for Russian NLP conference papers. In International Conference on Analysis of Images, Social Networks and Texts (pp. 111-120). Springer, Cham.
  8. Bakarov, A., Suvorov. R., Sochenkov, I. (2018, June). The Limitations of Cross-language Word Embeddings Evaluation. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics (pp. 94-100).
  9. Bakarov, A., Yadrintsev, V., Sochenkov, I. (2018, June). Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch From Goal-Oriented Conversation to Chit-Chatting. In Digital Transformations & Global Society (pp. 289-298). Springer, Cham.
  10. Bakarov, A., Kutuzov, A., Nikishina I. (2018, May). Russian Computational Linguistics: Topical Structure in 2007-2017 Conference Papers. Computational Linguistics and Intellectual Technologies (Dialogue 2018).
  11. Bakarov A. (2018, May). The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics. Computational Linguistics in Bulgaria 2018.
  12. Bakarov A. (2018, May). The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics. Computational Linguistics in Bulgaria 2018.
  13. Bakarov, A. (2018, May). Can Eye Movement Data Be Used As Ground Truth For Word Embeddings Evaluation? In Linguistic and Neuro-Cognitive Resources (LiNCR), LREC 2018 Workshop (pp. 27-32).
  14. Bakarov, A. (2018, January). A Survey of Word Embeddings Evaluation Methods. arXiv preprint arXiv:1801.09536.
  15. Bakarov, A., Gureenkova, O. (2017, July). Automated Detection of Non-Relevant Posts on the Russian Imageboard 2ch: Importance of the Choice of Word Representations. In International Conference on Analysis of Images, Social Networks and Texts (pp. 16-21). Springer, Cham.

Open-source contributions

  • RusNLP, a semantic search engine for Russian NLP conference papers.
  • Vecto, a Python framework for working with real-valued linguistic representations.
  • CIRTEC project dedicated to scientometric analysis of role of citations in academic publication.

Relevant skills

  • Computer Science: My education and work experience as a software engineer gave me such skills in computer science and software engineering as algorithms, object oriented design, patterns, architecture design and product understanding.
  • Linguistics: My second degree in linguistics gave me a deep understanding of mechanisms of a human language, e.g. generative grammar, formal semantics and linguistic typology.
  • Stack: I use C/C++ for performance-critical code, and Python (along with pytorch and numpy) for high-level scripting, quick prototyping, experiments and data analysis.
  • Other: I have a decent understanding of UNIX system administration, I am familiar with version control systems (git, svn) and databases (SQL). I have interest in cognitive sciences, and I know about key works, methods and recent advances in fields of decision making and language processing.
  • Natural Languages: Russian (native), English (fluent).

Teaching

  1. Teaching Assistant (January 2019 - July 2019), Machine Learning, The National Research University Higher School of Economics.
  2. Teaching Assistant (January 2018 - July 2018), Data Science, The National Research University Higher School of Economics.
  3. Lecturer (January 2019 - June 2019), Distributional Semantics, Novosibirsk State University, Novosibirsk, Russia

Public Speaking

  1. Linguistic Representativeness of Word Embeddings, 5th Kolmogorov Seminar on Computational Linguistics, April 2021.
  2. Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings, AIST Conference, June 2020.
  3. Introduction to Natural Language Processing, Geek Picnic, June 2019.
  4. Data Augmentation & Few-Shot Learning, Open Data Science Conference, May 2019.
  5. Linguistic Representativeness of Distributional Semantic Models on the Lexical Level, University of Oslo Seminar in Language Technology, April 2019.
  6. Seq2Seq Models & Chatbots, St. Petersburg Developers Community, December 2018.
  7. Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch From Goal-Oriented Conversation to Chit-Chatting, DTGS, June 2018.
  8. Meaning Representations for Conversational Agents, ICT Algorithm Design (ICTAD), November 2018.
  9. Methods of Evaluation of Word Embeddings, AINL, September 2018.
  10. The Limitations of Cross-language Word Embeddings Evaluation, ESSLI Student Session, August 2018.
  11. The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics, CLIB, May 2018.
  12. Can Eye Movement Data Be Used As Ground Truth For Word Embeddings Evaluation?, LiNCR, May 2018.
  13. Importance of the Choice of Word Representations, AIST Conference, July 2017.
  14. Ontology Learning with LDA2VEC, MNSK Conference, April 2017.

Posters

  1. Dont Count, Look! Finding Correlation Between Distributional Word Embeddings and Eye-Tracking Gaze Vectors, AINL, September 2017.
    1. Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings, HSE ML Workshop, April 2018.
  2. The Limitations of Cross-language Word Embeddings, *SEM, June 2018.
  3. Russian Computational Linguistics: Topical Structure in 2007-2017 Conference Papers, Dialogue, June 2018.
  4. Representativeness of Cross-language Word Embeddings on a Lexical Level: Could it be General, or is it Always Task-specific? FoTran Workshop, September 2018.
  5. Vecto: A framework for word, character, sentence embeddings and more! AINL, October 2018.

Activities

  • Programme Committee: AACL, ACL, EACL, EMNLP, LREC, NAACL; AIST (Analysis of Images, Social networks and Texts Conference), AINL(Artificial Intelligence and Natural Language Conference).
  • Reviewing: IEEE Access (Q1), JASIST (Q1), Language Resources and Evaluation Journal (Q3), Journal of Intelligent Systems (Q4), Journal of Metadata, Semantics and Ontologies (Q4), etc.
  • Membership: Association of Computational Linguistics and Special Interest Group on Slavic Natural Language Processing SIGSLAV.
  • Mentoring: Mentor for the Apertium Project at Google Code-in 2018.
  • Organizing: Organizer of an open NLP Seminar in Moscow and St. Petersburg; participated in organisation of Sberbank Data Science Journey 2018.
  • Competitions: 1st place at the EVALITA-2018 Shared Task on Automated Misogyny Detection across 16 teams.
  • Participated in ESSLLI 2018 (Sofia, Bulgaria) and RUSSIR 2017/2018 summer schools.

About

My personal blog.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages