This repository contains code, datasets, and links related to entity/knowledge papers from the VERT (Versatile Entity Recognition & Disambiguation Toolkit) project, by the Knowledge Computing (KC) group at Microsoft Research Asia (MSRA).
Our group is hiring both research interns and full-time employees! If you are interest, please take a look at:
- Internship opportunities in KC (PDF);
- Researcher or RSDE positions and select "China" on the left-side "Country/Region" menu.
- 2021-Jul: The Recognizers-Text project reached over 3 million package downloads (across NuGet/npm/PyPI)!
- 2021-May: ReTraCk has reached #1 in the Generalizable Question Answering (GrailQA) leaderboard for knowledge base QA (KBQA).
- 2020-Dec: The Recognizers-Text project reached over 2 million package downloads (across NuGet/npm/PyPI)!
- 2020-Nov: The LinkingPark system, developed in partnership between the Knowledge Computing group at MSRA and our collaborators in MSR Cambridge, has gotten 2nd place in the SemTab 2020 challenge (Semantic Web Challenge on Tabular Data to Knowledge Graph Matching)!
- AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial Discriminator for Cross-Lingual NER, Weile Chen, Huiqiang Jiang, Qianhui Wu, Börje F. Karlsson, Yi Guan, ACL 2021.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/AdvPicker - ReTraCk: A Flexible and Efficient Framework for Knowledge Base Question Answering, Shuang Chen, Qian Liu, Zhiwei Yu, Chin-Yew Lin, Jian-Guang Lou, Feng Jiang, ACL 2021. (demo paper)
Repository: https://github.com/microsoft/KC/tree/master/papers/ReTraCk - BoningKnife: Joint Entity Mention Detection and Typing for Nested NER via prior Boundary Knowledge, Huiqiang Jiang, Guoxin Wang, Weile Chen, Chengxi Zhang, Börje F. Karlsson, arXiv:2107.09429 - 2020/2021.
- LinkingPark: An integrated approach for Semantic Table Interpretation, Shuang Chen, Alperen Karaoglu, Carina Negreanu, Tingting Ma, Jin-Ge Yao, Jack Williams, Andy Gordon, Chin-Yew Lin, Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2020) at ISWC 2020.
- UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data, Qianhui Wu, Zijia Lin, Börje F. Karlsson, Biqing Huang, Jian-Guang Lou, IJCAI 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/UniTrans - Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language, Qianhui Wu, Zijia Lin, Börje F. Karlsson, Jian-Guang Lou, Biqing Huang, ACL 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/SingleMulti-TS - Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources, Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin, AAAI 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/Meta-Cross - Improving Entity Linking by Modeling Latent Entity Type Information, Shuang Chen, Jinpeng Wang, Feng Jiang, Chin-Yew Lin, AAAI 2020.
- Exploring Word Representations on Time Expression Recognition, Sanxing Chen, Guoxin Wang, Börje Karlsson, Technical Report - Microsoft Research Asia, 2019.
- Towards Improving Neural Named Entity Recognition with Gazetteers, Tianyu Liu, Jin-Ge Yao, Chin-Yew Lin, ACL 2019.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/SubTagger - CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition, Yuying Zhu, Guoxin Wang, Börje F. Karlsson, NAACL-HLT 2019.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NER - GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition, Hui Chen, Zijia Lin, Guiguang Ding, Jian-Guang Lou, Yusen Zhang, Börje F. Karlsson, AAAI 2019.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/GRN-NER
- microsoft/Recognizers-Text - Open-source library that provides recognition and normalization/resolution of numbers, units, date/time, and sequences (e.g., phone numbers, URLs) expressed in multiple languages;
- Knowledge Computing (KC) on GitHub - Open-source repository including code and datasets for other projects by the Knowledge Computing group at MSRA.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.