Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLDB-2019/8-TextCube: Automated Construction and Multidimensional Exploration #285

Open
BrambleXu opened this issue Nov 8, 2019 · 0 comments
Assignees
Labels
NER(T) Named Entity Recognition Task

Comments

@BrambleXu
Copy link
Owner

BrambleXu commented Nov 8, 2019

Summary:

提出了一个TextCube的数据结构框架。讲了为了做到自动化构建这个框架,用到了哪些技术。#275 的团队。

Resource:

  • pdf
  • [code](
  • [paper-with-code](

Paper information:

  • Author:
  • Dataset:
  • keywords:

Notes:

TextCube provides a critical information organization structure, enhancing text
exploration and analysis for various applications.

We focus on new TextCube construction methods that are scalable, weakly-supervised, domain-independent, language-agnostic, and effective (i.e., generating quality TextCubes from large corpora of various domains).

Module I. Mining Structural Primitives from Text: Phrases, Entities and Relations

  • AutoPhrase
  • AutoNER
  • ReMine [58] which extracts high-confidence relational phrases from domain-specific texts in an end-to-end manner.

Module II. Automated Construction of TextCubes

  1. Taxonomy construction: Taxonomy construction clusters similar concepts and generates a hierarchy of “concept clusters” from massive corpus。 模型:TaxoGen [53], a recursive framework that leverages word distributional representations and constructs cluster-based taxonomy using adaptive spherical clustering and local embedding

  2. Embedding learning: serve as the preliminary to document classification and TextCube construction. 模型:JoSE. , an unsupervised text embedding framework that jointly learns word embedding and paragraph embedding by incorporating both local and global contexts to capture more complete text semantics, and present TopicMine [24], a category-name guided word embedding framework that endows word embedding with discriminative power over the specific set of categories

  3. Supervised methods: for text cube construction。 We present how to adapt the supervised methods for text cube construction along with their strength and drawbacks.

  4. Weakly-supervised methods: WeSTClass [25] and WeSHClass [26], which generate pseudo training data for neural classifier pre-training, and then bootstrap the classifier by selftraining on unlabeled documents.

Module III. Multi-Dimensional Exploration of TextCubes

TextCube facilitates multidimensional text analysis

  1. Cube-based multidimensional analysis:
  2. Text summarization:

Model Graph:

Result:

Thoughts:

Next Reading:

@BrambleXu BrambleXu self-assigned this Nov 8, 2019
@BrambleXu BrambleXu added the NER(T) Named Entity Recognition Task label Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NER(T) Named Entity Recognition Task
Projects
None yet
Development

No branches or pull requests

1 participant