Skip to content

EasyCam/MTMinePy

Repository files navigation

MTMinePy - Multilingual Text Miner with Python

PyPI version License: GPL v3

MTMinePy is a Python-based academic text mining platform inspired by MTMineR. It is a comprehensive Flask web application designed for powerful text mining and analysis, supporting interactive visualization and advanced modeling.

Key Features

  • Advanced NLP: Integrated with jieba, HanLP, LTP, Spacy, and NLTK.
  • Multilingual: Native support for 10+ languages including Chinese, English, Japanese, and more.
  • Interactive Visualization: Powered by ECharts, supporting responsive force-directed networks, dynamic word clouds, and interactive scatter plots.
  • Academic Metric Analysis: Supports advanced distance functions (Hsim, Close, Esim) and high-end visualization.
  • Advanced Modeling: Comprehensive suite of Unsupervised (Clustering, Topic Modeling) and Supervised learning algorithms.

Screenshots

Chinese Analysis

Co-occurrence Network Word Cloud Clustering
Chinese Network Chinese WordCloud Chinese Clustering

English Analysis

Co-occurrence Network Word Cloud Clustering
English Network English WordCloud English Clustering

Installation

Install from PyPI (Recommended)

pip install mtminepy

To install with all optional NLP backends (Janome, spaCy, HanLP, LTP, UMAP, Boruta, etc.):

pip install mtminepy[full]

Install from source

git clone https://github.com/EasyCam/MTMinePy.git
cd MTMinePy
pip install -e .

Usage

Run from command line

After installation, run directly:

mtminepy

Access the dashboard at http://localhost:5000.

Command-line options

mtminepy --help
mtminepy --port 8080          # Custom port
mtminepy --host 127.0.0.1    # Bind to localhost only
mtminepy --debug              # Flask debug mode
mtminepy --version            # Show version

Run from Python

from mtminepy.app import create_app

app = create_app()
app.run(host='0.0.0.0', port=5000)

Advanced Capabilities

Modeling Algorithms

MTMinePy supports a wide range of standard machine learning algorithms for text analysis:

  • Feature Engineering: TF-IDF, Bag of Words (CountVectorizer), N-gram support.
  • Unsupervised Learning:
    • Topic Modeling: Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), STM (Structural Topic Model).
    • Clustering: K-Means, Agglomerative (Hierarchical), DBSCAN, Spectral Clustering.
    • Dimensionality Reduction: PCA, t-SNE, UMAP, Factor Analysis.
  • Supervised Learning (Classification):
    • Support Vector Machines (SVM)
    • Random Forest
    • Linear Discriminant Analysis (LDA)
    • Quadratic Discriminant Analysis (QDA)
    • Logistic Regression (Elastic Net)

Mathematical Models (Distance & Similarity)

MTMinePy supports advanced metrics for academic research:

Advanced Custom Similarity Measures

  1. Hsim (Yang Fengzhao, 2007) $$ Hsim(x_i, x_j) = \frac{1}{n} \sum_{k=1}^n \frac{1}{1+|x_{ik}-x_{jk}|} $$

  2. Close (Shao Changsheng, et al., 2011) $$ Close(x_i, x_j) = \frac{1}{n} \sum_{k=1}^n e^{-|x_{ik}-x_{jk}|} $$

  3. Esim (Wang Xiaoyang, et al., 2013) $$ Esim(x_{ik}, x_{jk}) = \frac{1}{n} \sum_{k=1}^d \omega_k e^{-\frac{|x_{ik}-x_{jk}|}{|x_{ik}-x_{jk}|+|x_{ik}+x_{jk}|/2}} $$

About

A Multilingual Text Miner with Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors