ML_for_learner

该项目旨在使用numpy实现一个类scikit-learn的mini机器学习库，对于相关的知识，均配有blog文章对其理论进行讲解，对于部分功能，还配有notebook分析代码实现上的细节。该项目的初衷是为那些算法学习者提供从理论到实现的一站式服务。

由于本人学识有限，并且没有Python开发经验，该库目前还是一个非常松散的代码集合体。如果你在blog、notebook或者code中发现任何纰漏或bug，甚至是觉得哪写的不通顺，都可以联系我，当然也可以直接在项目页面提issue，谢谢。

QQ: 435248055 | WeChat: QQ435248055 | Blog

点击算法名称进入相应Blog了解算法理论，notebook指导如何step-by-step的去实现该算法，code为模块化的代码文件。

注：除非特别说明，各模型所接受的数据格式均为numpy.ndarray格式，部分也可接受List或者嵌套List，除此之外的数据格式本人暂不保证。由于目前的Python type hint还不支持numpy，所以在代码中未说明(感谢微信昵称@Stream的提醒)。

Supervised learning

Class	Algorithm	Implementation	Code
Generalized Linear Models	Linear Regression	notebook	code
	Logistic regression	notebook	code
Nearest Neighbors	Nearest Neighbors Classification	notebook	code
Naive Bayes	Gaussian Naive Bayes	notebook	code
Support Vector Machine	SVC	notebook	code
Decision Trees	ID3 Classification	notebook	code
	ID3 Regression	notebook	code
	CART Classification	notebook	code
	CART Regression	notebook	code
Ensemble methods	Random Forests Classification	notebook	code
	Random Forests Regression	notebook	code
	AdaBoosting Classification	notebook	code

Unsupervised learning

Class	Algorithm	Implementation	Code
Gaussian mixture models	Gaussian Mixture	notebook	code
Clustering	K-means	notebook	code
	DBSCAN	notebook	code
Association Rules	Apriori	notebook
Collaborative Filtering	User-based	notebook
	Item-based	notebook
	LFM	notebook

Model selection and evaluation

Class	Approach	Code
Model Selection	Dataset Split	code
	K-Fold	code
	Stratified K-Fold	code
Metrics	Accuracy	code
	Log loss	code
	F1-score	code
	AUC	code
	Explained Variance	code
	Mean Absolute Error	code
	Mean Squared Error	code
	R Square	code
	Euclidean Distances	code

Preprocessing data

Class	Algorithm	Implementation	Code
Feature Scaling	StandardScaler		code
	MinMaxScaler		code
Unsupervised dimensionality reduction	PCA	notebook	code
	SVD	notebook	code
Supervised dimensionality reduction	Linear Discriminant Analysis	notebook	code
Text Feature	Count Feature		code
	TF-IDF		code

Known Issues

整体代码重用性较低。

random forest没有实现并行。

LDA代码存在功能欠缺。

K-Fold代码中使用了np.append()，效率较低。

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
cluster		cluster
datasets		datasets
decomposition		decomposition
discriminant_analysis		discriminant_analysis
ensemble		ensemble
feature_extraction		feature_extraction
linear_model		linear_model
metrics		metrics
mixture		mixture
model_selection		model_selection
naive_bayes		naive_bayes
neighbors		neighbors
preprocessing		preprocessing
recommend		recommend
rule		rule
svm		svm
tree		tree
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML_for_learner

Supervised learning

Unsupervised learning

Model selection and evaluation

Preprocessing data

Known Issues

About

Releases

Packages

Languages

License

Daya-Jin/ML_for_learner

Folders and files

Latest commit

History

Repository files navigation

ML_for_learner

Supervised learning

Unsupervised learning

Model selection and evaluation

Preprocessing data

Known Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages