Strata DSMF

Distributed Stochastic Matrix Factorization using strata optimization

Parallelized matrix factorization on pyspark.
Trained over Movielens dataset.

Motivation:

Matrix factorization doesn't shard well, we can't just train mf on shards and average [w][h] as the loss function isn't convex.

Iterative parameter mixing, running mf stochasticly updating [w][h] after each step is slow.

But some pieces of the matrix can be trained totally independently, calling these pieces strata, these strata can be trained without exchanging parameters. This can significantly low network wait time.

Based on IBM paper https://dl.acm.org/citation.cfm?id=2020426

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
README.md		README.md
iter_param_mix_mf.py		iter_param_mix_mf.py
single_node_tf_mf.py		single_node_tf_mf.py
spark_als_dmf.py		spark_als_dmf.py
strata_dsmf.py		strata_dsmf.py
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

iter_param_mix_mf.py

iter_param_mix_mf.py

single_node_tf_mf.py

single_node_tf_mf.py

spark_als_dmf.py

spark_als_dmf.py

strata_dsmf.py

strata_dsmf.py

train.csv

train.csv

Repository files navigation

Strata DSMF

Motivation:

About

Releases

Packages

Languages

Ryan-Qiyu-Jiang/strata_dsmf

Folders and files

Latest commit

History

Repository files navigation

Strata DSMF

Motivation:

About

Resources

Stars

Watchers

Forks

Languages