Skip to content

Latest commit

 

History

History
238 lines (190 loc) · 18.1 KB

20221202_03.md

File metadata and controls

238 lines (190 loc) · 18.1 KB

配置 madlib for PolarDB 实现数据库机器学习功能

作者

digoal

日期

2022-12-02

标签

PostgreSQL , PolarDB , madlib , 机器学习


背景

PolarDB 的云原生存算分离架构, 具备低廉的数据存储、高效扩展弹性、高速多机并行计算能力、高速数据搜索和处理; PolarDB与计算算法结合, 将实现双剑合璧, 推动业务数据的价值产出, 将数据变成生产力.

本文将介绍PolarDB结合madlib, 让PolarDB具备机器学习功能.

madlib库无疑是大而全的数据库机器学习库,

  • Deep Learning
  • Graph
  • Model Selection
  • Sampling
  • Statistics
  • Supervised Learning
  • Time Series Analysis
  • Unsupervised Learning

将madlib安装到PolarDB, 让PolarDB具备机器学习功能

这个例子直接在PolarDB容器中部署pgcat.

PolarDB部署请参考:

进入PolarDB环境

docker exec -it 67e1eed1b4b6 bash    

下载madlib rpm

https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide

wget https://dist.apache.org/repos/dist/release/madlib/1.20.0/apache-madlib-1.20.0-CentOS7.rpm  

安装madlib

sudo rpm -ivh apache-madlib-1.20.0-CentOS7.rpm  

加载madlib到PolarDB数据库对应DB中(非extension管理)

/usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install  
  
or  
  
/usr/local/madlib/bin/madpack -s madlib -p postgres install  

测试madlib安装正确性

/usr/local/madlib/bin/madpack -s madlib -p postgres install-check  

使用madlib

[postgres@67e1eed1b4b6 ~]$ psql -h 127.0.0.1  
psql (11.9)  
Type "help" for help.  
  
postgres=# set search_path =madlib, "$user", public;  
SET  
  
  
postgres=# \dT+  
                                                               List of data types  
 Schema |               Name                |           Internal name           | Size  | Elements |  Owner   | Access privileges | Description   
--------+-----------------------------------+-----------------------------------+-------+----------+----------+-------------------+-------------  
 madlib | args_and_value_double             | args_and_value_double             | tuple |          | postgres |                   |   
 madlib | __arima_lm_result                 | __arima_lm_result                 | tuple |          | postgres |                   |   
 madlib | __arima_lm_stat_result            | __arima_lm_stat_result            | tuple |          | postgres |                   |   
 madlib | __arima_lm_sum_result             | __arima_lm_sum_result             | tuple |          | postgres |                   |   
 madlib | assoc_rules_results               | assoc_rules_results               | tuple |          | postgres |                   |   
 madlib | bytea8                            | bytea8                            | var   |          | postgres |                   |   
 madlib | _cat_levels_type                  | _cat_levels_type                  | tuple |          | postgres |                   |   
 madlib | chi2_test_result                  | chi2_test_result                  | tuple |          | postgres |                   |   
 madlib | closest_column_result             | closest_column_result             | tuple |          | postgres |                   |   
 madlib | closest_columns_result            | closest_columns_result            | tuple |          | postgres |                   |   
 madlib | __clustered_agg_result            | __clustered_agg_result            | tuple |          | postgres |                   |   
 madlib | __clustered_lin_result            | __clustered_lin_result            | tuple |          | postgres |                   |   
 madlib | __clustered_log_result            | __clustered_log_result            | tuple |          | postgres |                   |   
 madlib | __clustered_mlog_result           | __clustered_mlog_result           | tuple |          | postgres |                   |   
 madlib | complex                           | complex                           | tuple |          | postgres |                   |   
 madlib | __coxph_a_b_result                | __coxph_a_b_result                | tuple |          | postgres |                   |   
 madlib | __coxph_cl_var_result             | __coxph_cl_var_result             | tuple |          | postgres |                   |   
 madlib | coxph_result                      | coxph_result                      | tuple |          | postgres |                   |   
 madlib | coxph_step_result                 | coxph_step_result                 | tuple |          | postgres |                   |   
 madlib | cox_prop_hazards_result           | cox_prop_hazards_result           | tuple |          | postgres |                   |   
 madlib | __cox_resid_stat_result           | __cox_resid_stat_result           | tuple |          | postgres |                   |   
 madlib | __dbscan_edge                     | __dbscan_edge                     | tuple |          | postgres |                   |   
 madlib | __dbscan_losses                   | __dbscan_losses                   | tuple |          | postgres |                   |   
 madlib | __dbscan_record                   | __dbscan_record                   | tuple |          | postgres |                   |   
 madlib | dense_linear_solver_result        | dense_linear_solver_result        | tuple |          | postgres |                   |   
 madlib | __elastic_net_result              | __elastic_net_result              | tuple |          | postgres |                   |   
 madlib | _flattened_tree                   | _flattened_tree                   | tuple |          | postgres |                   |   
 madlib | f_test_result                     | f_test_result                     | tuple |          | postgres |                   |   
 madlib | __glm_result_type                 | __glm_result_type                 | tuple |          | postgres |                   |   
 madlib | _grp_state_type                   | _grp_state_type                   | tuple |          | postgres |                   |   
 madlib | heteroskedasticity_test_result    | heteroskedasticity_test_result    | tuple |          | postgres |                   |   
 madlib | kmeans_result                     | kmeans_result                     | tuple |          | postgres |                   |   
 madlib | kmeans_state                      | kmeans_state                      | tuple |          | postgres |                   |   
 madlib | ks_test_result                    | ks_test_result                    | tuple |          | postgres |                   |   
 madlib | lda_result                        | lda_result                        | tuple |          | postgres |                   |   
 madlib | lincrf_result                     | lincrf_result                     | tuple |          | postgres |                   |   
 madlib | linear_svm_result                 | linear_svm_result                 | tuple |          | postgres |                   |   
 madlib | linregr_result                    | linregr_result                    | tuple |          | postgres |                   |   
 madlib | lmf_result                        | lmf_result                        | tuple |          | postgres |                   |   
 madlib | __logregr_result                  | __logregr_result                  | tuple |          | postgres |                   |   
 madlib | marginal_logregr_result           | marginal_logregr_result           | tuple |          | postgres |                   |   
 madlib | marginal_mlogregr_result          | marginal_mlogregr_result          | tuple |          | postgres |                   |   
 madlib | margins_result                    | margins_result                    | tuple |          | postgres |                   |   
 madlib | matrix_result                     | matrix_result                     | tuple |          | postgres |                   |   
 madlib | __mlogregr_cat_coef               | __mlogregr_cat_coef               | tuple |          | postgres |                   |   
 madlib | mlogregr_result                   | mlogregr_result                   | tuple |          | postgres |                   |   
 madlib | mlogregr_summary_result           | mlogregr_summary_result           | tuple |          | postgres |                   |   
 madlib | mlp_result                        | mlp_result                        | tuple |          | postgres |                   |   
 madlib | __multinom_result_type            | __multinom_result_type            | tuple |          | postgres |                   |   
 madlib | mw_test_result                    | mw_test_result                    | tuple |          | postgres |                   |   
 madlib | one_way_anova_result              | one_way_anova_result              | tuple |          | postgres |                   |   
 madlib | __ordinal_result_type             | __ordinal_result_type             | tuple |          | postgres |                   |   
 madlib | path_match_result                 | path_match_result                 | tuple |          | postgres |                   |   
 madlib | _pivotalr_lda_model               | _pivotalr_lda_model               | tuple |          | postgres |                   |   
 madlib | _prune_result_type                | _prune_result_type                | tuple |          | postgres |                   |   
 madlib | __rb_coxph_hs_result              | __rb_coxph_hs_result              | tuple |          | postgres |                   |   
 madlib | __rb_coxph_result                 | __rb_coxph_result                 | tuple |          | postgres |                   |   
 madlib | residual_norm_result              | residual_norm_result              | tuple |          | postgres |                   |   
 madlib | robust_linregr_result             | robust_linregr_result             | tuple |          | postgres |                   |   
 madlib | robust_logregr_result             | robust_logregr_result             | tuple |          | postgres |                   |   
 madlib | robust_mlogregr_result            | robust_mlogregr_result            | tuple |          | postgres |                   |   
 madlib | sparse_linear_solver_result       | sparse_linear_solver_result       | tuple |          | postgres |                   |   
 madlib | summary_result                    | summary_result                    | tuple |          | postgres |                   |   
 madlib | __svd_bidiagonal_matrix_result    | __svd_bidiagonal_matrix_result    | tuple |          | postgres |                   |   
 madlib | __svd_lanczos_result              | __svd_lanczos_result              | tuple |          | postgres |                   |   
 madlib | __svd_vec_mat_mult_result         | __svd_vec_mat_mult_result         | tuple |          | postgres |                   |   
 madlib | svec                              | svec                              | var   |          | postgres |                   |   
 madlib | _tree_result_type                 | _tree_result_type                 | tuple |          | postgres |                   |   
 madlib | t_test_result                     | t_test_result                     | tuple |          | postgres |                   |   
 madlib | __utils_scales                    | __utils_scales                    | tuple |          | postgres |                   |   
 madlib | wsr_test_result                   | wsr_test_result                   | tuple |          | postgres |                   |   
 madlib | xgb_gridsearch_train_results_type | xgb_gridsearch_train_results_type | tuple |          | postgres |                   |   
 public | vector                            | vector                            | var   |          | postgres |                   |   
(73 rows)  
  
postgres=#   \do+  
                                                    List of operators  
 Schema | Name |   Left arg type    |   Right arg type   |   Result type    |           Function            | Description   
--------+------+--------------------+--------------------+------------------+-------------------------------+-------------  
 madlib | %*%  | double precision[] | double precision[] | double precision | madlib.svec_dot               |   
 madlib | %*%  | double precision[] | svec               | double precision | madlib.svec_dot               |   
 madlib | %*%  | svec               | double precision[] | double precision | madlib.svec_dot               |   
 madlib | %*%  | svec               | svec               | double precision | madlib.svec_dot               |   
 madlib | *    | double precision[] | double precision[] | svec             | float8arr_mult_float8arr      |   
 madlib | *    | double precision[] | svec               | svec             | float8arr_mult_svec           |   
 madlib | *    | svec               | double precision[] | svec             | svec_mult_float8arr           |   
 madlib | *    | svec               | svec               | svec             | svec_mult                     |   
 madlib | *||  | integer            | svec               | svec             | svec_concat_replicate         |   
 madlib | +    | double precision[] | double precision[] | svec             | float8arr_plus_float8arr      |   
 madlib | +    | double precision[] | svec               | svec             | float8arr_plus_svec           |   
 madlib | +    | svec               | double precision[] | svec             | svec_plus_float8arr           |   
 madlib | +    | svec               | svec               | svec             | svec_plus                     |   
 madlib | -    | double precision[] | double precision[] | svec             | float8arr_minus_float8arr     |   
 madlib | -    | double precision[] | svec               | svec             | float8arr_minus_svec          |   
 madlib | -    | svec               | double precision[] | svec             | svec_minus_float8arr          |   
 madlib | -    | svec               | svec               | svec             | svec_minus                    |   
 madlib | /    | double precision[] | double precision[] | svec             | float8arr_div_float8arr       |   
 madlib | /    | double precision[] | svec               | svec             | float8arr_div_svec            |   
 madlib | /    | svec               | double precision[] | svec             | svec_div_float8arr            |   
 madlib | /    | svec               | svec               | svec             | svec_div                      |   
 madlib | <    | svec               | svec               | boolean          | svec_lt                       |   
 madlib | <=   | svec               | svec               | boolean          | svec_le                       |   
 madlib | <>   | svec               | svec               | boolean          | svec_ne                       |   
 madlib | =    | svec               | svec               | boolean          | svec_eq                       |   
 madlib | ==   | svec               | svec               | boolean          | svec_eq                       |   
 madlib | >    | svec               | svec               | boolean          | svec_gt                       |   
 madlib | >=   | svec               | svec               | boolean          | svec_ge                       |   
 madlib | ^    | svec               | svec               | svec             | svec_pow                      |   
 madlib | ||   | svec               | svec               | svec             | svec_concat                   |   
 public | +    | vector             | vector             | vector           | vector_add                    |   
 public | -    | vector             | vector             | vector           | vector_sub                    |   
 public | <    | vector             | vector             | boolean          | vector_lt                     |   
 public | <#>  | vector             | vector             | double precision | vector_negative_inner_product |   
 public | <->  | vector             | vector             | double precision | l2_distance                   |   
 public | <=   | vector             | vector             | boolean          | vector_le                     |   
 public | <=>  | vector             | vector             | double precision | cosine_distance               |   
 public | <>   | vector             | vector             | boolean          | vector_ne                     |   
 public | =    | vector             | vector             | boolean          | vector_eq                     |   
 public | >    | vector             | vector             | boolean          | vector_gt                     |   
 public | >=   | vector             | vector             | boolean          | vector_ge                     |   
(41 rows)  

参考

https://madlib.apache.org/docs/latest/index.html

https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide

digoal's wechat