You may find this series of notebooks at https://github.com/databricks-industry-solutions/als-recommender. For more information about this solution accelerator, visit https://www.databricks.com/solutions/accelerators/recommendation-engines

The purpose of this notebook is to introduce the ALS recommender solution accelerator and to provide access to configuration information for the notebooks supporting it.

## Introduction

Recommender systems are becoming increasing important as companies seek better ways to select products to present to end users. In this solution accelerator, we will explore a form of collaborative filter referred to as a matrix factorization.  

Matrix factorization works by assembling a set of ratings for various products made by a set of users.  The large user x products matrix is decomposed into smaller user and product submatrices associated with some developer-specified number of latent factors.  In many ways, a matrix factorization is a dimension reduction technique but one where missing values in the original matrix are allowed.

When examining ratings for a large number of user and product combinations, most users will engage with a very smaller percentage of products.  This causes us to have a user x products matrix that is highly sparse. When we decompose this matrix into the submatrices, the two can be combined to *recreate* the original matrix in a manner that provides ratings estimates for all products, including those a user has not yet engaged.  This ability to fill-in the missing ratings forms the basis for recommending new products to a user.

Matrix factorization recommenders are frequently used in scenarios where we wish to suggest new and repeat purchase items to a user.  *People like you also bought ...*, *Products we think you'll like ...*, and *Based on your purchase history ...* styled recommendations are frequently delivered through this type of recommender.

The challenge in developing a matrix factorization recommender is the large amount of computational horsepower required to calculate the submatrices.  Alternating Least Squares (ALS) is one approach that decomposes the process into a series of incremental steps that can be implemented in a distributed manner. In this solution accelerator, we will train and deploy an ALS-based matrix factorization recommender using the ALS capabilities in Apache Spark to demonstrate how this is done.

## Configuration Settings

In [0]:
if 'config' not in locals().keys():
  config = {}

In [0]:
config['database'] = 'adv_analytics_poc'

In [0]:
# create database if not exists
# _ = spark.sql('create database if not exists {0}'.format(config['database']))

# set current database context
# _ = spark.catalog.setCurrentDatabase(config['database'])

Here we use a temporary path in DBFS for illustration purposes to reduce external dependencies. We recommend that you use a cloud storage path or [mount point](https://docs.databricks.com/dbfs/mounts.html) to save data for production workloads. 

In [0]:
config['mount_point'] = '/tmp/instacart_als'

In [0]:
config['products_path'] = config['mount_point'] + '/bronze/products'
config['orders_path'] = config['mount_point'] + '/bronze/orders'
config['order_products_path'] = config['mount_point'] + '/bronze/order_products'
config['aisles_path'] = config['mount_point'] + '/bronze/aisles'
config['departments_path'] = config['mount_point'] + '/bronze/departments'

In [0]:
config['model name'] = 'als'

In [0]:
import mlflow
username = dbutils.notebook.entry_point.getDbutils().notebook().getContext().userName().get()
mlflow.set_experiment('/Users/{}/als-recommender'.format(username))

2024/04/01 09:50:18 INFO mlflow.tracking.fluent: Experiment with name '/Users/mahendra.v@sapiens.com/als-recommender' does not exist. Creating a new experiment.


<Experiment: artifact_location='dbfs:/databricks/mlflow-tracking/4413013109107832', creation_time=1711965018511, experiment_id='4413013109107832', last_update_time=1711965018511, lifecycle_stage='active', name='/Users/mahendra.v@sapiens.com/als-recommender', tags={'mlflow.experiment.sourceName': '/Users/mahendra.v@sapiens.com/als-recommender',
 'mlflow.experimentType': 'MLFLOW_EXPERIMENT',
 'mlflow.ownerEmail': 'mahendra.v@sapiens.com',
 'mlflow.ownerId': '6670788333455762'}>

© 2022 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License.