# Model Training

This notebook will train the models: 

1) A model to recommend product categories based on customer clusters' history

## Imports

In [1]:
import os

# ETL and Data Manipulation
from pyspark.sql import SparkSession

# Matrix Factorization
from pyspark.ml.recommendation import ALS

In [2]:
# Create local Spark session
spark = SparkSession.builder \
    .appName("LocalSparkForTesting") \
    .master("local[*]") \
    .getOrCreate()

## Load data

In [8]:
DATA_PATH = os.path.join('/sparkdata/wholesale-recommender','processed')

# Customers with features
customers = spark.read.parquet(os.path.join(DATA_PATH, "customers_features"))

# Interactions
interactions = spark.read.parquet(os.path.join(DATA_PATH, "interactions"))

## Prepare data

Based on your history, you might like..
 
Only for customers with >10 orders

### Get customers with >10 orders

In [9]:
# Filter customers with >10 distinct orders
active_customers = customers.filter("order_count > 10").withColumnRenamed("Customer ID", "customer_id")

# Filter interactions
interactions_active = interactions.join(active_customers, on="customer_id", how="inner")

## Train ALS Model

In [10]:
als = ALS(
    userCol="customer_index",
    itemCol="product_index",
    ratingCol="rating",
    implicitPrefs=True,         
    coldStartStrategy="drop",   # Avoid NaNs in output
    rank=10,
    maxIter=10,
    regParam=0.1
)

als_model = als.fit(interactions_active)

## Save

In [13]:
MODEL_PATH = os.path.join('/sparkdata/wholesale-recommender','models','individual_cust_rec')

# Save
als_model.save(MODEL_PATH)