In [None]:
%load_ext lab_black
# nb_black if running in jupyter

In [None]:
# hide
# from your_lib.core import *
# from ml-project-template.data import *

# Simulating Customer Journey Prediction in a Federated Learning Setup



## About

### What
In this study I show that customer paths can be predicted in a federated setup. 
The purpose of the work is to show, that this is possible using real customer journey data.
The purpose is not yet to provide optimal solutions.
The goal of the work was to observe machine learning in the given setup.

### Terminology
Federated learning [[1](#mcmahan2016communication)] is a term used for distributed computation setups, where 

Introduce tensorflow federated

### Why this is important

A customer journey describes the service touchpoints and transitions between them as a customer goes through a service process.
We can model these as mathematical states, and use machine learning to predict the next state based on the previous one, and other information about the customer.
Knowing the most likely next states allows us to personalize the service for a desired outcome. 
Customer journey data is quite similar to any time series data, but it is typically very limited considering the number of data points.

Considering customer journey prediction, each customer has relatively little of their own data in use.
In the data I used, most customers had 3-5 events recorded and the max observed was 10.
This means, that a localized machine learning model using the data of a single customer is not an option.
However, the customers might be unwilling to share their data for centralized processing.
With federated learning, the customers can benefit from each other without directly sharing their data.
This makes customer journey prediction is an interesting field of application for federated learning,
especially in the public sector & government context.

Federated learning allows both high level of data protection and personalization because the distributed models are fitted only on the data of a single customer.
For the public sector, this opens up a way to provide novel digital services. 
For example cities or governments can create personal AI assistants that give citizens personalized recommendation for variety of topics,
including preventative healthcare and career advice.
These kind of services require very sensitive data, that the citizens may be unwilling to trust the government with.
Federated learning can help provide the same services, without the need for data share.
In addition, with contrast to centralized modelling, the models are fitted on each customers own data.

The results may also be applied to other next state prediction problems not involving human customers,
but artificial clients insted. One such example could be predicting faults on IoT devices.

### How it was done?

In this work I show how custom data can be used with TFF. I use the customer journey dataset by Bernard & Andritsos [[2](#bernard2019customer)].
I create a simple federated learning setup for next state prediction on small subset of the dataset (21k data points, 3k customers, 16 state labels ans 3 customer background features),
simulate the federated learning with TFF and compare the results against centralized computation baseline using identical ANN model.

The code is presented in the wiki tabs CustomClientData and FederatedCustomerJourney and in the notebooks 00_data.ipynb and 01_model.ipynb that can be found from the root of the repository.

### Results

![accuracy plot](results/accuracy.png "Figure 1")
Train and validation accuracy in comparison to baselines. X-axis shows the federate update iterations, and y axis shows the accuracy. The federated updates are not directly comparable to progression of centralized computation, for which we only plot the result after 20 epochs.

We observe the federated learning achieves clearly better stats than random guessing, but does not perform as well as the centralized baseline.
Complete reproducibility proved difficult to achieve using TFF, so the results may vary depending on the run.

View the tabs / notebooks for further info.

### Critical thoughts and known issues

- Limited data
- Limited pool of customers (same customers on each federated update iteration)
- Update issues, user activity and update scheduling issues not considered
- Malicious intents of users, service provider has no access to user data

### Conclusions




## Contents

Briefly describe the contents of your repository

## How to Install and Run

Describe how to install your code. Be very through and include every single step of the process.

## Contributing

> NOTE: Edit the hyperlink below to point to the CONTRIBUTING.md file of your repository

See [here](https://github.com/City-of-Helsinki/ml_project_template/blob/master/CONTRIBUTING.md) on how to contribute to this project.


## References

<a id='mcmahan2016communication'></a> McMahan et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. 2021. Google Inc. https://arxiv.org/pdf/1602.05629.pdf

<a id='bernard2019contextual'> Bernard, G., & Andritsos, P. (2019). Contextual and behavioral customer journey discovery using a genetic approach. In 23rd European Conference on Advances in Databases and Information Systems (ADBIS), pages 251–266, Cham. Springer.
Dataset available at: https://customer-journey.me/datasets/

This project was built using [nbdev](https://nbdev.fast.ai/) on top of the city of Helsinki [ml_project_template](https://github.com/City-of-Helsinki/ml_project_template).

## How to Cite

To cite this work, use:

In [None]:
%%script False

@misc{
    sten2021simulating,
    title = {Simulating Customer Journey Prediction in a Federated Learning Setup},
    author = {Nuutti Akilles Sten},
    month = {12},
    year = {2021},
    howpublished = {City of Helsinki},
    doi = {ADD DOI HERE},
}

Couldn't find program: 'False'


## Copyright

> NOTE: Edit the year and author below according to your project!

Copyright 2021 City-of-Helsinki. Licensed under the Apache License, Version 2.0 (the "License");
you may not use this project's files except in compliance with the License.
A copy of the License is provided in the LICENSE file in this repository.

The Helsinki logo is a registered trademark owned by the city of Helsinki.