# On-device recommendations with Firebase ML and TensorFlow Lite

## Overview

Для выполнения необходимо:
Подключить firebase analytics
Создать и заполнить таблицу в BigQuery

## Prerequisites

We're gonna start with a simple knn model

## Set up authentication

In this notebook, we use analytics data from BigQuery to generate training data for our recommendations model. To access BigQuery data from the Colab notebook, you need to upload the service account file that you downloaded in step 10 of the codelab.

Note: If this step is throwing an error, you can either:
1. Manually upload the json file to the /content folder using the Folder icon in the left menu. Then set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the file path.
i.e. If file was uploaded to /content, run:
`os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='/content/<your_service_acct_file_name>`
OR,
2. Try disabling third party cookies in your browser, as [suggested here](https://stackoverflow.com/a/61494336).

In [None]:
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]='donapp-d2378-firebase-adminsdk-zxd1d-2147e3a97f.json'

# Import app analytics data from BigQuery

In this step, we will load the analytics data we collected in the app with Firebase Analytics and sent to BigQuery. We will load the data into the pandas data processing library and then preprocess this data to be the appropriate format for input for the model training step.

## Enable BigQuery IPython magics

BigQuery provides several convenience IPython magics that we will use to fetch data with the %load_ext magic below.

In [None]:
%reload_ext google.cloud.bigquery

## Import data

We use the following SQL statement to get items from the table we created in BigQuery. Firebase Analytics exports a lot of additional information, such as device type, platform version, etc, that we don't need for the purposes of training this model. Initially, we only get a limited amount of rows to briefly explore the form of this data and select which fields are important.

Notice that a row in the dataframe is created for each analytics event logged in the app. This row has many properties, but the ones that are of importance for this notebook are the fields:
* event_name
* event_timestamp
* items
* user_pseudo_id

Notice that some fields, such as the **items** field is actually an object. We will extract the subfield of interest below.

Now we run the following command to import the whole dataset into a variable. Note how we only import the fields which we are interested in for training purposes.

In [None]:
%%bigquery data_real
SELECT
    charityID,userID,timestamp
FROM `firebase_recommendations_dataset.donations_table`

Query is running:   0%|          |

Downloading:   0%|          |

In [None]:
data_real.head()

Unnamed: 0,charityID,userID,timestamp
0,3MWg9xpBnDeB1GPauOf57hl90SIy,qk0Q5ZmS3au5RkcPuyotTjtg3G0b,2023-03-08 15:13:33+00:00
1,JwY0MIYrcifUDJj35tO5JKedB8Nt,qk0Q5ZmS3au5RkcPuyotTjtg3G0b,2023-03-08 15:13:34+00:00
2,tjKvwntCUTmcxOsE3hAYsOc4pxMk,qk0Q5ZmS3au5RkcPuyotTjtg3G0b,2023-03-08 15:13:36+00:00
3,3MWg9xpBnDeB1GPauOf57hl90SIy,qk0Q5ZmS3au5RkcPuyotTjtg3G0b,2023-03-08 15:13:37+00:00
4,JwY0MIYrcifUDJj35tO5JKedB8Nt,qk0Q5ZmS3au5RkcPuyotTjtg3G0b,2023-03-08 15:13:37+00:00


# Preprocess the dataset

In this step, we create a lambda function to extract a subfield 'item_id' from the items object. This represents the movie_id, so we also rename the columns to match.

In [None]:
%%bigquery analytics_data
SELECT
    value,user_id,event_timestamp
FROM `analytics_225904054.events_intraday_20230324` t CROSS JOIN
     UNNEST(t.event_params) ep
WHERE event_name='select_item' AND ep.key = 'item_id' 

Query is running:   0%|          |

Downloading:   0%|          |

In [None]:
analytics_data.head()

Unnamed: 0,value,user_id,event_timestamp
0,{'string_value': '3MWg9xpBnDeB1GPauOf57hl90SIy...,EUNOaNRQfyYlAummUev37EKg2qH3,1679672512628161
1,{'string_value': '3MWg9xpBnDeB1GPauOf57hl90SIy...,EUNOaNRQfyYlAummUev37EKg2qH3,1679672146642161
2,{'string_value': 'JwY0MIYrcifUDJj35tO5JKedB8Nt...,EUNOaNRQfyYlAummUev37EKg2qH3,1679672530866161
3,{'string_value': '3MWg9xpBnDeB1GPauOf57hl90SIy...,EUNOaNRQfyYlAummUev37EKg2qH3,1679650957899099
4,"{'string_value': '12367', 'int_value': None, '...",EUNOaNRQfyYlAummUev37EKg2qH3,1679659722144168


In [None]:
analytics_data['value'] = analytics_data['value'].map(lambda entry: entry['string_value'])

In [None]:
import pandas as pd

In [None]:
analytics = pd.concat([data_real.drop(columns=['timestamp']), analytics_data.drop(columns=['event_timestamp']).rename(columns={'value':'charityID', 'user_id':'userID'})])
#def getMovieID(row):
#  items_obj = row['items'][0]
#  return items_obj['item_id']
#analytics['movieId'] = analytics.apply(lambda row: getMovieID(row), axis=1)
analytics

Unnamed: 0,charityID,userID
0,3MWg9xpBnDeB1GPauOf57hl90SIy,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
1,JwY0MIYrcifUDJj35tO5JKedB8Nt,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
2,tjKvwntCUTmcxOsE3hAYsOc4pxMk,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
3,3MWg9xpBnDeB1GPauOf57hl90SIy,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
4,JwY0MIYrcifUDJj35tO5JKedB8Nt,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
...,...,...
31,SrYqL0G0KEht16Q6iUqicpy2oLWI,EUNOaNRQfyYlAummUev37EKg2qH3
32,9TPmjJvlASIKCgZ9NHBZP1jZEP3S,EUNOaNRQfyYlAummUev37EKg2qH3
33,12367,EUNOaNRQfyYlAummUev37EKg2qH3
34,5gNFY4JG86pVLGu6X0vAQJUuJYHc,EUNOaNRQfyYlAummUev37EKg2qH3


Here is our processed dataframe containing only the data we want to use in training.

The data has the following properties:
*   UserIDs string
*   MovieIDs string
*   Timestamp Timestamp

In [None]:
analytics.values

array([['3MWg9xpBnDeB1GPauOf57hl90SIy', 'qk0Q5ZmS3au5RkcPuyotTjtg3G0b'],
       ['JwY0MIYrcifUDJj35tO5JKedB8Nt', 'qk0Q5ZmS3au5RkcPuyotTjtg3G0b'],
       ['tjKvwntCUTmcxOsE3hAYsOc4pxMk', 'qk0Q5ZmS3au5RkcPuyotTjtg3G0b'],
       ['3MWg9xpBnDeB1GPauOf57hl90SIy', 'qk0Q5ZmS3au5RkcPuyotTjtg3G0b'],
       ['JwY0MIYrcifUDJj35tO5JKedB8Nt', 'qk0Q5ZmS3au5RkcPuyotTjtg3G0b'],
       ['tjKvwntCUTmcxOsE3hAYsOc4pxMk', 'qk0Q5ZmS3au5RkcPuyotTjtg3G0b'],
       ['tjKvwntCUTmcxOsE3hAYsOc4pxMk', 'a1aX9SLLbe3Fksvqtv7YhRoqtaw9'],
       ['9TPmjJvlASIKCgZ9NHBZP1jZEP3S', 'unFb4dqPjHNnUoGJhC9u7gBdb9YW'],
       ['9TPmjJvlASIKCgZ9NHBZP1jZEP3S', 'unFb4dqPjHNnUoGJhC9u7gBdb9YW'],
       ['b79lWABizCzlu2gUYvVsUCSApCD4', '7dlnsoBWpyftpEJ6gQBkm86oxmI7'],
       ['b79lWABizCzlu2gUYvVsUCSApCD4', '7dlnsoBWpyftpEJ6gQBkm86oxmI7'],
       ['b79lWABizCzlu2gUYvVsUCSApCD4', '7dlnsoBWpyftpEJ6gQBkm86oxmI7'],
       ['9TPmjJvlASIKCgZ9NHBZP1jZEP3S', 'pL9XlHZqQpBNF1BrLfT7SAfBchQm'],
       ['bWmuiL1Np1nat3k3BRFsrcRfBxIx', 'pL9XlHZqQp

## Encode user, charity IDs

Order of rows (users) and columns(charities) according to timestamp

In [None]:
from sklearn.preprocessing import LabelEncoder

## Train a model

In [None]:
from sklearn.neighbors import NearestNeighbors
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

In [None]:
!pip install --upgrade tensorflow==2.9.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow==2.9.0
  Downloading tensorflow-2.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.7/511.7 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Collecting flatbuffers<2,>=1.12
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting tensorboard<2.10,>=2.9
  Downloading tensorboard-2.9.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m111.5 MB/s[0m eta [36m0:00:00[0m
Collecting keras<2.10.0,>=2.9.0rc0
  Downloading keras-2.9.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m88.0 MB/s[0m eta [36m0:00:00[0m
Collecting tensorflow-estimator<2.10.0,>=2.9.0rc0
  Downloading tensorflow_estimator-2.9.0-py2.py3-none-any.whl (438 kB)
[2K     [90m━━━

In [None]:
!pip install tensorflow_recommenders

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow_recommenders
  Downloading tensorflow_recommenders-0.7.3-py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.2/96.2 KB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tensorflow_recommenders
Successfully installed tensorflow_recommenders-0.7.3


In [None]:
import tensorflow as tf
import numpy as np
import keras
from keras import Input
from keras import Model
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Concatenate
import torch
import tensorflow_recommenders as tfrs
from typing import Dict, Text
import pandas as pd

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import LabelEncoder

In [None]:
data = analytics.drop_duplicates()
data

Unnamed: 0,charityID,userID
0,3MWg9xpBnDeB1GPauOf57hl90SIy,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
1,JwY0MIYrcifUDJj35tO5JKedB8Nt,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
2,tjKvwntCUTmcxOsE3hAYsOc4pxMk,qk0Q5ZmS3au5RkcPuyotTjtg3G0b
6,tjKvwntCUTmcxOsE3hAYsOc4pxMk,a1aX9SLLbe3Fksvqtv7YhRoqtaw9
7,9TPmjJvlASIKCgZ9NHBZP1jZEP3S,unFb4dqPjHNnUoGJhC9u7gBdb9YW
9,b79lWABizCzlu2gUYvVsUCSApCD4,7dlnsoBWpyftpEJ6gQBkm86oxmI7
12,9TPmjJvlASIKCgZ9NHBZP1jZEP3S,pL9XlHZqQpBNF1BrLfT7SAfBchQm
13,bWmuiL1Np1nat3k3BRFsrcRfBxIx,pL9XlHZqQpBNF1BrLfT7SAfBchQm
14,5gNFY4JG86pVLGu6X0vAQJUuJYHc,pL9XlHZqQpBNF1BrLfT7SAfBchQm
18,P0UKy85iinfvfJZJPMB4R5G024ID,hAdXSNr0fVNOavrzh2SKW7geqxGT


In [None]:
uniq = data.charityID.unique()
uniq = pd.DataFrame(uniq)

uniq.columns = ['charityID']
uniq

rat = data[['userID', 'charityID']]

dataset = tf.data.Dataset.from_tensor_slices(dict(data))
ratings = dataset.from_tensor_slices(dict(rat))

Charity = dataset.from_tensor_slices(dict(uniq))

ratings = ratings.map(lambda x: {
"userID": x["userID"],
"charityID": x["charityID"]
})

Charity = Charity.map(lambda x: x["charityID"])
ratings.take(1)

UserID_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
UserID_vocabulary.adapt(ratings.map(lambda x: x["userID"]))

Charity_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
Charity_vocabulary.adapt(Charity)

#Define a model
#We can define a TFRS model by inheriting from tfrs.Model and implementing the compute_loss method:
class CharityRecModel(tfrs.Model):
    def __init__(self, UserModel: tf.keras.Model, CharityModel: tf.keras.Model, task: tfrs.tasks.Retrieval):
        super().__init__()

        # Set up Customer and SalesItem representations.
        self.UserModel = UserModel
        self.CharityModel = CharityModel

        # Set up a retrieval task.
        self.task = task
    
    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        # Define how the loss is computed.
        UserEmbeddings = self.UserModel(features["userID"])
        CharityEmbeddings = self.CharityModel(features["charityID"])
        
        return self.task(UserEmbeddings, CharityEmbeddings)

In [None]:
UserModel = tf.keras.Sequential([
    UserID_vocabulary,
    tf.keras.layers.Embedding(UserID_vocabulary.vocabulary_size(), 64)
])

CharityModel = tf.keras.Sequential([
    Charity_vocabulary,
    tf.keras.layers.Embedding(Charity_vocabulary.vocabulary_size(), 64)
])

task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    Charity.batch(4).map(CharityModel))
)

In [None]:
model = CharityRecModel(UserModel, CharityModel, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Train for 3 epochs.
model.fit(ratings.batch(4), epochs=3)
# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.UserModel)

index.index_from_dataset(Charity.batch(4).map(lambda charity: (charity, model.CharityModel(charity))))
users = data.userID.unique().tolist()

fcst = pd.DataFrame()

for x in users:
    _, Charity = index(np.array([x]))
    fcst = pd.concat((fcst, pd.DataFrame(Charity[0, :10].numpy()).transpose()))
    
fcst['User'] = users

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [None]:
fcst

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,User
0,b'b79lWABizCzlu2gUYvVsUCSApCD4',b'JwY0MIYrcifUDJj35tO5JKedB8Nt',b'3MWg9xpBnDeB1GPauOf57hl90SIy',b'P0UKy85iinfvfJZJPMB4R5G024ID',b'dnQuAUt0lpGpxZayXlwxj2Vw0lF2',b'i9F9dIW8V0suu1JN9w2xVOat04yn',b'EBMREkEGMiovkibTjYMHAjO2Mcny',b'nyhK1EZt98jO1Adr7Pb7ptEWrBRK',b'cluster1',b'12367',qk0Q5ZmS3au5RkcPuyotTjtg3G0b
0,b'tjKvwntCUTmcxOsE3hAYsOc4pxMk',b'9TPmjJvlASIKCgZ9NHBZP1jZEP3S',b'5gNFY4JG86pVLGu6X0vAQJUuJYHc',b'bWmuiL1Np1nat3k3BRFsrcRfBxIx',b'S1sRftyyHivjg41mrPrEAQVutge9',b'D3TqTvewWR6iStUaM8G7AuyGxvIi',b'123',b'SrYqL0G0KEht16Q6iUqicpy2oLWI',b'Charity',b'Ti1euxITu7QVVpBdiUElA3d5eqoz',a1aX9SLLbe3Fksvqtv7YhRoqtaw9
0,b'9TPmjJvlASIKCgZ9NHBZP1jZEP3S',b'5gNFY4JG86pVLGu6X0vAQJUuJYHc',b'tjKvwntCUTmcxOsE3hAYsOc4pxMk',b'i9F9dIW8V0suu1JN9w2xVOat04yn',b'D3TqTvewWR6iStUaM8G7AuyGxvIi',b'Ti1euxITu7QVVpBdiUElA3d5eqoz',b'P0UKy85iinfvfJZJPMB4R5G024ID',b'123',b'Charity',b'12367',unFb4dqPjHNnUoGJhC9u7gBdb9YW
0,b'b79lWABizCzlu2gUYvVsUCSApCD4',b'P0UKy85iinfvfJZJPMB4R5G024ID',b'dnQuAUt0lpGpxZayXlwxj2Vw0lF2',b'nyhK1EZt98jO1Adr7Pb7ptEWrBRK',b'EBMREkEGMiovkibTjYMHAjO2Mcny',b'JwY0MIYrcifUDJj35tO5JKedB8Nt',b'3MWg9xpBnDeB1GPauOf57hl90SIy',b'Ti1euxITu7QVVpBdiUElA3d5eqoz',b'S1sRftyyHivjg41mrPrEAQVutge9',b'Four',7dlnsoBWpyftpEJ6gQBkm86oxmI7
0,b'5gNFY4JG86pVLGu6X0vAQJUuJYHc',b'bWmuiL1Np1nat3k3BRFsrcRfBxIx',b'D3TqTvewWR6iStUaM8G7AuyGxvIi',b'SrYqL0G0KEht16Q6iUqicpy2oLWI',b'9TPmjJvlASIKCgZ9NHBZP1jZEP3S',b'tjKvwntCUTmcxOsE3hAYsOc4pxMk',b'cluster1',b'12367',b'123',b'i9F9dIW8V0suu1JN9w2xVOat04yn',pL9XlHZqQpBNF1BrLfT7SAfBchQm
0,b'P0UKy85iinfvfJZJPMB4R5G024ID',b'D3TqTvewWR6iStUaM8G7AuyGxvIi',b'b79lWABizCzlu2gUYvVsUCSApCD4',b'Ti1euxITu7QVVpBdiUElA3d5eqoz',b'JwY0MIYrcifUDJj35tO5JKedB8Nt',b'3MWg9xpBnDeB1GPauOf57hl90SIy',b'Charity',b'123',b'Four',b'cluster1',hAdXSNr0fVNOavrzh2SKW7geqxGT
0,b'dnQuAUt0lpGpxZayXlwxj2Vw0lF2',b'EBMREkEGMiovkibTjYMHAjO2Mcny',b'nyhK1EZt98jO1Adr7Pb7ptEWrBRK',b'i9F9dIW8V0suu1JN9w2xVOat04yn',b'b79lWABizCzlu2gUYvVsUCSApCD4',b'S1sRftyyHivjg41mrPrEAQVutge9',b'3MWg9xpBnDeB1GPauOf57hl90SIy',b'Four',b'JwY0MIYrcifUDJj35tO5JKedB8Nt',b'123',2jdmhYp8SmSIOpAhniXIzuYQ6uwZ
0,b'D3TqTvewWR6iStUaM8G7AuyGxvIi',b'P0UKy85iinfvfJZJPMB4R5G024ID',b'5gNFY4JG86pVLGu6X0vAQJUuJYHc',b'9TPmjJvlASIKCgZ9NHBZP1jZEP3S',b'Ti1euxITu7QVVpBdiUElA3d5eqoz',b'bWmuiL1Np1nat3k3BRFsrcRfBxIx',b'SrYqL0G0KEht16Q6iUqicpy2oLWI',b'tjKvwntCUTmcxOsE3hAYsOc4pxMk',b'3MWg9xpBnDeB1GPauOf57hl90SIy',b'Charity',rDlVTz5Ec0tXI4WlNVz6Nh94jmvh
0,b'b79lWABizCzlu2gUYvVsUCSApCD4',b'EBMREkEGMiovkibTjYMHAjO2Mcny',b'dnQuAUt0lpGpxZayXlwxj2Vw0lF2',b'Ti1euxITu7QVVpBdiUElA3d5eqoz',b'nyhK1EZt98jO1Adr7Pb7ptEWrBRK',b'i9F9dIW8V0suu1JN9w2xVOat04yn',b'12367',b'Four',b'JwY0MIYrcifUDJj35tO5JKedB8Nt',b'123',0PyuWHnQPqkbVPi5oRUnuRQQ8MXY
0,b'SrYqL0G0KEht16Q6iUqicpy2oLWI',b'5gNFY4JG86pVLGu6X0vAQJUuJYHc',b'bWmuiL1Np1nat3k3BRFsrcRfBxIx',b'EBMREkEGMiovkibTjYMHAjO2Mcny',b'tjKvwntCUTmcxOsE3hAYsOc4pxMk',b'dnQuAUt0lpGpxZayXlwxj2Vw0lF2',b'Four',b'12367',b'3MWg9xpBnDeB1GPauOf57hl90SIy',b'D3TqTvewWR6iStUaM8G7AuyGxvIi',GoOo3nSmuBiT3hiPoBANg569IVTe


In [None]:
inputs = ['a1aX9SLLbe3Fksvqtv7YhRoqtaw9']

In [None]:
import tempfile

In [None]:
with tempfile.TemporaryDirectory() as tmp:
    path = os.path.join(tmp, "model")
    tf.saved_model.save(index, path)
    loaded = tf.saved_model.load(path)
    scores, titles = loaded(inputs)
    print(f"Recommendations: {titles[0][:3]}")
    print(loaded)



Recommendations: [b'tjKvwntCUTmcxOsE3hAYsOc4pxMk' b'9TPmjJvlASIKCgZ9NHBZP1jZEP3S'
 b'5gNFY4JG86pVLGu6X0vAQJUuJYHc']
<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject object at 0x7ff5f6251c10>


In [None]:
tf.saved_model.save(index, "model")



In [None]:
converter = tf.lite.TFLiteConverter.from_saved_model("model") 
tflite_model = converter.convert()

In [None]:
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

In [None]:
import firebase_admin
from firebase_admin import ml
from firebase_admin import credentials

projectID = 'donapp-d2378'

firebase_admin.initialize_app(options={'projectId': projectID, 
             'storageBucket': projectID + '.appspot.com' })

<firebase_admin.App at 0x7ff5f0088430>

In [None]:
source = ml.TFLiteGCSModelSource.from_tflite_model_file('model.tflite')

# Create the model object
tflite_format = ml.TFLiteFormat(model_source=source)
model = ml.Model(
    display_name="recommender_model",  # This is the name you use from your app to load the model.
    model_format=tflite_format)

# Add the model to your Firebase project and publish it
new_model = ml.update_model(model)
ml.publish_model(new_model.model_id)

InvalidArgumentError: ignored