Say Better-ML Repo

Say Better

Abstract

In this repository, we created a word card recommendation system using the text bison (PaLM2) model and embeddings from SBERT, a pre-trained model.

The goal is to make it easier for educators to find words that fit the situation as educators are recommended word cards that fit the situation.

File-Tree

📦Say_Better_ML
 ┣ 📂image
 ┃ ┣ 📜Say-Better_logo1.png
 ┃ ┣ 📜Say_Better-System-Architecture.drawio.png
 ┃ ┣ 📜say_better_embedding_graph2d.png
 ┃ ┗ 📜say_better_embedding_graph3d.png
 ┣ 📂recommender
 ┃ ┣ 📜create_relate_word.py
 ┃ ┗ 📜recommend_word_card.py
 ┣ 📜KAAC_basic.csv
 ┣ 📜main.py
 ┣ 📜README.md
 ┣ 📜requirements.txt
 ┣ 📜your_key.json
 ┣ 📜word_card_embed.npy
 ┗ 📂preprocessing
   ┗ 📜sentence_transformer_word_embedding.ipynb

System Architecture

Preprocessing

Since the word card had a lot of StopWords and no spaces, the pre-processing process removed the StopWord and put spaces using the bison (text) model.
I vectorized word cards using SBERT

Operation

The user enters the situation.
The Cloud Function receives input and communicates the situation to Vertex AI.
Vertex AI returns 10 keywords to Cloud Function.
We calculate the cosine similarity of keywords vectorized by Cloud Funciton with 543 word card vectors.
Add to the list the top three word cards with the highest cosine similarity per keyword.
Cloud Function returns a total of 30 word cards.

The codebase is developed with Python 3.10.12. After creating an environment install the requirements as follows:

pip install -r requirements.txt

Pre-Trained Models

We used a hugging face pre-learning model, kykim/vert-kor-base. Loading the pre-trained models is as easy as running the following code piece:

from transformers import BertTokenizer, TFBertModel

model_id = 'kykim/bert-kor-base'
tokenizer = BertTokenizer.from_pretrained(model_id)
model = TFBertModel.from_pretrained(model_id)

Model-Embedding

This graph shows the embedding vector values of the word card.

Click on the image to zoom in.

It can be seen that word cards with similar meanings are gathered together.

Vertex API_<model>_<dataset>

The text bison (PaLM2) model was used for keyword extraction.

The kykim/vert-kor-base model was used for sentence similarity analysis.

The word cards used were brought by KAAC with copyright permission, and 2000 word cards were used.

References

Link

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Google Cloud Function Overview

Vertex AI Overview

Team github pages

SayBetter-TeamDoc:

(1) https://github.com/Say-Better/Team-Docs

SayBetter-Server:

(1) https://github.com/Say-Better/Service-Server

SayBetter-Front:

(1) https://github.com/Say-Better/Android

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Say Better-ML Repo

Say Better

Abstract

File-Tree

System Architecture

Pre-Trained Models

Model-Embedding

Vertex API_<model>_<dataset>

References

Link

Team github pages

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
image		image
preprocessing		preprocessing
recommender		recommender
KAAC_basic.csv		KAAC_basic.csv
README.md		README.md
Say_Better_ML.code-workspace		Say_Better_ML.code-workspace
main.py		main.py
requirements.txt		requirements.txt
word_card_embed.npy		word_card_embed.npy
your_key.json		your_key.json

Say-Better/ML

Folders and files

Latest commit

History

Repository files navigation

Say Better-ML Repo

Say Better

Abstract

File-Tree

System Architecture

Pre-Trained Models

Model-Embedding

Vertex API_<model>_<dataset>

References

Link

Team github pages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages