This repository contains a simple Streamlit application for embedding and analyzing user thoughts using PostgreSQL and Principal Component Analysis (PCA).
- Create a Google Cloud Account
- Create an AlloyDB Cluster
- Create an AlloyDB instance
Enable below APIs:
- AlloyDB API
- Compute Engine API
- Cloud Resource Manager API
- Service Networking API
CREATE TABLE "public".thought_embedding ( thought TEXT, thought_id SERIAL PRIMARY KEY, entry_date TIMESTAMP );
CREATE EXTENSION IF NOT EXISTS vector CREATE EXTENSION IF NOT EXISTS google_ml_integration
ALTER TABLE thought_embedding ADD COLUMN embedding vector GENERATED ALWAYS AS (embedding('textembedding-gecko@001',thought)) STORED;
-
getconn(): Establishes connection to PostgreSQL using Google Cloud AlloyDB. -
inserttodb(thought: str): Inserts a user thought into the PostgreSQL database. -
getembedding(): Retrieves all stored thought embeddings from the database. -
similar_thoughts(thought): Finds top 3 similar thoughts based on vector similarity.
- User Interface: Collects and records thoughts, displaying similar entries and PCA plot.
compute_pca(): Performs PCA on embeddings, returning principal components for visualization.
- Python 3.8+
- PostgreSQL
- Google Cloud AlloyDB
- Required Python packages:
pip install sqlalchemy streamlit pandas matplotlib scikit-learn pg8000