<a href="https://colab.research.google.com/github/Ajeeetsingh/financial-recommendation-system-/blob/main/recommendation_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vector DB Recommendation System
## Use Case: Finance-Focused Investment Recommendations
- **Objective**: Recommend financial products (e.g., mutual funds, stocks, bonds) based on user preferences (e.g., "low-risk investments").
- **Input**: User text query (e.g., "I want low-risk mutual funds") and optional filters (e.g., risk level, return range).
- **Output**: Top-5 financial products ranked by semantic similarity to the query.
- **Dataset**: Synthetic dataset of ~100 financial products (ID, Name, Description, Risk, Return).
- **Vector DB**: Chroma (local, open-source).

## Vector DB Choice
- **Selected**: Chroma (local, free, open-source).
- **Rationale**: Easy to set up, runs locally in Colab, no API key needed. Pinecone (cloud) considered but avoided for simplicity in a 1-week project.

## Create a Synthetic Dataset

In [1]:
import pandas as pd
import random

# Define possible values
product_types = ["Mutual Fund", "Stock", "Bond", "ETF"]
risk_levels = ["Low", "Medium", "High"]
sectors = ["Technology", "Healthcare", "Government", "Energy", "Consumer Goods", "Sustainable", "Finance", "Real Estate"]
returns = [round(random.uniform(2.0, 15.0), 1) for _ in range(500)]  # Random returns 2–15%
expense_ratios = [round(random.uniform(0.1, 2.0), 2) for _ in range(500)]  # Random expense ratios 0.1–2%

# Generate synthetic data
data = []
for i in range(500):
    product_type = random.choice(product_types)
    sector = random.choice(sectors)
    risk = random.choice(risk_levels)
    ret = returns[i]
    exp_ratio = expense_ratios[i]
    name = f"{sector} {product_type} {i+1}"
    description = f"A {risk.lower()}-risk {product_type.lower()} focusing on {sector.lower()} with expected returns of {ret}% and expense ratio of {exp_ratio}%."
    data.append([i+1, name, description, risk, ret, exp_ratio])

# Create DataFrame
df = pd.DataFrame(data, columns=["ID", "Name", "Description", "Risk", "Return", "Expense Ratio"])

# Save to CSV
df.to_csv("financial_products.csv", index=False)
print("Dataset created: financial_products.csv")
print(df.head())

Dataset created: financial_products.csv
   ID                       Name  \
0   1          Healthcare Bond 1   
1   2          Technology Bond 2   
2   3   Technology Mutual Fund 3   
3   4  Real Estate Mutual Fund 4   
4   5      Finance Mutual Fund 5   

                                         Description    Risk  Return  \
0  A medium-risk bond focusing on healthcare with...  Medium    12.8   
1  A medium-risk bond focusing on technology with...  Medium     5.6   
2  A high-risk mutual fund focusing on technology...    High    12.6   
3  A low-risk mutual fund focusing on real estate...     Low     3.1   
4  A medium-risk mutual fund focusing on finance ...  Medium    13.8   

   Expense Ratio  
0           1.02  
1           1.70  
2           0.75  
3           0.10  
4           0.17  


Set Up Colab Environment

In [2]:
!pip install sentence-transformers chromadb streamlit pyngrok pandas

Collecting chromadb
  Downloading chromadb-1.0.10-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting streamlit
  Downloading streamlit-1.45.1-py3-none-any.whl.metadata (8.9 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.8-py3-none-any.whl.metadata (10 kB)
Collecting fastapi==0.115.9 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-4.2.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.33.1-py3-none-any.whl.metadata (1.6 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)

Test Imports

In [3]:
import sentence_transformers
import chromadb
import streamlit
import pyngrok
import pandas as pd

print("All libraries imported successfully!")

All libraries imported successfully!


Load the Dataset

In [4]:
import pandas as pd

# Load dataset
df = pd.read_csv("financial_products.csv")

# Verify data
print("Dataset Info:")
print(df.info())
print("\nFirst 5 rows:")
print(df.head())
print("\nMissing values:")
print(df.isnull().sum())

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             500 non-null    int64  
 1   Name           500 non-null    object 
 2   Description    500 non-null    object 
 3   Risk           500 non-null    object 
 4   Return         500 non-null    float64
 5   Expense Ratio  500 non-null    float64
dtypes: float64(2), int64(1), object(3)
memory usage: 23.6+ KB
None

First 5 rows:
   ID                       Name  \
0   1          Healthcare Bond 1   
1   2          Technology Bond 2   
2   3   Technology Mutual Fund 3   
3   4  Real Estate Mutual Fund 4   
4   5      Finance Mutual Fund 5   

                                         Description    Risk  Return  \
0  A medium-risk bond focusing on healthcare with...  Medium    12.8   
1  A medium-risk bond focusing on technology with...  Medium     5.6   
2  A high-risk

Generate Embeddings

In [5]:
from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for descriptions
descriptions = df['Description'].tolist()
embeddings = model.encode(descriptions, show_progress_bar=True)

# Verify embeddings
print(f"Generated {len(embeddings)} embeddings with shape: {embeddings.shape}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/16 [00:00<?, ?it/s]

Generated 500 embeddings with shape: (500, 384)


Store Embeddings in Chroma

In [6]:
import chromadb
from chromadb.config import Settings

# Initialize Chroma client with persistence
client = chromadb.PersistentClient(path="./chroma_db")

# Create or get collection
collection_name = "financial_products"
try:
    collection = client.get_collection(collection_name)
    print(f"Using existing collection: {collection_name}")
except:
    collection = client.create_collection(collection_name, metadata={"hnsw:space": "cosine"})
    print(f"Created new collection: {collection_name}")

# Prepare data for Chroma
documents = df['Description'].tolist()
ids = df['ID'].astype(str).tolist()  # Chroma requires string IDs
metadatas = df[['Name', 'Risk', 'Return', 'Expense Ratio']].to_dict(orient='records')

# Add embeddings to collection
collection.add(
    documents=documents,
    embeddings=embeddings.tolist(),  # Convert numpy array to list
    metadatas=metadatas,
    ids=ids
)

# Verify collection
print(f"Stored {collection.count()} items in Chroma collection")

Created new collection: financial_products
Stored 500 items in Chroma collection


Test Basic Queries

In [7]:
# Sample query
query = "low-risk mutual fund"
query_embedding = model.encode([query])[0]  # Generate embedding for query

# Query Chroma
results = collection.query(
    query_embeddings=[query_embedding.tolist()],
    n_results=5
)

# Display results
print(f"\nTop 5 recommendations for query: '{query}'")
for i, (doc, metadata, distance) in enumerate(zip(results['documents'][0], results['metadatas'][0], results['distances'][0])):
    print(f"{i+1}. {metadata['Name']} (Risk: {metadata['Risk']}, Return: {metadata['Return']}%, Expense Ratio: {metadata['Expense Ratio']}%)")
    print(f"   Description: {doc}")
    print(f"   Similarity Score: {1 - distance:.4f}\n")


Top 5 recommendations for query: 'low-risk mutual fund'
1. Finance Mutual Fund 67 (Risk: Low, Return: 3.0%, Expense Ratio: 0.78%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 3.0% and expense ratio of 0.78%.
   Similarity Score: 0.8363

2. Finance Mutual Fund 394 (Risk: Low, Return: 8.5%, Expense Ratio: 1.3%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 8.5% and expense ratio of 1.3%.
   Similarity Score: 0.8351

3. Finance Mutual Fund 482 (Risk: Low, Return: 5.3%, Expense Ratio: 1.36%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 5.3% and expense ratio of 1.36%.
   Similarity Score: 0.8313

4. Finance Mutual Fund 428 (Risk: Low, Return: 6.2%, Expense Ratio: 1.19%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 6.2% and expense ratio of 1.19%.
   Similarity Score: 0.8283

5. Sustainable Mutual Fund 82 (Risk: Low, Return: 3.8%, Expens

Prerequisites

In [8]:
import pandas as pd
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings

# Load dataset
df = pd.read_csv("financial_products.csv")

# Verify dataset
print("Dataset Info:")
print(df.info())
print("\nFirst 5 rows:")
print(df.head())

# Generate embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(df['Description'].tolist(), show_progress_bar=True)
print(f"Generated {len(embeddings)} embeddings with shape: {embeddings.shape}")

# Initialize Chroma client
client = chromadb.PersistentClient(path="./chroma_db")

# Check for existing collection or create new
collection_name = "financial_products"
try:
    collection = client.get_collection(collection_name)
    print(f"Using existing collection: {collection_name}")
except:
    print(f"Creating new collection: {collection_name}")
    collection = client.create_collection(collection_name, metadata={"hnsw:space": "cosine"})
    # Store embeddings in Chroma
    collection.add(
        documents=df['Description'].tolist(),
        embeddings=embeddings.tolist(),
        metadatas=df[['Name', 'Risk', 'Return', 'Expense Ratio']].to_dict(orient='records'),
        ids=df['ID'].astype(str).tolist()
    )
    print(f"Stored {collection.count()} items in Chroma collection")
else:
    # Verify collection has data
    if collection.count() == 0:
        print(f"Collection is empty, adding embeddings...")
        collection.add(
            documents=df['Description'].tolist(),
            embeddings=embeddings.tolist(),
            metadatas=df[['Name', 'Risk', 'Return', 'Expense Ratio']].to_dict(orient='records'),
            ids=df['ID'].astype(str).tolist()
        )
        print(f"Stored {collection.count()} items in Chroma collection")
    else:
        print(f"Collection already has {collection.count()} items")

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             500 non-null    int64  
 1   Name           500 non-null    object 
 2   Description    500 non-null    object 
 3   Risk           500 non-null    object 
 4   Return         500 non-null    float64
 5   Expense Ratio  500 non-null    float64
dtypes: float64(2), int64(1), object(3)
memory usage: 23.6+ KB
None

First 5 rows:
   ID                       Name  \
0   1          Healthcare Bond 1   
1   2          Technology Bond 2   
2   3   Technology Mutual Fund 3   
3   4  Real Estate Mutual Fund 4   
4   5      Finance Mutual Fund 5   

                                         Description    Risk  Return  \
0  A medium-risk bond focusing on healthcare with...  Medium    12.8   
1  A medium-risk bond focusing on technology with...  Medium     5.6   
2  A high-risk

Batches:   0%|          | 0/16 [00:00<?, ?it/s]

Generated 500 embeddings with shape: (500, 384)
Using existing collection: financial_products
Collection already has 500 items


Verify libraries are installed

In [9]:
!pip install sentence-transformers chromadb pandas



Ensure Prerequisites Are Ready

In [10]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [14]:
!ls /content/

chroma_db  drive  financial_products.csv  sample_data


In [15]:
!mkdir -p /content/drive/MyDrive/recommendation_system
!cp /content/financial_products.csv /content/drive/MyDrive/recommendation_system/financial_products.csv
!ls /content/drive/MyDrive/recommendation_system

financial_products.csv


In [11]:
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/recommendation_system/financial_products.csv')
print("Dataset Info:")
print(df.info())
print("\nFirst 5 rows:")
print(df.head())

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             500 non-null    int64  
 1   Name           500 non-null    object 
 2   Description    500 non-null    object 
 3   Risk           500 non-null    object 
 4   Return         500 non-null    float64
 5   Expense Ratio  500 non-null    float64
dtypes: float64(2), int64(1), object(3)
memory usage: 23.6+ KB
None

First 5 rows:
   ID                      Name  \
0   1            Finance Bond 1   
1   2  Government Mutual Fund 2   
2   3      Energy Mutual Fund 3   
3   4  Technology Mutual Fund 4   
4   5         Government Bond 5   

                                         Description    Risk  Return  \
0  A medium-risk bond focusing on finance with ex...  Medium     4.3   
1  A medium-risk mutual fund focusing on governme...  Medium     3.9   
2  A high-risk mutua

In [17]:
!cp -r /content/chroma_db /content/drive/MyDrive/recommendation_system/chroma_db
!ls /content/drive/MyDrive/recommendation_system

chroma_db  financial_products.csv


# Day 3: Core Recommendation Logic
## Task 1: Query Processing
Convert user queries to embeddings using sentence-transformers.

## Task 2: Query Vector DB
Retrieve top-10 similar products from Chroma.

## Task 3: Recommendation Logic
Filter and rank products based on risk, return, and expense ratio.

## Task 4: Test Queries
Validate with sample queries and filters.

In [12]:
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings

# Initialize model and Chroma client
model = SentenceTransformer('all-MiniLM-L6-v2')
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("financial_products")
print(f"Collection has {collection.count()} items")

# Query processing
def process_query(query):
    if not query or not isinstance(query, str):
        raise ValueError("Query must be a non-empty string")
    return model.encode([query])[0]

# Query Vector DB
def query_vector_db(query_embedding, k=10):
    results = collection.query(
        query_embeddings=[query_embedding.tolist()],
        n_results=k
    )
    return results['documents'][0], results['metadatas'][0], results['distances'][0]

# Recommendation logic (fixed)
def recommend_products(query, k=5, filters=None):
    if filters is None:
        filters = {}
    try:
        query_embedding = process_query(query)
    except ValueError as e:
        return f"Error: {str(e)}"
    docs, metadatas, distances = query_vector_db(query_embedding, k=10)
    recommendations = []
    for doc, meta, distance in zip(docs, metadatas, distances):
        # Safely check filters using .get()
        if filters.get('risk') and meta['Risk'] != filters['risk']:
            continue
        if filters.get('min_return') and float(meta['Return']) < filters['min_return']:
            continue
        if filters.get('max_expense_ratio') and float(meta['Expense Ratio']) > filters['max_expense_ratio']:
            continue
        recommendations.append({
            'Name': meta['Name'],
            'Description': doc,
            'Risk': meta['Risk'],
            'Return': float(meta['Return']),
            'Expense Ratio': float(meta['Expense Ratio']),
            'Similarity Score': 1 - distance
        })
    return sorted(recommendations, key=lambda x: x['Similarity Score'], reverse=True)[:k]

# Test queries
test_queries = [
    {"query": "low-risk mutual fund", "filters": {"risk": "Low"}},
    {"query": "high-return stocks", "filters": {"min_return": 10.0}},
    {"query": "sustainable ETF", "filters": {"max_expense_ratio": 1.0}}
]
for test in test_queries:
    print(f"\nRecommendations for query: '{test['query']}' with filters: {test['filters']}")
    recommendations = recommend_products(test['query'], k=5, filters=test['filters'])
    if isinstance(recommendations, str):
        print(recommendations)
        continue
    for i, rec in enumerate(recommendations):
        print(f"{i+1}. {rec['Name']} (Risk: {rec['Risk']}, Return: {rec['Return']}%, Expense Ratio: {rec['Expense Ratio']}%)")
        print(f"   Description: {rec['Description']}")
        print(f"   Similarity Score: {rec['Similarity Score']:.4f}\n")

Collection has 500 items

Recommendations for query: 'low-risk mutual fund' with filters: {'risk': 'Low'}
1. Finance Mutual Fund 67 (Risk: Low, Return: 3.0%, Expense Ratio: 0.78%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 3.0% and expense ratio of 0.78%.
   Similarity Score: 0.8363

2. Finance Mutual Fund 394 (Risk: Low, Return: 8.5%, Expense Ratio: 1.3%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 8.5% and expense ratio of 1.3%.
   Similarity Score: 0.8351

3. Finance Mutual Fund 482 (Risk: Low, Return: 5.3%, Expense Ratio: 1.36%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 5.3% and expense ratio of 1.36%.
   Similarity Score: 0.8313

4. Finance Mutual Fund 428 (Risk: Low, Return: 6.2%, Expense Ratio: 1.19%)
   Description: A low-risk mutual fund focusing on finance with expected returns of 6.2% and expense ratio of 1.19%.
   Similarity Score: 0.8283

5. Sustainabl

In [13]:
!ls /content/drive/MyDrive/recommendation_system

app.py	chroma_db  financial_products.csv


In [14]:
recommendations = recommend_products("low-risk sustainable funds", k=5, filters={"risk": "Low"})
for i, rec in enumerate(recommendations):
    print(f"{i+1}. {rec['Name']} (Risk: {rec['Risk']}, Return: {rec['Return']}%, Expense Ratio: {rec['Expense Ratio']}%)")
    print(f"   Description: {rec['Description']}")
    print(f"   Similarity Score: {rec['Similarity Score']:.4f}\n")

1. Sustainable Mutual Fund 82 (Risk: Low, Return: 3.8%, Expense Ratio: 1.03%)
   Description: A low-risk mutual fund focusing on sustainable with expected returns of 3.8% and expense ratio of 1.03%.
   Similarity Score: 0.7716

2. Sustainable Mutual Fund 193 (Risk: Low, Return: 3.3%, Expense Ratio: 1.22%)
   Description: A low-risk mutual fund focusing on sustainable with expected returns of 3.3% and expense ratio of 1.22%.
   Similarity Score: 0.7669

3. Sustainable Mutual Fund 485 (Risk: Low, Return: 12.9%, Expense Ratio: 1.75%)
   Description: A low-risk mutual fund focusing on sustainable with expected returns of 12.9% and expense ratio of 1.75%.
   Similarity Score: 0.7576

4. Sustainable Mutual Fund 7 (Risk: Low, Return: 7.2%, Expense Ratio: 0.91%)
   Description: A low-risk mutual fund focusing on sustainable with expected returns of 7.2% and expense ratio of 0.91%.
   Similarity Score: 0.7551

5. Sustainable Mutual Fund 487 (Risk: Low, Return: 13.0%, Expense Ratio: 0.84%)
   De

In [15]:
!pip install streamlit pyngrok



In [16]:
# Install required libraries
!pip install streamlit pyngrok sentence-transformers chromadb pandas



Streamlit app

# Day 4: Streamlit UI
## Task 1: Build Streamlit App
Created `app.py` with query input, risk dropdown, and sliders for min return/max expense ratio.

## Task 2: Integrate Recommendation Logic
Integrated `recommend_products` to process queries and filters.

## Task 3: Run with ngrok
Launched app via `ngrok` at [insert ngrok URL].

## Task 4: Test UI
Tested queries: "low-risk mutual fund", "high-return stocks", "sustainable ETF", "low-risk sustainable funds".
Verified results match Day 3 outputs.
Tested edge cases: empty query, strict filters.
Screenshots saved in `/content/drive/MyDrive/recommendation_system/screenshots`.

In [None]:
# Save Streamlit app to a file
with open("app.py", "w") as f:
    f.write('''
import streamlit as st
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
import pandas as pd

# Initialize model and Chroma client
@st.cache_resource
def initialize_model_and_db():
    model = SentenceTransformer('all-MiniLM-L6-v2')
    client = chromadb.PersistentClient(path="./chroma_db")
    collection = client.get_collection("financial_products")
    return model, collection

model, collection = initialize_model_and_db()

# Recommendation functions
def process_query(query):
    if not query or not isinstance(query, str):
        return None, "Error: Query must be a non-empty string."
    try:
        query_embedding = model.encode([query])[0]
        return query_embedding, None
    except Exception as e:
        return None, f"Error encoding query: {str(e)}"

def query_vector_db(query_embedding, k=10):
    try:
        results = collection.query(
            query_embeddings=[query_embedding.tolist()],
            n_results=k
        )
        return results['documents'][0], results['metadatas'][0], results['distances'][0], None
    except Exception as e:
        return None, None, None, f"Error querying database: {str(e)}"

def recommend_products(query, k=5, filters=None):
    if filters is None:
        filters = {}
    query_embedding, error = process_query(query)
    if error:
        return error
    docs, metadatas, distances, error = query_vector_db(query_embedding, k=10)
    if error:
        return error
    recommendations = []
    for doc, meta, distance in zip(docs, metadatas, distances):
        if filters.get('risk') and meta['Risk'] != filters['risk']:
            continue
        if filters.get('min_return') and float(meta['Return']) < filters['min_return']:
            continue
        if filters.get('max_expense_ratio') and float(meta['Expense Ratio']) > filters['max_expense_ratio']:
            continue
        recommendations.append({
            'Name': meta['Name'],
            'Description': doc,
            'Risk': meta['Risk'],
            'Return': float(meta['Return']),
            'Expense Ratio': float(meta['Expense Ratio']),
            'Similarity Score': 1 - distance
        })
    return sorted(recommendations, key=lambda x: x['Similarity Score'], reverse=True)[:k]

# Streamlit UI
def main():
    st.title("Financial Product Recommender")
    st.write("Enter your investment preferences and customize filters to get personalized financial product recommendations.")

    # Input fields
    prompt = st.text_input("Investment Preference (e.g., 'low-risk mutual fund')", value="low-risk mutual fund")
    risk = st.selectbox("Risk Level", ["Any", "Low", "Medium", "High"], index=1)
    min_return = st.slider("Minimum Return (%)", min_value=0.0, max_value=15.0, value=0.0, step=0.1)
    max_expense_ratio = st.slider("Maximum Expense Ratio (%)", min_value=0.0, max_value=2.0, value=2.0, step=0.01)

    # Process filters
    filters = {}
    if risk != "Any":
        filters['risk'] = risk
    if min_return > 0.0:
        filters['min_return'] = min_return
    if max_expense_ratio < 2.0:
        filters['max_expense_ratio'] = max_expense_ratio

    # Get recommendations
    if st.button("Get Recommendations"):
        if not prompt:
            st.error("Please enter an investment preference.")
            return
        with st.spinner("Generating recommendations..."):
            recommendations = recommend_products(prompt, k=5, filters=filters)
            if isinstance(recommendations, str):
                st.error(recommendations)
            else:
                st.subheader("Recommendations")
                for i, rec in enumerate(recommendations):
                    st.write(f"**{i+1}. {rec['Name']}**")
                    st.write(f"- Risk: {rec['Risk']}")
                    st.write(f"- Return: {rec['Return']}%")
                    st.write(f"- Expense Ratio: {rec['Expense Ratio']}%")
                    st.write(f"- Description: {rec['Description']}")
                    st.write(f"- Similarity Score: {rec['Similarity Score']:.4f}")
                    st.write("---")

if __name__ == "__main__":
    main()
''')

# Run Streamlit and ngrok with clean shutdown
import subprocess
import signal
import os
from pyngrok import ngrok

# Set ngrok authtoken (replace with your actual authtoken)
!ngrok authtoken 2wUFgUnZUHXkXPh70TnRjtiHoMg_5qpQDaz7krTTXx5Htf8q2  # Replace with your token

# Ensure files are in place
!cp /content/drive/MyDrive/recommendation_system/financial_products.csv /content/
!cp -r /content/drive/MyDrive/recommendation_system/chroma_db /content/chroma_db

# Start ngrok tunnel
public_url = ngrok.connect(8501)
print(f"Streamlit app running at: {public_url}")

# Start Streamlit server
streamlit_cmd = ["streamlit", "run", "app.py", "--server.port", "8501", "--server.fileWatcherType", "none"]
streamlit_proc = subprocess.Popen(streamlit_cmd)

# Handle shutdown
def signal_handler(sig, frame):
    print("Shutting down Streamlit and ngrok...")
    streamlit_proc.terminate()
    ngrok.kill()
    print("Shutdown complete.")
    os._exit(0)

signal.signal(signal.SIGINT, signal_handler)

# Keep the cell running
try:
    streamlit_proc.wait()
except KeyboardInterrupt:
    signal_handler(None, None)

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml
Streamlit app running at: NgrokTunnel: "https://d8a5-34-16-240-71.ngrok-free.app" -> "http://localhost:8501"


In [1]:
!cp /content/app.py /content/drive/MyDrive/recommendation_system/app.py