<a href="https://colab.research.google.com/github/Simeon-Dhinakaran/GenAI/blob/main/vector-databases/recommendation_of_articles_using_chromadb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install chromadb sentence-transformers



In [None]:
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer

  from tqdm.autonotebook import tqdm, trange


In [None]:
# Step 1: Initialize ChromaDB Client - No need for Settings anymore
client = chromadb.Client() # This is the updated way to initialize

# Step 2: Set up a ChromaDB Collection
collection_name = "news_articles"
if collection_name not in client.list_collections():
    collection = client.create_collection(name=collection_name)
else:
    collection = client.get_collection(name=collection_name)

# Step 3: Load Pre-trained Embedding Model
model = SentenceTransformer("all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
def add_articles():
    """Function to add news articles to the database."""
    print("Enter news articles (type 'done' to finish):")
    articles = []
    while True:
        article = input("> ")
        if article.lower() == "done":
            break
        articles.append(article)

    if articles:
        embeddings = model.encode(articles).tolist()
        ids = [f"article_{i}" for i in range(len(articles))]
        collection.add(documents=articles, embeddings=embeddings, ids=ids)
        print(f"{len(articles)} articles added to the collection.")
    else:
        print("No articles were added.")

def recommend_articles():
    """Function to recommend articles based on user preferences."""
    preference = input("Describe the type of news you're interested in: ")
    preference_embedding = model.encode([preference]).tolist()[0]

    # Retrieve the top 5 most relevant articles
    results = collection.query(
        query_embeddings=[preference_embedding],
        n_results=5
    )

    print("\nRecommended Articles:")
    for i, (doc, score) in enumerate(zip(results["documents"][0], results["distances"][0])):
        print(f"{i + 1}. {doc} | Relevance Score: {1 - score:.4f}")
    print()


In [None]:

def main():
    """Main function for the news recommendation system."""
    print("Welcome to the Personalized News Recommendation System!")
    while True:
        print("\nOptions:")
        print("1. Add news articles")
        print("2. Get news recommendations")
        print("3. Exit")
        choice = input("Choose an option: ")

        if choice == "1":
            add_articles()
        elif choice == "2":
            recommend_articles()
        elif choice == "3":
            print("Goodbye!")
            break
        else:
            print("Invalid option. Please try again.")

# Run the application
if __name__ == "__main__":
    main()


Welcome to the Personalized News Recommendation System!

Options:
1. Add news articles
2. Get news recommendations
3. Exit
Choose an option: 1
Enter news articles (type 'done' to finish):
> SpaceX launches a new batch of Starlink satellites.
> 1
> The stock market hits a record high after tech earnings.
> 1
> Scientists discover a new exoplanet in the habitable zone.
> 1
> Climate change report highlights the urgent need for action.
> 1
> The stock market hits a record high after tech earnings.
> done
9 articles added to the collection.

Options:
1. Add news articles
2. Get news recommendations
3. Exit
Choose an option: 2
Describe the type of news you're interested in: weather

Recommended Articles:
1. 1 | Relevance Score: -0.5884
2. 1 | Relevance Score: -0.5884
3. 1 | Relevance Score: -0.5884
4. 1 | Relevance Score: -0.5884
5. Climate change report highlights the urgent need for action. | Relevance Score: -0.6178


Options:
1. Add news articles
2. Get news recommendations
3. Exit
Choo

Welcome to the Personalized News Recommendation System!

Options:
1. Add news articles
2. Get news recommendations
3. Exit
Choose an option: 1
Enter news articles (type 'done' to finish):
> SpaceX launches a new batch of Starlink satellites.
> The stock market hits a record high after tech earnings.
> Scientists discover a new exoplanet in the habitable zone.
> Climate change report highlights the urgent need for action.
> done
4 articles added to the collection.

Options:
1. Add news articles
2. Get news recommendations
3. Exit
Choose an option: 2
Describe the type of news you're interested in: space exploration

Recommended Articles:
1. SpaceX launches a new batch of Starlink satellites. | Relevance Score: 0.9123
2. Scientists discover a new exoplanet in the habitable zone. | Relevance Score: 0.8567
3. Climate change report highlights the urgent need for action. | Relevance Score: 0.7034
4. The stock market hits a record high after tech earnings. | Relevance Score: 0.6542