Stay informed effortlessly with this intelligent news platform that transforms how you consume Australian news. By automatically collecting, categorizing, and prioritizing articles from multiple outlets, it cuts through the noise to deliver what matters. Ask natural questions about current events and get accurate, contextual responses powered by advanced RAG technology. No more endless scrolling or information overload - just clean, categorized news and a smart assistant to help you make sense of it all.
- Automated News Collection: Uses make.com to automatically gather and preprocess Australian news articles daily from multiple australian news outlets
- Automated News Classification: Categorizes news articles into topics (sports, finance, politics, lifestyle, music)
- Content Clustering: Detects and groups similar news stories to reduce redundancy
- Highlights Extraction: Identifies the most important news stories based on priority scoring
- RAG-Powered Chatbot: Ask questions about current news using Retrieval-Augmented Generation
- Modern React Frontend: Clean, responsive UI built with React, TypeScript, and Tailwind CSS
- Flask Backend API: Robust Python backend for all AI processing and data management
- Docker Support: Easy deployment with Docker Compose
- Frontend: React, TypeScript, Tailwind CSS
- Backend: Flask, Python
- AI/ML: OpenAI API, LangChain, ChromaDB
- Automation: make.com workflows
- Deployment: Docker, Docker Compose
The platform processes news data through several stages:
- Data Collection: Automated gathering of Australian news articles daily using make.com, which:
- Collects articles from major Australian news outlets
- Performs initial preprocessing and data cleaning
- Exports to Google Sheets, which is then converted to CSV format
- Classification: Articles are categorized into predefined topics
- Clustering: Similar content is grouped to reduce redundancy
- Highlight Extraction: Important stories are identified based on relevance and priority
- Indexing: Content is vectorized and stored for retrieval by the chatbot
The news aggregation workflow is automated using make.com (formerly Integromat), which:
- Schedules daily news collection from multiple Australian news sources
- Extracts article titles, content, publication dates, and source information
- Implements custom filters to remove duplicate articles and irrelevant content
- Normalizes data formats (dates, text encoding, etc.)
- Pushes clean data to Google Sheets automatically
The application performs several preprocessing steps on the collected news data:
- Text Cleaning: Removes HTML tags, special characters, and formatting issues
- Date Standardization: Converts various date formats to a consistent ISO format
- Content Deduplication: Identifies and removes duplicate articles using similarity metrics
- Missing Data Handling: Implements strategies for handling missing fields in articles
- Content Truncation: Ensures article content is within appropriate length limits for processing
After preprocessing, the application applies several data analysis techniques:
- NLP-based Classification: Uses natural language processing to categorize articles by topic
- Semantic Clustering: Groups similar articles based on content similarity
- Importance Scoring: Calculates priority scores based on recency, source credibility, and content relevance
βββ app.py # Main Flask application
βββ config.py # Configuration settings
βββ datasets/ # News datasets (CSV files)
βββ rag/ # RAG implementation modules
β βββ categorizer.py # News classification
β βββ clustering.py # Content clustering
β βββ highlights.py # Highlights extraction
β βββ utils.py # Utility functions
β βββ vector_store.py # Vector database management
βββ src/ # React frontend
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile.backend # Backend Dockerfile
βββ Dockerfile.frontend # Frontend Dockerfile
βββ nginx.conf # Nginx configuration for the frontend
- Python 3.10+ (for local development)
- Node.js 18+ (for local frontend development)
- Docker and Docker Compose (for containerized deployment)
- OpenAI API Key
- Clone the repository
git clone https://github.com/mr-jestin-roy/News-Aggregator-with-Chatbot.git
cd News-Aggregator-with-Chatbot
- Create a .env file with your OpenAI API key
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
echo "PORT=8000" >> .env
echo "FLASK_DEBUG=False" >> .env
- Build and start the containers
docker compose up -d
- Access the application
- Frontend: http://localhost
- Backend API: http://localhost:8000/api
- Clone the repository
git clone https://github.com/mr-jestin-roy/News-Aggregator-with-Chatbot.git
cd News-Aggregator-with-Chatbot
- Set up the backend
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create a .env file with your OpenAI API key
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
- Run the backend
python app.py
- Set up the frontend
cd src
npm install
- Run the frontend
npm run dev
- Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000/api
Navigate to different categories using the navigation bar:
- Sports: Latest sports news and updates
- Finance: Business and economic news
- Politics: Political developments
- Lifestyle: Health, travel, and culture
- Music: Music industry news and events
- Click on the chat icon in the bottom right corner
- Ask questions about current news, such as:
- "What's happening in politics today?"
- "Tell me about recent sports events"
- "What are the latest financial developments?"
- "Are there any major music events happening?"
- News readers seeking a categorized, streamlined news experience
- Researchers wanting quick summaries of news trends
- Users looking to ask natural language questions about current events
- Media analysts tracking coverage across different topics
The application uses CSV files stored in the datasets/
directory:
Aggregated News Dataset - Sheet1.csv
: Original news dataclassified_articles.csv
: Processed news with classifications (generated automatically)daily_highlights.csv
: Extracted highlights for the chatbot (generated automatically)
# Build the containers
docker compose build
# Start the containers
docker compose up -d
# View logs
docker compose logs -f
# Stop the containers
docker compose down
# Remove containers and volumes
docker compose down -v
This application requires an OpenAI API key to power the RAG chatbot functionality. You need to:
- Obtain an API key from OpenAI's platform
- Add it to your
.env
file asOPENAI_API_KEY=your_key_here
- Never commit your API key to version control!
- Backend connection issues: Verify that both containers are running with
docker compose ps
- Missing data: Ensure your dataset files exist in the
datasets/
directory - API key errors: Check that your OpenAI API key is correctly set in the
.env
file - Container restarts: View logs with
docker compose logs backend
to identify issues
- This project was built using OpenAI's APIs
- Special thanks to the open-source libraries that made this possible