Trendwork is a data-driven platform designed to aggregate, process, and analyze job listings from multiple job boards. It provides a centralized view of market trends, skill demands, and salary insights by leveraging modern data engineering practices and Artificial Intelligence.
The primary goal of Trendwork is to automate the extraction of job market data and transform it into actionable intelligence. By using advanced NLP and geocoding, it identifies emerging skills and geographic hotspots for various roles.
The platform follows a Medallion Architecture (Bronze, Silver, Gold) implemented on Google Cloud Platform to ensure data quality and lineage.
- Scrapers running on Cloud Run extract raw JSON data from job boards.
- Data is stored as immutable objects in Google Cloud Storage.
- Cloud Scheduler triggers automated scraping cycles.
- A Cloud Function (Processor) is triggered by new files in Cloud Storage.
- Performs data cleaning, normalization, and deduplication.
- Loads structured data into BigQuery silver tables.
- Implements fail-safe mechanisms for missing fields like keywords from file metadata.
- An Enrichment service utilizes Vertex AI (Gemini 2.0 Flash) to extract specific information:
- Skill requirements.
- Salary ranges.
- Role summaries.
- Geocoding services convert location strings into latitude and longitude.
- Final enriched data is stored in BigQuery gold tables for high-performance querying.
- Infrastructure: Terraform for Reproducible Infrastructure as Code.
- Compute: Cloud Run and Cloud Run Functions (Python 3.11).
- Storage: Google Cloud Storage and BigQuery.
- AI/ML: Vertex AI (Gemini) for text extraction and analysis.
- Visualization: Streamlit for interactive dashboards.
- Observability: OpenTelemetry and Google Cloud Trace for distributed tracing.
- Automated Scraping: Bypasses modern bot detection mechanisms.
- AI-Powered Extraction: Transforms unstructured job descriptions into structured skill lists and summaries.
- Geographic Mapping: Visualizes job density across regions using interactive 3D maps.
- Real-time Analysis: Dynamic filtering by job keywords and automatic metric calculations.
- Which job titles are experiencing the highest growth in demand?
- What are the top technical skills required for specific roles in current market?
- How do salary ranges vary across different geographic locations?
- Are there emerging roles or skills appearing suddenly in recent postings?
To successfully deploy the platform using Terraform, the following order must be observed:
- Initial Infrastructure: Configure
terraform.tfvarswith yourproject_idandbucket_name. - Container Image: The dashboard service depends on a pre-existing container image. Before applying the dashboard infrastructure, build and push the image to Google Container Registry:
cd dashboard docker build --platform linux/amd64 -t gcr.io/[PROJECT_ID]/jobstreet-dashboard:latest . docker push gcr.io/[PROJECT_ID]/jobstreet-dashboard:latest
- Terraform Apply: Run
terraform applyto provision the Cloud Run services, BigQuery tables, and necessary IAM permissions.
- The dashboard can be run locally using
streamlit run dashboard/app.py. - Ensure Google Cloud credentials are configured via
gcloud auth login. - Secrets are managed through Streamlit secrets or environment variables.