An AI-powered full-stack web application for analysing the Finnish national vehicle registry (Traficom open data) โ with interactive charts, natural-language chat, and Gemini-driven market insights.
- Project Overview
- Business Task
- Data Source
- Data Processing & Cleaning
- Analysis & Visualisations
- AI-Powered Insights
- Key Findings
- Getting Started
- Project Structure
- Tech Stack
- Limitations & Future Work
- Acknowledgements
- License
The Finnish Vehicle Market Analyser is a full-stack Next.js application that lets analysts, researchers, and automotive professionals explore over 5.2 million registered vehicles from the Finnish national vehicle registry maintained by Traficom (Finnish Transport and Communications Agency).
- Parses the full 930 MB Traficom CSV dataset directly in the browser using chunked streaming โ no server upload required
- Aggregates data into interactive dashboards with 7 Recharts visualisations
- Filters records by registration date range (month + year granularity), fuel type, vehicle class, make, and municipality
- Provides a Google Gemini 2.5 Flash powered AI chat for natural-language market analysis
- Generates full market analysis reports and exports them as Markdown, CSV, or JSON
- Runs entirely locally โ your data never leaves your machine
| Audience | Use Case |
|---|---|
| Automotive analysts | Track brand market share, EV adoption, and fleet age trends |
| Car dealers & importers | Identify growth segments and competitive positioning |
| Policy researchers | Assess EV transition progress against EU Green Deal targets |
| Data science students | A real-world capstone project on a 5M-row public dataset |
๐ Traficom Open Vehicle Data โ avoindata.fi
Finland, like other EU member states, is navigating a major transition in its vehicle fleet โ from combustion to electric propulsion, driven by EU emissions targets and domestic policy incentives. However, there is no publicly available interactive tool that lets non-technical stakeholders explore the full 5-million-vehicle Traficom registry without writing code.
This project builds that tool: a browser-based analytics platform that turns raw government CSV data into actionable market intelligence.
The application is designed to answer the following analytical questions:
- What are the most popular vehicle brands in Finland and how has market share shifted?
- How is EV adoption trending year by year โ is Finland on track for its electrification goals?
- Which Finnish regions (municipalities) have the oldest vehicle fleets, indicating lower turnover?
- What is the fuel type distribution across the national fleet โ petrol, diesel, electric, hybrid, gas?
- Which models have experienced the strongest growth in new registrations over the last 5 years?
- What is the average age of the Finnish vehicle fleet and what does that imply for emissions?
- How does COโ emission data (WLTP) compare across fuel types and vehicle classes?
| Attribute | Detail |
|---|---|
| Provider | Finnish Transport and Communications Agency (Traficom) |
| Dataset name | Ajoneuvojen avoin data (Open vehicle data) |
| Portal | avoindata.fi |
| Rows | ~5,122,260 (as of December 2025) |
| Uncompressed size | ~930 MB |
| Format | ZIP-packed CSV |
| Delimiter | Semicolon (;) |
| Encoding | ISO-8859-1 (Latin-1) |
| License | Creative Commons Attribution 4.0 International |
| Update frequency | Quarterly |
| Finnish Column | English Meaning | Type |
|---|---|---|
merkkiSelvakielinen |
Vehicle make / brand (human-readable) | String |
mallimerkinta |
Model name | String |
kayttoonottopvm |
Date of first use (YYYYMMDD) |
Integer |
ensirekisterointipvm |
First registration date (DD.MM.YYYY) |
Date |
kayttovoima |
Fuel type code | String |
kunta |
Municipality of registration | String |
ajoneuvoluokka |
Vehicle class (M1, N1, L3e, MUU...) | String |
ajoneuvoryhma |
Vehicle group / sub-class | String |
sahkohybridi |
Hybrid flag ("true" / "false") |
Boolean string |
WLTP_Co2 |
WLTP COโ emissions (g/km) | Float |
| Code | Meaning |
|---|---|
01 |
Petrol |
02 |
Diesel |
03 |
Electric (legacy code) |
04 |
Electric |
05 |
Ethanol (E85) |
06 |
Natural Gas |
13 |
Petrol/CNG |
The raw CSV is 5.2 million rows and ~930 MB uncompressed. Parsing it entirely into memory would crash a browser tab. The application uses PapaParse with chunked streaming:
Papa.parse<Record<string, string>>(file, {
header: true,
delimiter: ';',
skipEmptyLines: true,
chunkSize: 1024 * 512, // 512 KB chunks
chunk(results) {
// Aggregate each chunk โ never accumulate raw rows
}
});Chunks are processed one at a time and immediately discarded after aggregation. Only up to 50,000 representative rows are kept in memory for the Data Explorer table.
The Traficom CSV uses ISO-8859-1 encoding (Finnish special characters: รค, รถ, รฅ). The browser's FileReader and PapaParse handle this transparently when the file is loaded from disk via <input type="file">.
Users can filter the dataset at parse time by a month+year range (e.g. January 2024 โ April 2026). Rows outside the range are skipped before any aggregation โ effectively reducing memory footprint when only recent data is needed:
const checkYM = checkYear * 100 + (regMonth || 6);
const filterFrom = yearFilter.from * 100 + yearFilter.fromMonth;
const filterTo = yearFilter.to * 100 + yearFilter.toMonth;
if (checkYM < filterFrom || checkYM > filterTo) continue;| Scenario | Handling |
|---|---|
Empty merkkiSelvakielinen |
Excluded from make/model counts |
Missing ensirekisterointipvm |
Falls back to kayttoonottopvm for year extraction |
WLTP_Co2 missing or "" |
Stored as null; excluded from COโ averages |
| Unknown municipality | Excluded from regional chart |
| Unparseable date strings | extractYear() / extractMonth() return 0; row still counted globally |
Raw vehicle rows are never sent to the AI. Instead, a structured JSON summary is built:
{
totalVehicles: 5122260,
makeCount: { "Toyota": 186432, "Volkswagen, VW": 154821, ... },
fuelTypeCount: { "01": 2340000, "02": 1800000, "04": 177816, ... },
evByYear: { 2020: 8421, 2021: 19843, 2022: 38251, ... },
yearRange: { min: 1958, max: 2025 },
evCount: 177816,
hybridCount: 388863,
...
}This summary is typically < 20 KB โ safe to embed in a Gemini prompt.
The dashboard presents 7 interactive Recharts visualisations, all rendered client-side from the aggregated statistics object.
Displays the ten most registered vehicle makes by total count. Allows immediate identification of market leaders (Toyota, Volkswagen, Volvo dominate the 2025 new-registration data).
Chart visible in the dashboard screenshot above (top-left panel).
Shows the percentage split across all fuel types: Petrol, Diesel, Electric, Hybrid, Ethanol, Natural Gas, and others. Provides an instant read on how far the fleet has electrified.
Chart visible in the dashboard screenshot above (top-right panel). In the filtered 2025 view: Petrol 45.9%, Electric 33.2%, Diesel 20.0%.
Plots total new vehicle registrations per calendar year across the selected date range. Highlights growth, recession-era dips, and post-COVID recovery patterns.
Chart visible in the bottom-left of the dashboard screenshot.
Buckets vehicles by their first-use year into a histogram. Identifies the vintage of the national fleet and shows whether the fleet is renewing quickly or ageing.
Chart visible in the bottom-right of the dashboard screenshot.
Ranks the most registered individual models (e.g. Toyota Yaris, VW Golf). Complements the makes chart with model-level granularity.
Year-by-year growth of electric vehicle registrations. Critical for assessing Finland's electrification trajectory against EU 2035 zero-emission targets.
Top municipalities by vehicle registrations. Reveals urban concentration (Helsinki, Espoo, Tampere) vs. rural vehicle density patterns.
The app integrates Google Gemini 2.5 Flash for two AI features:
| Feature | Endpoint | Description |
|---|---|---|
| Auto-Analyse Market | POST /api/analyze |
Generates a full structured market report (8 sections) from the aggregated stats JSON. Non-streaming. |
| AI Chat | POST /api/chat |
Real-time streaming conversational chat. Maintains message history per session. |
Both routes embed the dataset summary JSON directly in the system instruction:
You are an expert Finnish vehicle market analyst with deep knowledge of the
Traficom vehicle registry, Finnish transport policy, and European automotive
trends. Use Finnish place names and brands naturally. Keep responses
structured and professional.
## Current Dataset Summary
{ "totalVehicles": 5122260, "makeCount": { ... }, ... }
By injecting aggregated statistics (not raw rows) into the prompt, the model has full dataset context in under 20 KB โ well within Gemini's context window.
The Gemini free tier allows 25 requests per day on gemini-2.5-flash. When a 429 Too Many Requests error is detected, both API routes automatically retry with gemini-2.5-flash-lite:
try {
await callGemini('gemini-2.5-flash', prompt);
} catch (err) {
if (isRateLimit(err)) {
await callGemini('gemini-2.5-flash-lite', prompt); // fallback
}
}When Flash-Lite is active, an amber badge (โก Flash-Lite (fallback)) is shown in the AI chat panel header.
Analysis results are cached in localStorage keyed by a hash of the current stats object:
// Cache key: gemini_cache_v1_analyze_<statsHash>
// TTL: 1 hour
// On cache hit: response shown instantly, no API call madeClicking "Auto-Analyse Market" on the same dataset within an hour returns the cached result immediately, preserving daily request quota.
A live counter in the AI Insights sidebar shows requests used today vs. the 25 RPD limit. The counter turns amber at 80% and red when the limit is reached.
"Which brand has grown the most in the last 5 years?"
"What is the EV market share trend in Finland?"
"Which regions have the oldest vehicle fleets?"
"What are the COโ emission trends across fuel types?"
"How does Finland's EV adoption compare to EU averages?"
"Which vehicle class (M1, N1, L3e) is growing fastest?"
"What are the policy implications of the current diesel share?"
Note: The findings below are placeholders to be filled after running a full-dataset analysis. Upload the Traficom CSV and click Auto-Analyse Market to generate AI-powered insights.
[TO BE ADDED AFTER ANALYSIS]
Key questions to answer:
- Which brand holds the largest share of new 2025 registrations?
- Has Toyota maintained first place against growing Korean competition?
- What share does Tesla hold in the EV-only segment?
[TO BE ADDED AFTER ANALYSIS]
Key questions to answer:
- What was the year-over-year EV growth rate from 2020 to 2025?
- Is the EV adoption curve accelerating or plateauing?
- Which municipality has the highest EV concentration?
[TO BE ADDED AFTER ANALYSIS]
Key questions to answer:
- Which 5 municipalities account for the majority of new registrations?
- Do rural municipalities show significantly older average fleet age?
- Is there a regional divide in EV vs. diesel adoption?
[TO BE ADDED AFTER ANALYSIS]
Key questions to answer:
- What is the national average vehicle age in 2025?
- How has average fleet age changed over the last decade?
- Which vehicle class (passenger car, van, motorcycle) is oldest on average?
- Node.js 18+ or Bun 1.0+
- A Google Gemini API key โ obtainable free at Google AI Studio
- The Traficom CSV file โ downloaded from avoindata.fi
1. Clone the repository
git clone https://github.com/your-username/vehicle-stats.git
cd vehicle-stats2. Install dependencies
# With Bun (recommended โ faster installs)
bun install
# Or with npm
npm install3. Configure environment variables
Create a .env.local file in the project root:
GEMINI_API_KEY=your_gemini_api_key_here4. Start the development server
# With Bun
bun dev
# Or with npm
npm run devOpen http://localhost:3000 in your browser.
5. Load the Traficom dataset
- Download the ZIP from avoindata.fi
- Extract the CSV (e.g.
TieliikenneAvoinData_31_12_2025.csv) - In the app's upload screen, choose a date range (e.g. last 2 years for ~100k rows โ fastest load)
- Drop or click to upload the CSV โ parsing progress is shown in real time
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
โ Yes | Google Gemini API key from AI Studio |
vehicle-stats/
โโโ src/
โ โโโ app/
โ โ โโโ api/
โ โ โ โโโ chat/
โ โ โ โ โโโ route.ts # Streaming Gemini chat endpoint
โ โ โ โโโ analyze/
โ โ โ โโโ route.ts # Full market analysis endpoint
โ โ โโโ globals.css # Tailwind v4 + dark mode styles
โ โ โโโ layout.tsx # Root layout (ThemeProvider + Contexts)
โ โ โโโ page.tsx # Entry point โ <AppShell />
โ โ
โ โโโ components/
โ โ โโโ ui/ # Radix UI primitives (button, card, badgeโฆ)
โ โ โโโ AIChat.tsx # Chat UI + auto-analyse panel + caching
โ โ โโโ AppShell.tsx # Section routing (dashboard/chat/explorer/export)
โ โ โโโ Charts.tsx # All 7 Recharts visualisations
โ โ โโโ DataExplorer.tsx # Filterable, searchable, paginated data table
โ โ โโโ ExportPanel.tsx # CSV / JSON stats / Markdown AI report export
โ โ โโโ FileUpload.tsx # Month+year date-range selector + CSV drop zone
โ โ โโโ Sidebar.tsx # Navigation + dark mode + EN/FI language toggle
โ โ โโโ SummaryCards.tsx # 6 KPI stat cards (total, EV, hybrid, ageโฆ)
โ โ
โ โโโ context/
โ โ โโโ DataContext.tsx # Global parsed stats, records, active filters
โ โ โโโ LanguageContext.tsx # EN/FI language switching via useLanguage()
โ โ
โ โโโ lib/
โ โโโ csvParser.ts # PapaParse chunked streaming + YearFilter logic
โ โโโ dataProcessor.ts # Aggregation helpers + Gemini summary builder
โ โโโ fuelTypes.ts # Fuel code โ label map + EV/hybrid detection sets
โ โโโ i18n.ts # EN and FI translation strings (~60 keys)
โ โโโ types.ts # Shared TypeScript interfaces
โ โโโ utils.ts # cn(), formatNumber(), downloadFile()
โ
โโโ screenshots/
โ โโโ dashboard.jpeg
โ โโโ aichatand suggestions.jpeg
โ โโโ dataexplorer.jpeg
โ โโโ reports.jpeg
โ
โโโ public/ # Static assets (SVGs)
โโโ .env.local # Local environment variables (git-ignored)
โโโ package.json
โโโ tsconfig.json
โโโ README.md
| Technology | Purpose | Version |
|---|---|---|
| Next.js | Full-stack React framework (App Router) | 16.2 |
| TypeScript | Static type safety | 5.x |
| Tailwind CSS | Utility-first styling with dark mode | v4 |
| Recharts | Composable SVG chart library | 3.8 |
| @google/generative-ai | Gemini 2.5 Flash SDK | 0.24 |
| PapaParse | Browser-side CSV streaming parser | 5.5 |
| Radix UI | Accessible headless UI primitives | latest |
| Lucide React | Icon library | latest |
| react-markdown | Renders Gemini Markdown responses in chat | 10.x |
| next-themes | Dark/light mode without flash | 0.4 |
| Bun | Fast JS runtime & package manager | 1.x |
| Limitation | Detail |
|---|---|
| Gemini free tier | 25 requests per day on the Flash model. LocalStorage caching mitigates this for repeated analyses on the same dataset. |
| No real-time prices | The Traficom dataset contains registration data only โ no sale price, mileage, or condition data. |
| CSV upload only | The dataset must be manually downloaded from avoindata.fi and uploaded locally; there is no auto-fetch. |
| Browser memory | Very old machines may struggle with the 930 MB uncompressed CSV. Using a 2-year date filter (< 200k rows) is recommended for slower devices. |
| ISO-8859-1 encoding | The parser is tuned for the Traficom CSV encoding; other CSV formats may need adjustment. |
- ๐บ๏ธ Municipality map visualisation โ choropleth map of Finland showing EV density or fleet age per region
- ๐ค Price prediction ML model โ integrate used car price API (e.g. Nettiauto) for market value estimation
- ๐ Full Finnish/English toggle โ complete i18n of all chart labels and AI prompts
- ๐ Time-series comparison โ load two date ranges side-by-side for year-over-year comparison
- ๐ค PDF export โ export the full dashboard (charts + AI analysis) as a formatted PDF
- โ๏ธ Server-side parsing โ offload CSV processing to a background worker for very low-memory devices
- ๐ Auto-refresh โ detect when a new Traficom quarterly release is available and notify the user
-
Traficom โ Finnish Transport and Communications Agency โ for publishing and maintaining the open vehicle registry dataset under a CC BY 4.0 licence.
-
Google Data Analytics Professional Certificate โ This project was built as a capstone demonstrating the full data analytics cycle: ask, prepare, process, analyse, share, and act.
-
Google AI Studio โ for providing free access to the Gemini 2.5 Flash API used to power the AI insights features.
-
avoindata.fi โ Finland's national open data portal for making government datasets accessible to researchers and developers.
| Screen | Preview |
|---|---|
| Dashboard | ![]() |
| AI Chat & Market Insights | ![]() |
| Data Explorer | ![]() |
| Export Panel | ![]() |
MIT License
Copyright (c) 2026
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
Built as part of the Google Data Analytics Professional Certificate
๐ซ๐ฎ Data: Traficom Open Data ยท CC BY 4.0



