Skip to content

Is116/vehicle-stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš— Finnish Vehicle Market Analyser

Dashboard

An AI-powered full-stack web application for analysing the Finnish national vehicle registry (Traficom open data) โ€” with interactive charts, natural-language chat, and Gemini-driven market insights.

Next.js TypeScript Tailwind CSS Gemini AI License: MIT Google Data Analytics


๐Ÿ“‹ Table of Contents

  1. Project Overview
  2. Business Task
  3. Data Source
  4. Data Processing & Cleaning
  5. Analysis & Visualisations
  6. AI-Powered Insights
  7. Key Findings
  8. Getting Started
  9. Project Structure
  10. Tech Stack
  11. Limitations & Future Work
  12. Acknowledgements
  13. License

๐Ÿ“Š Project Overview

The Finnish Vehicle Market Analyser is a full-stack Next.js application that lets analysts, researchers, and automotive professionals explore over 5.2 million registered vehicles from the Finnish national vehicle registry maintained by Traficom (Finnish Transport and Communications Agency).

What it does

  • Parses the full 930 MB Traficom CSV dataset directly in the browser using chunked streaming โ€” no server upload required
  • Aggregates data into interactive dashboards with 7 Recharts visualisations
  • Filters records by registration date range (month + year granularity), fuel type, vehicle class, make, and municipality
  • Provides a Google Gemini 2.5 Flash powered AI chat for natural-language market analysis
  • Generates full market analysis reports and exports them as Markdown, CSV, or JSON
  • Runs entirely locally โ€” your data never leaves your machine

Who it is for

Audience Use Case
Automotive analysts Track brand market share, EV adoption, and fleet age trends
Car dealers & importers Identify growth segments and competitive positioning
Policy researchers Assess EV transition progress against EU Green Deal targets
Data science students A real-world capstone project on a 5M-row public dataset

Dataset

๐Ÿ”— Traficom Open Vehicle Data โ€” avoindata.fi


๐ŸŽฏ Business Task

Problem Statement

Finland, like other EU member states, is navigating a major transition in its vehicle fleet โ€” from combustion to electric propulsion, driven by EU emissions targets and domestic policy incentives. However, there is no publicly available interactive tool that lets non-technical stakeholders explore the full 5-million-vehicle Traficom registry without writing code.

This project builds that tool: a browser-based analytics platform that turns raw government CSV data into actionable market intelligence.

Key Business Questions

The application is designed to answer the following analytical questions:

  1. What are the most popular vehicle brands in Finland and how has market share shifted?
  2. How is EV adoption trending year by year โ€” is Finland on track for its electrification goals?
  3. Which Finnish regions (municipalities) have the oldest vehicle fleets, indicating lower turnover?
  4. What is the fuel type distribution across the national fleet โ€” petrol, diesel, electric, hybrid, gas?
  5. Which models have experienced the strongest growth in new registrations over the last 5 years?
  6. What is the average age of the Finnish vehicle fleet and what does that imply for emissions?
  7. How does COโ‚‚ emission data (WLTP) compare across fuel types and vehicle classes?

๐Ÿ—„๏ธ Data Source

Attribute Detail
Provider Finnish Transport and Communications Agency (Traficom)
Dataset name Ajoneuvojen avoin data (Open vehicle data)
Portal avoindata.fi
Rows ~5,122,260 (as of December 2025)
Uncompressed size ~930 MB
Format ZIP-packed CSV
Delimiter Semicolon (;)
Encoding ISO-8859-1 (Latin-1)
License Creative Commons Attribution 4.0 International
Update frequency Quarterly

Key Columns

Finnish Column English Meaning Type
merkkiSelvakielinen Vehicle make / brand (human-readable) String
mallimerkinta Model name String
kayttoonottopvm Date of first use (YYYYMMDD) Integer
ensirekisterointipvm First registration date (DD.MM.YYYY) Date
kayttovoima Fuel type code String
kunta Municipality of registration String
ajoneuvoluokka Vehicle class (M1, N1, L3e, MUU...) String
ajoneuvoryhma Vehicle group / sub-class String
sahkohybridi Hybrid flag ("true" / "false") Boolean string
WLTP_Co2 WLTP COโ‚‚ emissions (g/km) Float

Fuel Type Codes (Selected)

Code Meaning
01 Petrol
02 Diesel
03 Electric (legacy code)
04 Electric
05 Ethanol (E85)
06 Natural Gas
13 Petrol/CNG

๐Ÿ”ง Data Processing & Cleaning

Parsing Strategy

The raw CSV is 5.2 million rows and ~930 MB uncompressed. Parsing it entirely into memory would crash a browser tab. The application uses PapaParse with chunked streaming:

Papa.parse<Record<string, string>>(file, {
  header: true,
  delimiter: ';',
  skipEmptyLines: true,
  chunkSize: 1024 * 512, // 512 KB chunks
  chunk(results) {
    // Aggregate each chunk โ€” never accumulate raw rows
  }
});

Chunks are processed one at a time and immediately discarded after aggregation. Only up to 50,000 representative rows are kept in memory for the Data Explorer table.

Encoding

The Traficom CSV uses ISO-8859-1 encoding (Finnish special characters: รค, รถ, รฅ). The browser's FileReader and PapaParse handle this transparently when the file is loaded from disk via <input type="file">.

Registration Date Filter

Users can filter the dataset at parse time by a month+year range (e.g. January 2024 โ€“ April 2026). Rows outside the range are skipped before any aggregation โ€” effectively reducing memory footprint when only recent data is needed:

const checkYM = checkYear * 100 + (regMonth || 6);
const filterFrom = yearFilter.from * 100 + yearFilter.fromMonth;
const filterTo   = yearFilter.to   * 100 + yearFilter.toMonth;
if (checkYM < filterFrom || checkYM > filterTo) continue;

Missing Value Handling

Scenario Handling
Empty merkkiSelvakielinen Excluded from make/model counts
Missing ensirekisterointipvm Falls back to kayttoonottopvm for year extraction
WLTP_Co2 missing or "" Stored as null; excluded from COโ‚‚ averages
Unknown municipality Excluded from regional chart
Unparseable date strings extractYear() / extractMonth() return 0; row still counted globally

Aggregation Strategy

Raw vehicle rows are never sent to the AI. Instead, a structured JSON summary is built:

{
  totalVehicles: 5122260,
  makeCount: { "Toyota": 186432, "Volkswagen, VW": 154821, ... },
  fuelTypeCount: { "01": 2340000, "02": 1800000, "04": 177816, ... },
  evByYear: { 2020: 8421, 2021: 19843, 2022: 38251, ... },
  yearRange: { min: 1958, max: 2025 },
  evCount: 177816,
  hybridCount: 388863,
  ...
}

This summary is typically < 20 KB โ€” safe to embed in a Gemini prompt.


๐Ÿ“ˆ Analysis & Visualisations

The dashboard presents 7 interactive Recharts visualisations, all rendered client-side from the aggregated statistics object.

Dashboard Overview

Dashboard Overview


1. ๐Ÿ“Š Top 10 Vehicle Makes โ€” Bar Chart

Displays the ten most registered vehicle makes by total count. Allows immediate identification of market leaders (Toyota, Volkswagen, Volvo dominate the 2025 new-registration data).

Chart visible in the dashboard screenshot above (top-left panel).


2. ๐Ÿฅง Fuel Type Distribution โ€” Donut Chart

Shows the percentage split across all fuel types: Petrol, Diesel, Electric, Hybrid, Ethanol, Natural Gas, and others. Provides an instant read on how far the fleet has electrified.

Chart visible in the dashboard screenshot above (top-right panel). In the filtered 2025 view: Petrol 45.9%, Electric 33.2%, Diesel 20.0%.


3. ๐Ÿ“‰ New Registrations Per Year โ€” Line Chart

Plots total new vehicle registrations per calendar year across the selected date range. Highlights growth, recession-era dips, and post-COVID recovery patterns.

Chart visible in the bottom-left of the dashboard screenshot.


4. ๐Ÿ“Š Vehicle Age Distribution โ€” Histogram

Buckets vehicles by their first-use year into a histogram. Identifies the vintage of the national fleet and shows whether the fleet is renewing quickly or ageing.

Chart visible in the bottom-right of the dashboard screenshot.


5. ๐Ÿ“Š Top 10 Models โ€” Horizontal Bar Chart

Ranks the most registered individual models (e.g. Toyota Yaris, VW Golf). Complements the makes chart with model-level granularity.


6. ๐Ÿ“ˆ EV Adoption Trend โ€” Line Chart

Year-by-year growth of electric vehicle registrations. Critical for assessing Finland's electrification trajectory against EU 2035 zero-emission targets.


7. ๐ŸŒ Regional Distribution โ€” Bar Chart

Top municipalities by vehicle registrations. Reveals urban concentration (Helsinki, Espoo, Tampere) vs. rural vehicle density patterns.


๐Ÿค– AI-Powered Insights (Gemini 2.5 Flash)

AI Chat & Market Insights

How Gemini is Used

The app integrates Google Gemini 2.5 Flash for two AI features:

Feature Endpoint Description
Auto-Analyse Market POST /api/analyze Generates a full structured market report (8 sections) from the aggregated stats JSON. Non-streaming.
AI Chat POST /api/chat Real-time streaming conversational chat. Maintains message history per session.

System Prompt Strategy

Both routes embed the dataset summary JSON directly in the system instruction:

You are an expert Finnish vehicle market analyst with deep knowledge of the
Traficom vehicle registry, Finnish transport policy, and European automotive
trends. Use Finnish place names and brands naturally. Keep responses
structured and professional.

## Current Dataset Summary
{ "totalVehicles": 5122260, "makeCount": { ... }, ... }

By injecting aggregated statistics (not raw rows) into the prompt, the model has full dataset context in under 20 KB โ€” well within Gemini's context window.

Rate Limit Handling (Flash โ†’ Flash-Lite Fallback)

The Gemini free tier allows 25 requests per day on gemini-2.5-flash. When a 429 Too Many Requests error is detected, both API routes automatically retry with gemini-2.5-flash-lite:

try {
  await callGemini('gemini-2.5-flash', prompt);
} catch (err) {
  if (isRateLimit(err)) {
    await callGemini('gemini-2.5-flash-lite', prompt); // fallback
  }
}

When Flash-Lite is active, an amber badge (โšก Flash-Lite (fallback)) is shown in the AI chat panel header.

LocalStorage Caching (1-Hour TTL)

Analysis results are cached in localStorage keyed by a hash of the current stats object:

// Cache key: gemini_cache_v1_analyze_<statsHash>
// TTL: 1 hour
// On cache hit: response shown instantly, no API call made

Clicking "Auto-Analyse Market" on the same dataset within an hour returns the cached result immediately, preserving daily request quota.

Daily Request Tracker

A live counter in the AI Insights sidebar shows requests used today vs. the 25 RPD limit. The counter turns amber at 80% and red when the limit is reached.

Example Questions for the AI

"Which brand has grown the most in the last 5 years?"
"What is the EV market share trend in Finland?"
"Which regions have the oldest vehicle fleets?"
"What are the COโ‚‚ emission trends across fuel types?"
"How does Finland's EV adoption compare to EU averages?"
"Which vehicle class (M1, N1, L3e) is growing fastest?"
"What are the policy implications of the current diesel share?"

๐Ÿ” Key Findings

Note: The findings below are placeholders to be filled after running a full-dataset analysis. Upload the Traficom CSV and click Auto-Analyse Market to generate AI-powered insights.

Brand Market Share Findings

[TO BE ADDED AFTER ANALYSIS]

Key questions to answer:

  • Which brand holds the largest share of new 2025 registrations?
  • Has Toyota maintained first place against growing Korean competition?
  • What share does Tesla hold in the EV-only segment?

EV Adoption Findings

[TO BE ADDED AFTER ANALYSIS]

Key questions to answer:

  • What was the year-over-year EV growth rate from 2020 to 2025?
  • Is the EV adoption curve accelerating or plateauing?
  • Which municipality has the highest EV concentration?

Regional Findings

[TO BE ADDED AFTER ANALYSIS]

Key questions to answer:

  • Which 5 municipalities account for the majority of new registrations?
  • Do rural municipalities show significantly older average fleet age?
  • Is there a regional divide in EV vs. diesel adoption?

Fleet Age Findings

[TO BE ADDED AFTER ANALYSIS]

Key questions to answer:

  • What is the national average vehicle age in 2025?
  • How has average fleet age changed over the last decade?
  • Which vehicle class (passenger car, van, motorcycle) is oldest on average?

๐Ÿš€ Getting Started

Prerequisites

  • Node.js 18+ or Bun 1.0+
  • A Google Gemini API key โ€” obtainable free at Google AI Studio
  • The Traficom CSV file โ€” downloaded from avoindata.fi

Installation

1. Clone the repository

git clone https://github.com/your-username/vehicle-stats.git
cd vehicle-stats

2. Install dependencies

# With Bun (recommended โ€” faster installs)
bun install

# Or with npm
npm install

3. Configure environment variables

Create a .env.local file in the project root:

GEMINI_API_KEY=your_gemini_api_key_here

4. Start the development server

# With Bun
bun dev

# Or with npm
npm run dev

Open http://localhost:3000 in your browser.

5. Load the Traficom dataset

  1. Download the ZIP from avoindata.fi
  2. Extract the CSV (e.g. TieliikenneAvoinData_31_12_2025.csv)
  3. In the app's upload screen, choose a date range (e.g. last 2 years for ~100k rows โ€” fastest load)
  4. Drop or click to upload the CSV โ€” parsing progress is shown in real time

Environment Variables

Variable Required Description
GEMINI_API_KEY โœ… Yes Google Gemini API key from AI Studio

๐Ÿ“ Project Structure

vehicle-stats/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ app/
โ”‚   โ”‚   โ”œโ”€โ”€ api/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ chat/
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ route.ts          # Streaming Gemini chat endpoint
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ analyze/
โ”‚   โ”‚   โ”‚       โ””โ”€โ”€ route.ts          # Full market analysis endpoint
โ”‚   โ”‚   โ”œโ”€โ”€ globals.css               # Tailwind v4 + dark mode styles
โ”‚   โ”‚   โ”œโ”€โ”€ layout.tsx                # Root layout (ThemeProvider + Contexts)
โ”‚   โ”‚   โ””โ”€โ”€ page.tsx                  # Entry point โ†’ <AppShell />
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ components/
โ”‚   โ”‚   โ”œโ”€โ”€ ui/                       # Radix UI primitives (button, card, badgeโ€ฆ)
โ”‚   โ”‚   โ”œโ”€โ”€ AIChat.tsx                # Chat UI + auto-analyse panel + caching
โ”‚   โ”‚   โ”œโ”€โ”€ AppShell.tsx              # Section routing (dashboard/chat/explorer/export)
โ”‚   โ”‚   โ”œโ”€โ”€ Charts.tsx                # All 7 Recharts visualisations
โ”‚   โ”‚   โ”œโ”€โ”€ DataExplorer.tsx          # Filterable, searchable, paginated data table
โ”‚   โ”‚   โ”œโ”€โ”€ ExportPanel.tsx           # CSV / JSON stats / Markdown AI report export
โ”‚   โ”‚   โ”œโ”€โ”€ FileUpload.tsx            # Month+year date-range selector + CSV drop zone
โ”‚   โ”‚   โ”œโ”€โ”€ Sidebar.tsx               # Navigation + dark mode + EN/FI language toggle
โ”‚   โ”‚   โ””โ”€โ”€ SummaryCards.tsx          # 6 KPI stat cards (total, EV, hybrid, ageโ€ฆ)
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ context/
โ”‚   โ”‚   โ”œโ”€โ”€ DataContext.tsx           # Global parsed stats, records, active filters
โ”‚   โ”‚   โ””โ”€โ”€ LanguageContext.tsx       # EN/FI language switching via useLanguage()
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ lib/
โ”‚       โ”œโ”€โ”€ csvParser.ts              # PapaParse chunked streaming + YearFilter logic
โ”‚       โ”œโ”€โ”€ dataProcessor.ts          # Aggregation helpers + Gemini summary builder
โ”‚       โ”œโ”€โ”€ fuelTypes.ts              # Fuel code โ†’ label map + EV/hybrid detection sets
โ”‚       โ”œโ”€โ”€ i18n.ts                   # EN and FI translation strings (~60 keys)
โ”‚       โ”œโ”€โ”€ types.ts                  # Shared TypeScript interfaces
โ”‚       โ””โ”€โ”€ utils.ts                  # cn(), formatNumber(), downloadFile()
โ”‚
โ”œโ”€โ”€ screenshots/
โ”‚   โ”œโ”€โ”€ dashboard.jpeg
โ”‚   โ”œโ”€โ”€ aichatand suggestions.jpeg
โ”‚   โ”œโ”€โ”€ dataexplorer.jpeg
โ”‚   โ””โ”€โ”€ reports.jpeg
โ”‚
โ”œโ”€โ”€ public/                           # Static assets (SVGs)
โ”œโ”€โ”€ .env.local                        # Local environment variables (git-ignored)
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ tsconfig.json
โ””โ”€โ”€ README.md

๐Ÿ› ๏ธ Tech Stack

Technology Purpose Version
Next.js Full-stack React framework (App Router) 16.2
TypeScript Static type safety 5.x
Tailwind CSS Utility-first styling with dark mode v4
Recharts Composable SVG chart library 3.8
@google/generative-ai Gemini 2.5 Flash SDK 0.24
PapaParse Browser-side CSV streaming parser 5.5
Radix UI Accessible headless UI primitives latest
Lucide React Icon library latest
react-markdown Renders Gemini Markdown responses in chat 10.x
next-themes Dark/light mode without flash 0.4
Bun Fast JS runtime & package manager 1.x

โš ๏ธ Limitations & Future Work

Current Limitations

Limitation Detail
Gemini free tier 25 requests per day on the Flash model. LocalStorage caching mitigates this for repeated analyses on the same dataset.
No real-time prices The Traficom dataset contains registration data only โ€” no sale price, mileage, or condition data.
CSV upload only The dataset must be manually downloaded from avoindata.fi and uploaded locally; there is no auto-fetch.
Browser memory Very old machines may struggle with the 930 MB uncompressed CSV. Using a 2-year date filter (< 200k rows) is recommended for slower devices.
ISO-8859-1 encoding The parser is tuned for the Traficom CSV encoding; other CSV formats may need adjustment.

Planned Improvements

  • ๐Ÿ—บ๏ธ Municipality map visualisation โ€” choropleth map of Finland showing EV density or fleet age per region
  • ๐Ÿค– Price prediction ML model โ€” integrate used car price API (e.g. Nettiauto) for market value estimation
  • ๐ŸŒ Full Finnish/English toggle โ€” complete i18n of all chart labels and AI prompts
  • ๐Ÿ“… Time-series comparison โ€” load two date ranges side-by-side for year-over-year comparison
  • ๐Ÿ“ค PDF export โ€” export the full dashboard (charts + AI analysis) as a formatted PDF
  • โ˜๏ธ Server-side parsing โ€” offload CSV processing to a background worker for very low-memory devices
  • ๐Ÿ”„ Auto-refresh โ€” detect when a new Traficom quarterly release is available and notify the user

๐Ÿ™ Acknowledgements

  • Traficom โ€” Finnish Transport and Communications Agency โ€” for publishing and maintaining the open vehicle registry dataset under a CC BY 4.0 licence.

  • Google Data Analytics Professional Certificate โ€” This project was built as a capstone demonstrating the full data analytics cycle: ask, prepare, process, analyse, share, and act.

  • Google AI Studio โ€” for providing free access to the Gemini 2.5 Flash API used to power the AI insights features.

  • avoindata.fi โ€” Finland's national open data portal for making government datasets accessible to researchers and developers.


๐Ÿ–ผ๏ธ Screenshots

Screen Preview
Dashboard Dashboard
AI Chat & Market Insights AI Chat
Data Explorer Data Explorer
Export Panel Export

๐Ÿ“„ License

MIT License

Copyright (c) 2026

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

Built as part of the Google Data Analytics Professional Certificate

๐Ÿ‡ซ๐Ÿ‡ฎ Data: Traficom Open Data ยท CC BY 4.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors