This repository contains the exercises and projects completed for the Data-Intensive Programming course at Tampere University.
-
Exercises/
Contains course assignments focused on foundational concepts in data-intensive programming. These may include algorithmic tasks, data processing exercises, or hands-on coding assignments. -
Projects/
Contains larger-scale, self-contained projects applying data-intensive programming techniques. These may involve real-world data sets, performance optimizations, or interactive applications.
- Scala – Appears to be the primary language (≈50% of the codebase)
- Python – Also heavily used (≈47% of the code)
- Jupyter Notebooks – Likely used for exploratory data analysis or interactive demonstrations (≈3%)
-
Exercise 1 – Data Processing Basics
Learn to manipulate collections, perform transformations, and implement simple algorithms in Scala/Python. -
Exercise 2 – Parallelism
Use parallel collections and concurrency constructs to speed up computation on large datasets. -
Exercise 3 – File & Data Formats
Parse structured data (CSV, JSON) and implement simple aggregations. -
Exercise 4 – Distributed Data Processing
Introduction to MapReduce and Spark, implementing word count and other classic tasks. -
(add more based on your repo content)
The final project applies the course concepts to a larger data-intensive problem, such as:
- Analyzing a real-world dataset (e.g., logs, social media, sensor data).
- Implementing efficient data pipelines for ETL (Extract–Transform–Load).
- Using Spark/parallel algorithms for large-scale analysis.
- Presenting findings in a short report or notebook.
- Scala 2.13+ (or as specified in repo)
- Python 3.8+
- Jupyter Notebook (if using
.ipynbfiles) - Optional: Apache Spark (if project/exercises use it)