Skip to content

This repository contains exercises and project that I have done in the course Data - Intensive Programming in Tampere University

Notifications You must be signed in to change notification settings

10n6z/Data---Intensive-Programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Data-Intensive Programming Exercises & Projects

This repository contains the exercises and projects completed for the Data-Intensive Programming course at Tampere University.


Repository Structure

  • Exercises/
    Contains course assignments focused on foundational concepts in data-intensive programming. These may include algorithmic tasks, data processing exercises, or hands-on coding assignments.

  • Projects/
    Contains larger-scale, self-contained projects applying data-intensive programming techniques. These may involve real-world data sets, performance optimizations, or interactive applications.


Technologies Used

  • Scala – Appears to be the primary language (≈50% of the codebase)
  • Python – Also heavily used (≈47% of the code)
  • Jupyter Notebooks – Likely used for exploratory data analysis or interactive demonstrations (≈3%)

📘 Project Descriptions

Exercises

  • Exercise 1 – Data Processing Basics
    Learn to manipulate collections, perform transformations, and implement simple algorithms in Scala/Python.

  • Exercise 2 – Parallelism
    Use parallel collections and concurrency constructs to speed up computation on large datasets.

  • Exercise 3 – File & Data Formats
    Parse structured data (CSV, JSON) and implement simple aggregations.

  • Exercise 4 – Distributed Data Processing
    Introduction to MapReduce and Spark, implementing word count and other classic tasks.

  • (add more based on your repo content)

Final Project

The final project applies the course concepts to a larger data-intensive problem, such as:

  • Analyzing a real-world dataset (e.g., logs, social media, sensor data).
  • Implementing efficient data pipelines for ETL (Extract–Transform–Load).
  • Using Spark/parallel algorithms for large-scale analysis.
  • Presenting findings in a short report or notebook.

🚀 How to Run

Requirements

  • Scala 2.13+ (or as specified in repo)
  • Python 3.8+
  • Jupyter Notebook (if using .ipynb files)
  • Optional: Apache Spark (if project/exercises use it)

About

This repository contains exercises and project that I have done in the course Data - Intensive Programming in Tampere University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published