Skip to content

Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua

License

Notifications You must be signed in to change notification settings

VikentiosVitalis/advanced_topics_in_database_systems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATDS's Custom Image

Data Science Project

This project focuses on data analysis using Apache Hadoop and Apache Spark.

The goal is to familiarize working with distributed systems and modern data science techniques.

The project utilizes large datasets related to crime data in Los Angeles.

Tools Used

  • Apache Hadoop 3.3.6 The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.

Distributed processing of large datasets across clusters of computers using simple programming models.

Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

Batch/streaming data

SQL analytics

Data science at scale

Machine learning

The project uses virtual machines from the public cloud ~Okeanos-knossos.

A detailed setup guide for the installation of the tools used is available in the files/documents folder.

Results

A detailed report with the execution and the interpretation of queries is also available in the files/documents folder.

Contributors

About

Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages