Final project of the Big data module of Americanas Polo Tech data science track
The project was carried out in a group and used the framework Databricks, pyspark and python.
Dataset available on: https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+temperature+modulation
- Alessa Santos
- Beatriz Guisso
- Guilherme Tonini
- João Luiz de Castro
- Thais Carvalho
- Thiago Lopes
The objective of this project was to observe the execution time of codes and of a simple machine learning model when working with Big Data compared to small datasets, and observe how faster is work using Spark in comparison to pandas library