Skip to content

Project challenge part of the data engineering festival at HSBC

Notifications You must be signed in to change notification settings

bbuluttekin/Data-Engineering-Festival

Repository files navigation

Data Engineering Festival

This project is part of data engineering festival at HSBC.

In this project, I will be using the Spark RDD API for data processing.

Spark RDD: HackerNews data analysis challenge

You will analyse a dataset of (almost) all submitted HackerNews posts.

  1. First run python get_data.py to download the data. Alternatively you can download it from https://s3-eu-west-1.amazonaws.com/kate-datasets/hackernews/HNStories.zip and unzip it.

  2. Then you can use the notebook spark-rdd-homework.ipynb to prototype your functions

  3. Finally write down your functions in spark_rdd.py and submit on K.A.T.E.

About

Project challenge part of the data engineering festival at HSBC

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published