https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow
- Use the Google Big Query API to query the data from Bigquery Dataset.
- Upload the files onto Amazon S3 bucket and set up the Amazon EMR.
- Upload the files onto Amazon S3 bucket and set up the Amazon EMR.
- Use startup.sh to set up the environment in AWS CLI.
- Run table_name_producer.py and table_name_stream.py to run the Kafka producer and consumer process.
- See if parquet files are formed in the cluster under the specified location.
- Use Tableau link above to visualize the analysis made on the data.