Udacity-SF-Crime-Statistics

Answers to Project Questions

How did changing values on the SparkSession property parameters affect the throughput and latency of the data?

Answer: The processedRowsPerSecond parameter allows for increasing the number of rows that are being processed per second. This allows for higher Throughput.
What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?

Answer: Through using the parameter processedRowsPerSecond to measure the how efficient the stream was, I manipulated the following three config parameters.
```
 i. spark.default.parallelism
 ii. spark.streaming.kafka.maxRatePerPartition
 iii. spark.sql.shuffle.partitions
```
With spark.default.parallelism = 11000, spark.streaming.kafka.maxRatePerPartition = 15 and spark.sql.shuffle.partitions = 15, I was able to process up to 13.51 rows per second. When these values were changed to 15,000 for (i), 20 for (ii) and 20 for (iii), I was able to process up to 145.78754578754578 rows per second. These parameters seemed to be the best at making the stream more efficient.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
.gitignore		.gitignore
Project Overview.docx		Project Overview.docx
Project_Screens.zip		Project_Screens.zip
README.md		README.md
Wrangling_Project_data.ipynb		Wrangling_Project_data.ipynb
consumer_server.py		consumer_server.py
data_stream.py		data_stream.py
kafka_server.py		kafka_server.py
producer_server.py		producer_server.py
radio_code.json		radio_code.json
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

.gitignore

.gitignore

Project Overview.docx

Project Overview.docx

Project_Screens.zip

Project_Screens.zip

README.md

README.md

Wrangling_Project_data.ipynb

Wrangling_Project_data.ipynb

consumer_server.py

consumer_server.py

data_stream.py

data_stream.py

kafka_server.py

kafka_server.py

producer_server.py

producer_server.py

radio_code.json

radio_code.json

requirements.txt

requirements.txt

start.sh

start.sh

Repository files navigation

Udacity-SF-Crime-Statistics

Answers to Project Questions

About

Releases

Packages

Languages

AhmadChaiban/Udacity-SF-Crime-Statistics

Folders and files

Latest commit

History

Repository files navigation

Udacity-SF-Crime-Statistics

Answers to Project Questions

About

Resources

Stars

Watchers

Forks

Languages