Python implementation for analyzing the strike rate of a cricket batsmen using Hadoop MapReduce and finding the final strike rate of each batsman.
Strike Rate is the average runs a batsman scores in 100 balls. Given the input, find the final strike rate of each batsman.
Local_strike_rate refers to the strike rate of the particular match.
Strike Rate formula = (runs/balls) * 100 [rounded upto 3 decimal places].
Average strike rate = sum of all local strike rates / total matches [rounded upto 3 decimal places]
-
Input : Array of JSON Objects
Output : name,local_strike_rate
-
Input : Same format as Mapper output
Output : Independent JSON Objects (refer to-->
expexted_output_sample_data.json
)
- Mapper Code:
mapper.py
- Processes input data and emits intermediate key-value pairs. - Reducer Code:
reducer.py
- Aggregates intermediate results and calculates the final strike rate for each batsman. - Sample Input:
sample_data.json
- Contains cricket statistics. - Sample Output:
expexted_output_sample_data.json
- Contains expected output.
-
cat sample_data.json | ./mapper.py | sort -k 1,1 | ./reducer.py
-
Create an HDFS directory for input:
hdfs dfs -mkdir /sr
-
Upload sample data to HDFS:
hdfs dfs -put sample_data.json /sr
-
Run the Hadoop MapReduce job:
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.3.jar -mapper "$PWD/mapper.py" -reducer "$PWD/reducer.py" -input /sr/sample_data.json -output /sr/output_sr
-
View the output:
hdfs dfs -ls /sr/output_sr hdfs dfs -cat /sr/output_sr/part-00000
{"name": "Deepti", "strike_rate": 61.551}
{"name": "Harmanpreet", "strike_rate": 87.124}
{"name": "Ishan", "strike_rate": 85.077}
{"name": "Jemimah", "strike_rate": 77.407}
{"name": "Renuka", "strike_rate": 74.35}
{"name": "Rohit", "strike_rate": 71.464}
{"name": "Shubman", "strike_rate": 66.041}
{"name": "Smriti", "strike_rate": 57.807}
{"name": "VVS Laxman", "strike_rate": 64.078}
{"name": "Virat", "strike_rate": 89.928}