The overall goal of the assignment is to:
- implement aggregation framework queries using the MongoDB Ruby Driver
The functional goal of the assignment is to:
- implement various document access methods for Race Results
Note that this assignment was written so that you can implement it in parts after each lecture. If you are performing the assignment in between lectures, stop at the next lecture boundary in the technical requirements section and resume once you have completed the lecture. You are free to experiment with other forms of the queries presented, but the grading will only be targeted at the specific requirements listed.
-
Start your MongoDB server
-
Download and extract the starter set of files. The root directory of this starter set will be referred to as the root directory of your solution.
--- student-start |-- assignment.rb |-- race_results.json |-- .rspec (important hidden file) `-- spec |-- assignment_spec.rb `-- spec_helper.rb- assignment.rb - your solution must be placed within this file
- spec - this directory contains tests to verify your solution. You should not modify anything in this directory
-
Install the following gems. You may already have them installed.
$ gem install rspec $ gem install rspec-its $ gem install mongo -v 2.1.2
-
Run the rspec command from the project root directory (i.e.,
student-startdirectory) to execute the unit tests within the spec directory. This should result in several failures until you complete your solution inassignment.rb.$ rspec (N) examples, (N) failures ...
-
Implement the Ruby technical requirements in
assignment.rbwithin the provided classSolution. Helper methods have been provided to get a connection to Mongo and set the database and collection names. You can override these values using environment variables if you are not using the defaults.
- MONGO_URL='mongodb://localhost:27017'
- MONGO_DATABASE='test'
- RACE_COLLECTION='race1'
Implement all methods relative to the @coll instance variable setup to reference the collection.
In this section will use a few pre-canned queries to get warmed up on the syntax of the aggregation query. Follow-on sections will require more work on your part.
-
Load the
assignment.rbscript into theirbshell, verify you can get access to the collection, and assign that collection to the variable calledracers(used in follow-on steps).$ irb > require './assignment.rb' > racers=Solution.collection => #<Mongo::Collection:0x22344800 namespace=test.race1>
-
Issue a simple aggregation query that counts the number of race results in the collection.
> racers.find.aggregate([ {:$group=>{:_id=>0, :count=>{:$sum=>1}}}]).first => {"_id"=>0, "count"=>1000}
To break this query down
> racers.find.aggregate([ {:$group=>{ :_id=>0, :count=>{:$sum=>1}} } ]).first => {"_id"=>0, "count"=>1000}
aggregate()takes an array of commands,$groupis one of them. We can have many aggregate commands in that array and$groupcan occur multiple times. The order within the array is important.$grouphas two primary arguments: group by key, and group functions. The function results are associated with each resulting key. In this example, we are making the key be a single value -- so all functions results are applied against every row. We have a single group._idis assigned to our group key. Here it is a fixed0value to make every document processed a member of the same group.countis a property name we want in the results. We set it to the result of counting 1 for each document.$sumis an aggregate function that adds a number (1in this case) for each document it processes.first()is a function that returns only the first document from the result ofaggregate(). Since we know we will have only a single result row with the count result, we can simply return the first (and only) result.
-
Issue a slightly more complex aggregation query that counts the number of race results by group. Please pardon the overload of the age-group
'$group'(e.g., "14 and under") with the MongoDB aggregation function:$group.> racers.find.aggregate([ {:$group=>{:_id=>"$group", :count=>{:$sum=>1}}}]).each {|r| pp r} {"_id"=>"14 and under", "count"=>111} {"_id"=>"40 to 49", "count"=>141} {"_id"=>"20 to 20", "count"=>123} {"_id"=>"60 to 69", "count"=>121} {"_id"=>"50 to 59", "count"=>129} {"_id"=>"masters", "count"=>117} {"_id"=>"30 to 39", "count"=>127} {"_id"=>"15 to 19", "count"=>131}
To again break the query down
aggregateand (the aggregration function-)$groupare being used exactly as they were before except we have changed the$groupspecification to use the name of the (age-)groupas the key. As you saw above -- this results in several documents broken down by distinct (age-)groupname assigned to the_idof the document.countand$sumwork the same as they did before except they have more (aggregate function-)groups to work with.eachiterates through each document in the collection. Since we may have multiple documentsfirstis of little value to us except to grab a sample.{|r| pp r}is a single line block where each document of the result is passed in asrand pretty print (pp) is used to print a more human readable form of document hashes.
> racers.find.aggregate([ { :$group=>{ :_id=>"$group", :count=>{:$sum=>1}} } ]).each {|r| pp r}
If you are familiar with SQL, the above query would be similar to the following. Again, pardon our use of the term (age-)
groupversus a SQL-group by.select 'group', count('_id') from RACERS group by 'group'
In this section you will be asked to reshape a document by promoting and building new properties
required by downstream aggregation functions and in the final result. You will place all solutions within
assignment.rb. Refer back to the lectures for the details of each query.
-
Implement an instance method called
racer_namesthat- accepts no inputs
- finds all racers
- reduces the result to contain only
first_nameandlast_name(Hint: $project) - returns the Mongo result object for the aggregation command
You can try out your new method using the irb shell.
> s=Solution.new > r=s.racer_names.to_a.slice(0,2) => [{"first_name"=>"SHAUN", "last_name"=>"JOHNSON"}, {"first_name"=>"TUAN", "last_name"=>"JOHNSON"}]
$rspec spec/lecture2_spec.rb -e rq01 -
Implement an instance method called
id_number_mapthat- accepts no inputs
- finds all racers
- reduces the result to contain only
_idandnumber - returns the Mongo result object for the aggregation command
You can try out your new method using the irb shell.
> r=s.id_number_map.to_a.slice(0,2) [{"_id"=>BSON::ObjectId('563e7555e301d0b356000000'), "number"=>0}, {"_id"=>BSON::ObjectId('563e7555e301d0b356000001'), "number"=>1}]
$rspec spec/lecture2_spec.rb -e rq02 -
Implement an instance method called
concat_namesthat- accepts no inputs
- finds all racers
- reduces the result to contain only
numberandnamefield only wherenameis the result of concatenatinglast_name, first_name(Hint:$concat) - returns the Mongo result object for the aggregation command
You can try out your new method using the irb shell.
> r=s.concat_names.to_a.slice(0,2) => [{"number"=>0, "name"=>"JOHNSON, SHAUN"}, {"number"=>1, "name"=>"JOHNSON, TUAN"}]
$rspec spec/lecture2_spec.rb -e rq03
In this section we will get some practice applying group functions around a selected set of sub-results.
-
Implement an instance method called
group_timesthat- accepts no inputs
- finds all racers
- groups the racers into gender and age group (Hint: $group)
- counts the number of racers in the group and assigns this to
runners - calculates the fastest time for each group and assigns this value to
fastest_time(Hint: $min) - returns the Mongo result object for the aggregation command
You can try out your new method using the irb shell.
> r=s.group_times.to_a.slice(0,3) => [{"_id"=>{"age"=>"50 to 59", "gender"=>"F"}, "runners"=>65, "fastest_time"=>1269}, {"_id"=>{"age"=>"30 to 39", "gender"=>"M"}, "runners"=>68, "fastest_time"=>1262}, {"_id"=>{"age"=>"14 and under", "gender"=>"M"}, "runners"=>66, "fastest_time"=>1363}]
$rspec spec/lecture3_spec.rb -e rq01 -
Implement an instance method called
group_last_namesthat- accepts no inputs
- finds all racers
- groups the racers into gender and (age-)group as above
- creates an array[] of (non-distinct) last_names called
last_names(Hint: $push) - returns the Mongo result object for the aggregation command
You can try out your new method using the irb shell. Note the first group and the names within the group may not be in the same order as the example shows.
> r=s.group_last_names.first {"_id"=>{"age"=>"50 to 59", "gender"=>"F"}, "last_names"=>["GARNER", "SINGH", ...]} > r=s.group_last_names.first[:last_names].count => 65
$rspec spec/lecture3_spec.rb -e rq02 -
Implement an instance method called
group_last_names_setthat repeats the previous query except- creates an array[] of (distinct) last_names called
last_names(Hint: $addToSet)
Note that because of the size of the array and the fact the contents are unsorted, it is hard to visually spot the duplicates, but
$addToSetwill de-dup the collection and$pushwill collect all members.You can try out your new method using the irb shell.
> r=s.group_last_names_set.first {"_id"=>{"age"=>"50 to 59", "gender"=>"F"}, "last_names"=>["GARNER", "SINGH", ...]} > r=s.group_last_names_set.first[:last_names].count => 61
$rspec spec/lecture3_spec.rb -e rq03 - creates an array[] of (distinct) last_names called
In this section we will limit documents in the query pipeline to those that match a certain criteria.
-
Reimplement your solution to
group_timesin a new instance method calledgroups_faster_thansuch that it:- accepts time input
- finds all racers
- groups the racers into gender and age group
- counts the number of racers in the group and assigns this to
runners - calculates the fastest time for each group and assigns this value to
fastest_time - reduces the list of results to only those that have a fastest time less than or equal to the time provided. This is the only difference from before. (Hint: $match)
- returns the Mongo result object for the aggregation command
Note that this will require that you form a
$match(:fastest_time) on a property that is not in the original document from the database.You can try out your new method using the irb shell.
> r=s.groups_faster_than(1280).to_a => [{"_id"=>{"age"=>"50 to 59", "gender"=>"F"}, "runners"=>65, "fastest_time"=>1269}, {"_id"=>{"age"=>"30 to 39", "gender"=>"M"}, "runners"=>68, "fastest_time"=>1262}, {"_id"=>{"age"=>"30 to 39", "gender"=>"F"}, "runners"=>59, "fastest_time"=>1270}, {"_id"=>{"age"=>"masters", "gender"=>"F"}, "runners"=>58, "fastest_time"=>1264}]
$rspec spec/lecture4_spec.rb -e rq01 -
Reimplement the previous solution to
groups_faster_thanin a new instance method calledage_groups_faster_thansuch that it:- accepts
criteria_timeandage_group - finds all racers in specified age group ("M" and "F"). This part is different.
- groups the racers into gender and age group
- counts the number of racers in the group and assigns this to
runners - calculates the fastest time for each group and assigns this value to
fastest_time - reduces the list of results to only those that have a fastest time less than or equal to the time provided.
- returns the Mongo result object for the aggregation command
Note that this can be implemented with two (2)
$matchfunctions. One prior to the$groupfunction and one after.You can try out your new method using the irb shell. The gender="M" for this age group did not satisfy the
criteria_timespecified.> r=s.age_groups_faster_than("masters",1280).first => {"_id"=>{"age"=>"masters", "gender"=>"F"}, "runners"=>58, "fastest_time"=>1264}
$rspec spec/lecture4_spec.rb -e rq02 - accepts
In this section we will modify the database.
-
Implement an instance method called
avg_family_timethat- accepts a
last_name - finds the racers having that same last name (Hint:
$match) - determines the average of all their race times (Hint:
$groupand$avg) - forms an array of numbers for each member of the group (Hint:
$groupand$push) - returns the Mongo result object for the command
You can try out your new method using the irb shell.
> s.avg_family_time("JONES").first => {"_id"=>"JONES", "avg_time"=>2006.6, "numbers"=>[3, 4, 5, 6, 7]}
$rspec spec/lecture5_spec.rb -e rq01 - accepts a
-
Extend the implementation of
avg_family_timein an instance method callednumber_goal- accepts a
last_name - finds the racers having that same last name (Hint:
$match) - determines the average of all their race times (Hint:
$groupand$avg) - forms an array of numbers for each member of the group (Hint:
$groupand$push) This is where the difference starts. - forms a result with
avg_timefor each number (Hint:$unwind) - forms a result with
last_name,number, andavg_timefor each number in the family with no_idproperty (Hint:$project) - returns the Mongo result object for the command
You can try out your new method using the irb shell.
> s.number_goal("JONES").each {|r| pp r} {"avg_time"=>2006.6, "number"=>3, "last_name"=>"JONES"} {"avg_time"=>2006.6, "number"=>4, "last_name"=>"JONES"} {"avg_time"=>2006.6, "number"=>5, "last_name"=>"JONES"} {"avg_time"=>2006.6, "number"=>6, "last_name"=>"JONES"} {"avg_time"=>2006.6, "number"=>7, "last_name"=>"JONES"}
$rspec spec/lecture5_spec.rb -e rq02 - accepts a
Unit tests have been provided in the bootstrap files that can be used to evaluate your solution. They must be run from the same directory as your solution.
$ rspec
........
(N) examples, 0 failuresThere is no submission required for this assignment but the skills learned will be part of a follow-on assignment so please complete this to the requirements of the unit test.