New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TOTAL Aggregation broken? #1
Comments
The odd thing is that total budget capping is working. And from the crosstalk API it can fetch the total spend just fine: {"error":false,"timestamp":1545500761961,"type":"GetSpendRate#","campaign":"2","totalSpend":3.786,"dailySpend":0.0,"hourlySpend":0.0,"minuteSpendAverage":0.3437,"std":0.6179715922992649} I will have to dig through the code to see why that is happening. |
Hi Mark, I believe I know what is causing your issue. When we run rtb4free in production, we keep a separate elasticsearch index called bidagg-* that aggregates the total summaries of costs so we can compute whether a campaign's total budget is exceeded. This index is separate from the bid, win, etc. elasticsearch indexes since those grow rather large after a short time. Since the bidagg index is a summary, it is much smaller, and can store the total history even if the campaign is long running. It appears we left out the scripts that generate the bidagg index. I'll find those scripts and post them to the repo as soon as I can. |
That would be great. I had been wondering what the right was would be to archive the elasticsearch data without messing up the budget. How is it aggregating the total spend without bidagg? Hourly and daily have their own code paths and I couldn't sort out visually where the total comes from without the total.json query on bidagg. |
I need to confirm this with Ben, but I believe that crosstalk gets budget for the long term history from the bidagg index (ie, time before the bidder started - since that is not persistent in the bidder - or greater than 1 hr), then adds it's in-memory summaries to this to get the current history sum. That's why you were seeing hourly numbers in the summary. But if you restarted the bidder, these would reset so you wouldn't be getting the pre-start history from ES. I've updated the source and the docker images to include the aggregation scripts. The scripts only operate on elasticsearch. They read the ES wins, bids, etc. and summarize them into a single bidagg record that is the summary for every 5 minute interval. If you are running docker, just update the ploh/rtbadmin_open image, (the swarm service "web"). If you downloaded this source, the files that you need to update from https://github.com/RTB4FREE/campaignmanager are
To run the script, execute the command in the web container: This will generate the elasticsearch records in index bidagg. To see these on Kibana, you will need to go to the "Management" menu, then follow directions for "Create Index Pattern" for bidagg-*. The records should look like this. In production, we run this script every 5 minutes using cron, though you can run it whenever and it should populate records since the last run. Peter |
@ploh814 Thanks for your help! But I think there is something missing here. This rake task only runs the campaign aggregation from app/models/report_aggregation.rb which is search bidagg-* for results: https://github.com/RTB4FREE/campaignmanager/blob/master/app/models/report_aggregation.rb#L472 So I am assuming you actually need to run campaignPerformance3 first? (Also I had to upgrade the elasticsearch gem 6+ to get around this: elastic/elasticsearch-rails#756) |
Another issue: In that script the current cost gets inserted into the DB in CPM form (not divided by 1000) But in the campaign manager I get the following error: The budget is $1.00 and there were 5 impressions at a $1 CPM. So either the script should be inserting cost /1000 or the campaign manager needs to interpret cost as a CPM. |
You are correct.
I've corrected this in the code, and updated our demo Docker container image. Regarding the CPM calculations, the cost data stored in the Elasticsearch is not altered from the data from the exchanges. The cost modification is done when the report is generated here: If you are pulling the cost from Elasticsearch, then you do need to divide by 1000. However we have seen some instances where the exchange change the units for their cost field so please check. |
Forgot to mention, the update also includes the Bidswitch special attributes to support the bidder update. |
@ploh814 I am getting the same error so I followed the above steps but I did not see any bidagg index generated in ES. I also get a warning "DID NOT GET ANY BUCKETS ON HOURLY/DAILY" What needs to be done? |
I am getting an error:
*** ELK TEST: Total aggregations are not avialble!
I don't see anything in my elasticsearch cluster for bidagg-*
This is where that query is:
https://github.com/RTB4FREE/crosstalk/blob/master/src/com/jacamars/dsp/crosstalk/budget/Aggregator.java#L215
Should this be wins-* as well?
The text was updated successfully, but these errors were encountered: