Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOTAL Aggregation broken? #1

Open
mcorner opened this issue Dec 22, 2018 · 9 comments
Open

TOTAL Aggregation broken? #1

mcorner opened this issue Dec 22, 2018 · 9 comments

Comments

@mcorner
Copy link
Contributor

mcorner commented Dec 22, 2018

I am getting an error:
*** ELK TEST: Total aggregations are not avialble!

I don't see anything in my elasticsearch cluster for bidagg-*

This is where that query is:
https://github.com/RTB4FREE/crosstalk/blob/master/src/com/jacamars/dsp/crosstalk/budget/Aggregator.java#L215

Should this be wins-* as well?

@mcorner
Copy link
Contributor Author

mcorner commented Dec 22, 2018

The odd thing is that total budget capping is working. And from the crosstalk API it can fetch the total spend just fine:

{"error":false,"timestamp":1545500761961,"type":"GetSpendRate#","campaign":"2","totalSpend":3.786,"dailySpend":0.0,"hourlySpend":0.0,"minuteSpendAverage":0.3437,"std":0.6179715922992649}

I will have to dig through the code to see why that is happening.

@ploh814
Copy link

ploh814 commented Dec 22, 2018

Hi Mark, I believe I know what is causing your issue. When we run rtb4free in production, we keep a separate elasticsearch index called bidagg-* that aggregates the total summaries of costs so we can compute whether a campaign's total budget is exceeded. This index is separate from the bid, win, etc. elasticsearch indexes since those grow rather large after a short time. Since the bidagg index is a summary, it is much smaller, and can store the total history even if the campaign is long running. It appears we left out the scripts that generate the bidagg index. I'll find those scripts and post them to the repo as soon as I can.
Peter

@mcorner
Copy link
Contributor Author

mcorner commented Dec 23, 2018

That would be great. I had been wondering what the right was would be to archive the elasticsearch data without messing up the budget.

How is it aggregating the total spend without bidagg? Hourly and daily have their own code paths and I couldn't sort out visually where the total comes from without the total.json query on bidagg.

@ploh814
Copy link

ploh814 commented Dec 23, 2018

I need to confirm this with Ben, but I believe that crosstalk gets budget for the long term history from the bidagg index (ie, time before the bidder started - since that is not persistent in the bidder - or greater than 1 hr), then adds it's in-memory summaries to this to get the current history sum. That's why you were seeing hourly numbers in the summary. But if you restarted the bidder, these would reset so you wouldn't be getting the pre-start history from ES.

I've updated the source and the docker images to include the aggregation scripts. The scripts only operate on elasticsearch. They read the ES wins, bids, etc. and summarize them into a single bidagg record that is the summary for every 5 minute interval.

If you are running docker, just update the ploh/rtbadmin_open image, (the swarm service "web").

If you downloaded this source, the files that you need to update from https://github.com/RTB4FREE/campaignmanager are

  • app/models/elastic_report.rb
  • app/models/report_aggregation.rb
  • lib/tasks/bidagg.rake

To run the script, execute the command in the web container:
docker ps (get the id of the service "web")
docker exec -it /rtb4free_admin/bin/rake bidagg:campaigns

This will generate the elasticsearch records in index bidagg. To see these on Kibana, you will need to go to the "Management" menu, then follow directions for "Create Index Pattern" for bidagg-*. The records should look like this.

screen shot 2018-12-23 at 3 42 13 pm

In production, we run this script every 5 minutes using cron, though you can run it whenever and it should populate records since the last run.

Peter

@mcorner
Copy link
Contributor Author

mcorner commented Feb 1, 2019

@ploh814 Thanks for your help! But I think there is something missing here.

This rake task only runs the campaign aggregation from app/models/report_aggregation.rb which is search bidagg-* for results: https://github.com/RTB4FREE/campaignmanager/blob/master/app/models/report_aggregation.rb#L472

So I am assuming you actually need to run campaignPerformance3 first?

(Also I had to upgrade the elasticsearch gem 6+ to get around this: elastic/elasticsearch-rails#756)

@mcorner
Copy link
Contributor Author

mcorner commented Feb 1, 2019

Another issue:

In that script the current cost gets inserted into the DB in CPM form (not divided by 1000)
https://github.com/RTB4FREE/campaignmanager/blob/master/app/models/report_aggregation.rb#L494

But in the campaign manager I get the following error:
Campaign not loaded - total cost 5.00 greater than budget 1.00

The budget is $1.00 and there were 5 impressions at a $1 CPM. So either the script should be inserting cost /1000 or the campaign manager needs to interpret cost as a CPM.

@ploh814
Copy link

ploh814 commented Feb 16, 2019

You are correct.

  1. I did miss calling campaignPerformance3 in bidagg.rake.
  2. If running Elasticsearch 6, you do need the ruby gem to match the ES version.

I've corrected this in the code, and updated our demo Docker container image.

Regarding the CPM calculations, the cost data stored in the Elasticsearch is not altered from the data from the exchanges. The cost modification is done when the report is generated here:

https://github.com/RTB4FREE/campaignmanager/blob/47911d41593cfe2f559d8420f9f55541e2e70306/app/views/dashboards/_campaignalltable_es.html.erb#L117

If you are pulling the cost from Elasticsearch, then you do need to divide by 1000. However we have seen some instances where the exchange change the units for their cost field so please check.

@ploh814
Copy link

ploh814 commented Feb 16, 2019

Forgot to mention, the update also includes the Bidswitch special attributes to support the bidder update.

@sandiemann
Copy link

sandiemann commented Mar 26, 2019

@ploh814 I am getting the same error so I followed the above steps but I did not see any bidagg index generated in ES.

I also get a warning "DID NOT GET ANY BUCKETS ON HOURLY/DAILY"

What needs to be done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants