Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About AdBlockClient::parse() function execution time #158

Closed
ymh8416 opened this Issue Jan 3, 2019 · 5 comments

Comments

Projects
None yet
4 participants
@ymh8416
Copy link

ymh8416 commented Jan 3, 2019

edit bbondy

See below, parse is not meant to be used from clients, just from server CI: #158 (comment)


I used your ad-block project in my android project. I found that using AdBlockClient::parse() to parse a 3M Abp rule file takes 80 seconds. Can this be optimized?
thanks very much

@daoan1412

This comment has been minimized.

Copy link

daoan1412 commented Jan 4, 2019

I think u can create a background service to parse data.

@daoan1412

This comment has been minimized.

Copy link

daoan1412 commented Jan 5, 2019

Instead create a background service.
You can parse data, serialize data to a file then deserialize data to match faster.

#include "ad_block_client.h"
#include <algorithm>
#include <iostream>
#include <fstream>
#include <sstream>
#include <iostream>
#include <string>

using namespace std;

string getFileContents(const char *filename)
{
  ifstream in(filename, ios::in);
  if (in) {
    ostringstream contents;
    contents << in.rdbuf();
    in.close();
    return(contents.str());
  }
  throw(errno);
}

void writeFile(const char *filename, const char *buffer, int length)
{
  ofstream outFile(filename, ios::out | ios::binary);
  if (outFile) {
    outFile.write(buffer, length);
    outFile.close();
    return;
  }
  throw(errno);
}


int main(int argc, char**argv) {
  std::string &&easyListTxt = getFileContents("./test/data/easylist.txt");
  const char *urlsToCheck[] = {
    // ||pagead2.googlesyndication.com^$~object-subrequest
    "http://pagead2.googlesyndication.com/pagead/show_ads.js",
    // Should be blocked by: ||googlesyndication.com/safeframe/$third-party
    "http://tpc.googlesyndication.com/safeframe/1-0-2/html/container.html",
    // Should be blocked by: ||googletagservices.com/tag/js/gpt_$third-party
    "http://www.googletagservices.com/tag/js/gpt_mobile.js",
    // Shouldn't be blocked
    "http://www.brianbondy.com"
  };

  // This is the site who's URLs are being checked, not the domain of the URL being checked.
  const char *currentPageDomain = "slashdot.org";

  // Parse easylist
  AdBlockClient client;
  client.parse(easyListTxt.c_str());

  // Do the checks
  std::for_each(urlsToCheck, urlsToCheck + sizeof(urlsToCheck) / sizeof(urlsToCheck[0]), [&client, currentPageDomain](std::string const &urlToCheck) {
    if (client.matches(urlToCheck.c_str(), FONoFilterOption, currentPageDomain)) {
      cout << urlToCheck << ": You should block this URL!" << endl;
    } else {
      cout << urlToCheck << ": You should NOT block this URL!" << endl;
    }
  });

  int size;
  // This buffer is allocate on the heap, you must call delete[] when you're done using it.
  char *buffer = client.serialize(size);
  writeFile("./ABPFilterParserData.dat", buffer, size);

  AdBlockClient client2;
  // Deserialize uses the buffer directly for subsequent matches, do not free until all matches are done.
  client2.deserialize(buffer);
  // Prints the same as client.matches would
  std::for_each(urlsToCheck, urlsToCheck + sizeof(urlsToCheck) / sizeof(urlsToCheck[0]), [&client2, currentPageDomain](std::string const &urlToCheck) {
    if (client2.matches(urlToCheck.c_str(), FONoFilterOption, currentPageDomain)) {
      cout << urlToCheck << ": You should block this URL!" << endl;
    } else {
      cout << urlToCheck << ": You should NOT block this URL!" << endl;
    }
  });
  delete[] buffer;
  return 0;
}
@ymh8416

This comment has been minimized.

Copy link
Author

ymh8416 commented Jan 9, 2019

I am very sorry, I understand what you mean. I am now in the background to execute AdBlockClient::parse(), but 80 seconds after the program starts, the ad blocking will start working normally. This time is a bit long.

@bbondy

This comment has been minimized.

Copy link
Member

bbondy commented Feb 16, 2019

Note that parse intentionally does as much work as possible. Clients are not meant to parse lists directly. They should instead serialize an already parsed list and then the clients should use the deserialized list.

@ymh8416

This comment has been minimized.

Copy link
Author

ymh8416 commented Feb 28, 2019

Note that parse intentionally does as much work as possible. Clients are not meant to parse lists directly. They should instead serialize an already parsed list and then the clients should use the deserialized list.

Thank you very much, I understand.

@snyderp snyderp closed this Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.