Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create 100 million object cost lot of time #663

Open
moderafasas opened this issue Oct 8, 2021 · 4 comments
Open

create 100 million object cost lot of time #663

moderafasas opened this issue Oct 8, 2021 · 4 comments

Comments

@moderafasas
Copy link

moderafasas commented Oct 8, 2021

Describe the bug
when i create 100 million object cost about 10 hour

To Reproduce
for (int i = 0; i < 100000000; i++) {
People person = new People();
person.setLevel(j);
person.setName(faker.name().fullName());
person.setCompany(faker.company().industry() + faker.company().buzzword());
person.setNation(faker.nation().nationality());
person.setPlace(faker.address().fullAddress());
person.setUniversity(faker.university().name());
person.setBlood(faker.name().bloodGroup());
person.setJob(faker.job().title());
person.setPhoneNum(faker.phoneNumber().cellPhone());
person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime()));
}

Expected behavior
create bigdata could be faster than now

Versions:

  • OS: Linux 64GB 1T
  • JDK 1.8
  • Faker Version 1.0.2
@wcarmon
Copy link

wcarmon commented Oct 8, 2021

Option-1

you might want to run a profiler to see where the bottleneck is. (eg. visualVM, yourkit, jprofiler)

the problem could be DateUtils.get8DateString. if so, this ticket is in the wrong project. Date formatting is not always fast

Option-2

you can generate in parallel with something like this:

    ExecutorService executor = Executors
        .newFixedThreadPool(
            Runtime.getRuntime().availableProcessors());

    java.util.function.Consumer<Person> personSink = ...;

    for (int i = 0; i < 100_000_000; i++) {
      executor.submit(() -> personSink.accept(generateOnePerson()));
    }

    // signal we're done submitting jobs
    executor.shutdown();  
 
    // bounded waiting for all jobs
    executor.awaitTermination(30, TimeUnit.SECONDS);

@mssoni2
Copy link

mssoni2 commented Oct 24, 2021

want to contribute for academic purpose

@snuyanzin
Copy link
Contributor

snuyanzin commented Apr 22, 2022

hi @icytek
may be a bit too late but anyway there is a port of java-faker to jdk8 with lots of improvements including performance https://github.com/datafaker-net/datafaker

I've just checked timing for your code (except person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); since I do not have DateUtils#get8DateString)
No code changes are required except imports

For me it took about 40 min to generate 100M. (Linux64, jdk1.8)

Also it could be parallelized as proposed by @wcarmon

@snuyanzin
Copy link
Contributor

snuyanzin commented Nov 28, 2022

just for fun i added this as a benchmark to https://github.com/datafaker-net/datafaker
After a number of optimizations done in datafaker (versions 1.2.0-1.7.0)

it takes less than 10 min to generate these 100 million objects (jdk 17.0.5) in one thread
(except person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); since I do not have DateUtils#get8DateString)
However there is its own setBirthday in datafaker which is included in benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants