Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance NL local company name and suffix #641

Merged

Conversation

robosoul
Copy link
Contributor

Expanding suffix collection + adjusting name patterns: we shouldn't use default (en) since and is en in Dutch, thus I added

        name:
          - "#{Name.last_name} #{suffix}"
          - "#{Name.last_name}-#{Name.last_name}"
          - "#{Name.last_name}, #{Name.last_name} en #{Name.last_name}"

@what-the-diff
Copy link

what-the-diff bot commented Jan 19, 2023

  • Added a new key called company to the nl.yml file
  • The suffixes are now in an array under the company key instead of being directly under the root level
  • A name field was added with 3 different options for generating names, which can be used by other keys (like email) that need a full name as input

name:
- "#{Name.last_name} #{suffix}"
- "#{Name.last_name}-#{Name.last_name}"
- "#{Name.last_name}, #{Name.last_name} en #{Name.last_name}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding default (en) patterns, since and in English is en in Dutch

@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2023

Codecov Report

Merging #641 (0e8eedd) into main (53e14a4) will increase coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main     #641      +/-   ##
============================================
+ Coverage     92.69%   92.71%   +0.01%     
- Complexity     2620     2621       +1     
============================================
  Files           281      281              
  Lines          5396     5396              
  Branches        589      589              
============================================
+ Hits           5002     5003       +1     
  Misses          241      241              
+ Partials        153      152       -1     
Impacted Files Coverage Δ
...ain/java/net/datafaker/idnumbers/KoKrIdNumber.java 81.25% <0.00%> (-6.25%) ⬇️
.../java/net/datafaker/service/FakeValuesService.java 84.95% <0.00%> (+0.19%) ⬆️
...rc/main/java/net/datafaker/service/FakeValues.java 84.16% <0.00%> (+0.83%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@robosoul
Copy link
Contributor Author

@bodiam - we can merge this one ? thanks

@bodiam bodiam merged commit 5b06228 into datafaker-net:main Jan 19, 2023
@bodiam
Copy link
Contributor

bodiam commented Jan 19, 2023

Merged!

@bodiam
Copy link
Contributor

bodiam commented Jan 19, 2023

Can I ask what you're working on, especially the Dutch part? I'm Dutch myself, hence my question :)

@robosoul robosoul deleted the feature/enhance-nl-locale-company-name branch January 20, 2023 06:21
@robosoul
Copy link
Contributor Author

@bodiam - it's part OCD, part project I'm working on. I'm using GPT3 to generate some website content and I'm doing it for a wide range of European countries. I want my prompt to be as realistic and versatile as possible. Cheers!

@bodiam
Copy link
Contributor

bodiam commented Jan 20, 2023

Nice! Have you seen the Datafaker GPT module? It's extremely basic, slow, beta, etc, but it kind of works:

https://github.com/datafaker-net/datafaker-experimental

@robosoul
Copy link
Contributor Author

Hey, this is really nice. I'll check it out and you reach any time if you need any help or think I could contribute. Cheers, Luka

@robosoul
Copy link
Contributor Author

@bodiam - just remembered, I was pushing davinci-003 to it's limits :D but figured out I could get a json response as a result.. eg with prompt: List 10 Dutch full names (male and female mixed). Response should be json object, with array field named "names". And you get something like:

{
  "names": [
    "Jan Willem van der Meer",
    "Johanna Maria van der Linden",
    "Klaas Jan Koopman",
    "Anneliese van der Veen",
    "Pieter Jan de Boer",
    "Marijke van der Heijden",
    "Hendrik Jan Smit",
    "Liesbeth van der Ploeg",
    "Gerard Jan Koster",
    "Petra van der Zee"
  ]
}

It's easier to later parse the response. Could be useful ;)

@bodiam
Copy link
Contributor

bodiam commented Jan 20, 2023

Yes, that is useful! What I actually wanted to do was to generate a set of data like you did, and then cache the results, and refresh the cache when all the items would be used. That would at least speed up the process a bit. I'll have a look at parsing the results, thanks for the idea!

@robosoul
Copy link
Contributor Author

I did that for my custom steampunk and cyberpunk classes. I used GPT3 to generate sets of SP and CP professions/industries and names. Some examples:

    cyberpunk:
      hacker:
        name: [ "Neuromark", "Cyberblade", "Decryptor", "Nucloid", "Scriptrix", "Farality", "Hardwired", "Servo-Tron", "Opticore", "Machina", "Broodlord", "Cryptster"]
      company:
        suffix: [ Corporation, Firm, Group, Solutions, Tech, Systems, Company, Group ]
        industry: [ "Biometric authentication and access control", "Cybernetic Prosthetics Manufacturing", "Cybernetic Weapons", "Cybernetics And Biohacking", "Nanotech Manufacturing", "Nanotechnology Solutions","Synthetic Organ Manufacturing", "Teleportation ", "Underground Combat Arenas", "Wearable Augmentation", "Wearable Computing" ]
    steampunk:
      company:
        suffix: [ "Inc", "and Sons", "LLC", "Group", "Ltd", "& Co"]
        industry: [ "Aetheric Airship Construction",  "Aetheric Telegraphy Signal Maintenance Services", "Airship Commuter Services", "Airship Maintenance Technicians", "Airship Manufacturing", "Airship Navigation ", "Animatronic Entertainment ", "Animatronic Maintenance & Repair",  "Copper Wire Supplier", "Zeppelin Parts ", "Zeppelin Refurbishment" ]

I think it GPT3 is very useful and inexpensive tool to achieve wide variations on the specific topic. Reach out if you need any kind of help ;)

@robosoul
Copy link
Contributor Author

@bodiam - also, what I noticed is needed with OpenAI API is retries. I did something like using resilience4j:

    final RetryConfig config =
        RetryConfig.<Response<CompletionResponse>>custom()
            .maxAttempts(5)
            .intervalFunction(IntervalFunction.ofExponentialBackoff(Duration.ofSeconds(5L), 2D))
            .retryOnResult(new RetryResponsePredicate())
            .retryExceptions(
                HttpException.class, TimeoutException.class, SocketTimeoutException.class)
            .failAfterMaxAttempts(false)
            .build();

Where our code retries on following:

  private static final class RetryResponsePredicate
      implements Predicate<Response<CompletionResponse>> {

    @Override
    public boolean test(Response<CompletionResponse> response) {
      if (response.code() == 429) {
        log.info("Retrying request due to HTTP:429 code...");
        return true;
      }

      if (response.code() > 400) {
        log.info("NOT retrying request due to HTTP status {}!", response.code());
        return false;
      }

      if (response.body() == null) {
        log.info("Retrying request due to empty body...");
        return true;
      }

      if (response.body().getChoices().isEmpty()) {
        log.info("Retrying request due to empty choices list...");
        return true;
      }

      if (response.body().getChoices().get(0).getText().isBlank()) {
        log.info("Retrying request due to empty text...");
        return true;
      }

      return false;
    }
  }

Might be useful for your use case as well.

@bodiam
Copy link
Contributor

bodiam commented Jan 22, 2023

Small update: I just added the better model to the experimental project, and I implemented caching. I didn't implement the retry mechanism (I didn't run into connectivity issues yet), but PRs welcome of course :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants