-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance NL local company name and suffix #641
Enhance NL local company name and suffix #641
Conversation
|
src/main/resources/nl.yml
Outdated
name: | ||
- "#{Name.last_name} #{suffix}" | ||
- "#{Name.last_name}-#{Name.last_name}" | ||
- "#{Name.last_name}, #{Name.last_name} en #{Name.last_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding default (en) patterns, since and
in English is en
in Dutch
…d duplicated company node
Codecov Report
@@ Coverage Diff @@
## main #641 +/- ##
============================================
+ Coverage 92.69% 92.71% +0.01%
- Complexity 2620 2621 +1
============================================
Files 281 281
Lines 5396 5396
Branches 589 589
============================================
+ Hits 5002 5003 +1
Misses 241 241
+ Partials 153 152 -1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@bodiam - we can merge this one ? thanks |
Merged! |
Can I ask what you're working on, especially the Dutch part? I'm Dutch myself, hence my question :) |
@bodiam - it's part OCD, part project I'm working on. I'm using GPT3 to generate some website content and I'm doing it for a wide range of European countries. I want my prompt to be as realistic and versatile as possible. Cheers! |
Nice! Have you seen the Datafaker GPT module? It's extremely basic, slow, beta, etc, but it kind of works: |
Hey, this is really nice. I'll check it out and you reach any time if you need any help or think I could contribute. Cheers, Luka |
@bodiam - just remembered, I was pushing davinci-003 to it's limits :D but figured out I could get a json response as a result.. eg with prompt: List 10 Dutch full names (male and female mixed). Response should be json object, with array field named "names". And you get something like: {
"names": [
"Jan Willem van der Meer",
"Johanna Maria van der Linden",
"Klaas Jan Koopman",
"Anneliese van der Veen",
"Pieter Jan de Boer",
"Marijke van der Heijden",
"Hendrik Jan Smit",
"Liesbeth van der Ploeg",
"Gerard Jan Koster",
"Petra van der Zee"
]
} It's easier to later parse the response. Could be useful ;) |
Yes, that is useful! What I actually wanted to do was to generate a set of data like you did, and then cache the results, and refresh the cache when all the items would be used. That would at least speed up the process a bit. I'll have a look at parsing the results, thanks for the idea! |
I did that for my custom steampunk and cyberpunk classes. I used GPT3 to generate sets of SP and CP professions/industries and names. Some examples: cyberpunk:
hacker:
name: [ "Neuromark", "Cyberblade", "Decryptor", "Nucloid", "Scriptrix", "Farality", "Hardwired", "Servo-Tron", "Opticore", "Machina", "Broodlord", "Cryptster"] company:
suffix: [ Corporation, Firm, Group, Solutions, Tech, Systems, Company, Group ]
industry: [ "Biometric authentication and access control", "Cybernetic Prosthetics Manufacturing", "Cybernetic Weapons", "Cybernetics And Biohacking", "Nanotech Manufacturing", "Nanotechnology Solutions","Synthetic Organ Manufacturing", "Teleportation ", "Underground Combat Arenas", "Wearable Augmentation", "Wearable Computing" ] steampunk:
company:
suffix: [ "Inc", "and Sons", "LLC", "Group", "Ltd", "& Co"]
industry: [ "Aetheric Airship Construction", "Aetheric Telegraphy Signal Maintenance Services", "Airship Commuter Services", "Airship Maintenance Technicians", "Airship Manufacturing", "Airship Navigation ", "Animatronic Entertainment ", "Animatronic Maintenance & Repair", "Copper Wire Supplier", "Zeppelin Parts ", "Zeppelin Refurbishment" ] I think it GPT3 is very useful and inexpensive tool to achieve wide variations on the specific topic. Reach out if you need any kind of help ;) |
@bodiam - also, what I noticed is needed with OpenAI API is retries. I did something like using final RetryConfig config =
RetryConfig.<Response<CompletionResponse>>custom()
.maxAttempts(5)
.intervalFunction(IntervalFunction.ofExponentialBackoff(Duration.ofSeconds(5L), 2D))
.retryOnResult(new RetryResponsePredicate())
.retryExceptions(
HttpException.class, TimeoutException.class, SocketTimeoutException.class)
.failAfterMaxAttempts(false)
.build(); Where our code retries on following: private static final class RetryResponsePredicate
implements Predicate<Response<CompletionResponse>> {
@Override
public boolean test(Response<CompletionResponse> response) {
if (response.code() == 429) {
log.info("Retrying request due to HTTP:429 code...");
return true;
}
if (response.code() > 400) {
log.info("NOT retrying request due to HTTP status {}!", response.code());
return false;
}
if (response.body() == null) {
log.info("Retrying request due to empty body...");
return true;
}
if (response.body().getChoices().isEmpty()) {
log.info("Retrying request due to empty choices list...");
return true;
}
if (response.body().getChoices().get(0).getText().isBlank()) {
log.info("Retrying request due to empty text...");
return true;
}
return false;
}
} Might be useful for your use case as well. |
Small update: I just added the better model to the experimental project, and I implemented caching. I didn't implement the retry mechanism (I didn't run into connectivity issues yet), but PRs welcome of course :) |
Expanding
suffix
collection + adjustingname
patterns: we shouldn't use default (en) since and is en in Dutch, thus I added