You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Symbroson — 28/08/2023 18:18
think we can use clyde and some semi automated scripting to read strings from a text file that will be thrown at clyde to translate it into a certain language and write the results back to a new text file?
If its not possible as a client we can make use of a discord bot
My first duty will be to create a mechanism that produces those translation text files out of the docs sources. I imagine it to be similar to the search index, but we can remove duplicated sentences from the translations. This might require some tweaks to the generator here and there as well, as type desciptions are hardcoded into conf.json for instance
CaptainStarbuck — 28/08/2023 19:26
I'm guessing Clyde is based on GPT v3.5, the free/public LLM. GPT v4 is vastly superior. I've been thinking along the same lines, not using Clyde via Discord, but just coding to the GPT API.
I created a prototype in ChatGPT. What do you think? ChatGPT
Symbroson — 28/08/2023 19:33
I thought about just writing a plain txt file with each sentence that appears in the docs on a new line so that it can translated efficiently while avoiding as many duplicates as possible
json has alot of control characters in it that we dont need. So I write a tool that extracts sentences from the docs, this will get translated by chatgpt and my tool feeds this back into the docs via search replace
we will have to do the translatin in chunks as chatgpt cant translate a whole megabyte of text at once. So we need a tool that feeds chatgpt with chunks of text and writes them back to a new text file
I can probably make my tool sort the sentence into several text files in a way that often reused sentences are in a single files, and sentences with only one appearance in separate file(s)
Symbroson — 28/08/2023 19:41
But I dont know how much of a difference duplicate sentences will have. Its possible that its irrelevant after all as each doc file is very individual (except of the type descriptions and inherited methods)
only collecting ideas so far - I will probably not start implementing this before friday as I have an essay to complete for my university and I already procrastinated way too much distracting myself with the docs 😂
Symbroson — 28/08/2023 19:53
Alternatively we can also just feed in the raw markup files. I had to hit "Continue Generating" one time for this whole layout page. https://chat.openai.com/share/4865d7da-c88b-4be1-a1d6-08b143dd0015
it has a few flaws (ie it shouldnt translate the // ------ sections, but overall this might just work as well
only takes a good while to translate the whole docs this way ^^ (still way faster as if a person would do it manually)
also, do you think we should ask it to translate variable names? not too sure about that, especially if it throws unicode in. I believe at least DS supports it, but I might have to adjust many of my generator regexes
You have a good idea how something like this could be implemened? maybe with a free chatgpt api? or maybe with a browser scripting tool like greasemonkey?
CaptainStarbuck — 29/08/2023 18:40
All of this can be done with the GPT API v4.
The quality of v4 is vastly superior to the free v3.5.
Symbroson — 29/08/2023 18:41
afaik the developer api is based on a paid plan as well, using rates for the amount of tokens sent and received
CaptainStarbuck — 29/08/2023 18:44
To use GPT like this, we need to provide context in a prompt. I'll make this up....
I'm going to provide you with raw data for new DroidScript documentation that needs to be translated. Don't convert ...this... When you see ...this... convert it to ...this... If context is missing for a specific phrase, consider the larger context. If there is still confusion, batch the text at the end of your translation so that we can figure it out.
A complete prompt like this will guide it very nicely.
Yes, the v4 API is for-fee but the cost is trivial. If we track changes then only the first batch will be large and after that one-off changes will be pennies or less.
With this technology we can also ask for suggestions for better text. It's really REALLY good with this stuff.
As to variables, I do not think they should be changed for the reasons you've mentioned - and this is common convention.
Symbroson — 29/08/2023 18:50
could you develop such a tool that takes the markup files from the docs as input and calls the GPT API to translate it and write it back? It can be part of the repository too so you can work directly in xour Docs fork
Its perfectly fine if you have other responsibilities atm - its not high priority
CaptainStarbuck — 29/08/2023 18:56
I can't do it immediately for two reasons. First, normal business commitments. Second, I haven't setup a separate billing account for API v4. We don't get v4 API with my ChatGPT-Plus plan. However, as time permits I will address that, and until then I can write this up with v3.5 as a POC and we can see the difference when switching to v4 for production.
Symbroson — 29/08/2023 19:22
sounds like a plan 👍
thanks alot.
The text was updated successfully, but these errors were encountered:
Symbroson — 28/08/2023 18:18
think we can use clyde and some semi automated scripting to read strings from a text file that will be thrown at clyde to translate it into a certain language and write the results back to a new text file?
If its not possible as a client we can make use of a discord bot
My first duty will be to create a mechanism that produces those translation text files out of the docs sources. I imagine it to be similar to the search index, but we can remove duplicated sentences from the translations. This might require some tweaks to the generator here and there as well, as type desciptions are hardcoded into conf.json for instance
CaptainStarbuck — 28/08/2023 19:26
I'm guessing Clyde is based on GPT v3.5, the free/public LLM. GPT v4 is vastly superior. I've been thinking along the same lines, not using Clyde via Discord, but just coding to the GPT API.
I created a prototype in ChatGPT. What do you think? ChatGPT
Symbroson — 28/08/2023 19:33
I thought about just writing a plain txt file with each sentence that appears in the docs on a new line so that it can translated efficiently while avoiding as many duplicates as possible
json has alot of control characters in it that we dont need. So I write a tool that extracts sentences from the docs, this will get translated by chatgpt and my tool feeds this back into the docs via search replace
we will have to do the translatin in chunks as chatgpt cant translate a whole megabyte of text at once. So we need a tool that feeds chatgpt with chunks of text and writes them back to a new text file
I can probably make my tool sort the sentence into several text files in a way that often reused sentences are in a single files, and sentences with only one appearance in separate file(s)
Symbroson — 28/08/2023 19:41
But I dont know how much of a difference duplicate sentences will have. Its possible that its irrelevant after all as each doc file is very individual (except of the type descriptions and inherited methods)
only collecting ideas so far - I will probably not start implementing this before friday as I have an essay to complete for my university and I already procrastinated way too much distracting myself with the docs 😂
Symbroson — 28/08/2023 19:53
Alternatively we can also just feed in the raw markup files. I had to hit "Continue Generating" one time for this whole layout page.
https://chat.openai.com/share/4865d7da-c88b-4be1-a1d6-08b143dd0015
it has a few flaws (ie it shouldnt translate the // ------ sections, but overall this might just work as well
only takes a good while to translate the whole docs this way ^^ (still way faster as if a person would do it manually)
also, do you think we should ask it to translate variable names? not too sure about that, especially if it throws unicode in. I believe at least DS supports it, but I might have to adjust many of my generator regexes
You have a good idea how something like this could be implemened? maybe with a free chatgpt api? or maybe with a browser scripting tool like greasemonkey?
CaptainStarbuck — 29/08/2023 18:40
All of this can be done with the GPT API v4.
The quality of v4 is vastly superior to the free v3.5.
Symbroson — 29/08/2023 18:41
afaik the developer api is based on a paid plan as well, using rates for the amount of tokens sent and received
CaptainStarbuck — 29/08/2023 18:44
To use GPT like this, we need to provide context in a prompt. I'll make this up....
I'm going to provide you with raw data for new DroidScript documentation that needs to be translated. Don't convert ...this... When you see ...this... convert it to ...this... If context is missing for a specific phrase, consider the larger context. If there is still confusion, batch the text at the end of your translation so that we can figure it out.
A complete prompt like this will guide it very nicely.
Yes, the v4 API is for-fee but the cost is trivial. If we track changes then only the first batch will be large and after that one-off changes will be pennies or less.
With this technology we can also ask for suggestions for better text. It's really REALLY good with this stuff.
As to variables, I do not think they should be changed for the reasons you've mentioned - and this is common convention.
Symbroson — 29/08/2023 18:50
could you develop such a tool that takes the markup files from the docs as input and calls the GPT API to translate it and write it back? It can be part of the repository too so you can work directly in xour Docs fork
Its perfectly fine if you have other responsibilities atm - its not high priority
CaptainStarbuck — 29/08/2023 18:56
I can't do it immediately for two reasons. First, normal business commitments. Second, I haven't setup a separate billing account for API v4. We don't get v4 API with my ChatGPT-Plus plan. However, as time permits I will address that, and until then I can write this up with v3.5 as a POC and we can see the difference when switching to v4 for production.
Symbroson — 29/08/2023 19:22
sounds like a plan 👍
thanks alot.
The text was updated successfully, but these errors were encountered: