Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc inspection tool 001 #87

Closed
wants to merge 9 commits into from
Closed

Conversation

kamir
Copy link
Contributor

@kamir kamir commented Jul 5, 2023

No description provided.

@justinmclean
Copy link
Member

justinmclean commented Jul 5, 2023 via email

@kamir
Copy link
Contributor Author

kamir commented Aug 1, 2023

Hi Justin,
currently the goal is to initiate the project with a focus on "semi-automation" of documentation tasks.
Most importantly, the prompts, the prompt templates, and the indexing logic will be developed and applied to small data sets, such as the docu-repos of several ASF projects.

Now we have two options:
(A) Creation of content, such as README or quickstart tutorials, or executive summaries etc.
(B) Automatic inspection of the content, using the "reference questions" as defined in the tool - and based on the result, a human editor can update the original documents in such a way that the questions can be answered. I

In case (B) no generated content will be published in a software repository, it is just temporary information which allows
quality assessments and improvements in a standardized semi-automatic way.

I understand that following the path towards (A) is still a risk. Hence, the goal is the creation of tool which can be configured to use multiple LLMs, starting from public offerings to personal deployments (such as OpenAI model in Azure self hosted by a user vs. the fully managed OpenAI model which is used right now.

@kamir
Copy link
Contributor Author

kamir commented Aug 1, 2023

Is it possible to continue with this kind of activity, as long as we do not use the generated content as "Apache Training content", but rather work on an assistance system which supports content editors (especially for software documentation and for training material) to get better docs faster released?

@justinmclean
Copy link
Member

@kamir
Copy link
Contributor Author

kamir commented Sep 5, 2023

OK, thanks Justin.
I will "process" the docs and come up with a conclusion so that we know what to do with the pull request and the idea of using LLMs in general.

@chrisdutz
Copy link
Contributor

Also for a PR like this ... a bit more description would be nice ... I think from having a look at the payload of the PR, I can get a rough idea of what it's about ... but I did notice quite some binary files being checked in and that's one thing we generally don't like to see in apache repos.

@kamir
Copy link
Contributor Author

kamir commented Sep 5, 2023

Ok, thanks for the feedback. I will provide description and a clean-up, especially the binaries.

@kamir
Copy link
Contributor Author

kamir commented Feb 6, 2024

Hey,

after being hit by a first wave of all the LLM Hype, I thought, it would be a nice contribution to the Apache Training project, if we would use this new opportunity. My first experiments with langchain made me confident, that this is the next great thing.

There have been several obstacles, e.g., the question: To what extend are we allowed to use LLMs in the ASF context.

I have realised that going forward on the tool-level would probably not improve the situation.
The tool I proposed would not generate content but rather would help developers and trainers to understand the code base
and the available docs faster. But since other tools with the same focus have been built in between, there is no need to bring such a tool into this project.

We can use tools like: https://github.com/peterw/Chat-with-Github-Repo for this purpose.
I suggest to not focus in building a new tool, when others have one ready to use.*

But when I started working on this topic, I did not see this coming.

With this in mind, we can close the PR.

@kamir kamir closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants