Doc inspection tool 001 #87

kamir · 2023-07-05T15:17:28Z

No description provided.

Initial steps for some Apache Wayang related content.

LIBS folder should not be added to the repo.

…el).

justinmclean · 2023-07-05T23:10:07Z

Hi Mirko, I appreciate what you are trying to do here but you may need to take some care with using the output of LLMs due to the possible issues around licensing and copyright. There a recent discussion on legal discuss list about that. Kind Regards, Justin

…

kamir · 2023-08-01T11:27:30Z

Hi Justin,
currently the goal is to initiate the project with a focus on "semi-automation" of documentation tasks.
Most importantly, the prompts, the prompt templates, and the indexing logic will be developed and applied to small data sets, such as the docu-repos of several ASF projects.

Now we have two options:
(A) Creation of content, such as README or quickstart tutorials, or executive summaries etc.
(B) Automatic inspection of the content, using the "reference questions" as defined in the tool - and based on the result, a human editor can update the original documents in such a way that the questions can be answered. I

In case (B) no generated content will be published in a software repository, it is just temporary information which allows
quality assessments and improvements in a standardized semi-automatic way.

I understand that following the path towards (A) is still a risk. Hence, the goal is the creation of tool which can be configured to use multiple LLMs, starting from public offerings to personal deployments (such as OpenAI model in Azure self hosted by a user vs. the fully managed OpenAI model which is used right now.

kamir · 2023-08-01T11:30:14Z

Is it possible to continue with this kind of activity, as long as we do not use the generated content as "Apache Training content", but rather work on an assistance system which supports content editors (especially for software documentation and for training material) to get better docs faster released?

justinmclean · 2023-08-01T13:19:39Z

Please see https://www.apache.org/legal/generative-tooling.html

kamir · 2023-09-05T07:00:06Z

OK, thanks Justin.
I will "process" the docs and come up with a conclusion so that we know what to do with the pull request and the idea of using LLMs in general.

chrisdutz · 2023-09-05T07:28:40Z

Also for a PR like this ... a bit more description would be nice ... I think from having a look at the payload of the PR, I can get a rough idea of what it's about ... but I did notice quite some binary files being checked in and that's one thing we generally don't like to see in apache repos.

kamir · 2023-09-05T20:32:13Z

Ok, thanks for the feedback. I will provide description and a clean-up, especially the binaries.

kamir · 2024-02-06T21:29:27Z

Hey,

after being hit by a first wave of all the LLM Hype, I thought, it would be a nice contribution to the Apache Training project, if we would use this new opportunity. My first experiments with langchain made me confident, that this is the next great thing.

There have been several obstacles, e.g., the question: To what extend are we allowed to use LLMs in the ASF context.

I have realised that going forward on the tool-level would probably not improve the situation.
The tool I proposed would not generate content but rather would help developers and trainers to understand the code base
and the available docs faster. But since other tools with the same focus have been built in between, there is no need to bring such a tool into this project.

We can use tools like: https://github.com/peterw/Chat-with-Github-Repo for this purpose.
I suggest to not focus in building a new tool, when others have one ready to use.*

But when I started working on this topic, I did not see this coming.

With this in mind, we can close the PR.

kamir added 7 commits May 29, 2023 17:08

Added module for Apache Wayang.

6781e8a

Initial steps for some Apache Wayang related content.

Clean project to prep PR.

b5029e8

Add scripts

49703a4

Updated Gitignore.

cb66ac8

LIBS folder should not be added to the repo.

Created initial version of DOCU inspector.

83d9f23

Created a simple repo-inspection tool using ChatGPT (Azure OpenAI mod…

e9c9c52

…el).

Fixed format issues.

2d3d9ba

prepared a working document analysis script to be published

2368c3d

clean up 4 release of the repo inspection tool

1932dad

kamir closed this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc inspection tool 001 #87

Doc inspection tool 001 #87

kamir commented Jul 5, 2023

justinmclean commented Jul 5, 2023 via email

kamir commented Aug 1, 2023

kamir commented Aug 1, 2023

justinmclean commented Aug 1, 2023

kamir commented Sep 5, 2023

chrisdutz commented Sep 5, 2023

kamir commented Sep 5, 2023

kamir commented Feb 6, 2024

Doc inspection tool 001 #87

Doc inspection tool 001 #87

Conversation

kamir commented Jul 5, 2023

justinmclean commented Jul 5, 2023 via email

kamir commented Aug 1, 2023

kamir commented Aug 1, 2023

justinmclean commented Aug 1, 2023

kamir commented Sep 5, 2023

chrisdutz commented Sep 5, 2023

kamir commented Sep 5, 2023

kamir commented Feb 6, 2024