Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step by step instructions on how to build this AI from source code #212

Open
adrelanos opened this issue Oct 26, 2023 · 0 comments
Open

Comments

@adrelanos
Copy link

For a project to be considered Truly [1] Open Source project instructions on how to build it from scratch, from source code are required.

What I mean by that... The following style...

  1. download the training data

download link

  1. install the build dependencies

A
B
C

  1. get the source code

  2. run this build script

  3. done

The LLM (large language model) file has been created.

  1. Start using the LLM.

Do x, y, z to get a prompt or follow the institutions here on how to interface with the LLM file.

  • For the build documentation, please refer to the cloud as little as possible. The essential information is to do without a cloud.
  • Also basic commentary how long the build process approximate took for you, what hardware you used, how much it cost you would be good.
  • Before anyone is saying "you cannot build this from source code because you don't have the infrastructure anyhow". That might be correct but even if I don't have it, somebody who has the requirements still needs the instructions on how to do it locally.
  • It's arguable how simple the instructions have to be. I am a fairly technical person, a Linux distribution maintainer and I must say I am lost at hello.

Some Debian Linux developers have indicated interest to package LLMs. (ref, many more refs on request) But of course, Debian developers would need to be able to independently reproduce the LLM from source code from the "smallest reasonable building blocks" (my words), i.e. build documentation + data + AI source code + build scripts.


[1] It is sad that the word "Truly" has to be prepended because third-parties used the term Open Source but all they provided was a huge binary blob (the LLM file) without build documentation, training data, AI source code.

Also OSI (Open Source Initative) shares this concern, see What does it mean for an AI system to be Open Source?.

If this is Dolly is Truely Open Source, I am applauding your efforts.

Disclaimer:
I am not a spokesperson for any of the mentioned projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant