Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved excess information and documentation to external dev doc #55826

Merged
merged 6 commits into from
Jan 26, 2024

Conversation

tshaffercodeorg
Copy link
Contributor

The below is a follow-up fix for the conversation in this PR: #55633

Executive Summary

  • Migrated background information and model evaluation scripts/logs to external google drive folder
  • Streamlined readme.md and inline documentation

Copy link
Contributor

@thomasoniii thomasoniii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀

Copy link
Contributor

@fisher-alice fisher-alice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this! Just left a handful of minor comments/suggestions.

# Math explanation: Cosine distance outputs a value between 0 -> 1 where smaller values = greater similarity
# We can redefine this into cosine similarity with a simple (x-1)*-1 due to their mathematical relationship
# Since we take the SUM(MAX(similarity)) value when determining which options to present the user, cosine similarity is preferable
# Conversion from cosine distance to cosine similarity for easier readability in frontend computations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: front-end since used as adjective. (Sorry I didn't include in prior comment.)

# Conversion from cosine distance to cosine similarity for easier readability in frontend computations.
# Math explanation: Cosine distance outputs a value between 0 -> 1 where smaller values = greater similarity.
# Cosine similarity redefines this relationship so that instead larger values = greater ssimilarity.
# Since we expose some of these values to students in the frontend, we felt that similarity values growing larger would be easier to understand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit suggestion: 'Since we expose some of these values to users in the frontend, we felt that greater values = greater similarity would be easier to understand.'

# Since we take the SUM(MAX(similarity)) value when determining which options to present the user, cosine similarity is preferable
# Conversion from cosine distance to cosine similarity for easier readability in frontend computations.
# Math explanation: Cosine distance outputs a value between 0 -> 1 where smaller values = greater similarity.
# Cosine similarity redefines this relationship so that instead larger values = greater ssimilarity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo 'similarity'

@@ -7,7 +7,7 @@ Run the `HoC2023AiGenerateWeights.py` to generate the associated output weights

Before running the script, make sure to adjust your local parameters based off the current model being used. Previous iterations have leveraged spaCy and OpenAI's Ada models and it is not unreasonable to anticipate that the model "vendor" may change again in the future.

As of 01/08/2024, this script uses AWS's Titan v1 LLM through their Bedrock API.
As of 01/08/2024, this script uses AWS's Titan v1 LLM through their Bedrock API. For additional background context and testing resources, check the google drive here: https://docs.google.com/document/d/1beDoalfB1Y7BybN82YGhuzTNos_TE5l0dX4XdKPzNdw/edit?usp=sharing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could link text to URL as done in javabuilder READAME
'For additional background context and testing resources, see the Dance AI Design Dev Doc.'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - could you define 'LLM' the first time it's introduced in the doc?

measure the relatedness of text strings. The distance between two vectors measures
their relatedness. https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture
These embeddings are stored in caches files as pickle files, python's native way to serialize data.
The embeddings used to generate these maps are stored in cached pickle files such as foreground_embeddings.pkl to prevent duplicate LLM API calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use backticks for file name:
'The embeddings used to generate these maps are stored in cached pickle files such as foreground_embeddings.pkl to prevent duplicate LLM API calls.'


At runtime, DanceAI will use the three maps to lookup the scores for each output type and take the top three indexes of (MAX(SUM(Input1Scores, Input2Scores, Input3Scores))) to select a final palette/foreground/background to display to the user. These maps are stored as a local cache rather than generated at runtime to remove the costs associated with querying a LLM and improve runtime performance.
At runtime, DanceAI will use the three maps to lookup the scores for each output type and randomly select one of the top 3 results of MAX(SUM(Input1Scores, Input2Scores, Input3Scores)) to select a final palette/foreground/background to display to the user. These maps are stored as a local cache rather than generated at runtime to remove the costs associated with querying a LLM and improve runtime performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since on the frontend, we present both randomly selected top version (from top 3) and randomly selected bottom versions ( 4 from bottom 20), maybe we can just state that the three maps are used to look up scores for each output type. We could also refer them to the calculateOutputSummedWeights.ts file.

@tshaffercodeorg tshaffercodeorg merged commit f57ff0e into staging Jan 26, 2024
2 checks passed
@tshaffercodeorg tshaffercodeorg deleted the tyrone/hoc2023-ai-documentation branch January 26, 2024 18:37
mikeharv pushed a commit that referenced this pull request Feb 5, 2024
…tation

Moved excess information and documentation to external dev doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants