Skip to content

What exactly is clip skip? #5674

Discussion options

You must be logged in to vote

CLIP model (The text embedding present in 1.x models) has a structure that is composed of layers. Each layer is more specific than the last. Example if layer 1 is "Person" then layer 2 could be: "male" and "female"; then if you go down the path of "male" layer 3 could be: Man, boy, lad, father, grandpa... etc. Note this is not exactly how the CLIP model is structured, but for the sake of example.

The 1.5 model is for example 12 ranks deep. Where in 12th layer is the last layer of text embedding. Each layer matrix of some size, and each layer is has additional matrixes. So 4x4 first layer has 4 4x4 under it... SO and so forth. So the text space is dimensionally fucking huge.

Now why would …

Replies: 2 comments 11 replies

Comment options

You must be logged in to vote
11 replies
@Michoko92
Comment options

@abhibeats95
Comment options

@5KilosOfCheese
Comment options

@redomsu76
Comment options

@5KilosOfCheese
Comment options

Answer selected by mezotaken
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet