## Prompting 
Source: https://docs.gpt4all.io/index.html

In [16]:
#! pip install gpt4all
#! pip install nomic

In [38]:
from gpt4all import GPT4All
model_meta_llama = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM

In [50]:
# Text from the Tinder box (HCA)
import time
# Start the timer
start_time = time.time()

text = '  Hvad vil du nu med det fyrtøj, spurgte soldaten. \
             Det kommer ikke dig ved! sagde heksen, nu har du jo fået penge! \
            Giv mig bare fyrtøjet! Snik snak! sagde soldaten, \
            \
             vil du straks sige mig, hvad du vil med det, eller jeg trækker min sabel \
            ud og hugger dit hoved af! Nej, sagde heksen. \
            \
            Så huggede soldaten hovedet af hende. Der lå hun! Men han bandt alle sine \
            penge ind i hendes forklæde, tog det som en bylt på ryggen, puttede fyrtøjet \
            i lommen og gik lige til byen.'

with model_meta_llama.chat_session():
    print(model_meta_llama.generate(f"Du bliver præsentret for en tekst og din opgave er at vurdere om teksten \
                            indeholder en beskrivelse af vold: {text}", max_tokens=100))

# End the timer
end_time = time.time()

# Calculate the elapsed time
print('\n')
elapsed_time = end_time - start_time
print(f"Time taken to run the code: {elapsed_time:.2f} seconds")

Ja, jeg vil nu vurdere om teksten indeholder en beskrivelse af vold.

Jeg kan konstatere at der er flere eksempler på voldelig adfærd i denne tekst:

* Heksen siger "Så huggede soldaten hovedet af hende." Det er et tydeligt eksempel på fysisk vold.
* Soldaten truer med at hugge heksens hoved af,
Time taken to run the code: 73.59 seconds


In [51]:
from gpt4all import GPT4All
gpt4all_model = GPT4All("gpt4all-13b-snoozy-q4_0.gguf") # downloads / loads a 4.66GB LLM

In [52]:
import time
# Start the timer
start_time = time.time()

text = '  Hvad vil du nu med det fyrtøj, spurgte soldaten. \
             Det kommer ikke dig ved! sagde heksen, nu har du jo fået penge! \
            Giv mig bare fyrtøjet! Snik snak! sagde soldaten, \
            \
             vil du straks sige mig, hvad du vil med det, eller jeg trækker min sabel \
            ud og hugger dit hoved af! Nej, sagde heksen. \
            \
            Så huggede soldaten hovedet af hende. Der lå hun! Men han bandt alle sine \
            penge ind i hendes forklæde, tog det som en bylt på ryggen, puttede fyrtøjet \
            i lommen og gik lige til byen.'

with gpt4all_model.chat_session():
    print(gpt4all_model.generate(f"You will be presneted for a text and your task is to evaluate it the text\
                            holds a description of violence: {text}", max_tokens=100))


# End the timer
end_time = time.time()

# Calculate the elapsed time
print('\n')
elapsed_time = end_time - start_time
print(f"Time taken to run the code: {elapsed_time:.2f} seconds")

The text describes a violent encounter between two characters - a soldier and a woman who is referred to as "heksen" (the witch). The conversation begins with the soldier asking for something, but hexes refuses to tell him what it is. She then asks for his money in exchange for an object that she refers to as fyrtøj (which translates to "fire equipment"). When the soldier refuses her request and insults her, hexen becomes angry and
Time taken to run the code: 126.13 seconds


##  Embeddnings

Source: https://docs.gpt4all.io/gpt4all_python/home.html#installation

In [53]:
#! pip install gpt4all
#! pip install nomic

In [54]:
from nomic import embed
embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings']
print("Number of embeddings created:", len(embeddings))
print("Number of dimensions per embedding:", len(embeddings[0]))

Number of embeddings created: 2
Number of dimensions per embedding: 768


## Embeddnings and Similarity search

Embedding is a fantastic technology that really can help us when we work with texts and images and information retrieval. 

Here follows an example with embedding with texts.

### Explanation:
This code snippet is designed to find the most similar text to a given query from a list of texts. It does this by:
1. Generating embeddings for the query and the texts.
2. Reshaping the embeddings to 2D arrays.
3. Computing the cosine similarity between the query embedding and each text embedding.
4. Identifying the text with the highest similarity score.

The output will be the index of the text that is most similar to the query.

In [56]:
from nomic import embed
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

embed_text_one = "Along the Swedish coasts you can find rose plants and raspberry plants"
embed_text_two = "The nuclear power plant and the oil drilling plants is located among the fruit plants on the Swedish coast"

query = 'A garter is looking for plants and fruits'

# Get the embeddings
embed_query = embed.text([query], inference_mode="local")['embeddings']

embed_text = embed.text([embed_text_one,
                         embed_text_two], inference_mode="local")['embeddings']

# Flatten the embeddings to 2D arrays
embed_query = np.array(embed_query).reshape(1, -1)  # Reshape to (1, embedding_size)
embed_text = np.array(embed_text).reshape(len(embed_text), -1)  # Reshape to (num_texts, embedding_size)

# Compute cosine similarity
similarities = cosine_similarity(embed_query, embed_text)

# Find the index of the nearest neighbor
nearest_neighbor_index = np.argmax(similarities)

print(f"The nearest neighbor index is: {nearest_neighbor_index}")


The nearest neighbor index is: 0
