In [4]:
from llama_index.llms.ollama import Ollama

llm = Ollama(model = "llama3")
response = llm.complete("Tell me a story in 20 words!")
print(response)

As the clock struck midnight, a lone rabbit named Rosie hopped into a magical garden, discovering hidden wonders.


In [10]:
#!pip install llama-index qdrant_client torch transformers 
#!pip install llama-index-llms-ollama
#!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-groq

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-index-llms-groq
  Downloading llama_index_llms_groq-0.1.4-py3-none-any.whl.metadata (2.2 kB)
Collecting llama-index-llms-openai-like<0.2.0,>=0.1.3 (from llama-index-llms-groq)
  Downloading llama_index_llms_openai_like-0.1.3-py3-none-any.whl.metadata (753 bytes)
Downloading llama_index_llms_groq-0.1.4-py3-none-any.whl (2.9 kB)
Downloading llama_index_llms_openai_like-0.1.3-py3-none-any.whl (3.0 kB)
Installing collected packages: llama-index-llms-openai-like, llama-index-llms-groq
Successfully installed llama-index-llms-groq-0.1.4 llama-index-llms-openai-like-0.1.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


In [7]:
from llama_index.llms.ollama import Ollama
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core import PromptTemplate, Settings
from llama_index.core.embeddings import resolve_embed_model

def ingest_load(query):
    # only load PDFs files
    required_exts = [".pdf"]

    # load documents 
    loader = SimpleDirectoryReader(
                            "data", 
                            required_exts= required_exts
                        )

    documents = loader.load_data()

    # create embeddings using HuggingFace model
    embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")

    # prompt template
    template =  (
        "We have provided context information below. \n"
        "---------------------\n"
        "{context_str}"
        "\n---------------------\n"
        "Given this information, please answer the question: {query_str}\n"
        "If you don't know the answer, please do mention : I don't know !"
    )

    prompt = PromptTemplate(template = template)

    # define llms
    llm = Ollama(model="llama3", request_timeout= 3000)

    # setting up llm and output tokens
    Settings.llm = llm
    Settings.num_output = 250
    Settings.embed_model = embed_model

    # define index
    index = VectorStoreIndex.from_documents(documents)

    # define query engine 
    query_engine = index.as_query_engine()

    # update our custom prompt
    query_engine.update_prompts(prompt)

    # Ask query and get response
    response = query_engine.query(query)

    print(response)

ingest_load("How did shopify scale their database processing?")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Shopify scaled their MySQL database by employing federation as their initial strategy. They broke up the primary database into smaller MySQL databases, identifying groups of large tables that could exist separately without requiring many joins between them. However, they eventually ran into issues with this approach, including unable to further split the primary database, long schema migrations, and interrupted background jobs. Instead, they pivoted to Vitess as their new scaling strategy.


In [8]:
ingest_load("How much efficiency and performance improvement they acheived after doing this federtion strategy?")

The training process described in the context involves a pretraining stage followed by Supervised Fine Tuning (SFT) and then Reward Modeling and Reinforcement Learning. After these stages, the resulting model is capable of performing well in tasks such as sentiment classification, question answering, chat assistant, etc. The SFT Model generates responses that are scored by human contractors using a reward model, which predicts how well the generated response answers the prompt. This process allows for training a Reinforcement Learning (RLHF) model that can score responses and fine-tune the SFT model to achieve better performance.

As a result of this process, the RLHF model is able to achieve high efficiency and performance in tasks such as chat assistance, question answering, and sentiment classification.


In [11]:
#pip install llama-index-llms-groq
from llama_index.llms.groq import Groq
#pip install python-dotenv
from dotenv import load_dotenv
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core import PromptTemplate, Settings
from llama_index.core.embeddings import resolve_embed_model
import os

load_dotenv()

api_key = os.getenv('GROQ_API_KEY')

def groq_ingest_load(query):

    # only load PDFs files
    required_exts = [".pdf"]

    # load documents 
    loader = SimpleDirectoryReader(
                            "data", 
                            required_exts= required_exts
                        )

    documents = loader.load_data()

    # create embeddings using HuggingFace model
    embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")

    # prompt template
    template =  (
        "We have provided context information below. \n"
        "---------------------\n"
        "{context_str}"
        "\n---------------------\n"
        "Given this information, please answer the question: {query_str}\n"
        "If you don't know the answer, please do mention : I don't know !"
    )

    prompt = PromptTemplate(template = template)

    # define llms
    llm = Groq(model="llama3-70b-8192", api_key= api_key)

    # setting up llm and output tokens
    Settings.llm = llm
    Settings.num_output = 250
    Settings.embed_model = embed_model

    # define index
    index = VectorStoreIndex.from_documents(documents)

    # define query engine 
    query_engine = index.as_query_engine()

    # update our custom prompt
    query_engine.update_prompts(prompt)

    # Ask query and get response
    response = query_engine.query(query)

    print(response)


groq_ingest_load("How did shopify scale their database processing?")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Shopify initially scaled their database processing using federation, where they broke up their primary database into smaller MySQL databases by identifying independent table groups. However, they eventually faced issues with this approach, including being unable to further split the primary database, long schema migrations, and interrupted background jobs. They then pivoted to using Vitess, an open-source sharding solution for MySQL, to overcome these scaling challenges.


In [12]:
groq_ingest_load("How much efficiency and performance improvement they acheived after doing this federtion strategy?")



The context does not provide a direct answer to the query about the efficiency and performance improvement achieved after a "federation strategy." However, it discusses the training process of ChatGPT, including pretraining, supervised fine-tuning, reward modeling, and reinforcement learning. 

It mentions that the pretraining stage accounts for 99% of the total compute time needed to train ChatGPT, and it took 21 days of training with 2048 A100 GPUs, costing $5 million USD. The fine-tuning stages, including supervised fine-tuning, reward modeling, and reinforcement learning, are described, but no specific efficiency or performance improvement metrics are provided.


In [14]:
import time
from ollama_llama3 import ingest_load
from llama3_groq import groq_ingest_load

def get_execution_time():
    # get the start time 
    start_time = time.time()
    ingest_load("How did shopify scale their database processing?")
    end_time = time.time()

    # calculate execution time
    execution_time = end_time - start_time
    print(f"The function llama3 took {execution_time} seconds to execute.")

    # get the start time 
    start_time1 = time.time()
    groq_ingest_load("How did shopify scale their database processing?")
    end_time1 = time.time()

    # calculate execution time
    execution_time1 = end_time1 - start_time1
    print(f"The function groq llama3 took {execution_time1} seconds to execute.")

get_execution_time()



Shopify's first strategy to scale MySQL was federation. This involved taking the primary database and breaking it up into smaller MySQL databases. They identified groups of large tables in the primary database that could exist separately - these table groups were independent from each other and didn’t have many queries that required joins between them.
The function llama3 took 15.256630182266235 seconds to execute.
Shopify initially employed federation as their scaling strategy, breaking up their primary database into smaller MySQL databases by identifying independent table groups. However, they eventually faced issues with this approach, including being unable to further split the primary database, long schema migrations, and interrupted background jobs. They then pivoted to Vitess, an open-source sharding solution for MySQL, to overcome these scaling pains.
The function groq llama3 took 5.633556127548218 seconds to execute.
