In [2]:
import dspy
from dspy.experimental import Synthesizer
from dotenv import load_dotenv
import os

load_dotenv(".env")

GEN_LM  = dspy.OpenAI(model="gpt-4",api_key=os.environ["OPENAI_API_KEY"])
dspy.settings.configure(lm=GEN_LM)

In [5]:
# Build the examples
examples = [
dspy.Example(content = "CPU task scheduling is a crucial aspect of operating system functionality, responsible for allocating CPU resources efficiently among competing processes. Task scheduling aims to optimize system performance, ensuring fairness, responsiveness, and maximizing throughput. Various algorithms, such as First Come First Serve (FCFS), Round Robin, Shortest Job Next (SJN), and Priority Scheduling, are employed to achieve these objectives. Each algorithm employs distinct criteria for task prioritization and execution, catering to diverse system requirements. Effective CPU task scheduling enhances system responsiveness, resource utilization, and overall performance, playing a pivotal role in modern computing environments").with_inputs("content"),
dspy.Example(content = "Multiprocessing involves the simultaneous execution of multiple processes or tasks on a computer system with multiple CPUs or CPU cores. By leveraging parallelism, multiprocessing enhances system performance, allowing tasks to be executed concurrently, thus reducing overall execution time. It enables efficient utilization of available resources, distributing computational loads across multiple processors. Multiprocessing can be implemented at various levels, ranging from shared-memory multiprocessing (SMP) systems with multiple CPU cores on a single chip to distributed multiprocessing across multiple computers connected via a network. This approach facilitates scalability and improves system throughput, making it well-suited for demanding computational tasks such as scientific simulations, data processing, and multimedia rendering. Overall, multiprocessing is a fundamental paradigm in modern computing that contributes to enhanced efficiency and performance").with_inputs("content")]

In [6]:
synthesizer = Synthesizer()

data = synthesizer.generate(examples=examples,task_description="Generate short paragraphs on various fields and componenets of computer science like computer architecture, database, systems, cores, programming etc..",num_data=100)

Preparing Input Fields: 100%|██████████| 1/1 [00:01<00:00,  1.62s/it]
Preparing Output Fields: 0it [00:00, ?it/s]
Generating Synthetic Data: 100%|██████████| 100/100 [33:02<00:00, 19.82s/it]


In [7]:
print(data)


[Example({'content': "managing and organizing vast amounts of data efficiently. Cores, in the context of computer science, refer to the individual processing units within a computer's central processing unit (CPU). Each core can execute instructions independently of the others, allowing for more efficient processing and multitasking. Programming, on the other hand, is the process of creating a set of instructions that tell a computer"}) (input_keys={'content'}), Example({'content': '1. Computer Architecture: Computer architecture refers to the design and organization of computer systems. It involves understanding the functionalities, system design, and operational structure of computers. The primary goal of computer architecture is to optimize performance, efficiency, and cost.\n\n2. Database Systems: Database systems are computer programs used to manage and manipulate structured data. They provide an interface for users to interact with data, facilitate data processing, and ensure dat

In [26]:
import weaviate
import weaviate.classes.config as wc

In [27]:
headers = {"X-OpenAI-Api-Key":os.getenv("OPENAI_API_KEY")}
client = weaviate.connect_to_local(headers=headers)


In [11]:
client.collections.create(
    name="Paragraphs",
    properties=[
        wc.Property(name="content", data_type=wc.DataType.TEXT),
    ],

    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),

    generative_config=wc.Configure.Generative.openai()
)

<weaviate.collections.collection.Collection at 0x7f56fa05b640>

In [29]:
print(data[72].answer)

There are many general Python packages that users can utilize when working. Some popular ones include NumPy for scientific computing and data analysis, Pandas for data manipulation and analysis, and Matplotlib for data visualization.


In [28]:
overview_collection = client.collections.get("Paragraphs")
with overview_collection.batch.rate_limit(2400) as batch:
    for example in data:
        obj = {
            "content":example.content,
        }
        batch.add_object(
            properties=obj
        )

In [29]:
from dspy.retrieve.weaviate_rm import WeaviateRM

client = weaviate.Client("http://localhost:8080",additional_headers=headers)
RETRIVER_MODEL = WeaviateRM("Paragraphs",weaviate_client=client)


            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python
            


In [30]:
dspy.settings.configure(lm=GEN_LM,rm=RETRIVER_MODEL)

In [32]:
dspy.Retrieve(k=3)("what is a database").passages

['Database systems are the backbone of any information system. They provide a structured way to store, retrieve, and manage data. A database system is a software application that interacts with the user, other applications, and the database itself to capture and analyze data. The core components of a database system include the database management system (DBMS), the data, and the database schema. The',
 '2. Database Systems: Database systems are a fundamental component of computer science, designed to manage, store, and retrieve information. They are structured to provide a convenient and efficient way to store large amounts of data, and they can be managed using database management systems (DBMS). DBMSs provide an interface for interacting with the database, ensuring data consistency, security, and integrity',
 '(Database Systems) Database systems are a structured set of data. So, they handle the storage, retrieval, and updating of data in a computer system. A well-known example is th