# Basics

In [37]:
import os

import pinecone

from playground_secret_key import SECRET_KEY
from langchain.schema import (SystemMessage, HumanMessage, AIMessage)
from langchain.chat_models import ChatOpenAI

os.environ['OPENAI_API_KEY'] = SECRET_KEY
chat = ChatOpenAI(
    openai_api_key = os.environ['OPENAI_API_KEY'],
    model = 'gpt-3.5-turbo'
)

messages = [
    SystemMessage(content='You are a tutor that helps highschool students.'),
    HumanMessage(content='Hi tutor, how are you today?'),
    AIMessage(content='I am great, thank you, how can I help you today?.'),
    HumanMessage(content='I would like you to explain to me second order derivatives')
]

# TODO : to have chat history you append both the AI response and the new prompt to the messages list

res = chat.invoke(messages)




# Chat history

In [38]:
messages.append(res)
prompt = HumanMessage(content='How does is this used in finding maxima and minima of a function')
messages.append(prompt)
res = chat.invoke(messages)
print(res.content)


Second derivatives are crucial in determining whether a critical point is a maximum, minimum, or neither. Here's how it works:

1. Find the critical points of the function by setting the first derivative equal to zero and solving for x.

2. Once you have the critical points, you can use the second derivative test to determine whether each critical point is a maximum, minimum, or neither.

- If the second derivative at the critical point is positive, then the point is a local minimum.
- If the second derivative at the critical point is negative, then the point is a local maximum.
- If the second derivative is zero or does not exist, the test is inconclusive.

By using the second derivative test, you can determine whether a critical point is a maximum, minimum, or neither without having to graph the function. This method is especially useful when dealing with functions that are more complex and difficult to graph.

If you have a specific function you'd like to work through or any other q

# Loading data

In [39]:
import glob
from langchain_community.document_loaders import DirectoryLoader
from pathlib import Path
from pathlib import Path
import glob

target_dir = '/Users/lorenzodeappolonia/Desktop/supervised_learning/to_do'
documents = DirectoryLoader(path=target_dir, glob='01_*.pdf', recursive=True, show_progress=True).load_and_split()


100%|██████████| 1/1 [00:00<00:00,  1.01it/s]


In [40]:
from pinecone import Pinecone 
from playground_secret_key import PINECONE_KEY

os.environ['PINECONE_API_KEY'] = PINECONE_KEY
environment = os.environ.get('PINECONE_ENVIRONMENT')

pc = Pinecone()



In [41]:
print(pc.list_indexes())

{'indexes': [{'dimension': 1536,
              'host': 'rag-xezwua8.svc.gcp-starter.pinecone.io',
              'metric': 'cosine',
              'name': 'rag',
              'spec': {'pod': {'environment': 'gcp-starter',
                               'pod_type': 'starter',
                               'pods': 1,
                               'replicas': 1,
                               'shards': 1}},
              'status': {'ready': True, 'state': 'Ready'}}]}


In [42]:
# from pinecone import ServerlessSpec, PodSpec 
# import time
# index_name = "llama-2-rag"
# 
# if index_name not in pinecone.list_indexes().names():
#     pinecone.create_index(
#         index_name,
#         dimension=1536,
#         metric='cosine',
#         spec=PodSpec(environment="us-west1-gcp", pod_type="p1.x1")
# 
#     )
# 
#     while not pinecone.describe_index(index_name).status['ready']:
#         time.sleep(1)
#         
# index = pinecone.index(index_name)

In [43]:
index = pc.Index(name='rag')
print(index.describe_index_stats())

{'dimension': 1536,
 'index_fullness': 0.00027,
 'namespaces': {'': {'vector_count': 27}},
 'total_vector_count': 27}


In [44]:
from langchain_openai.embeddings import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model='text-embedding-ada-002')

In [45]:
texts = ['this is the first chunk of text',
         'then here is another chunk of text']

res = embed_model.embed_documents(texts)
print(len(res), len(res[0]))
print(documents[0].page_content)

2 1536
Introduction to Supervised Learning

Michela Papandrea michela.papandrea@supsi.ch

Supervised Learning Bachelor of Data Science and Artiﬁcial Intelligence University of Applied Sciences and Arts of Southern Switzerland

Michela Papandrea (SUPSI)

Introduction to Supervised Learning

1 / 25

Overview

1

Introduction

2 Data Representation

3 Classiﬁcation vs Regression

Deﬁnition of Classiﬁcation Deﬁnition of Regression

4 Generalization, Overﬁtting and Underﬁtting

Michela Papandrea (SUPSI)

Introduction to Supervised Learning

2 / 25

Machine Learning

extracting knowledge from data.

intersection of statistics, artiﬁcial intelligence, and computer science (aka predictive analytics or statistical learning )

ML applications is ubiquitous many modern websites and devices have machine learning algorithms at their core

Example

automatic recommendations of which movies to watch, what food to order or which products to buy,

personalized online music streaming

recognizing friend

In [46]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=500,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)




In [47]:
batch_size = 500
i=0
vectors = []
j=0

for document in documents:
    i=0
    while i <= len(document.page_content):
        print(i)
        batch = document.page_content[i:batch_size]
        i += batch_size
        embeds = embed_model.embed_query(batch)
        vectors.append({'id' : f'{j}_{i}', 'values' : embeds})
    j+=1
        
print(vectors)    
index.upsert(vectors = vectors)

    



0
500
1000
1500
2000
2500
3000
3500
0
500
1000
1500
2000
2500
3000
3500
0
500
1000
1500
2000
2500
3000
3500
0
500
1000
[{'id': '0_500', 'values': [0.01558062600115537, 0.0076565736756312435, 0.026841474088281095, -0.041378932144134815, -0.026680986712080842, 0.018990977623136354, -0.00363436497962474, 0.024300427379960018, -0.012170274379174386, -0.04132543697294981, 0.020408614560700894, 0.02419343517494477, -0.009676036876962807, -0.007081494997456724, 0.023671851667955258, -0.0014836701559113594, 0.014417094852010079, 0.012083343794676134, 0.009040775199882537, -0.011989726313779757, -0.03292660941522305, 0.021237797164649425, -0.00338861897585642, -0.01802805522858007, -0.018335656188184325, -0.009228010161675293, 0.015754488101474493, -0.0349059531008111, -0.0015421810814715956, -0.01590159982223326, -0.00773681736373137, -0.01491192984208447, -0.01828215915435408, -0.01372165017602406, -0.010806133544117469, -0.04151267379738781, 0.015941723063267247, -0.015045668701369605, 0.015

{'upserted_count': 27}

In [48]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.00027,
 'namespaces': {'': {'vector_count': 27}},
 'total_vector_count': 27}

In [49]:
from langchain_pinecone import Pinecone


vectorstore = Pinecone(index, embed_model)

query = 'give me an example of a Machine Learning application'

res = vectorstore.similarity_search(query=query, k=2)
for el in res:
    print(el.page_content)

PineconeApiAttributeError: ScoredVector has no attribute 'metadata' at ['['received_data', 'matches', 0]']['metadata']

In [50]:
def augmented_prompt(query: str):
    results = vectorstore.similarity_search(query, k=1)
    source_knowledge = '\n'.join([x.page_content for x in results])
    augmented_prompt = f"""Using the context below, answer the query. 
    
    Contexts: 
    {source_knowledge} 
    
    Query: 
    {query}"""
    return augmented_prompt

In [51]:
print(augmented_prompt(query))

PineconeApiAttributeError: ScoredVector has no attribute 'metadata' at ['['received_data', 'matches', 0]']['metadata']

In [52]:
prompt = HumanMessage(
    content=augmented_prompt('According to Michela Papandrea, what are the main steps of M.L. analysis')
)

messages.append(prompt)
res = chat(messages)

print(res.content)

PineconeApiAttributeError: ScoredVector has no attribute 'metadata' at ['['received_data', 'matches', 0]']['metadata']

In [61]:
content = "Given the following chunk of text: Introduction to Supervised Learning Michela Papandrea michela.papandrea@supsi.ch Supervised Learning Bachelor of Data Science and Artificial Intelligence University of Applied Sciences and Arts of Southern Switzerland Michela Papandrea (SUPSI) Introduction to Supervised Learning 1 / 25 Overview 1 Introduction 2 Data Representation 3 Classification vs Regression Definition of Classification Definition of Regression 4 Generalization, Overfitting and Underfitting Michela Papandrea (SUPSI) Introduction to Supervised Learning 2 / 25 Machine Learning extracting knowledge from data. intersection of statistics, artificial intelligence, and computer science (aka predictive analytics or statistical learning) ML applications is ubiquitous many modern websites and devices have machine learning algorithms at their core Example automatic recommendations of which movies to watch, what food to order or which products to buy, personalized online music streaming recognizing friends faces on your photos, inferring age and gender Michela Papandrea (SUPSI) Introduction to Supervised Learning 3 / 25 Why Machine Learning? Past intelligent applications involved handcoded rules: if-then-else decisions to process data Example: spam filter with blacklist of words Present major disadvantages of manually crafted decision rules 1 logic required to make a decision is specific to a single domain and task. = might require a rewrite of the whole ) every slight change in the task system 2 human expert deep understanding of how a decision should be made is necessary Example: faces detection in images Michela Papandrea (SUPSI) Introduction to Supervised Learning 4 / 25 Supervised Learning FILI automate decision-making processes by generalizing from known examples (dataset) data is provided as pairs of inputs and desired outputs (model) the algorithm finds a way to produce the desired output given an input (generalization) the algorithm generate autonomously an output for an input it has never seen before Michela Papandrea (SUPSI) Introduction to Supervised Learning 5 / 25 Supervised Learning: the meaning Supervised Learning Supervised Learning is a subbranch of Machine Learning algorithms that learn from examples (<input, desired output> pairs) ”Supervised” refers to: having a teacher which supervise the whole process supervision is provided in the form of desired outputs for each training example require a laborious manual process of inputs and outputs dataset creation prediction performances are quantitatively measurable training: the algorithm will search for patterns in the data that correlate the input with the desired outputs prediction: the algorithm takes new unseen inputs and determine the output (<new input, predicted output>) based on prior training data objective of a SL model: predict the correct label for newly presented input data Michela Papandrea (SUPSI) Introduction to Supervised Learning 6 / 25 Examples of supervised machine learning tasks Example Identifying the zip code from handwritten digits on an envelope input: a scan of the handwriting, output: actual digits in the zip code. To create a dataset for building a machine learning model, you need to collect many envelopes. Then you can read the zip codes yourself and store the digits as your desired outcomes. Example Determining whether a tumor is benign based on a medical image input: the image output: whether the tumor is benign (Y/N) To create a dataset for building a model, you need a database of medical images. You also need an expert opinion, so a doctor needs to look at all of the images and decide which tumors are benign and which are not. It might even be necessary to do additional diagnosis beyond the content of the image to determine whether the tumor in the image is cancerous or not. Michela Papandrea (SUPSI) Introduction to Supervised Learning 7 / 25 Give me the metadata for this chunk of text in the following format: format:{start_page: , finish_page: , title: }"
prompt = HumanMessage(content=content)
mess = [prompt]
print(chat(mess))

content='format:{start_page: 1, finish_page: 7, title: Introduction to Supervised Learning}'
