# Langchain

learn main components with small mini projects

### Models Components

In [None]:
# Models: Close source / Open source
# topics:
# - Text generation
# - Temperature, top_p, top_k
# - invoke and stream

# Embedding models: create embeddings from text
# task:
# - Create embeddings from text documents
# - Store embeddings in vector databases
# - Similarity search and calculate cosine similarity

In [None]:
# Embedding task

# ollama pull embeddinggemma
# check ollama model using "ollama serve" in cmd
from langchain.embeddings.ollama import OllamaEmbeddings

data = """ 
What is Machine Learning?
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns within datasets. 
It allows them to predict new, similar data without explicit programming for each task. 
Machine learning finds applications in diverse fields such as image and speech recognition, 
natural language processing, recommendation systems, fraud detection, portfolio optimization, and automating tasks.

Types of Machine Learning
Machine learning algorithms can be broadly categorized into three main types based on their learning approach and the nature of the data they work with.

Supervised Learning
Involves training models using labeled datasets. Both input and output variables are provided during training.
The aim is to establish a mapping function that predicts outcomes for new, unseen data.
Common applications include classification, regression, and forecasting.

Unsupervised Learning
Works with unlabeled data where outputs are not known in advance.
The model identifies hidden structures, relationships, or groupings in the data.
Useful for clustering, dimensionality reduction, and anomaly detection.
Focuses on discovering inherent patterns within datasets.

Reinforcement Learning
Based on decision-making through interaction with an environment.
An agent performs actions and receives rewards or penalties as feedback.
The goal is to learn an optimal strategy that maximizes long-term rewards.
Widely applied in robotics, autonomous systems, and strategic game playing.
"""

embeddings = OllamaEmbeddings(model="embeddinggemma")


# create chunked documents
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.create_documents([data])

print(f"Created {len(docs)} documents")
print(f"first document: {docs[0]}")

# create embeddings
doc_embeddings = embeddings.embed_documents([doc.page_content for doc in docs])
print(f"\nCreated {len(doc_embeddings)} embeddings")

# calculate cosine similariy between embeddings (numpy)
import numpy as np
def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

similarity = cosine_similarity(doc_embeddings[0], doc_embeddings[1])
print(f"\nCosine similarity between first two embeddings: {similarity}")

Created 4 documents
first document: page_content='What is Machine Learning?
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns within datasets. 
It allows them to predict new, similar data without explicit programming for each task. 
Machine learning finds applications in diverse fields such as image and speech recognition, 
natural language processing, recommendation systems, fraud detection, portfolio optimization, and automating tasks.'

Created 4 embeddings

Cosine similarity between first two embeddings: 0.679702416172827
