In [1]:
%pip install transformers torch einops

Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
docs =["""
The History and Impact of Artificial Neural Networks

Artificial Neural Networks (ANNs) represent a fundamental shift in how we approach computation and artificial intelligence. Inspired by biological neural networks, these systems have evolved from simple perceptrons in the 1950s to today's sophisticated deep learning architectures.

Early Development (1940s-1950s):
The first artificial neuron was proposed by Warren McCulloch and Walter Pitts in 1943. Their mathematical model showed how neurons might work, demonstrating that simple neural networks could compute basic logical functions. In 1957, Frank Rosenblatt developed the perceptron, the first algorithm that could learn specific patterns through iterative training.

The AI Winter (1970s):
Despite early promise, neural network research faced significant setbacks in the 1970s. Marvin Minsky and Seymour Papert's 1969 book "Perceptrons" highlighted fundamental limitations of single-layer networks, particularly their inability to solve the XOR problem. This led to reduced funding and interest in neural network research, a period known as the "AI Winter."

Renaissance (1980s-1990s):
The field experienced a revival with several breakthrough developments:
1. The backpropagation algorithm became widely recognized as a solution for training multi-layer networks
2. Improvements in computer processing power made larger networks feasible
3. New architectures like Convolutional Neural Networks (CNNs) emerged
4. Successful applications in pattern recognition and speech processing demonstrated practical value

Modern Era (2000s-Present):
The explosion of big data and computational power has led to remarkable achievements:
- Deep learning models have surpassed human performance in various tasks
- Applications range from computer vision to natural language processing
- Transfer learning has enabled more efficient model training
- Architectures like transformers have revolutionized language models

Technical Foundations:

Neural networks consist of interconnected layers of nodes, each performing weighted calculations:
1. Input Layer: Receives raw data
2. Hidden Layers: Process information through weighted connections
3. Output Layer: Produces final results

Key concepts include:
- Activation functions (ReLU, sigmoid, tanh)
- Weight initialization and adjustment
- Loss functions and optimization algorithms
- Regularization techniques

Practical Applications:

Modern neural networks have found applications across numerous fields:
* Healthcare: Disease diagnosis, drug discovery, medical image analysis
* Finance: Risk assessment, fraud detection, algorithmic trading
* Transportation: Autonomous vehicles, traffic prediction, route optimization
* Entertainment: Content recommendations, game AI, art generation

Challenges and Future Directions:

Despite their success, neural networks face several ongoing challenges:
1. Interpretability and explainability of decisions
2. Energy consumption and computational requirements
3. Data privacy and ethical considerations
4. Robustness against adversarial attacks

Research continues in areas such as:
- More efficient architectures
- Unsupervised learning approaches
- Neuromorphic computing
- Integration with symbolic AI systems

The field of neural networks continues to evolve rapidly, with new architectures and applications emerging regularly. As our understanding of both biological and artificial neural networks deepens, we can expect further innovations in this transformative technology.  
    """]

In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"
def get_embedding(
    docs, 
    task='retrieval.passage', 
    device="cuda" if torch.cuda.is_available() else "cpu"
    ):
    
    tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3", use_fast=True)
    
    tokens = tokenizer(
        docs,
        return_offsets_mapping=True,
        return_attention_mask=True,
        add_special_tokens=False,
        padding=True, 
        truncation=True, 
        return_tensors="pt"
    ).to(device)

    model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True).to(device)
    model.eval()

    task_id = model._adaptation_map[task]
    adapter_mask = torch.full((len(docs),), task_id, dtype=torch.int32)

    with torch.no_grad():
        outputs = model(tokens.input_ids, 
                       attention_mask=tokens.attention_mask,
                       adapter_mask=adapter_mask,
                       return_dict=True)

    token_embeddings = outputs.last_hidden_state[0]
    offsets_mapping = tokens.offset_mapping[0]
    attention_mask = tokens.attention_mask[0]

    return token_embeddings, offsets_mapping, attention_mask

token_embeddings, offsets_mapping, attention_mask = get_embedding(docs)
token_embeddings.shape, offsets_mapping.shape, attention_mask.shape

(torch.Size([740, 1024]), torch.Size([740, 2]), torch.Size([740]))

In [5]:
def optimal_segmentation(values, min_chunk_size, max_chunk_size):
    n = len(values)
    similarity_matrix = np.dot(values, values.T)
    mean_similarity = np.mean(similarity_matrix[np.triu_indices(similarity_matrix.shape[0], k=1)])
    similarity_matrix = similarity_matrix - mean_similarity
    np.fill_diagonal(similarity_matrix, 0)

    dp = np.zeros(n)
    segmentation = np.zeros(n, dtype=int)

    for i in range(n):
        max_reward = float('-inf')
        best_start = i

        for size in range(min_chunk_size, min(max_chunk_size + 1, i + 2)):
            if i - size + 1 >= 0:
                reward = np.sum(similarity_matrix[i - size + 1:i + 1, i - size + 1:i + 1])
                if i - size >= 0:
                    reward += dp[i - size]
                if reward > max_reward:
                    max_reward = reward
                    best_start = i - size + 1

        dp[i] = max_reward
        segmentation[i] = best_start

    boundaries = []
    i = n - 1
    while i >= 0:
        boundaries.append((segmentation[i], i))
        i = segmentation[i] - 1

    boundaries.reverse()
    return boundaries


min_chunk_size = 100
max_chunk_size = 200

embeddings = token_embeddings.numpy()

boundaries = optimal_segmentation(embeddings, min_chunk_size, max_chunk_size)
boundaries

[(np.int64(0), np.int64(184)),
 (np.int64(185), np.int64(284)),
 (np.int64(285), np.int64(430)),
 (np.int64(431), np.int64(541)),
 (np.int64(542), 739)]

In [6]:
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(model_output.size()).float()
    )
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )

In [7]:
chunks = []
for start_idx, end_idx in boundaries:
    text_start = offsets_mapping[start_idx][0].item()
    text_end = offsets_mapping[end_idx][1].item()
    chunk_text = docs[0][text_start:text_end]
    
    model_output = token_embeddings[start_idx:end_idx + 1].unsqueeze(0)
    chunk_attention_mask = attention_mask[start_idx:end_idx + 1].unsqueeze(0)

    chunk_embedding = mean_pooling(model_output, chunk_attention_mask)
    chunk_embedding = F.normalize(chunk_embedding, p=2, dim=1)[0]
    
    chunk = {
        "content": chunk_text,
        "embedding": chunk_embedding,
        "text_start": text_start, 
        "text_end": text_end
    }
    chunks.append(chunk)

chunks

[{'content': "The History and Impact of Artificial Neural Networks\n\nArtificial Neural Networks (ANNs) represent a fundamental shift in how we approach computation and artificial intelligence. Inspired by biological neural networks, these systems have evolved from simple perceptrons in the 1950s to today's sophisticated deep learning architectures.\n\nEarly Development (1940s-1950s):\nThe first artificial neuron was proposed by Warren McCulloch and Walter Pitts in 1943. Their mathematical model showed how neurons might work, demonstrating that simple neural networks could compute basic logical functions. In 1957, Frank Rosenblatt developed the perceptron, the first algorithm that could learn specific patterns through iterative training.\n\nThe AI Winter (1970s):\nDespite early promise, neural network research faced significant setbacks in the 1970s.",
  'embedding': tensor([ 0.1460, -0.1540, -0.0707,  ..., -0.0021,  0.0234, -0.0228]),
  'text_start': 1,
  'text_end': 841},
 {'content'

In [8]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=64, random_state=411)
kmeans.fit(token_embeddings.cpu().numpy())

# Normalize the token embeddings and cluster centers
normalized_token_embeddings = F.normalize(token_embeddings, p=2, dim=1)
concepts = F.normalize(torch.tensor(kmeans.cluster_centers_), p=2, dim=1)

# Compute the cosine similarity
concepts_target = torch.mm(normalized_token_embeddings, concepts.T)

concepts.shape, concepts_target.shape

(torch.Size([64, 1024]), torch.Size([740, 64]))

In [9]:
queries  = [
    "What were the major developments in neural networks during the 1980s and 1990s?",
    "Explain the basic components of a neural network's architecture.",
    "What are the current applications of neural networks in healthcare?",
    "What caused the AI Winter in the 1970s?",
    "What are the main challenges facing neural networks today?",
]

In [15]:
query_token_embeddings, _, query_attention_mask = get_embedding(queries[1:2])
query_embedding = mean_pooling(query_token_embeddings.unsqueeze(0), query_attention_mask.unsqueeze(0))
query_embedding.shape

torch.Size([1, 1024])

In [16]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity between query embedding and chunk embeddings
similarities = cosine_similarity(query_embedding.cpu().numpy(), np.stack([chunk['embedding'].cpu().numpy() for chunk in chunks]))

for i, similarity in enumerate(similarities[0]):
    print(f"chunk: {i}")
    print(f"Similarity: {np.max(similarity)}")
    print()
    
# Get the most similar chunks
print("most similar chunk:")
most_similar_idx = np.argmax(similarities)
most_similar_chunk = chunks[most_similar_idx]

most_similar_chunk

chunk: 0
Similarity: 0.27758681774139404

chunk: 1
Similarity: 0.27326375246047974

chunk: 2
Similarity: 0.2968369722366333

chunk: 3
Similarity: 0.2944921851158142

chunk: 4
Similarity: 0.2731066942214966

most similar chunk:


{'content': 'a solution for training multi-layer networks\n2. Improvements in computer processing power made larger networks feasible\n3. New architectures like Convolutional Neural Networks (CNNs) emerged\n4. Successful applications in pattern recognition and speech processing demonstrated practical value\n\nModern Era (2000s-Present):\nThe explosion of big data and computational power has led to remarkable achievements:\n- Deep learning models have surpassed human performance in various tasks\n- Applications range from computer vision to natural language processing\n- Transfer learning has enabled more efficient model training\n- Architectures like transformers have revolutionized language models\n\nTechnical Foundations:\n\nNeural networks consist of',
 'embedding': tensor([ 0.1434, -0.1384, -0.0763,  ...,  0.0017,  0.0181, -0.0184]),
 'text_start': 1283,
 'text_end': 2019}

In [17]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

concept_similarities = cosine_similarity(query_embedding.cpu().numpy(), concepts.cpu().numpy())

concepts_target_np = np.array(concepts_target)
concept_similarities_np = np.array(concept_similarities.T)
token_importances = np.dot(concepts_target_np, concept_similarities_np)[:,0]
token_importances = (token_importances - np.min(token_importances)) / (np.max(token_importances) - np.min(token_importances))

  concepts_target_np = np.array(concepts_target)


In [18]:
def find_dense_subsequences(
    values: np.ndarray,
    min_size: int = 1,
    max_size: int = None,
    num_sequences: int = 3,
    min_density: float = None,
    min_gap: int = 0
) -> list[tuple[int, int, float]]:

    if max_size is None:
        max_size = len(values)
        
    n = len(values)
    results = []
    used_positions = np.zeros(n, dtype=bool)

    def is_valid_region(start: int, end: int) -> bool:
        for s in range(max(0, start - min_gap), min(n, end + min_gap)):
            if used_positions[s]:
                return False
        return True
    
    cumsum = np.concatenate(([0], np.cumsum(values)))
    
    while len(results) < num_sequences:
        max_density = float('-inf')
        best_start = None
        best_end = None
        

        for length in range(min_size, min(n + 1, max_size + 1)):
            for start in range(n - length + 1):
                end = start + length

                if not is_valid_region(start, end):
                    continue

                curr_sum = cumsum[end] - cumsum[start]
                density = curr_sum / length
                
                if density > max_density:
                    max_density = density
                    best_start = start
                    best_end = end

        if best_start is None or (min_density is not None and max_density < min_density):
            break

        used_positions[best_start:best_end] = True
        results.append((best_start, best_end, max_density))

    results.sort(key=lambda x: x[2], reverse=True)
    return results

In [19]:
concepts_results = find_dense_subsequences(token_importances, min_size=20, max_size=100, num_sequences=3, min_density=0.2, min_gap=10)
print(f"Found {len(concepts_results)} dense passages:")
for start, end, density in concepts_results:
    text_start = offsets_mapping[start][0].item()
    text_end = offsets_mapping[end][1].item()
    dense_passage = docs[0][text_start:text_end]
    print(f"Dense Passage: {dense_passage}")
    print(f"Density: {density}")
    print()

Found 3 dense passages:
Dense Passage: er networks
2. Improvements in computer processing power made larger networks feasible
3. New a
Density: 0.8834032331194196

Dense Passage: performance in various tasks
- Applications range from computer vision to natural language processing
- Transfer learning has enabled more efficient model
Density: 0.8782475789388021

Dense Passage: consumption and computational requirements
3. Data privacy and ethical considerations
4. Robustness against adversarial attack
Density: 0.8738695227581522

