### Step 1: Indexing
- Apply sliding window approach to capture local context

In [9]:
import textract
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import numpy as np

vector_database_path = './data/vector_database.npy'

# Load and process the PDF document
def load_document(file_path):
    text = textract.process(file_path, method='pdfminer').decode('utf-8')
    return text

# Segment text using a sliding window approach
def segment_text(text, window_size=500, step=250):
    return [text[i:i+window_size] for i in range(0, len(text)-window_size+1, step)]

# Create TF-IDF based index
def create_and_save_index(segments, filename=vector_database_path):
    vectorizer = TfidfVectorizer(stop_words='english')
    tfidf_matrix = vectorizer.fit_transform(segments)

    # Save the matrix and vectorizer
    np.save(filename, tfidf_matrix.toarray())
    return vectorizer, tfidf_matrix

# Load index from file
def load_index(filename=vector_database_path):
    matrix = np.load(filename)
    return matrix

### Step 2
- Enhancing the query by re-writing or through expansion

In [10]:
# Enhance query, expand or re-write
def enhance_query (query):
    # May involve adding keywords, synonyms, domain specific terms or something similar
    return query


### Step 3: Retrieval
- Using the enhanced query, retrieve relevant document segments

In [11]:
def retrieve_documents(query, vectorizer, tfidf_matrix, top_k=5):
    query_vec = vectorizer.transform([query])
    cosine_similarities = linear_kernel(query_vec, tfidf_matrix).flatten()
    top_indices = cosine_similarities.argsort()[-top_k:][::-1]
    return top_indices

### Step 4: Post retrieval
- Process the retrieved content to highlight the most relevant information
- Prepare the content for the generation

In [12]:
def process_retrieved_data(indices, segments):
    # This could involve summarizing or selecting key segments
    # For simplicity, I concatenate the top segments
    return " ".join([segments[i] for i in indices]) 

### Step 5: Generation
- Put the prompt together


In [13]:
from openai import OpenAI
import os

def generate_response(prompt, api_key):
    client = OpenAI(api_key=api_key)
    response = client.chat.completions.create(messages=[{ "role":"user", "content": prompt }], model="gpt-3.5-turbo")

    generated_text = response.choices[0].message.content

    # Saving the response
    file_path = './data/advanced-rag-chat-gpt-response.txt'
    
    os.makedirs(os.path.dirname(file_path), exist_ok=True)

    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(generated_text)
    
    print('Response saved in: ')
    print(file_path)
    return generated_text


### Running actions now
- Get OpenAI key
- Prepare document
- And so on.

In [14]:
from dotenv import load_dotenv

load_dotenv()

file_path = './data/deep-learning.pdf'
api_key = os.getenv('OPEN_AI_KEY')
query = "Explain the concept of back propagation in neural networks"

text = load_document(file_path=file_path)
segments = segment_text(text=text)
vectorizer , tfidf_matrix = create_and_save_index(segments=segments)
tfidf = load_index() # Replaced with communication with vector database


enhanced_query = enhance_query(query=query)
indices = retrieve_documents(enhanced_query, vectorizer=vectorizer, tfidf_matrix=tfidf_matrix)
processed_text = process_retrieved_data(indices=indices, segments=segments)
prompt = f"Question: {query}\nContext: {processed_text}"
response = generate_response(prompt=prompt, api_key=api_key)
print(response)

Response saved in: 
./data/advanced-rag-chat-gpt-response.txt
Back propagation in neural networks refers to the method used to compute the gradient of the loss function with respect to the weights of the network. It is a key component of training a neural network through techniques like stochastic gradient descent. 

The process involves passing the input data forward through the network to make a prediction, calculating the error between the predicted output and the true output (loss function), and then propagating this error backward through the network to update the weights in order to minimize the error.

During back propagation, the gradient of the loss function is computed with respect to each weight in the network using the chain rule of calculus. This gradient is then used to update the weights in the direction that minimizes the error, effectively optimizing the network for better performance.

Overall, back propagation is a fundamental concept in training neural networks and 