# Data processing
Prepare and pre-process the documents to be used as training data

#### Convert the word file to text file and then save it **(run only once)**

In [1]:
from docx import Document

# Function to extract text from a word file
def extract_text_from_docs(doc_path):
    doc = Document(doc_path)
    return '\n'.join([para.text for para in doc.paragraphs if para.text.strip()])

# Function to save it to a .txt file
def save_text_to_file(text, output_path):
    with open(output_path, 'w', encoding='utf-8') as file:
        file.write(text)

text = extract_text_from_docs(r'C:\Users\arez3\Desktop\Etudes\Limitless Learning\Gen AI\Mini projet\Books\word\Diagnosisofbusiness.docx')
save_text_to_file(text, r'C:\Users\arez3\Desktop\Etudes\Limitless Learning\Gen AI\Mini projet\Books\text\Diagnosisofbusiness.txt')


#### Chunk the text for embedding

Split the document to several smaller bits to make it more efficient

In [2]:
#TODO: The chunk needs to be organized as thematic or can be randomly made?

import textwrap 

def chunk_text(text, chunk_size=1000, overlap=200):
    chunk = []
    start = 0

    while start < len(text):
        end = start + chunk_size
        chunk.append(text[start:end])
        start += chunk_size - overlap
    return chunk

chunks = chunk_text(text)


In [3]:
import os
os.environ["TRANSFORMERS_NO_TF"] = "1"  # disable tensorflow fallback

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

def search_similar_chunks(query, model, index, chunks, top_k=3):
    query_embedding = model.encode([query])
    D, I = index.search(np.array(query_embedding), top_k)
    return [chunks[i] for i in I[0]]


model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunks)

dimension = len(embeddings[0])
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

  from .autonotebook import tqdm as notebook_tqdm





In [4]:
query = "Give me a diagnose of my buisness problems and possible solutions"
relevant_chunks = search_similar_chunks(query, model, index, chunks, top_k=1)

print(relevant_chunks)

[' modern approaches to diagnose a business performance of a company which activates in a modern sustainable development economy. Also it proposes general models for diagnosing a business. This work is necessary both for the academic environment and the business environment, providing the guarantee of acquiring rich and up-to-date skills in the the financial analysis area, in order to ensure the professionalisation of all economic specialists at a high level.\nCHAPTER 1. FUNDAMENTAL CONCEPTS OF BUSINESS DIAGNOSIS\nConceptual approaches\nThe diagnosis process, in the general sense, represent „a broad investigation of the main aspects of the organization activity, of economic, technical, sociological, legal and managerial nature, in order to identify strengths and disruption, causes that generated them, and to design some recommendations for improvement and development ” (Miles, 2000, p.86).\nIn the view Bătrâncea et al. (2008) analysis and diagnosis of a business involves the decomposit

In [5]:
from transformers import AutoTokenizer

# Load a tokenizer (choose one similar to Gemini, e.g. 'bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

def trim_text_by_tokens(text, max_tokens):
    # Tokenize text to tokens
    tokens = tokenizer.tokenize(text)
    
    # Trim tokens if longer than max_tokens
    if len(tokens) > max_tokens:
        tokens = tokens[:max_tokens]
    
    # Convert tokens back to string
    trimmed_text = tokenizer.convert_tokens_to_string(tokens)
    return trimmed_text

In [6]:
import time
import google.generativeai as genai
from google.api_core.exceptions import ResourceExhausted

genai.configure(api_key="AIzaSyDXO1LSxJ9I5281a9E7e6MitdlWFlv_r30")
model = genai.GenerativeModel("gemma-3-12b-it")

def ask_gemini(question, context_chunks, max_context_chars=10000, max_retries=5, initial_wait_time=5):
    context = "\n\n".join(context_chunks)
    context = context[:max_context_chars]  # truncate safely
    prompt = f"""Based on the following business knowledge, answer the question:\n\nContext:\n{context}\n\nQuestion: {question}"""

    retries = 0
    wait_time = initial_wait_time
    while retries < max_retries:
        try:
            response = model.generate_content(prompt)
            return response.text
        except ResourceExhausted as e:
            print(f"Rate limit exceeded. Retrying in {wait_time} seconds... (Attempt {retries + 1}/{max_retries})")
            print(f"Error details: {e}")
            time.sleep(wait_time)
            retries += 1
            wait_time *= 2  # Exponential backoff
        except Exception as e: # Catch other potential errors
            print(f"An unexpected error occurred: {e}")
            raise # Re-raise other errors
    
    raise ResourceExhausted(f"Failed to get response after {max_retries} retries due to rate limiting.")

# Example
answer = ask_gemini("Sales at my retail business have been declining for the past 9 months. I’m seeing fewer customers, my inventory isn't moving fast, and my online ads aren't working well. Can you help me diagnose what might be wrong and suggest where to focus", relevant_chunks)
print(answer)

Okay, based on the provided context, here's a breakdown of how to approach diagnosing your retail business's declining performance, framed within the principles of business diagnosis.

**1. Understanding the Diagnostic Approach (Based on the Context)**

The context emphasizes a *broad investigation* of the business, looking beyond just financial numbers. It highlights examining economic, technical, sociological, legal, and managerial aspects.  Essentially, it's about understanding the *root causes* of the problem, not just the symptoms.  Bătrâncea et al. (2008) suggest "decomposition" – breaking down the business into its components to analyze them individually.

**2. Applying the Diagnostic Framework to Your Situation**

Let's break down your situation and apply this framework. Your symptoms are:

*   **Declining Sales:** The core problem.
*   **Fewer Customers:** A key indicator of a deeper issue.
*   **Slow-Moving Inventory:** Suggests problems with product selection, pricing, or de

In [7]:
answer = ask_gemini("What is H infinity Controller ?", relevant_chunks)
print(answer)

This text **does not mention H infinity Controller**. It focuses on the conceptual approaches to business diagnosis and financial analysis within a sustainable development economy. The provided context discusses the definition of business diagnosis and its components, but it doesn't delve into control systems or engineering topics like H infinity controllers.



Therefore, based solely on the provided text, there is no information about what an H infinity Controller is.


In [8]:
answer = ask_gemini("Hello", relevant_chunks)
print(answer)

Hello! It seems like you're setting up a scenario for me to answer a question based on the provided text. I'm ready when you are! Please ask your question.
