In [2]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from dotenv import load_dotenv
import os

load_dotenv()


  from .autonotebook import tqdm as notebook_tqdm


True

In [3]:
# Load the API key from .env file
gemini_api_key = os.getenv("GEMINI_API_KEY")
os.environ["GEMINI_API_KEY"] = gemini_api_key


In [4]:
# Initialize the LLM
llm = ChatGoogleGenerativeAI( 
    model="gemini-1.5-flash",
    api_key=gemini_api_key,
    temperature=0.2,
    verbose=True,
)

#### 1. `file_path = "../example.txt"`
   - **Explanation**: Is line mein aap ek file ka path define kar rahe hain. 
   - `file_path` variable mein ek relative path diya gaya hai jo `example.txt` file ko locate karta hai. 
   - `../` ka matlab hai ke current directory se ek directory peeche jaake `example.txt` ko search karega.


In [5]:
## PDF File Path
file_path = "../../../../Prompt Engineering for Generative AI Future-Proof Inputs for Reliable Al Outputs (James Phoenix, Mike Taylor).pdf"
print(file_path)

../../../../Prompt Engineering for Generative AI Future-Proof Inputs for Reliable Al Outputs (James Phoenix, Mike Taylor).pdf


#### 2. `loader = TextLoader(file_path)`
   - **Explanation**: Yahaan pe aap `TextLoader` class ka instance bana rahe hain, jo `file_path` mein di gayi text file ko load karega.
   - `TextLoader` LangChain mein ek document loader hai jo plain text files ko load karta hai. Yeh file ko read karta hai aur iske content ko `Document` objects mein convert karta hai.


In [6]:
loader = TextLoader(file_path)
print(loader)

<langchain_community.document_loaders.text.TextLoader object at 0x000001D1FFBF6F60>


Chalo, isko aur simple tareeke se samjhata hoon.

### 3. `mydata = loader.load()`

#### Kya ho raha hai?
Jab aap `load()` method call karte hain, to yeh `example.txt` file ko uthata hai, uske andar ka jo text hota hai wo read karta hai, aur is text ko ek special format mein store karta hai, jisko `Document` object kehte hain.

#### `Document` Object kya hota hai?
`Document` object ek special container hota hai jisme do main cheezen hoti hain:

1. **`page_content`**:
   - Yeh us file ka **main text** hota hai jo file ke andar likha hota hai. Example ke liye agar aapki file ke andar yeh likha ho:
     ```
     Hello, this is an example file.
     ```
     To `page_content` mein yeh pura text store hoga.

2. **`metadata`**:
   - Isme file se related **extra information** hoti hai, jaise:
     - File ka naam (for example, `example.txt`).
     - File ka path (for example, `../example.txt`).
   - Aap file ke content ke ilawa kuch aur bhi track kar sakte ho, jaise kis page se data aa raha hai agar PDF hoti, ya file ka location.

#### Jab aap `mydata = loader.load()` call karte hain:
- File load hoti hai, aur jo text us file ke andar hota hai wo `page_content` ke andar store hota hai.
- Saath mein file ka naam ya path `metadata` ke andar store hota hai.



In [None]:
mydata = loader.load()
print(mydata)

In [None]:
examplse = mydata[0].page_content
print(examplse)

### PDFs w/ tables and Multi-Modal (text + images)

Yeh pura process **PDFs ke analysis** ka ek structured tariqa batata hai, jisme tables aur images ko handle karne ke liye different approaches use ki jati hain. Tumhe yeh samjhane ki koshish karta hoon ke kis tarah se **Unstructured** library ko use karke PDFs se tables ko reliably extract kiya ja sakta hai, aur kaise tum multi-modal techniques (text + images) ka use karke PDF content ko aur bhi useful bana sakte ho.

### Key Concepts Samjho:

1. **PDF Tables Extraction**: 
   - Tables ko reliably extract karna traditional methods jaise character-based separators (e.g., comma ya tab) se mushkil hota hai. Is liye `Unstructured` jaise advanced tools ka use kiya jata hai jo tables ko accurately recognize aur extract kar sakein.
   - **Unstructured** library ko use karke, tum tables ko HTML format mein extract karte ho jo LLMs (Language Models) ke liye easier to parse hota hai.

2. **Unstructured Library**:
   - **Unstructured** library LLMs ke liye data ko ready karne ke liye badi achi tool hai. Yeh tumhare PDFs ka content, specially **tables**, ko structure mein todne mein madad karti hai.
   - Iska **high-resolution (hi_res)** strategy use hota hai tables ko detect karne ke liye, aur yeh **YOLOX model** ka use karti hai to understand bounding boxes for tables and embedded images.

3. **Partitioning a PDF**:
   - Tum `partition_pdf` function ka use karte ho taake PDF ko analyze kiya ja sake, aur tables, text, aur images ko recognize karke alag-alag extract kiya jaye.
   - Tum **table structure infer** kar sakte ho, jo tables ko identify aur structure ko samajhne mein madad karta hai.

4. **Example Code**:
   ```python
   from unstructured.partition.pdf import partition_pdf

   # File path to your PDF
   filename = "static/SalesforceFinancial.pdf"

   # Partitioning the PDF
   elements = partition_pdf(
       filename=filename,
       strategy="hi_res",  # Using high-resolution strategy for better table extraction
       infer_table_structure=True,  # Enable table structure recognition
       model_name="yolox"  # Using YOLOX model to detect table bounding boxes
   )
   ```

   - **Elements**: Yeh tumhare PDF ke parts hain jo alag-alag ho jate hain, jaise narrative text, tables, headings, etc.
   - Tum tables ko HTML format mein bhi extract kar sakte ho, jo LLM ko easily process karne mein madad karta hai.

   ```python
   table_html = elements[-4].metadata.text_as_html
   print(table_html)
   ```

5. **Tables and Semantic Search**:
   - Jab tum tables ko extract kar lete ho, toh tum **semantic search** mein problem face kar sakte ho agar tum raw tables par embeddings match karne ki koshish karo. Is liye common practice yeh hoti hai ke tables ka ek **summary** generate kar liya jaye, aur fir us summary ka embedding create kiya jaye.

6. **Multi-Modal (Text + Images)**:
   - **Unstructured** tumhe PDF mein embedded images bhi extract karne ka option deti hai.
   - Tum multi-modal techniques ka use karte ho jaise **GPT-4V** ko use karke images ka summary generate kar sakte ho. Yeh process images ko meaningful format mein convert karta hai jo tumhare LLM ko aur useful insights provide karta hai.

   Example code for extracting images:
   ```python
   from PIL import Image
   import base64
   import io

   # Function to convert image to base64 format
   def image_to_base64(image_path):
       with Image.open(image_path) as image:
           buffered = io.BytesIO()
           image.save(buffered, format=image.format)
           img_str = base64.b64encode(buffered.getvalue())
           return img_str.decode('utf-8')

   image_str = image_to_base64("static/pdfImages/figure-15-6.jpg")
   ```

   Fir tum GPT-4V ka use karke image ka summary generate kar sakte ho:
   ```python
   from langchain.chat_models import ChatOpenAI
   from langchain.schema.messages import HumanMessage

   # Initializing the GPT-4V model
   chat = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=1024)

   # Passing image to LLM for summary
   msg = chat.invoke(
       [
           HumanMessage(
               content=[
                   {"type": "text", "text": "Please give a summary of the image provided."},
                   {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_str}"}},
               ]
           )
       ]
   )

   print(msg.content)  # Summary of the image
   ```

### Conclusion:
Yeh pura process tumhe **PDFs ke analysis** ka ek structured aur reliable tareeqa deta hai jisme tum tables, images, aur text ko extract karke apne data ko LLMs ke liye ready kar sakte ho. Tables ko accurately extract karne ke liye tum advanced methods jaise **YOLOX** ka use karte ho, aur fir multi-modal approaches jaise **GPT-4V** ko integrate karte ho for images aur tables ki summarization ke liye.

 ### Multi-Modal (text + images)

Is paragraph mein hum **multi-modal text splitting** ke concept ko explore kar rahe hain, jo text ke sath images ko bhi handle karta hai. Iska matlab hai ke hum text aur images ko sath milakar split karte hain aur use karte hain, jo aik evolving field hai. Ye technique LangChain ke Lance Martin ne popular banai thi, aur hum uska ek tareeqa dekhte hain.

### PDF ko Process karna aur Images ko Handle karna

Yahan hum PDF files ke andar text aur images ko separate karne ka tareeqa dekhte hain.

1. **Library Installation**: Pehle hum `unstructured` library install karte hain jo various document formats ko handle karne ke liye use hoti hai, jaise PDF. 
   
   ```python
   #!pip3 install "unstructured[all-docs]"
   ```

2. **PDF Partitioning**: Hum `partition_pdf` function ko use kar rahe hain jo PDF ke alag alag elements ko nikalta hai, jisme images aur text donon shamil hain.

   ```python
   from unstructured.partition.pdf import partition_pdf

   filepath = "static/VisualInstruction.pdf"
   raw_pdf_elements = partition_pdf(
       filename=filepath,
       extract_images_in_pdf=True,  # images ko extract karta hai
       infer_table_structure=True,  # tables ke layout ko samajhta hai
       chunking_strategy="by_title",  # title ke mutabiq chunks banata hai
       max_characters=4000,  # max 4000 characters per chunk
       new_after_n_chars=3800,  # naya chunk 3800 characters ke baad banta hai
       combine_text_under_n_chars=2000,  # small text ko aggregate karta hai
       image_output_dir_path="static/pdfImages/"  # images ko output folder me save karta hai
   )
   ```

3. **Images Extract Karna**: PDF se images extract karne ke baad, images ko folder `static/pdfImages/` me save karte hain. Is example mein images ko split karna scope ke bahar hai, lekin aage hum images ka use karenge.

### Images ke Saath kaam karna

Ab images ko sirf folder mein rakhne ka fayda nahi, hum images ko process karke kuch meaningful information nikalna chahte hain. Yahan hum GPT-4V (GPT-4 Vision) ka use karenge jo images se summaries ya embeddings banata hai.

1. **Image to Base64 Conversion**: Pehle hum image ko Base64 format mein convert karte hain, taake usay model ko pass kiya ja sake.

   ```python
   from PIL import Image
   import base64
   import io

   def image_to_base64(image_path):
       with Image.open(image_path) as image:
           buffered = io.BytesIO()
           image.save(buffered, format=image.format)
           img_str = base64.b64encode(buffered.getvalue())
           return img_str.decode('utf-8')

   image_str = image_to_base64("static/pdfImages/figure-15-6.jpg")
   ```

2. **GPT-4 Vision ko Image Dena**: Ab hum GPT-4 Vision model ko image denge aur usay ek descriptive summary banane ke liye kahenge.

   ```python
   from langchain.chat_models import ChatOpenAI
   from langchain.schema.messages import HumanMessage

   chat = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=1024)

   msg = chat.invoke(
       [
           HumanMessage(
               content=[
                   {"type": "text", "text": "Please give a summary of the image provided. Be descriptive"},
                   {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_str}"}}
               ]
           )
       ]
   )
   ```

3. **Summary ka Result**: GPT-4V ne jo summary di, wo kuch is tarah se hai:

   ```
   'The image shows a baking tray with pieces of fried chicken arranged to roughly mimic the continents on Earth as seen from space...'
   ```

   Is tarah, image ko process karke ek meaningful summary mil gayi jo aap future mein use kar sakte hain, jaise ke **semantic search** mein.

### Conclusion

Is process mein:
- Humne PDF se text aur images ko separate kiya.
- Images ko Base64 format mein convert kiya.
- GPT-4 Vision model ka use karke images ki descriptive summary banayi.

Agar aapko ye samajhna hai ke multi-modal text splitting kaise kaam karta hai, to ye technique useful hai jab aap text aur images ko sath process karna chahte hain.