# PostParse Testing Notebook

This notebook demonstrates how to use the PostParse package to extract and store social media content.

## Setup

First, let's install the package in development mode:

In [1]:
import os
from typing import Optional

def set_project_root_dir(project_root_name: str, cwd: Optional[str] = None):
    """Set the working directory to the project root directory, based on the name of the project root directory.

    Args:
        project_root_name (str): The name of the project root directory.
        cwd (str, optional): The current working directory. Defaults to None.

    Raises:
        ValueError: If the project root directory is not found in the directory hierarchy.

    Returns:
        None
    """
    # If no current working directory is provided, use the current working directory
    if cwd is None:
        cwd = os.getcwd()

    # Split the current working directory into its components
    cwd_components = cwd.split(os.sep)

    # Find the index of the first occurrence of the project root directory in the list of components
    try:
        root_index = cwd_components.index(project_root_name)
    except ValueError:
        raise ValueError(f"Project root directory '{project_root_name}' not found in directory hierarchy.")

    # Use the root index to get the path of the project root directory
    root_dir = os.sep.join(cwd_components[:root_index+1])

    # Change the working directory to the project root directory
    os.chdir(root_dir)

    # Print new CWD
    print('New CWD is: ' + os.getcwd())

set_project_root_dir('postparse')

New CWD is: i:\Coding\00_Projects\00_packages\postparse


In [2]:
!pip install -e .

Obtaining file:///I:/Coding/00_Projects/00_packages/postparse
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: postparse
  Building editable for postparse (pyproject.toml): started
  Building editable for postparse (pyproject.toml): finished with status 'done'
  Created wheel for postparse: filename=postparse-0.1.0-0.editable-py3-none-any.whl size=3905 sha256=e2a1a867c0338a98093172db0e8dceaef96bcd1c10c54dabe7674b219abdb679
  Stored in directory: C:\Users\pachl\AppData\Local\Temp\pip-ephem-wheel-cache-lb4rp7w


[notice] A new release of pip is available: 24.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## Import Required Modules

In [3]:
import os
from pathlib import Path
from dotenv import load_dotenv

# Import our package
from postparse.data.database import SocialMediaDatabase
from postparse.instagram.instagram_parser import InstaloaderParser
from postparse.telegram.telegram_parser import save_telegram_messages

%load_ext autoreload
%autoreload 2


# Load environment variables from .env file
load_dotenv(dotenv_path='config/.env')

INFO:telethon.crypto.libssl:Failed to load SSL library: <class 'OSError'> (no library called "ssl" found)
INFO:telethon.crypto.aes:cryptg module not installed and libssl not found, falling back to (slower) Python encryption


True















































## Setup Database

Create a database instance in the playground directory:

In [4]:
# Create database in playground directory
db_path = Path("data/social_media.db")
db = SocialMediaDatabase(db_path)

print(f"Database created at: {db_path.absolute()}")

Database created at: i:\Coding\00_Projects\00_packages\postparse\data\social_media.db


## Instagram Parser Test

Test the Instagram parser with your credentials. Make sure to set these in your .env file:
```
INSTAGRAM_USERNAME=your_username
INSTAGRAM_PASSWORD=your_password
```

In [5]:
# Get Instagram credentials from environment
instagram_username = os.getenv("INSTAGRAM_USERNAME")
instagram_password = os.getenv("INSTAGRAM_PASSWORD")

if not instagram_username or not instagram_password:
    print("Please set INSTAGRAM_USERNAME and INSTAGRAM_PASSWORD in .env file")
else:
    # Initialize Instagram parser
    parser = InstaloaderParser(
        username=instagram_username,
        password=instagram_password,
        session_file="instagram_session"  # Cache session for future use
    )
    
    # Save posts with conservative limits
    saved_count = parser.save_posts_to_db(
        db=db,
        limit=None,  # Start with just 10 posts
        force_update=False  # Skip existing posts by default
    )
    
    print(f"Saved {saved_count} Instagram posts")

INFO:postparse.instagram.instagram_parser:Successfully loaded Instagram session from cache
INFO:postparse.instagram.instagram_parser:Found 920 saved posts
Fetching posts (delay: 6.1s): 100%|██████████| 920/920 [21:28<00:00,  1.40s/post, processed=142, skipped=778, mode=normal] 
INFO:postparse.instagram.instagram_parser:Normal fetch completed. Processed: 142, Skipped: 778, Total: 920
INFO:postparse.instagram.instagram_parser:Found 142 posts to save
Saving to database: 100%|██████████| 142/142 [00:02<00:00, 58.54post/s, new=142, updated=0, total=142]
INFO:postparse.instagram.instagram_parser:Process completed. Saved: 142, Total new posts: 142


Saved 142 Instagram posts


## Telegram Parser Test

Test the Telegram parser with your API credentials. Make sure to set these in your .env file:
```
TELEGRAM_API_ID=your_api_id
TELEGRAM_API_HASH=your_api_hash
```

In [6]:
# Get Telegram credentials from environment
telegram_api_id = os.getenv("TELEGRAM_API_ID")
telegram_api_hash = os.getenv("TELEGRAM_API_HASH")
telegram_phone = os.getenv("TELEGRAM_PHONE")

if not telegram_api_id or not telegram_api_hash:
    print("Please set TELEGRAM_API_ID and TELEGRAM_API_HASH in .env file")
else:
    # Save messages with conservative limits
    saved_count = save_telegram_messages(
        api_id=telegram_api_id,
        api_hash=telegram_api_hash,
        phone=telegram_phone,
        db_path=str(db_path),
        cache_dir="data/cache",
        downloads_dir="data/downloads/telegram",
        session_file="telegram_session",  # Cache session for future use
        limit=None,  # Start with just 10 messages
        max_requests_per_session=None,  # Conservative request limit
        force_update=False # kip existing messages by default, if true Override existing messages
    )
    
    print(f"Saved {saved_count} Telegram messages")

INFO:telethon.network.mtprotosender:Connecting to 149.154.167.51:443/TcpFull...
INFO:telethon.network.mtprotosender:Connection to 149.154.167.51:443/TcpFull complete!


Found 5313 saved messages


Fetching messages (delay: 2.0s):   0%|          | 7/5313 [00:15<3:06:12,  2.11s/msg, processed=7, skipped=0, mode=normal]INFO:telethon.client.downloads:Starting direct file download in chunks of 131072 at 0, stride 131072
Fetching messages (delay: 2.5s):   0%|          | 16/5313 [00:42<3:24:59,  2.32s/msg, processed=16, skipped=0, mode=normal]INFO:telethon.client.downloads:Starting direct file download in chunks of 131072 at 0, stride 131072
Fetching messages (delay: 3.0s):   0%|          | 20/5313 [01:05<8:12:22,  5.58s/msg, processed=20, skipped=0, mode=normal]INFO:telethon.client.downloads:Starting direct file download in chunks of 131072 at 0, stride 131072
Fetching messages (delay: 9.0s):   3%|▎         | 139/5313 [08:23<3:33:29,  2.48s/msg, processed=139, skipped=0, mode=normal] INFO:telethon.client.downloads:Starting direct file download in chunks of 131072 at 0, stride 131072
Fetching messages (delay: 9.0s):   3%|▎         | 144/5313 [08:31<4:03:54,  2.83s/msg, processed=144, s

Normal fetch completed. Processed: 410, Skipped: 4903, Total: 5313
Found 410 messages to save


Saving to database: 100%|██████████| 410/410 [01:33<00:00,  4.38msg/s, new=410, updated=0, total=410]
INFO:telethon.network.mtprotosender:Disconnecting from 149.154.167.51:443/TcpFull...
INFO:telethon.network.mtprotosender:Disconnection from 149.154.167.51:443/TcpFull complete!


Process completed. Saved: 410, Total new messages: 410
Saved 410 Telegram messages































## LLM Zeroshot Classifier

In [14]:
from postparse.analysis.classifiers.recipe_classifier import RecipeClassifier
from postparse.data.database import SocialMediaDatabase

# Initialize the recipe classifier
classifier = RecipeClassifier()

# Example recipe text for classification
recipe_text = """Here's my favorite pasta recipe! 
Ingredients:
- 500g pasta
- 2 cloves garlic
- Olive oil
Instructions:
1. Boil pasta
2. Sauté garlic
3. Mix and enjoy!"""

# Classify the recipe text
result = classifier.predict(recipe_text)
print(f"Classification: {result}")

# Example non-recipe text for classification
non_recipe = "Beautiful sunset at the beach today! The waves were amazing."

# Classify the non-recipe text
result = classifier.predict(non_recipe)
print(f"Classification: {result}")

# Fetch Instagram posts for classification
posts = db.get_instagram_posts(limit=5)
for post in posts:
    caption = post['caption']
    if caption:  # Only process if caption exists
        # Classify the caption
        result = classifier.predict(caption)
        print(f"\nCaption: {caption[:200]}...")
        print(f"Classification: {result['label']} (confidence: {result['confidence']:.2f})")

  0%|          | 0/1 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST http://192.168.188.92:11434/api/chat "HTTP/1.1 200 OK"
100%|██████████| 1/1 [00:02<00:00,  2.05s/it]


Classification: recipe


  0%|          | 0/1 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST http://192.168.188.92:11434/api/chat "HTTP/1.1 200 OK"
100%|██████████| 1/1 [00:01<00:00,  1.99s/it]

Classification: not recipe



