This script creates a graphical user interface (GUI) application using Python's `Tkinter` library. The application loads Instagram post descriptions from a CSV file, displays a random selection of posts, and identifies locations or organizations mentioned in the posts using a pre-trained Named Entity Recognition (NER) model. Here’s a detailed breakdown of what the code does:

### Key Components:

1. **Model and Tokenizer Loading**:
   - The script loads a pre-trained NER model (`sigaldanilov/nerbart_model`) and a tokenizer from Hugging Face.
   - These are used to process Instagram post descriptions and predict named entities (locations, persons, organizations).

2. **Loading Posts from CSV**:
   - The `load_posts_from_csv()` function loads a CSV file containing Instagram post descriptions. The CSV file is expected to have a column called `'translated'` (which likely contains translated post texts).
   - The function skips problematic rows using `on_bad_lines='skip'` to ensure robustness and returns a list of valid post descriptions.

3. **Fetching Random Posts**:
   - The `fetch_random_posts()` function randomly selects 6 Instagram posts from the loaded CSV data to display in the GUI.

4. **NER Prediction**:
   - The `predict_ner()` function takes a text input (an Instagram post), tokenizes it, and uses the NER model to predict the named entities (locations, organizations, etc.) within the text.
   - The function processes tokens and their labels (e.g., "B-LOC" for a beginning of a location) to extract and return a list of identified locations and organizations.
   - The NER predictions are refined to handle subword tokens (which BERT models produce) by merging tokens properly (like combining "##tel" and "aviv" into "Tel Aviv").

5. **Displaying Posts and Details**:
   - The `show_locations()` function loads posts from the CSV, clears any previous content in the display frame, and shows a random selection of posts in a two-column layout.
   - Each post is displayed with a short preview (first 50 characters) and a clickable card (styled as a small frame) to view full details.

6. **Detailed View of Posts**:
   - When a post is clicked, the `show_post_details()` function opens a new window that shows the full post text along with identified locations (predicted by the NER model).
   - If no locations are found in the post, the window displays "No locations identified."

### GUI Structure:

- **Main Window**: 
   - The main window displays a title ("Randomly selected Instagram Posts") and a button ("Fetch and Show Locations").
   - The button triggers the fetching and displaying of random Instagram posts from the CSV file.

- **Scrollable Frame for Posts**:
   - The posts are displayed in a scrollable frame, allowing users to scroll through multiple posts.

- **Clickable Post Cards**:
   - Each post is displayed as a clickable card. The card shows an image placeholder (gray square) and a short text preview.
   - Clicking the card opens a detailed view with the full post and location information.

### How the Application Works:
1. The user loads the application, and the main window appears.
2. Clicking the "Fetch and Show Locations" button triggers the loading and random selection of Instagram posts from the CSV file.
3. The posts are displayed in a scrollable, two-column layout.
4. Clicking on a post opens a new window showing the full post and any locations identified by the NER model.

### Key Features:
- **Random Sampling**: Shows a random selection of 6 posts each time the user clicks the button.
- **NER Location Extraction**: The application identifies locations and organizations in the post using a pre-trained NER model.
- **Interactive GUI**: Users can click on posts to view detailed information, making the interface user-friendly.
- **Error Handling**: The code handles errors when loading CSV data, such as skipping bad rows.

### Example Flow:
1. **User Clicks "Fetch and Show Locations"**: A random sample of posts is displayed.
2. **User Clicks on a Post**: A new window opens with the full text and the locations identified by the NER model.
3. **Location Prediction**: The NER model processes the text and predicts named entities related to locations or organizations.

This application provides an interactive and visual way to display NER results on social media posts, focusing on locations and organizations mentioned in the text.

In [1]:
import tkinter as tk
from tkinter import messagebox
import pandas as pd
import random
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

# Load the model and tokenizer
model_name = "sigaldanilov/nerbart_model"  # Your actual model name
model = AutoModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the label names based on your model's 9 labels
label_names = ['O', 'B-LOC', 'I-LOC', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-MISC', 'I-MISC']

# Specify the path to your CSV file (globally accessible)
csv_file_path = "rowsfromsql.csv"

# Load the CSV file containing Instagram post descriptions
def load_posts_from_csv(file_path):
    try:
        # Use 'on_bad_lines' to skip problematic rows
        df = pd.read_csv(file_path, on_bad_lines='skip')  # Skip rows that cause tokenizing errors
        print("CSV loaded successfully.")
        return df['translated'].dropna().tolist()  # Ensure no missing translations
    except Exception as e:
        print(f"Error loading CSV: {e}")
        return []

# Fetch random posts from the CSV data
def fetch_random_posts(posts, limit=6):
    if len(posts) < limit:
        print("Not enough posts to fetch.")
        return []
    return random.sample(posts, limit)  # Get a random sample of 'limit' posts

# Function to predict locations from the text using the model
def predict_ner(text, model, tokenizer, label_names):
    tokenized_input = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    model.eval()
    with torch.no_grad():
        outputs = model(**tokenized_input)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=2)

    predicted_labels = [
        label_names[idx] if idx < len(label_names) else 'O' for idx in predictions[0].numpy()
    ]

    tokens = tokenizer.convert_ids_to_tokens(tokenized_input['input_ids'][0])

    locations = []
    current_location = ""
    inside_location = False
    for token, label in zip(tokens, predicted_labels):
        if token in ["[CLS]", "[SEP]"]:
            continue

        if token.startswith("##"):
            token = token[2:]
            current_location += token
        else:
            if inside_location and label not in ["B-LOC", "I-LOC", "B-ORG", "I-ORG"]:
                locations.append(current_location)
                current_location = ""
                inside_location = False

            if label in ["B-LOC", "I-LOC", "B-ORG", "I-ORG"]:
                token = token.lstrip("#")
                if not inside_location:
                    current_location = token
                    inside_location = True
                else:
                    current_location += " " + token

    if current_location and inside_location:
        locations.append(current_location)

    return locations

# Function to display a new window with full post details and location information
def show_post_details(post_text):
    # Create a new window for the detailed post information
    detail_window = tk.Toplevel(window)
    detail_window.title("Post Details")
    detail_window.geometry('600x400')

    # Predict locations in the post text
    locations = predict_ner(post_text, model, tokenizer, label_names)

    # Add post text in the new window
    post_label = tk.Label(detail_window, text=f"Post: {post_text}", wraplength=500, font=("Arial", 12))
    post_label.pack(pady=10)

    # Show location information
    if locations:
        locations_label = tk.Label(detail_window, text=f"Identified Locations: {', '.join(locations)}", font=("Arial", 10, "italic"))
        locations_label.pack(pady=10)
    else:
        locations_label = tk.Label(detail_window, text="No locations identified.", font=("Arial", 10, "italic"))
        locations_label.pack(pady=10)

# Function to display the posts in a two-column layout
def show_locations():
    print("Fetching posts...")  # Debugging message
    posts = load_posts_from_csv(csv_file_path)
    print(f"Number of posts loaded: {len(posts)}")  # Debugging message

    # Clear previous posts
    for widget in post_frame.winfo_children():
        widget.destroy()

    # Fetch random posts and display them
    random_posts = fetch_random_posts(posts)
    if not random_posts:
        print("No posts to display.")  # Debugging message
        return

    for i, post_text in enumerate(random_posts):
        # Create a new frame for each post (like a card in Instagram)
        post_card = tk.Frame(post_frame, bd=1, relief="solid", padx=10, pady=10)
        post_card.grid(row=i//2, column=i%2, padx=10, pady=10, sticky="n")  # Two-column layout

        # Add an image placeholder (a square above each post)
        img_placeholder = tk.Frame(post_card, width=100, height=100, bg="gray")  # Placeholder square
        img_placeholder.pack(pady=5)

        # Show a short version of the post (first 50 characters)
        short_post_text = post_text[:50] + "..." if len(post_text) > 50 else post_text
        post_label = tk.Label(post_card, text=f"Post: {short_post_text}", anchor="w", justify="left", wraplength=250, font=("Arial", 12))
        post_label.pack(anchor="w")

        # Add click event to show full post details in a new window
        post_card.bind("<Button-1>", lambda e, text=post_text: show_post_details(text))

# Create the main GUI window
window = tk.Tk()
window.title("NER Location Finder")
window.geometry('800x600')

# Create a frame to hold the title and center it
title_frame = tk.Frame(window)
title_frame.pack(pady=10)

# Add a title for the application and center it
title_label = tk.Label(title_frame, text="Randomly selected Instagram Posts", font=("Arial", 18, "bold"))
title_label.pack()

# Create a scrollable frame to hold all posts
canvas = tk.Canvas(window)
scrollbar = tk.Scrollbar(window, orient="vertical", command=canvas.yview)
post_frame = tk.Frame(canvas)

# Configure the scrollbar
post_frame.bind(
    "<Configure>",
    lambda e: canvas.configure(scrollregion=canvas.bbox("all"))
)
canvas.create_window((0, 0), window=post_frame, anchor="nw")
canvas.configure(yscrollcommand=scrollbar.set)

canvas.pack(side="left", fill="both", expand=True)
scrollbar.pack(side="right", fill="y")

# Add a button to fetch posts and display identified locations
fetch_button = tk.Button(window, text="Fetch and Show Locations", command=show_locations, font=("Arial", 14))
fetch_button.pack(pady=10)

# Start the GUI event loop
window.mainloop()


Fetching posts...
CSV loaded successfully.
Number of posts loaded: 1957
