# Centralized Workflows Dashboard

Welcome to the central command for data scraping and processing workflows. Use this notebook to trigger the various pipelines of the project.

## Environment Setup
Run this cell to setup the environment.

In [None]:
import os
import sys
from pathlib import Path

# Ensure local imports work
current_dir = os.getcwd()
if current_dir not in sys.path:
    sys.path.append(current_dir)

# Check paths
print(f"Current Working Directory: {current_dir}")

---

## 1. List to Scrap to Model to DB

**Description**: Reads `companies_list.csv`, scrapes the websites, builds the data model, and saves to MongoDB.

**Input**: `src/workflows/companies_list.csv`  
**Output**: `src/workflows/companies_list_out.csv` and MongoDB Records.

In [None]:
# Execute Workflow 1
%run list_to_scrap_to_model_to_DB.py

---

## 2. Process TheCrowdSpace Companies

**Description**: Reads `companies_list.csv`, uses the 'Link_Thecrowdspace' column to scrape profile data, and updates the `theCrowdSpace` field in MongoDB.

**Input**: `src/workflows/companies_list.csv`

In [None]:
# Execute Workflow 2
%run process_thecrowdspace_companies.py

---

## 3. Update Company Operational Status

**Description**: Updates `operational.status` based on TheCrowdSpace data and the manual 'Active' column in `companies_list.csv`.

In [None]:
# Execute Workflow 3
%run update_company_operational_status.py

---

## 4. Classify role url in Datasources for NO-Inactive Platforms

**Description**: Retrieves all platforms where `operational.status` is NOT "inactive" and classifies their datasources using keyword analysis.

**Configuration**: Set `TARGET_ROLES` list to classify specific roles (e.g., `['press_release', 'blog']`) or `None` to classify all available roles.

In [None]:
# --- PARAMETERS ---
TARGET_ROLES = ["official_site", "store_listing"]  # Example: ["official_site", "official_social_profile", "store_listing", etc] or None for all
# ------------------

from classify_active_companies_datasources import process_active_companies

print(f"Executing Classification with Target Roles: {TARGET_ROLES}")
process_active_companies(target_roles=TARGET_ROLES)

---

## 5. Manual Platform Operational Status Update

Use this section to manually update the operational status of a platform when automatic workflows require intervention.

In [None]:
from src.DB.mongo import get_db
from datetime import datetime

def get_platforms_collection():
    db = get_db()
    return db["platforms"]

### Input Manual Data

In [None]:
# --- CONFIGURATION --- #
TARGET_SLUG = "rondainvest"  # Replace with the actual slug
NEW_STATUS = "active"            # Options: "active", "inactive", "uncertain"
UPDATE_NOTES = "Manually updated." 
# --------------------- #

In [None]:
# EXECUTE UPDATE
collection = get_platforms_collection()

if not TARGET_SLUG:
    print("HINT: Please set a valid TARGET_SLUG in the cell above.")
else:
    platform_doc = collection.find_one({"slug": TARGET_SLUG})

    if not platform_doc:
        print(f"‚ùå Platform with slug '{TARGET_SLUG}' NOT FOUND in database.")
    else:
        print(f"‚úÖ Found platform: {platform_doc.get('name', 'Unknown Name')} (ID: {platform_doc.get('_id')})")
        
        current_op = platform_doc.get("operational", {})
        print(f"   Current Status: {current_op.get('status', 'Not Set')}")
        print(f"   Current Notes:  {current_op.get('notes', 'None')}")

        # Check if update is actually needed
        if current_op.get("status") == NEW_STATUS and current_op.get("notes") == UPDATE_NOTES:
            print(f"\n‚ÑπÔ∏è  No change needed. Status is already '{NEW_STATUS}' with the same note.")
        else:
            updated_at = datetime.utcnow().isoformat() + "Z"
            update_result = collection.update_one(
                {"_id": platform_doc["_id"]},
                {
                    "$set": {
                        "operational.status": NEW_STATUS,
                        "operational.notes": UPDATE_NOTES,
                        "operational.updatedAt": updated_at
                    }
                }
            )
            if update_result.modified_count > 0:
                print(f"\nüöÄ SUCCESS: Updated status to '{NEW_STATUS}'.")
                print(f"   Note: {UPDATE_NOTES}")
            else:
                print("\n‚ö†Ô∏è  Update executed but no document modified (unexpected).")

---

## 6. Extract Mobile Apps from Store Links in dataSources

**Description**: Extracts `store_listing` URLs from NO inactive platforms, processes them to find valid mobile app store links (Apple/Google), and updates the `mobileApps` field in the database.

**Input**: Database platforms (`operational.status` != "inactive")
**Output**: Updates `mobileApps` field in MongoDB platforms collection.

In [None]:
# Execute Workflow 6
%run store_links_from_datasource_to_db.py