# üèõÔ∏è Italian Parliament Speech Analyzer

This notebook runs the complete analysis pipeline for Italian Parliament speeches.

## What this notebook does:
1. **Scrapes** speeches from senato.it and camera.it
2. **Generates embeddings** using Sentence Transformers
3. **Computes analytics** (identity, relations, temporal, sentiment)
4. **Exports JSON** files for the frontend visualization

---

‚ö†Ô∏è **GPU Recommended**: Enable GPU runtime for faster embedding generation.

`Runtime ‚Üí Change runtime type ‚Üí T4 GPU`

## 1. Setup Environment

In [2]:
# Clone the repository
!git clone https://github.com/WeridFire/Parliament-Speech-Analyzer.git

fatal: destination path 'Parliament-Speech-Analyzer' already exists and is not an empty directory.


In [6]:
# Install Python dependencies
%cd /content/Parliament-Speech-Analyzer
!pip install -r requirements.txt
# Install spaCy for advanced keyword extraction
!pip install -q spacy
!python -m spacy download it_core_news_sm -q

print("‚úÖ Core dependencies installed!")

[Errno 2] No such file or directory: 'Parliament-Speech-Analyzer'
/content/Parliament-Speech-Analyzer
Collecting SPARQLWrapper>=2.0.0 (from -r requirements.txt (line 13))
  Downloading SPARQLWrapper-2.0.0-py3-none-any.whl.metadata (2.0 kB)
Collecting rdflib>=6.1.1 (from SPARQLWrapper>=2.0.0->-r requirements.txt (line 13))
  Downloading rdflib-7.5.0-py3-none-any.whl.metadata (12 kB)
Downloading SPARQLWrapper-2.0.0-py3-none-any.whl (28 kB)
Downloading rdflib-7.5.0-py3-none-any.whl (587 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m587.2/587.2 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdflib, SPARQLWrapper
Successfully installed SPARQLWrapper-2.0.0 rdflib-7.5.0
[38;5;2m‚úî Download and installation successful[0m
You can now load the package via spacy.load('it_core_news_sm')
[38;5;3m‚ö† Restart to reload dependencies[0m
If you are in a Jupyter or

In [1]:
# Optional: Install transformer sentiment (more accurate, takes longer)
# Comment the line below if you don't want to use --transformer-sentiment

!pip install -q transformers torch

USE_TRANSFORMER_SENTIMENT = True  # Set to True if you installed transformers

## 2. Configuration

Choose your data source and analysis settings.

In [None]:
# Configuration
DATA_SOURCE = "both"  # Options: "senate", "camera", "both"
FORCE_REFETCH = False  # Set to True to re-scrape from parliament websites
FORCE_REEMBED = False  # Set to True to regenerate embeddings

print(f"üìã Configuration:")
print(f"   Data source: {DATA_SOURCE}")
print(f"   Force refetch: {FORCE_REFETCH}")
print(f"   Force reembed: {FORCE_REEMBED}")
print(f"   Transformer sentiment: {USE_TRANSFORMER_SENTIMENT}")

## 3. Run the Analysis Pipeline

This step:
- Scrapes speeches (first run takes ~10-15 minutes)
- Generates embeddings (~5 min with GPU, ~15 min CPU)
- Computes all analytics
- Exports JSON files

In [None]:
# Build the command
cmd = f"python backend/export_data.py --source {DATA_SOURCE}"

if FORCE_REFETCH:
    cmd += " --refetch"
if FORCE_REEMBED:
    cmd += " --reembed"
if USE_TRANSFORMER_SENTIMENT:
    cmd += " --transformer-sentiment"

print(f"üöÄ Running: {cmd}\n")
print("="*60)

# Run the pipeline
!{cmd}

## 4. Verify Output

Check that the JSON files were generated successfully.

In [None]:
import os
import json

output_dir = "frontend/public"
files = [f for f in os.listdir(output_dir) if f.endswith('.json')]

print("üìÅ Generated files:")
for f in files:
    path = os.path.join(output_dir, f)
    size_mb = os.path.getsize(path) / (1024 * 1024)

    # Load and show stats
    with open(path, 'r', encoding='utf-8') as file:
        data = json.load(file)

    n_speeches = len(data.get('speeches', []))
    n_deputies = len(data.get('deputies', []))

    print(f"   ‚úÖ {f}: {size_mb:.2f} MB | {n_speeches} speeches | {n_deputies} deputies")

## 5. Download Results

Download the generated JSON files to use with the frontend.

In [None]:
from google.colab import files

# Create a zip of all output files
!cd frontend/public && zip -r ../../parliament_data.zip *.json

# Download the zip
files.download('parliament_data.zip')

print("\nüì• Download started! Extract the zip and place files in frontend/public/")

## 6. Quick Data Exploration

Preview some analytics results.

In [None]:
import json

# Load one of the output files
with open('frontend/public/camera.json', 'r', encoding='utf-8') as f:
    camera_data = json.load(f)

# Show available analytics
analytics = camera_data.get('analytics', {}).get('global', {})
print("üìä Available Analytics:")
for key in analytics.keys():
    print(f"   - {key}")

In [None]:
# Example: Show distinctive keywords for a party
identity = analytics.get('identity', {})
keywords = identity.get('distinctive_keywords', {})

print("\nüè∑Ô∏è Distinctive Keywords by Party:")
for party, words in list(keywords.items())[:5]:  # First 5 parties
    print(f"\n   {party}:")
    print(f"   {', '.join(words[:10])}")

In [None]:
# Example: Show party affinity matrix
relations = analytics.get('relations', {})
affinity = relations.get('affinity_matrix', {})

if affinity:
    print("\nü§ù Top Party Pairs by Semantic Similarity:")
    pairs = affinity.get('pairs', [])[:10]
    for p in pairs:
        print(f"   {p['party1']} ‚Üî {p['party2']}: {p['similarity']:.3f}")

---

## Next Steps

1. **Download** `parliament_data.zip` from Step 5
2. **Extract** the JSON files to your local `frontend/public/` folder
3. **Run the frontend** locally:
   ```bash
   cd frontend
   npm install
   npm run dev
   ```
4. **Open** `http://localhost:5173` in your browser

---

Made with ‚ù§Ô∏è for political science research