# ‚öΩ Tactical Analysis Bot

## üìã What This Bot Does
This bot answers questions about football tactical analysis using **your match data**.
It finds **exact answers** - no AI generation, just facts from your JSON file!

## üöÄ How to Use
1. **Run Cell 1** - Installs required packages (takes 1-2 minutes)
2. **Run Cell 2** - Sets up the environment
3. **Run Cell 3** - Upload your `team_shape_summary.json` file
4. **Run Cells 4-5** - Processes your data (creates searchable chunks)
5. **Run Cell 6** - Creates the query engine
6. **Run Cell 7+8** - Launches the chat interface

## üí° After Running
- Click the Gradio link that appears
- Ask questions like:
  - "What formation did team B use in final attack?"
  - "How wide was team A's defense?"

## ‚è±Ô∏è Expected Time
- Total runtime: 3-5 minutes
- Public link works for 72 hours

## üéØ Features
- ‚úÖ Exact answers only - no hallucination
- ‚úÖ Confidence scores
- ‚úÖ Source attribution
- ‚úÖ Beautiful chat interface
- ‚úÖ Free and runs in Colab

## EXTRA Notes
- When you see this line in the run of cell 7 & 8 (* Running on public URL), please go with this link to see the whole bot with historical conversiaion --- for a better experience.

In [27]:
# ========================================
# CELL 1: Install Dependencies
# ========================================
!pip install fastapi uvicorn pyngrok streamlit langchain langchain-community chromadb sentence-transformers pandas nest-asyncio
!pip install transformers torch



In [20]:
# ========================================
# CELL 2: Setup and Imports
# ========================================
import json
import numpy as np
from pydantic import BaseModel
from typing import List, Dict, Any, Optional
import nest_asyncio
import threading
import time
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
import hashlib
import os

# Apply nest_asyncio to allow running in Colab
nest_asyncio.apply()

print("‚úÖ Setup complete")

‚úÖ Setup complete


In [3]:
# ========================================
# CELL 3: Upload Your JSON File
# ========================================
from google.colab import files
print("üìÇ Please upload your team_shape_summary.json file:")
uploaded = files.upload()

# Load and verify
filename = list(uploaded.keys())[0]
with open(filename, 'r') as f:
    data = json.load(f)

print(f"‚úÖ JSON loaded successfully!")
print(f"Teams found: {list(data['summary'].keys())}")

üìÇ Please upload your team_shape_summary.json file:


Saving team_shape_summary.json to team_shape_summary.json
‚úÖ JSON loaded successfully!
Teams found: ['team_B', 'team_A']


In [21]:
# ========================================
# CELL 4: Create Searchable Chunks (No LLM)
# ========================================
class JSONChunker:
    """Convert JSON to searchable chunks with metadata"""

    def __init__(self, json_data):
        self.data = json_data
        self.chunks = []
        self.metadata = []

    def create_chunks(self):
        """Extract all facts as chunks"""
        for team, team_data in self.data['summary'].items():
            for phase, phase_data in team_data.items():
                for stage, stage_info in phase_data.items():
                    # Create base metadata
                    base_meta = {
                        'team': team,
                        'phase': phase,
                        'stage': stage,
                        'formation': stage_info.get('formation', 'N/A')
                    }

                    # 1. Formation chunk
                    self.chunks.append(f"Team {team} during {phase} {stage} played a {stage_info.get('formation', 'unknown')} formation")
                    self.metadata.append({**base_meta, 'type': 'formation'})

                    # 2. Shape metrics
                    if 'shape' in stage_info:
                        shape = stage_info['shape']
                        metrics = [
                            f"width average {shape['width_avg']:.2f}",
                            f"depth average {shape['depth_avg']:.2f}",
                            f"horizontal spread {shape['h_spread_avg']:.2f}",
                            f"vertical spread {shape['v_spread_avg']:.2f}",
                            f"stretch index {shape['stretch_index_avg']:.3f}"
                        ]
                        for metric in metrics:
                            self.chunks.append(f"Team {team} {phase} {stage} had {metric}")
                            self.metadata.append({**base_meta, 'type': 'shape', 'metric': metric})

                    # 3. Line positions
                    if 'lines' in stage_info:
                        lines = stage_info['lines']
                        line_types = ['defensive', 'midfield', 'attacking']
                        for line in line_types:
                            if line in lines:
                                pos = lines[line].get('line_x_position_avg')
                                width = lines[line].get('width_avg')
                                if pos:
                                    self.chunks.append(f"Team {team} {phase} {stage} {line} line at x={pos:.2f}")
                                    self.metadata.append({**base_meta, 'type': 'line', 'line': line, 'value': pos})
                                if width:
                                    self.chunks.append(f"Team {team} {phase} {stage} {line} width {width:.2f}")
                                    self.metadata.append({**base_meta, 'type': 'line_width', 'line': line, 'value': width})

                        # 4. Gaps
                        if 'line_gaps_x' in lines:
                            gaps = lines['line_gaps_x']
                            for gap_name, gap_value in gaps.items():
                                self.chunks.append(f"Team {team} {phase} {stage} {gap_name} = {gap_value:.2f}")
                                self.metadata.append({**base_meta, 'type': 'gap', 'gap': gap_name, 'value': gap_value})

                    # 5. Frame count
                    if 'frames_count' in stage_info:
                        self.chunks.append(f"Team {team} {phase} {stage} analyzed over {stage_info['frames_count']} frames")
                        self.metadata.append({**base_meta, 'type': 'frames', 'value': stage_info['frames_count']})

        print(f"‚úÖ Created {len(self.chunks)} searchable chunks")
        return self.chunks, self.metadata

# Create chunks
chunker = JSONChunker(data)
chunks, metadata = chunker.create_chunks()

# Show sample
print("\nüìù Sample chunks:")
for i in range(min(5, len(chunks))):
    print(f"  {i+1}. {chunks[i]}")

‚úÖ Created 180 searchable chunks

üìù Sample chunks:
  1. Team team_B during attack progression played a 2-4-4 formation
  2. Team team_B attack progression had width average 34.79
  3. Team team_B attack progression had depth average 51.38
  4. Team team_B attack progression had horizontal spread 11.37
  5. Team team_B attack progression had vertical spread 15.60


In [22]:
# ========================================
# CELL 5: Create Vector Database (Embeddings Only)
# ========================================
# Load embedding model (lightweight)
print("\nüî§ Loading embedding model...")
model = SentenceTransformer('all-MiniLM-L6-v2')  # 80MB only
print("‚úÖ Model loaded")

# Create embeddings
print("üìä Creating embeddings...")
embeddings = model.encode(chunks, show_progress_bar=True)
print(f"‚úÖ Created {len(embeddings)} embeddings of size {embeddings.shape[1]}")

# Setup ChromaDB
import chromadb
from chromadb.utils import embedding_functions

# Create persistent client
chroma_client = chromadb.PersistentClient(path="./chroma_db")

# Create or get collection
collection_name = "tactical_analysis"
try:
    chroma_client.delete_collection(collection_name)
except:
    pass

collection = chroma_client.create_collection(
    name=collection_name,
    metadata={"hnsw:space": "cosine"}
)

# Add documents to Chroma
print("üíæ Storing in vector database...")
ids = [hashlib.md5(chunk.encode()).hexdigest()[:16] for chunk in chunks]

# Convert metadata to strings for Chroma
chroma_metadatas = []
for m in metadata:
    meta_dict = {}
    for k, v in m.items():
        if v is not None:
            meta_dict[k] = str(v)
    chroma_metadatas.append(meta_dict)

# Add in batches
batch_size = 100
for i in range(0, len(chunks), batch_size):
    end_idx = min(i + batch_size, len(chunks))
    collection.add(
        embeddings=embeddings[i:end_idx].tolist(),
        documents=chunks[i:end_idx],
        metadatas=chroma_metadatas[i:end_idx],
        ids=ids[i:end_idx]
    )
    print(f"  Added batch {i//batch_size + 1}/{(len(chunks)-1)//batch_size + 1}")

print(f"‚úÖ Vector database ready with {collection.count()} entries")


üî§ Loading embedding model...


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


‚úÖ Model loaded
üìä Creating embeddings...


Batches:   0%|          | 0/6 [00:00<?, ?it/s]

‚úÖ Created 180 embeddings of size 384
üíæ Storing in vector database...
  Added batch 1/2
  Added batch 2/2
‚úÖ Vector database ready with 180 entries


In [23]:
# ========================================
# CELL 6: Create Direct Query Function
# ========================================
class TacticalQueryBot:
    """Direct query bot - no API needed"""

    def __init__(self, model, collection, chunks, metadata):
        self.model = model
        self.collection = collection
        self.chunks = chunks
        self.metadata = metadata

    def ask(self, question, top_k=5):
        """Answer question directly"""
        # Encode the question
        question_embedding = self.model.encode([question])[0]

        # Search in Chroma
        results = self.collection.query(
            query_embeddings=[question_embedding.tolist()],
            n_results=top_k
        )

        if not results['documents'][0]:
            return {
                'answer': "No relevant information found.",
                'sources': [],
                'confidence': 0.0
            }

        # Get best match
        best_doc = results['documents'][0][0]
        best_distance = results['distances'][0][0]
        confidence = 1.0 - best_distance

        # Format sources
        sources = []
        for i, (doc, meta, dist) in enumerate(zip(
            results['documents'][0],
            results['metadatas'][0],
            results['distances'][0]
        )):
            sources.append({
                'rank': i + 1,
                'text': doc,
                'metadata': meta,
                'similarity': 1.0 - dist
            })

        return {
            'answer': best_doc,
            'sources': sources,
            'confidence': confidence
        }

# Create the bot instance
bot = TacticalQueryBot(model, collection, chunks, metadata)

# Test it
test_result = bot.ask("What formation did team B use in final attack?")
print(f"\n‚úÖ Test query result: {test_result['answer']}")
print("‚úÖ Direct query bot ready!")


‚úÖ Test query result: Team team_B during attack final_attack played a 2-3-5 formation
‚úÖ Direct query bot ready!


In [26]:
# ========================================
# CELL 7+8 COMBINED: Run Gradio Directly
# ========================================
!pip install gradio -q

import gradio as gr

# Create the query function
def query_bot(question, history=[]):
    if not question:
        return "Please ask a question."

    # Get answer from bot
    result = bot.ask(question)

    # Format nice response with markdown
    answer_text = f"### ü§ñ Answer\n{result['answer']}\n\n"
    answer_text += f"**Confidence:** {result['confidence']:.1%}\n\n"

    if result['sources']:
        answer_text += "### üìö Sources\n"
        for i, source in enumerate(result['sources'][:3], 1):
            answer_text += f"**{i}. {source['text']}**\n"
            answer_text += f"   *Match: {source['similarity']:.1%}*\n\n"

    return answer_text

# Create beautiful interface
with gr.Blocks(title="‚öΩ Tactical Analysis Bot", theme=gr.themes.Soft()) as demo:
    gr.Markdown("""
    # ‚öΩ Tactical Analysis Assistant

    Ask anything about the match data and get **exact answers** - no generation, just facts!
    """)

    with gr.Tab("üí¨ Chat"):
        chatbot = gr.Chatbot(label="Conversation", height=500)
        msg = gr.Textbox(
            label="Your Question",
            placeholder="e.g., What formation did team B use in final attack?",
            lines=2
        )

        with gr.Row():
            clear = gr.Button("üóëÔ∏è Clear Chat")
            submit = gr.Button("üöÄ Ask", variant="primary")

    with gr.Tab("üìä Statistics"):
        gr.Markdown(f"### Database Statistics")
        gr.Markdown(f"- **Total facts:** {len(chunks)}")
        gr.Markdown(f"- **Teams:** team_A, team_B")
        gr.Markdown(f"- **Phases:** attack, defense")
        gr.Markdown(f"- **Stages:** build_up, progression, final_attack")

        # Show sample data
        if len(chunks) > 0:
            gr.Markdown("### Sample Facts")
            for i in range(min(5, len(chunks))):
                gr.Markdown(f"- {chunks[i]}")

    with gr.Tab("‚ùì Examples"):
        examples = [
            "What formation did team B use in final attack?",
            "How wide was team A's defense during progression?",
            "Compare gaps between lines for both teams",
            "What was team B's shape in build-up?",
            "How many frames for team A's defensive progression?",
            "What's the stretch index for team B in attack?",
            "Where was team A's defensive line in build-up?"
        ]

        for ex in examples:
            gr.Button(ex, size="sm").click(
                lambda x=ex: x,
                outputs=msg
            )

    # Handle chat
    def respond(message, chat_history):
        if not message:
            return "", chat_history
        response = query_bot(message)
        chat_history.append((message, response))
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])
    submit.click(respond, [msg, chatbot], [msg, chatbot])
    clear.click(lambda: None, None, chatbot, queue=False)

# Launch with public link
print("\n" + "="*70)
print("üöÄ Launching your Tactical Analysis Bot...")
print("="*70)

# Launch with share=True for public URL
demo.launch(share=True, debug=False, server_name="0.0.0.0")

# This will output two URLs:
# - Local URL: http://localhost:7860
# - Public URL: https://xxxxx.gradio.app (valid for 72 hours)

  with gr.Blocks(title="‚öΩ Tactical Analysis Bot", theme=gr.themes.Soft()) as demo:
  chatbot = gr.Chatbot(label="Conversation", height=500)
  chatbot = gr.Chatbot(label="Conversation", height=500)



üöÄ Launching your Tactical Analysis Bot...
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1038f6e38b746669d8.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


