# DATA DOWNLOAD AND EXPLORATION

## 1. Project Context
This projects aims to recognize fouls in soccer matches through multi-view videos. We use the SoccerNet-v2 dataset, wich provides detailed videos and annotations for training an validating.

This first notebook will cover the **installation** of the dataset to be used. This dataset is provided by _SoccerNet_ and contains over 3.000 multi-view videos for training, validation and testing, and provides 273 actions for the challenge.

We start by installing the __SoccerNet-v2__, wich will provide us the necessary tools for the download of the dataset.


## 2. Setup Instructions

We use a virtual environment to manage the project's dependecies. This prevents conflicts with Python configuration between contributors.

Before running this notebook, please ensure the following steps have been completed:

1. **Activate the Virtual Environment**:
   Make sure you have activated the virtual environment where all the dependencies are installed. If you haven't created a virtual environment yet, you can do so with the following commands:

   ```sh
   python -m venv venv
   source venv\Scripts\activate  # On Windows

2. **Make sure you have all the dependencies installed:**
   You can do so by running the following command:

   ```sh
   pip install -r requirements.txt
   

4. **Run Jupyter Lab once the Virtual Environment is running**:
   Run the following command:

   ```sh
   jupyter lab


In [None]:
!pip install SoccerNet --upgrade

## 3. Environment Variables

For this download we need to adquire a **license of use** through this [link](https://docs.google.com/forms/d/e/1FAIpQLSfYFqjZNm4IgwGnyJXDPk2Ko_lZcbVtYX73w5lf6din5nxfmA/viewform). To avoid hardcoding sensitive information in the code, we use environment variables wich are crucial for keeping sensitive information. Make sure to properly follow the next steps. 

#### Steps to Set Up

1. Copy the `env_example` file and rename it as `.env` in the root directory of this project.
3. Replace `your_password_here` with your actual Soccernet password.

In [3]:
import SoccerNet
from SoccerNet.Downloader import SoccerNetDownloader
from dotenv import load_dotenv
import os

mySoccerNetDownloader = SoccerNetDownloader(LocalDirectory="../data")

load_dotenv()

password = os.getenv('SOCCERNET_PASSWORD')
mySoccerNetDownloader.password = password

## 4. Dataset description and download

- train: Standard training data for models, with complete annotations.
- valid: Validation data to evaluate performance during development.
- test: Test data with non-public labels (typically used for competitions).
- challenge: A subset of data specific to official competition or specific challenges, such as Multi-View Foul Recognition.

We need the dataset related withe *Multi-view Foul Recognition* challenge, known as __mvfouls__. Also we will download only videos in 720p.

**IMPORTANT: by executing the following code the installation will start, it can last a long time, so be sure of executing it.**

In [2]:
mySoccerNetDownloader.downloadDataTask(task="mvfouls", split=["train","valid","test","challenge"], password = psswrd, version = "720p")

Downloading ../data\mvfouls\train_720p.zip...: : 13.3GiB [14:30, 15.2MiB/s]                                            
Downloading ../data\mvfouls\valid_720p.zip...: : 1.85GiB [02:05, 14.7MiB/s]                                            
Downloading ../data\mvfouls\test_720p.zip...: : 1.45GiB [01:42, 14.1MiB/s]                                             
Downloading ../data\mvfouls\challenge_720p.zip...: : 1.37GiB [01:31, 14.9MiB/s]                                        


The dataset download is in _.zip_ files, which must be **unzipped**.

The dataset consists of **3,901 actions**. Each action is composed of at least two videos showing the live action and at least one replay.

The dataset is divided into:
- Training set (2916 actions).
- Validation set (411 actions).
- Test set (301 actions).
- Challenge set (273 actions without the annotations).

## 4. Download Verification

It is important to verify that the videos and labels have been downloaded correctly.

We will then use _OpenCV_ to check that the videos and tags have been downloaded correctly.

In [2]:
import cv2
import os

train_dataset_path = "../data/mvfouls/train_720p"

training_videos_path = os.path.join(train_dataset_path)
example_video = os.path.join(training_videos_path, "action_1/clip_0.mp4")

def visualize_video(video_path, num_frames=100):
    cap = cv2.VideoCapture(video_path)
    frame_count = 0
    
    if not cap.isOpened():
        print(f"Video couldn't be opened: {video_path}")
        return

    while cap.isOpened() and frame_count < num_frames:
        ret, frame = cap.read()
        if not ret:
            print("End of the video or error while reading the frame.")
            break
        
        cv2.imshow('Video Frame', frame)
        if cv2.waitKey(25) & 0xFF == ord('q'):  # Press 'q' to exit
            break
        frame_count += 1

    cap.release()
    cv2.destroyAllWindows()

visualize_video(example_video)

We will also verify that the tags are in the correct format.

In [3]:
import json

labels_path = os.path.join(train_dataset_path, "annotations.json")

with open(labels_path, "r") as f:
    data = json.load(f)

print(f"Total fo etiqueted actions: {data['Number of actions']}")
for action_index, action_data in data['Actions'].items():
    print(f"Action index: {action_index}")
    print(f"Action details: {action_data}")
    break


Total fo etiqueted actions: 2916
Action index: 0
Action details: {'UrlLocal': 'england_epl\\2014-2015\\2015-02-21 - 18-00 Chelsea 1 - 1 Burnley', 'Offence': 'Offence', 'Contact': 'With contact', 'Bodypart': 'Upper body', 'Upper body part': 'Use of shoulder', 'Action class': 'Challenge', 'Severity': '1.0', 'Multiple fouls': '', 'Try to play': '', 'Touch ball': '', 'Handball': 'No handball', 'Handball offence': '', 'Clips': [{'Url': 'Dataset/Train/action_0/clip_0', 'Camera type': 'Main camera center', 'Timestamp': 1730826, 'Replay speed': 1.0}, {'Url': 'Dataset/Train/action_0/clip_1', 'Camera type': 'Close-up player or field referee', 'Timestamp': 1744173, 'Replay speed': 1.8}]}


## 5. Preliminary Data Analysis

In this brief section we perform a preliminary analysis of the downloaded data by visualizing some action's tags
We will extract the information of three actions and also visualize and example of two different clips from the same action.

In [5]:
actions = data['Actions']

count = 3

for action_id, action in actions.items():
    print(f"Action number {action_id}:")
    print(f"  > Offence: {action['Offence']}")
    print(f"  > Bodypart: {action['Bodypart']}")
    print(f"  > Action type: {action['Action class']}")
    print(f"  > Severity: {action['Severity']}")
    
    for clip in action['Clips']:
        clip_url = clip['Url']
        camera_type = clip['Camera type']
        timestamp = clip['Timestamp']
        print(f"    - Clip - {camera_type}: {clip_url} @ {timestamp}ms")
    count -= 1
    if count == 0:
        break
    print(f"------------------------------------------------------")


Action number 0:
  > Offence: Offence
  > Bodypart: Upper body
  > Action type: Challenge
  > Severity: 1.0
    - Clip - Main camera center: Dataset/Train/action_0/clip_0 @ 1730826ms
    - Clip - Close-up player or field referee: Dataset/Train/action_0/clip_1 @ 1744173ms
------------------------------------------------------
Action number 1:
  > Offence: Offence
  > Bodypart: Under body
  > Action type: Tackling
  > Severity: 3.0
    - Clip - Main camera center: Dataset/Train/action_1/clip_0 @ 2400217ms
    - Clip - Close-up player or field referee: Dataset/Train/action_1/clip_1 @ 2415695ms
------------------------------------------------------
Action number 2:
  > Offence: Offence
  > Bodypart: Under body
  > Action type: Standing tackling
  > Severity: 3.0
    - Clip - Main camera center: Dataset/Train/action_2/clip_0 @ 206821ms
    - Clip - Close-up player or field referee: Dataset/Train/action_2/clip_1 @ 230429ms
    - Clip - Close-up player or field referee: Dataset/Train/action_2

In this fragment we will show both clips of an action currently.

In [7]:
import cv2
import os

train_dataset_path = "../data/mvfouls/train_720p"

clip_0_path = os.path.join(train_dataset_path, "action_1/clip_0.mp4")
clip_1_path = os.path.join(train_dataset_path, "action_1/clip_1.mp4")


def visualize_clips_synchronized(clip_1_path, clip_2_path, timestamp_1=0, timestamp_2=0, num_frames=100):
    cap1 = cv2.VideoCapture(clip_1_path)
    cap2 = cv2.VideoCapture(clip_2_path)
    
    if not cap1.isOpened() or not cap2.isOpened():
        print(f"Error opening videos: {clip_1_path}, {clip_2_path}")
        return

    fps1 = cap1.get(cv2.CAP_PROP_FPS)
    fps2 = cap2.get(cv2.CAP_PROP_FPS)

    start_frame_1 = int(timestamp_1 / 1000 * fps1)
    start_frame_2 = int(timestamp_2 / 1000 * fps2)

    cap1.set(cv2.CAP_PROP_POS_FRAMES, start_frame_1)
    cap2.set(cv2.CAP_PROP_POS_FRAMES, start_frame_2)

    frame_count = 0
    while frame_count < num_frames:
        ret1, frame1 = cap1.read()
        ret2, frame2 = cap2.read()

        if not ret1 or not ret2:
            print("End of one of the videos or error when reading the frame.")
            break

        combined_frame = cv2.hconcat([frame1, frame2])
        cv2.imshow('Synchronized Clips', combined_frame)

        if cv2.waitKey(int(1000 / max(fps1, fps2))) & 0xFF == ord('q'):  # Press 'q' to exit
            break

        frame_count += 1

    cap1.release()
    cap2.release()
    cv2.destroyAllWindows()

visualize_clips_synchronized(clip_0_path, clip_1_path, timestamp_1=0, timestamp_2=1000)

## 6. Next Steps
In the next notebook, we will perform more comprehensive data exploration to understand the distribution and characteristics of the dataset and tags.