## Number Detector

### Problem Statement

Given a set of images of football players, identify their jersey numbers from the images. The images are taken from various angles. The goal is to develop a model with a high accuracy that can detect a jersey number from an image taken from any angle.

### Dataset
The main dataset I used is from here: https://www.kaggle.com/datasets/frlemarchand/nfl-player-numbers/data

The full dataset contains about 43,500 64x64 images of NFL players with jersey numbers. The dataset appears to have some images where it is difficult, even for a human, to identify a player number. The images appear to come from All-22 film, with images from both the sideline and endzone view. 

I intend to use a randomly selected subset of images, as it is not feasible to upload all of these images to GitHub, and training a model on all of these images would be computationally expensive.

### To Do List
- Reduce size of dataset
- Read csv file
- Obtain labels from csv file, based on the images being used
- Preprocess images
- Create a model
- Train the model
- Test the model
- Evaluate the model

In [None]:
# ALL IMPORTS
import os
import random
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

### Reducing Full Dataset

The following code blocks reduced the size of the dataset from 43540 images to 1000 images.

In [4]:
import os
import random

In [13]:
data_dir = os.path.join('archive', 'train_player_numbers')
image_count = 43540

def select_images(data_dir, sample_size = 1000, extensions = ('.jpg', '.jpeg', '.png')):
    image_files = [file for file in os.listdir(data_dir) if file.lower().endswith(extensions)]
    img_sample = random.sample(image_files, sample_size)
    
    for image in image_files:
        if image not in img_sample:
            os.remove(os.path.join(data_dir, image))

In [None]:
# select_images(data_dir)    

In [14]:
# checking the size of image files
image_files = [file for file in os.listdir(data_dir)]
len(image_files)

1000

### Removing files which can't be accessed

The following code blocks remove any files which can't be accessed, to prevent errors from coming up later on.

In [None]:
import cv2

  import imghdr


In [7]:
for image in os.listdir(data_dir):
    try:
        img = cv2.imread(os.path.join(data_dir, image))
    except:
        os.remove(os.path.join(data_dir, image))

### Reading CSV file

The csv file given with the dataset contains the labels for the images. The following code blocks read the csv file and obtain the labels for the images being used.

In [9]:
import pandas as pd
import numpy as np

In [19]:
csv_df = pd.read_csv('archive/train_player_numbers.csv')
csv_df.head()

Unnamed: 0,filename,video_frame,player,label,left,top,right,bottom,filepath
0,58000_001306_Sideline_240_V84.png,58000_001306_Sideline_240,V84,84,826,137,890,201,train_player_numbers/58000_001306_Sideline_240...
1,58095_004022_Endzone_140_H24.png,58095_004022_Endzone_140,H24,24,592,323,656,387,train_player_numbers/58095_004022_Endzone_140_...
2,58094_002819_Sideline_200_V83.png,58094_002819_Sideline_200,V83,83,749,309,813,373,train_player_numbers/58094_002819_Sideline_200...
3,57594_000923_Sideline_240_V23.png,57594_000923_Sideline_240,V23,23,585,76,649,140,train_player_numbers/57594_000923_Sideline_240...
4,57680_003470_Endzone_260_V72.png,57680_003470_Endzone_260,V72,72,530,189,594,253,train_player_numbers/57680_003470_Endzone_260_...


In [20]:
# makes sure that labels are only for the images that we have
csv_df = csv_df[csv_df['filename'].isin(image_files)]