### This notebook provides the code for downloading and testing NFL team logo images from the bing image search engine. The images will be used for image classification.

In [1]:
import time
import sys
import os
from PIL import Image

base_dir  = r'C:\Users\dlbry\Downloads\DSC 680\Project 2'

train_dir = os.path.join(base_dir, 'train')

if not os.path.isdir(train_dir):
        os.mkdir(train_dir)

Next, we will define the 31 NFL team names that we want to search for. 

In [2]:
# Define class labels
nfl_team_names = [
'Arizona Cardinals',
'Atlanta Falcons',
'Baltimore Ravens',
'Buffalo Bills',
'Carolina Panthers',
'Chicago Bears',
'Cincinnati Bengals',
'Cleveland Browns',
'Dallas Cowboys',
'Denver Broncos',
'Detroit Lions',
'Green Bay Packers',
'Houston Texans',
'Indianapolis Colts',
'Jacksonville Jaguars',
'Kansas City Chiefs',
'Las Vegas Raiders',
'Chargers',
'Rams',
'Miami Dolphins',
'Minnesota Vikings',
'New England Patriots',
'New Orleans Saints',
'Giants',
'Jets',
'Philadelphia Eagles',
'Pittsburgh Steelers',
'San Francisco 49ers',
'Seattle Seahawks',
'Tampa Bay Buccaneers',
'Tennessee Titans']
len(nfl_team_names)

31

Next, we will use the bing_image_downloader library to download 300 images for each NFL team logo search query. The downloader will automatically create a folder for each NFL team logo query and download the search engine image results into the query folder.

In [None]:
from bing_image_downloader import downloader

for team_logo in nfl_team_names:

    query = team_logo + ' NFL logo images'
   
    downloader.download(query, limit=300,  output_dir=train_dir, adult_filter_off=False, force_replace=False, timeout=60, verbose=True)

Since not all of the downloaded images will be error-free, we need to test them, and delete images that cannot open. We will iterate over all of the image folders and attempt to open each image. If the image does not open, then it will be deleted. If we do not delete these images, then they could cause issues during model-buildings, since they are unreadable.

In [None]:
image_count=0
for team_logo in nfl_team_names:
    
    query = team_logo + ' NFL logo images'

    team_dir = os.path.join(train_dir, query)
    
    os.chdir(team_dir)
    
    
    img_dir = team_dir
    for filename in os.listdir(img_dir):
        try :
            with Image.open(img_dir + "\\" + filename) as im:
                print('ok')
        except :
            print(img_dir + "\\" + filename, 'Is not ok')
            image_count+=1
            
            os.remove(img_dir + "\\" + filename)
print(str(image_count) + ' images were deleted.')

### An important note on image curation.

The final stage of the image collection process is to manually review the images in each folder. We need to make sure that there are no mislabeled images, and no repeated images of the same size. I found many duplicate sized images and images that were misclassified by the bing image search engine. I used domain knowledge of the team logos to ensure that all images were correctly labeled prior to model training. The curation of the images was by far the most time consuming step of the project, but was also the most critical. Failing to remove incorrectly labeled images would cause the model to propogate the misclassification, and would limit the accuracy of any model that is built using the data.  

After carefully curating the images, the final training set containined approximately 215 images for each NFL team logo.  The images used in the project can be downloaded from [google drive](https://drive.google.com/drive/folders/1aM-0xHmFzcPjx1pa0hImghVUhezYLkfa).