## **Teamleden**

|Teamleden|GitHub Username|
|--|--|
|Nima Ghafar|NimaGhafar|
|Busse Heemskerk|BJHeemskerk|
|Henry Lau||
|Jesse van Leeuwen|22096337|

# *Foto Herkennings app*

In dit notebook zal de pipeline worden opgesteld waarmee data kan worden ingeladen en het model mee wordt getraind, gedeployed en gehertraind.

-verdere opdrachtomschrijving-

-inhoudsopgave-

## Inladen van de libaries en de data

In [1]:
import pandas as pd
import numpy as np
import os
import cv2

## Begin data-ingestion pipeline

Om de data in te kunnen laden wordt er gebruik gemaakt van verschillende functies om de juiste afbeeldingspaden in de juiste dataframes te zetten. Zo zijn de train, test en validatie afbeeldingen gesplitst in drie verschillende dataframes.

In [14]:
# Functie om afbeeldingen te lezen
def read_image(filename):
    # OpenCV om afbeelding in te lezen
    image = cv2.imread(filename)
    # Omzetten van afbeelding naar RGB
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    return image

# Functie om afbeeldingen te lezen vanuit een txt bestand
def read_image_filenames(file_path):
    with open(file_path, 'r') as file:
        lines = file.read().splitlines()
    return lines

# Functie om tokens in te lezen
def read_tokens(file_path):
    with open(file_path, 'r') as file:
        lines = file.read().splitlines()
    tokens_dict = {}
    for line in lines:
        parts = line.split()
        image_token = parts[0].split('#')[0]
        image_index = int(parts[0].split('#')[1])
        tokens = ' '.join(parts[1:])
        tokens_dict[f'{image_token}' + f'{image_index}'] = (image_index, tokens)
    return tokens_dict

# Functie om afbeeldingen in DataFrames te laden
def load_images_into_dataframes(
        data_dir, label_dir, labels_file, train_file, test_file, val_file
        ):
    # Lezen van bestandsnamen uit txt bestanden
    train_filenames = read_image_filenames(os.path.join(label_dir, train_file))
    test_filenames = read_image_filenames(os.path.join(label_dir, test_file))
    val_filenames = read_image_filenames(os.path.join(label_dir, val_file))

    # Inlezen van alle tokens
    tokens_dict = read_tokens(os.path.join(label_dir, labels_file))

    # Aanmaken DataFrames
    train_df = pd.DataFrame({'filename': train_filenames})
    test_df = pd.DataFrame({'filename': test_filenames})
    val_df = pd.DataFrame({'filename': val_filenames})

    # Toevoegen van volledige file_paths
    train_df['filepath'] = train_df['filename'].apply(lambda x: os.path.join(data_dir, x))
    test_df['filepath'] = test_df['filename'].apply(lambda x: os.path.join(data_dir, x))
    val_df['filepath'] = val_df['filename'].apply(lambda x: os.path.join(data_dir, x))

    # Toevoegen van de labels als kolommen
    for i in range(5):
        label_col_name = f"label #{i}"
        train_df[label_col_name] = train_df['filename'].apply(
            lambda x: tokens_dict.get((str(x) + str(i)), (None, ""))[1]
            )
        test_df[label_col_name] = test_df['filename'].apply(
            lambda x: tokens_dict.get((str(x) + str(i)), (None, ""))[1]
            )
        val_df[label_col_name] = val_df['filename'].apply(
            lambda x: tokens_dict.get((str(x) + str(i)), (None, ""))[1]
            )

    # Toevoegen van numpy arrays van de afbeeldingen
    train_df['image_array'] = train_df['filepath'].apply(lambda x: read_image(x))
    test_df['image_array'] = test_df['filepath'].apply(lambda x: read_image(x))
    val_df['image_array'] = val_df['filepath'].apply(lambda x: read_image(x))
    print("Het dataframe is klaar")

    return train_df, val_df, test_df


# Toewijzen van alle paden
data_directory = 'Images'
label_directory = 'Label_files'
labels_file = 'Flickr8k.token.txt'
train_file_path = 'Flickr_8k.trainImages.txt'
test_file_path = 'Flickr_8k.testImages.txt'
val_file_path = 'Flickr_8k.devImages.txt'

# Laden van afbeeldingen in datasets
train_df, val_df, test_df = load_images_into_dataframes(
    data_directory, label_directory, labels_file, train_file_path, test_file_path, val_file_path
    )

# Tonen van de datasets
print("Train DataFrame:")
display(train_df.head())

print("\nValidation DataFrame:")
display(val_df.head())

print("\nTest DataFrame:")
display(test_df.head())

Het dataframe is klaar
Train DataFrame:


Unnamed: 0,filename,filepath,label #0,label #1,label #2,label #3,label #4,image_array
0,2513260012_03d33305cf.jpg,Images\2513260012_03d33305cf.jpg,A black dog is running after a white dog in th...,Black dog chasing brown dog through snow,Two dogs chase each other across the snowy gro...,Two dogs play together in the snow .,Two dogs running through a low lying body of w...,"[[[38, 31, 25], [64, 50, 49], [78, 73, 67], [2..."
1,2903617548_d3e38d7f88.jpg,Images\2903617548_d3e38d7f88.jpg,A little baby plays croquet .,A little girl plays croquet next to a truck .,The child is playing croquette by the truck .,The kid is in front of a car with a put and a ...,The little boy is playing with a croquet hamme...,"[[[254, 254, 254], [254, 254, 254], [254, 254,..."
2,3338291921_fe7ae0c8f8.jpg,Images\3338291921_fe7ae0c8f8.jpg,A brown dog in the snow has something hot pink...,A brown dog in the snow holding a pink hat .,A brown dog is holding a pink shirt in the snow .,A dog is carrying something pink in its mouth ...,A dog with something pink in its mouth is look...,"[[[146, 143, 154], [149, 143, 155], [153, 142,..."
3,488416045_1c6d903fe0.jpg,Images\488416045_1c6d903fe0.jpg,A brown dog is running along a beach .,A brown dog wearing a black collar running acr...,A dog walks on the sand near the water .,Brown dog running on the beach .,The large brown dog is running on the beach by...,"[[[121, 158, 202], [117, 154, 198], [118, 155,..."
4,2644326817_8f45080b87.jpg,Images\2644326817_8f45080b87.jpg,A black and white dog with a red Frisbee stand...,A dog drops a red disc on a beach .,A dog with a red Frisbee flying in the air .,Dog catching a red Frisbee .,The black dog is dropping a red disc on a beach .,"[[[168, 182, 209], [169, 183, 210], [168, 184,..."



Validation DataFrame:


Unnamed: 0,filename,filepath,label #0,label #1,label #2,label #3,label #4,image_array
0,2090545563_a4e66ec76b.jpg,Images\2090545563_a4e66ec76b.jpg,the boy laying face down on a skateboard is be...,Two girls play on a skateboard in a courtyard .,Two people play on a long skateboard .,Two small children in red shirts playing on a ...,two young children on a skateboard going acros...,"[[[174, 182, 185], [169, 177, 180], [164, 172,..."
1,3393035454_2d2370ffd4.jpg,Images\3393035454_2d2370ffd4.jpg,a boy in a blue top is jumping off some rocks ...,A boy jumps off a tan rock .,A boy jumps up in a field in the woods .,A young boy jumps off a rock in the forest,Child in blue and grey shirt jumping off hill ...,"[[[153, 159, 159], [168, 173, 166], [125, 126,..."
2,3695064885_a6922f06b2.jpg,Images\3695064885_a6922f06b2.jpg,A lady walking her dog through an obstacle cou...,A small tan and white dog and trainer running ...,A woman is guiding a brown dog around an obsta...,A woman with a hat is leading a small dog thro...,The woman is leading a dog through an obstacle...,"[[[43, 52, 49], [36, 45, 40], [39, 48, 43], [3..."
3,1679557684_50a206e4a9.jpg,Images\1679557684_50a206e4a9.jpg,a big black dog jumps in the air to catch the ...,A dog looks at another dog catching a ball in ...,A white dog is watching a black dog jump on a ...,A white dog watching a black dog in the air .,Two dogs playing with a tennis ball in the yard .,"[[[70, 91, 74], [79, 92, 82], [78, 94, 81], [6..."
4,3582685410_05315a15b8.jpg,Images\3582685410_05315a15b8.jpg,two woman climbing rocks around the ocean,Two women are climbing over rocks near to the ...,Two women climb on top of rocks in front of th...,Two women in bathing suit on large rocks at th...,Two women in bathing suits climb rock piles by...,"[[[119, 154, 212], [120, 155, 211], [122, 154,..."



Test DataFrame:


Unnamed: 0,filename,filepath,label #0,label #1,label #2,label #3,label #4,image_array
0,3385593926_d3e9c21170.jpg,Images\3385593926_d3e9c21170.jpg,The dogs are in the snow in front of a fence .,The dogs play on the snow .,Two brown dogs playfully fight in the snow .,Two brown dogs wrestle in the snow .,Two dogs playing in the snow .,"[[[29, 33, 44], [38, 42, 53], [34, 38, 49], [2..."
1,2677656448_6b7e7702af.jpg,Images\2677656448_6b7e7702af.jpg,a brown and white dog swimming towards some in...,A dog in a swimming pool swims toward sombody ...,A dog swims in a pool near a person .,Small dog is paddling through the water in a p...,The small brown and white dog is in the pool .,"[[[0, 61, 174], [0, 60, 169], [0, 60, 167], [1..."
2,311146855_0b65fdb169.jpg,Images\311146855_0b65fdb169.jpg,A man and a woman in festive costumes dancing .,A man and a woman with feathers on her head da...,A man and a woman wearing decorative costumes ...,one performer wearing a feathered headdress da...,Two people are dancing with drums on the right...,"[[[253, 253, 253], [254, 254, 254], [255, 255,..."
3,1258913059_07c613f7ff.jpg,Images\1258913059_07c613f7ff.jpg,A couple of people sit outdoors at a table wit...,Three people are sitting at an outside picnic ...,Three people sit at an outdoor cafe .,Three people sit at an outdoor table in front ...,Three people sit at a picnic table outside of ...,"[[[12, 24, 46], [18, 22, 51], [16, 20, 49], [1..."
4,241347760_d44c8d3a01.jpg,Images\241347760_d44c8d3a01.jpg,A man is wearing a Sooners red football shirt ...,A Oklahoma Sooners football player wearing his...,A Sooners football player weas the number 28 a...,Guy in red and white football uniform,The American footballer is wearing a red and w...,"[[[89, 90, 84], [90, 91, 86], [89, 90, 85], [8..."


Nu de datasets zijn ingeladen kan er gekeken worden naar verschillende vormen van Feature Engineering. Aangezien de token het makkelijkste zijn om te onderzoeken, zullen we hier eerst naar gaan kijken.

In [15]:
# Het tonen van een token
print(train_df['label #0'][0])
print(train_df['label #1'][0])
print(train_df['label #2'][0])
print(train_df['label #3'][0])
print(train_df['label #4'][0])

A black dog is running after a white dog in the snow .
Black dog chasing brown dog through snow
Two dogs chase each other across the snowy ground .
Two dogs play together in the snow .
Two dogs running through a low lying body of water .


Zoals er te zien is, is er verschillende informatie beschikbaar over de informatie. Volgens de tokens bevat de afbeelding twee honden die door een kleine plas water rennen. Echter bevat de caption ook nog een punt aan het einde. Deze is overbodig en lijdt tot overbodige informatie naar het model.

## Begin ML-pipeline