In [None]:
%matplotlib inline

### Reading and Interpreting Metadata

There is a lot of data stored in each of the images. It is stored in 3 types of metadata extensions. They are:
1. XMP
2. Exif
3. IPTC

Because we mostly care about the color classes and whether images are "tagged" or not we need edited in data. This data is supplied using Photomechanic or Photoshop and looks different to an image. Because Exif is the data the camera collected at the time the picture was taken, this means that this data is stored in the XMP and IPTC data. IPTC (International Press Telecommunications Council) Metadata is an older varient where XMP is newer and can be stored either in a sidecar file or in the image itself. This may make a difference as we work backward in year over our images. But starting in 2017, XMP is easier to work with so we will use it as the baseline. Metadata tags look like the following:

```
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about=""
        xmlns:xmp="http://ns.adobe.com/xap/1.0/"
        xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
        xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/"
        xmlns:photomechanic="http://ns.camerabits.com/photomechanic/1.0/"
        xmlns:aux="http://ns.adobe.com/exif/1.0/aux/"
        xmp:CreatorTool="NIKON D750 Ver.1.10     "
        xmp:CreateDate="2017-09-21T07:34:40.73"
        xmp:Rating="0"
        photoshop:AuthorsPosition="Photographer"
        photoshop:Credit="REDACTED FOR PII"
        photoshop:Source="Columbia Missourian"
        photoshop:CaptionWriter="NC"
        photoshop:DateCreated="2017-09-21T07:34:40-05:00"
        xmpRights:Marked="True"
        xmpRights:WebStatement="www.ColumbiaMissourian.com"
        photomechanic:ColorClass="0"
        photomechanic:Tagged="False"
        photomechanic:Prefs="0:0:0:008714"
        photomechanic:PMVersion="PM5"
        aux:ImageNumber="8714">
            <xmpRights:UsageTerms>
                <rdf:Alt>
                    <rdf:li xml:lang="x-default">MAGS OUT, NO SALES</rdf:li>
                </rdf:Alt>
            </xmpRights:UsageTerms>
            <dc:subject>
                <rdf:Bag>
                    <rdf:li>#09212017hickmanvolleyball</rdf:li>
                    <rdf:li>Varsity</rdf:li>
                    <rdf:li>JV</rdf:li>
                    <rdf:li>California</rdf:li>
                    <rdf:li>Kewpies</rdf:li>
                    <rdf:li>Hickman</rdf:li>
                    <rdf:li>sports</rdf:li>
                    <rdf:li>women</rdf:li>
                </rdf:Bag>
            </dc:subject>
            <dc:description>
                <rdf:Alt>
                    <rdf:li xml:lang="x-default">Hickman Plays California in a volleyball game at Hickman High School on Thursday, Sept. 21, 2017.&#xA;REDACTED FILE PATH</rdf:li>
                </rdf:Alt>
            </dc:description>
            <dc:creator>
                <rdf:Seq>
                    <rdf:li>REDACTED FOR PII</rdf:li>
                </rdf:Seq>
            </dc:creator>
            <dc:rights>
                <rdf:Alt>
                    <rdf:li xml:lang="x-default">REDACTED FILE PATH</rdf:li>
                </rdf:Alt>
            </dc:rights>
            <Iptc4xmpCore:CreatorContactInfo
            Iptc4xmpCore:CiTelWork="REDACTED FOR PII"
            Iptc4xmpCore:CiEmailWork="REDACTED FOR PII"/>
        </rdf:Description>
    </rdf:RDF>
</x:xmpmeta>

```

##### Steps to find XMP metadata in image:
1. Read in image via file path
2. Open the image in read binary mode
3. Search the image to find the '<x:xmpmeta' tag and read to the '</x:xmpmeta' closing tag
4. Add padding for last close xmp tag
5. Print out the xmp data

From here the most useful 3 data tags are located near the top. They are:
```
photomechanic:ColorClass="0"
photomechanic:Tagged="False"
photomechanic:Prefs="0:0:0:008714"
```
Where tagged is whether the image is tagged at all, the prefs a base 64 string that contains specially designated preferences, and the final color class is the color code assigned. The color code exists on ~20% of all images in the dataset and spans a range of 1-8.

In [None]:
def find_color_code(data_loader):
    counter = 0
    i = 0
    for _,_,path in data_loader:
        try:
            i = i+1
            path=path.rstrip()
            with open(path, "rb") as f:
                img = f.read()
            img_string = str(img)
            xmp_start = img_string.find('photomechanic:ColorClass')
            xmp_end = img_string.find('photomechanic:Tagged')
            if xmp_start != xmp_end:
                xmp_string = img_string[xmp_start:xmp_end]
                if xmp_string[26] != "0":
                    print(xmp_string[26] + " " + str(path) + "\n\n")
                else:
                    counter = counter + 1
        except Exception as e:
    print(counter)
    print("Total Images: " + str(i))

Before we can find the color codes we need to get a set of images first. This can be accomplished by passing in a dataloader to find_color_code. These dataloaders are generated in load_split_train_test.

#### Next Step Notes:
Check out subclass to dataloaders, might be able to run this here and save directly to labels to speed it up? Try to run this across all images in path to see how many files are color classed.