<img src='https://www.cs.usask.ca/images/news/2020/wheatdetection.png' width="768">
<h1><center>Global Wheat Detection EDA for BBox</center><h1>

## <a name="Wheat Detection">About this EDA 📗</a>

In this competition, there are some wrong bbox and unlabeled wheat heads.
- Hug bbox
- Too small bbox
- Wrong target(e.g. ladybug)
- etc.

I cleaned data in my way. 

Although this competition is over,I think it will help someone. 
I decided to share this results.
Later, you can just download **clean_df.csv** in my public dataset.

**[Update]** Clean BBox Dataset for Wheat Detection is [this](http://www.kaggle.com/piantic/gwd-clean-train).
And I added this dataset!

If you feel this was something new and fresh, and it added some value to you, please consider <font color='orange'>upvoting</font>, it motivates to keep writing good kernels. 😄

# <a id='importing'>Importing the necessary libraries </a> 

In [None]:
import cv2
import math
import numpy as np
import scipy as sp
import pandas as pd

import glob 
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

import plotly.graph_objects as go
import plotly.express as px
import plotly.figure_factory as ff

# <a id='reading'>Reading the train.csv </a>
- train_df : `train.csv`(original)
- clean_df : `clean train.csv`

In [None]:
train_df = pd.read_csv('../input/gwd-clean-train/original_train.csv')
clean_df = pd.read_csv('../input/gwd-clean-train/new_train_0805.csv')

In [None]:
train_image_path = "../input/global-wheat-detection/train/"
test_image_path = "../input/global-wheat-detection/test/"

In [None]:
train_df.head()

In [None]:
clean_df.head()

## Unique image_ids

In [None]:
train_df['image_id'].nunique()

In [None]:
clean_df['image_id'].nunique()

# Visualizing images

In [None]:
def show_box(df, image_id, color='red'):
    df = df.where(df['image_id']== image_id)
    df = df.dropna(axis='rows')
    arr = df["bbox"].to_numpy()

    image = cv2.imread(f'{train_image_path}/{image_id}.jpg')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    for box in arr:
        box=box[1:-1]
        xmin,ymin,width,height= box.split(",")
 
        xmin = int(float(xmin))
        ymin= int(float(ymin)) 
        width = int(float(width))
        height= int(float(height))

        xmax = xmin + width
        ymax = ymin + height
        
        color_tuple = (255,0,0)
        if color == 'blue':
            color_tuple = (0,0,255)
            
        image = cv2.rectangle(image,(xmin,ymin), (xmax,ymax),color_tuple,3)
        
        img = Image.fromarray(image)
    return img

I'm going to show images with bbox.

`red`: train.csv
</br>


`blue`: clean train.csv

You can see wrong bbox. - Only visible to good people :)
- I recommend viewing the image in a larger view.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
    
ax.imshow(show_box(train_df, '41c0123cc'))

In this image, there is a hub bbox colored `red`.

And I removed this bbox as below(`blue`). 

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
    
ax.imshow(show_box(clean_df, '41c0123cc', 'blue'))

This looks pretty good.

But there is a wheat head that is `not labeled`. A hub box that I removed is for this.

# <a id='reading'>Cleaning the train.csv </a>
There are several ways for cleaning. Generally,
1. remove images have something wrong in train data : `simple`, `but inefficient`

2. remove wrong bbox in images : `it seems good`, but `unmarked wheat heads is appeared.`

3. remove wrong bbox and add new bbox in images : `best I think`, `but need doman knowledge & a lot of resources.`

4. etc.

I decide to choose 2nd way mainly.


And I removed image when unmarked wheat heads appeared too much.
- This is very subjective. So, it may be different from your opinion.

All explanations are over.

Let's start visualization for clean bbox.
- left side : `red` - train.csv
- right side :`blue` - clean train.csv

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '41c0123cc'))
ax[1].imshow(show_box(clean_df, '41c0123cc', 'blue'))

> a wrong bbox in left side. about `(322.0, 626.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '2cc75e9f5'))
ax[1].imshow(show_box(clean_df, '2cc75e9f5', 'blue'))

> a wrong bbox in left side. about `(37.0, 84.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'c01a58fdb'))
ax[1].imshow(show_box(clean_df, 'c01a58fdb', 'blue'))

> You can see `a ladybug` in left side. about `(713.0, 634.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'c631c7fdb'))
ax[1].imshow(show_box(clean_df, 'c631c7fdb', 'blue'))

> a wrong bbox in left side. about `(545.0, 798.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '61ff5cdc2'))
ax[1].imshow(show_box(clean_df, '61ff5cdc2', 'blue'))

> a wrong bbox in left side. about `(526.0, 531.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'c6d94be4c'))
ax[1].imshow(show_box(clean_df, 'c6d94be4c', 'blue'))

> a wrong target bbox in left side. about `(633.0, 421.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '8bad19780'))
ax[1].imshow(show_box(clean_df, '8bad19780', 'blue'))

> a wrong bbox in left side. about `(149.0, 1.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '4cbb2b7bd'))
ax[1].imshow(show_box(clean_df, '4cbb2b7bd', 'blue'))

> a wrong bbox in left side. about `(322.0, 626.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'fade0e053'))
ax[1].imshow(show_box(clean_df, 'fade0e053', 'blue'))

> a wrong bbox in left side. about `(753.0, 768.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '9a30dd802'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'b53afdf5c'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'd96205316'))
ax[1].imshow(show_box(clean_df, 'd96205316', 'blue'))

> a wrong bbox in left side. about `(479.0, 813.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'dc7c60052'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '6106eefbc'))

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '83842ec14'))
ax[1].imshow(show_box(clean_df, '83842ec14', 'blue'))

> a wrong bbox in left side. about `(493.0, 533.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '42e6efaaa'))
ax[1].imshow(show_box(clean_df, '42e6efaaa', 'blue'))

> a wrong bbox in left side. about `(272.0, 0.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'fc6860020'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '9780d64f5'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '49c3e4f6e'))
ax[1].imshow(show_box(clean_df, '49c3e4f6e', 'blue'))

> a wrong bbox in left side. about `(253.0, 684.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '409a8490c'))
ax[1].imshow(show_box(clean_df, '409a8490c', 'blue'))

> a wrong bbox in left side. about `(102.0, 268.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'd7a02151d'))
ax[1].imshow(show_box(clean_df, 'd7a02151d', 'blue'))

> a wrong bbox in left side. about `(712.0, 217.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'b53afdf5c'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'a1321ca95'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'a36608629'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '5ec31deb1'))
ax[1].imshow(show_box(clean_df, '5ec31deb1', 'blue'))

> a wrong bbox in left side. about `(602.0, 131.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '834690a35'))
ax[1].imshow(show_box(clean_df, '834690a35', 'blue'))

> two wrong bbox in left side. about `(31.0, 393.0), (952.0, 729.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'f5f5a9d30'))
ax[1].imshow(show_box(clean_df, 'f5f5a9d30', 'blue'))

> a wrong bbox in left side. about `(499.0, 677.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'd8cae4d1b'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'd067ac2b1'))
ax[1].imshow(show_box(clean_df, 'd067ac2b1', 'blue'))

> a wrong bbox in left side. about `(0.0, 153.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '69e509038'))
ax[1].imshow(show_box(clean_df, '69e509038', 'blue'))

> a wrong bbox in left side. about `(590.0, 419.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'f24698e88'))
ax[1].imshow(show_box(clean_df, 'f24698e88', 'blue'))

> a wrong bbox in left side. about `(721.0, 768.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '4b4f6de9b'))
ax[1].imshow(show_box(clean_df, '4b4f6de9b', 'blue'))

> a wrong bbox in left side. about `(13.0, 431.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '91c7fb84e'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '7d5af5b74'))

> remove images in clean data

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '7c659d49a'))
ax[1].imshow(show_box(clean_df, '7c659d49a', 'blue'))

> a wrong bbox in left side. about `(243.0, 633.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, 'd60e832a5'))
ax[1].imshow(show_box(clean_df, 'd60e832a5', 'blue'))

> a wrong bbox in left side. about `(325.0, 62.0)`

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '595709e55'))
ax[1].imshow(show_box(clean_df, '595709e55', 'blue'))

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
    
ax[0].imshow(show_box(train_df, '893938464'))
ax[1].imshow(show_box(clean_df, '893938464', 'blue'))

> a wrong bbox in left side. about `(116.0, 177.0)`

## If this kernel is useful, <font color='orange'>please upvote</font>!
- I hope everyone has a good something in this competition!