# 🌍🛰️ Planet: Understanding the Amazon from Space
## Use satellite data to track the human footprint in the Amazon rainforest

### Description
🌍🛰️ Planet Aerial Imagery

Every passing minute, an expanse of forest equivalent to 48 football fields disappears from our planet. The Amazon Basin takes the lead in this alarming deforestation trend, contributing to biodiversity loss, habitat destruction, climate change, and other catastrophic effects. Precise data on deforestation and human activities in forests is crucial for swift and effective responses from governments and local stakeholders.

🛰️ Planet, the innovator behind the world’s largest fleet of Earth-imaging satellites, is set to capture daily imagery of the entire Earth's land surface at an impressive 3-5 meter resolution. While existing research focuses on monitoring forest changes, it often relies on coarse-resolution imagery from sources like Landsat (30-meter pixels) or MODIS (250-meter pixels), limiting its effectiveness in areas dominated by small-scale deforestation or forest degradation.

Moreover, current methods struggle to distinguish between human-induced and natural forest loss. Higher resolution imagery, such as that from Planet, has demonstrated exceptional capability in this regard, but robust algorithms are yet to be developed.

🌈 In this competition, Planet and its Brazilian partner SCCON invite Kagglers to colorfully label satellite image chips with atmospheric conditions and various classes of land cover/land use. The resulting algorithms will empower the global community to comprehensively understand when, where, and why deforestation occurs worldwide—and, most importantly, how to respond effectively. 🚀✨

In [1]:
import os
path = "../input/planets-dataset/"
os.listdir(path)

['planet', 'test-jpg-additional']

In [2]:
import pandas as pd
train_label = pd.read_csv("/kaggle/input/planets-dataset/planet/planet/train_classes.csv")
train_label

Unnamed: 0,image_name,tags
0,train_0,haze primary
1,train_1,agriculture clear primary water
2,train_2,clear primary
3,train_3,clear primary
4,train_4,agriculture clear habitation primary road
...,...,...
40474,train_40474,clear primary
40475,train_40475,cloudy
40476,train_40476,agriculture clear primary
40477,train_40477,agriculture clear primary road


In [3]:
unique_labels = set()

def extract_unique_labels(tag_string):
    '''
    Takes in a string of tags, splits the tags, and stores them in a set
    '''
    [unique_labels.add(tag) for tag in tag_string.split()]

# Create a copy of the train_label DataFrame
train_data = train_label.copy()
# Apply the function to extract unique labels from the 'tags' column
train_data['tags'].apply(extract_unique_labels)
# Convert the set of unique labels to a list
unique_labels_list = list(unique_labels)
# Display the list of unique labels
print(unique_labels_list)

['haze', 'agriculture', 'slash_burn', 'cloudy', 'conventional_mine', 'artisinal_mine', 'selective_logging', 'water', 'primary', 'clear', 'blow_down', 'road', 'bare_ground', 'habitation', 'blooming', 'partly_cloudy', 'cultivation']


In [4]:
# One hot encoding for the labels in train classes
for label in unique_labels_list:
    train_data[label] = train_data['tags'].apply(lambda x: 1 if label in x.split() else 0)

# Adding '.jpg' extension to the 'image_name' column for consistency with image file names
train_data['image_name'] = train_data['image_name'].apply(lambda x: f'{x}.jpg')
train_data

Unnamed: 0,image_name,tags,haze,agriculture,slash_burn,cloudy,conventional_mine,artisinal_mine,selective_logging,water,primary,clear,blow_down,road,bare_ground,habitation,blooming,partly_cloudy,cultivation
0,train_0.jpg,haze primary,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,train_1.jpg,agriculture clear primary water,0,1,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0
2,train_2.jpg,clear primary,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0
3,train_3.jpg,clear primary,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0
4,train_4.jpg,agriculture clear habitation primary road,0,1,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40474,train_40474.jpg,clear primary,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0
40475,train_40475.jpg,cloudy,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
40476,train_40476.jpg,agriculture clear primary,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0
40477,train_40477.jpg,agriculture clear primary road,0,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0


In [5]:
# Defining the columns,i.e the labels that were newly added to the train_classes via hot encoding.
columns = list(train_data.columns[2:])
columns

['haze',
 'agriculture',
 'slash_burn',
 'cloudy',
 'conventional_mine',
 'artisinal_mine',
 'selective_logging',
 'water',
 'primary',
 'clear',
 'blow_down',
 'road',
 'bare_ground',
 'habitation',
 'blooming',
 'partly_cloudy',
 'cultivation']

In [6]:
import tensorflow as tf

def fbeta_score(y_true, y_pred, beta=2, epsilon=1e-4):
    """
    Compute the F-beta score for multi-label classification.

    Args:
        y_true: Correct target values.
        y_pred: Predicted values returned by the classifier.
        beta: Beta value for weighting precision and recall.
        epsilon: Small constant to avoid division by zero.

    Returns:
        F-beta score.
    """
    beta_squared = beta**2

    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.cast(tf.greater(tf.cast(y_pred, tf.float32), 0.5), tf.float32)

    true_positive = tf.reduce_sum(y_true * y_pred, axis=1)
    false_positive = tf.reduce_sum(y_pred, axis=1) - true_positive
    false_negative = tf.reduce_sum(y_true, axis=1) - true_positive

    precision = true_positive / (true_positive + false_positive + epsilon)
    recall = true_positive / (true_positive + false_negative + epsilon)

    fbeta = (1 + beta_squared) * precision * recall / (beta_squared * precision + recall + epsilon)
    return fbeta


def multi_label_accuracy(y_true, y_pred, epsilon=1e-4):
    """
    Compute accuracy for multi-label classification.

    Args:
        y_true: Correct target values.
        y_pred: Predicted values returned by the classifier.
        epsilon: Small constant to avoid division by zero.

    Returns:
        Accuracy score.
    """
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.cast(tf.greater(tf.cast(y_pred, tf.float32), 0.5), tf.float32)

    true_positive = tf.reduce_sum(y_true * y_pred, axis=1)
    false_positive = tf.reduce_sum(y_pred, axis=1) - true_positive
    false_negative = tf.reduce_sum(y_true, axis=1) - true_positive

    y_true_bool = tf.cast(y_true, tf.bool)
    y_pred_bool = tf.cast(y_pred, tf.bool)

    true_negative = tf.reduce_sum(tf.cast(~y_true_bool, tf.float32) * tf.cast(~y_pred_bool, tf.float32), axis=1)

    accuracy = (true_positive + true_negative) / (true_positive + true_negative + false_positive + false_negative + epsilon)
    return accuracy


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization, Conv2D, MaxPooling2D, Dropout, Flatten, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

from keras.applications import VGG16
from keras.models import Model

def build_model():
    # Use a pre-trained VGG16 model
    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(128, 128, 3))
    
    # Freeze the convolutional layers
    for layer in base_model.layers:
        layer.trainable = False
    
    x = base_model.output
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    x = Dropout(0.5)(x)
    
    # Output Layer
    predictions = Dense(17, activation='sigmoid')(x)
    
    # Create the final model
    model = Model(inputs=base_model.input, outputs=predictions)
    
    # Compile the model
    optimizer = Adam(lr=1e-4)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[multi_label_accuracy, fbeta_score])

    return model


# Model Checkpoint
save_best_check_point = ModelCheckpoint(filepath='best_model.hdf5', 
                                        monitor='val_fbeta_score',
                                        mode='max',
                                        save_best_only=True,
                                        save_weights_only=True)

# Image Data Generator
train_datagen = ImageDataGenerator(rescale=1/255, validation_split=0.2)

# Train Data Generator
train_generator = train_datagen.flow_from_dataframe(dataframe=train_data,
                                                    directory="/kaggle/input/planets-dataset/planet/planet/train-jpg",
                                                    x_col="image_name", y_col=columns, subset="training",
                                                    batch_size=16, seed=42, shuffle=True,
                                                    class_mode="raw", target_size=(128, 128))

# Validation Data Generator
val_generator = train_datagen.flow_from_dataframe(dataframe=train_data,
                                                  directory="/kaggle/input/planets-dataset/planet/planet/train-jpg",
                                                  x_col="image_name", y_col=columns, subset="validation",
                                                  batch_size=16, seed=42, shuffle=True,
                                                  class_mode="raw", target_size=(128, 128))

# Step Sizes
step_train_size = int(np.ceil(train_generator.samples / train_generator.batch_size))
step_val_size = int(np.ceil(val_generator.samples / val_generator.batch_size))

print('Build model') # Build Model
model_1 = build_model()


print('Preview the model architecture')# Preview the model architecture
print(model_1.summary())
print()

print('fitting our model using the parameters already defined') #fitting our model using the parameters already defined 
history_1 = model_1.fit(
    x = train_generator, 
    steps_per_epoch = step_train_size, 
    validation_data = val_generator, 
    validation_steps = step_val_size,
    epochs = 1, 
    callbacks=[save_best_check_point]
)

Found 32384 validated image filenames.
Found 8095 validated image filenames.
Build model
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Preview the model architecture
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 128, 128, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 128, 128, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 128, 128, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 64, 64, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 64, 64, 128)       73856     


In [7]:
!ls

__notebook__.ipynb  best_model.hdf5


In [8]:
print('Initializing a second model for predictions')
# Initializing a second model for predictions
model_for_predictions = build_model()
model_for_predictions.load_weights('best_model.hdf5')

print('Loading the sample submission file')
# Loading the sample submission file
sample_submission = pd.read_csv('/kaggle/input/planets-dataset/planet/planet/sample_submission.csv')
# Adding '.jpg' extension to image names in the sample submission file
sample_submission['image_name'] = sample_submission['image_name'].apply(lambda x: '{}.jpg'.format(x))
# Print column names in the sample_submission DataFrame
print(sample_submission.columns)
print()

print('Splitting the sample submission file into two parts')
# Splitting the sample submission file into two parts
test_df_part1 = sample_submission.iloc[:40669][['image_name']].reset_index(drop=True)
test_df_part2 = sample_submission.iloc[40669:][['image_name']].reset_index(drop=True)

print('Creating generator for test_df_part1')
# Initialize ImageDataGenerator for the test images and perform rescaling
test_datagen = ImageDataGenerator(rescale=1/255)
# Create a generator for the images in the first part of the test dataset
test_generator_part1 = test_datagen.flow_from_dataframe(dataframe=test_df_part1,
                                                          directory="/kaggle/input/planets-dataset/planet/planet/test-jpg",
                                                          x_col="image_name",
                                                          y_col=None,
                                                          batch_size=16,
                                                          shuffle=False,
                                                          class_mode=None,
                                                          target_size=(128, 128))

# Get the number of steps for the test generator
step_test_size_part1 = int(np.ceil(test_generator_part1.samples / test_generator_part1.batch_size))
# Reset the test generator to avoid shuffling of indices
test_generator_part1.reset()
# Make predictions on the first part of the test dataset
# Make predictions on the first part of the test dataset
print('Making predictions on test_df_part1')
pred_part1 = model_for_predictions.predict(test_generator_part1, steps=step_test_size_part1, verbose=1)
# Get the filenames in the generator using the attribute .filenames
file_names_part1 = test_generator_part1.filenames  # Fixed the missing parenthesis here
# Convert predicted values to a DataFrame and join labels together if the probability is greater than 0.5
pred_tags_part1 = pd.DataFrame(pred_part1)
pred_tags_part1 = pred_tags_part1.apply(lambda x: ' '.join(np.array(unique_labels_list)[x > 0.5]), axis=1)
# Create a DataFrame for the first set of predictions
result_part1 = pd.DataFrame({'image_name': file_names_part1, 'tags': pred_tags_part1})
print(result_part1)
print()

                                           
print('Additional test dataset')# Additional test dataset
test_df_part2 = sample_submission.iloc[40669:][['image_name']].reset_index(drop=True)
# Create a generator for the additional test image files
print('Creating generator for test_df_part2')
test_generator_part2 = test_datagen.flow_from_dataframe(dataframe=test_df_part2,
                                                          directory="/kaggle/input/planets-dataset/test-jpg-additional/test-jpg-additional",
                                                          x_col="image_name",
                                                          y_col=None,
                                                          batch_size=16,
                                                          shuffle=False,
                                                          class_mode=None,
                                                          target_size=(128, 128))

# Get the number of steps for the additional test generator
step_test_size_part2 = int(np.ceil(test_generator_part2.samples / test_generator_part2.batch_size))
# Reset the additional test generator to avoid shuffling of indices
test_generator_part2.reset()
# Make predictions on the additional test dataset
print('Making predictions on test_df_part2')
add_pred_part2 = model_for_predictions.predict(test_generator_part2, steps=step_test_size_part2, verbose=1)
# Get the filenames in the generator using the attribute .filenames
file_names_part2 = test_generator_part2.filenames
# Convert predicted values to a DataFrame and join labels together if the probability is greater than 0.5
add_pred_tags_part2 = pd.DataFrame(add_pred_part2)
add_pred_tags_part2 = add_pred_tags_part2.apply(lambda x: ' '.join(np.array(unique_labels_list)[x > 0.5]), axis=1)
# Create a DataFrame for the second set of predictions
result_part2 = pd.DataFrame({'image_name': file_names_part2, 'tags': add_pred_tags_part2})

# Concatenate the results in order to avoid shuffling the index
final_result = pd.concat([result_part1, result_part2]).reset_index(drop=True)
print(final_result)
print()
                                           
print('Save the final result to a CSV file')
# Save the final result to a CSV file
final_result.to_csv('submission_1.csv', index=False)

Initializing a second model for predictions
Loading the sample submission file
Index(['image_name', 'tags'], dtype='object')

Splitting the sample submission file into two parts
Creating generator for test_df_part1
Found 40669 validated image filenames.
Making predictions on test_df_part1
           image_name                   tags
0          test_0.jpg          primary clear
1          test_1.jpg          primary clear
2          test_2.jpg  primary partly_cloudy
3          test_3.jpg          primary clear
4          test_4.jpg  primary partly_cloudy
...               ...                    ...
40664  test_40664.jpg          primary clear
40665  test_40665.jpg          primary clear
40666  test_40666.jpg          primary clear
40667  test_40667.jpg  primary partly_cloudy
40668  test_40668.jpg          primary clear

[40669 rows x 2 columns]

Additional test dataset
Creating generator for test_df_part2
Found 20522 validated image filenames.
Making predictions on test_df_part2
       

In [9]:
!ls

__notebook__.ipynb  best_model.hdf5  submission_1.csv


In [10]:
!mkdir /kaggle/working/output
!mv submission_1.csv /kaggle/working/output/
!ls /kaggle/working/output/

submission_1.csv


In [11]:
# !mkdir /kaggle/working/output

In [12]:
!kaggle kernels output

Traceback (most recent call last):
  File "/opt/conda/bin/kaggle", line 5, in <module>
    from kaggle.cli import main
  File "/opt/conda/lib/python3.7/site-packages/kaggle/__init__.py", line 23, in <module>
    api.authenticate()
  File "/opt/conda/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 166, in authenticate
    self.config_file, self.config_dir))
OSError: Could not find kaggle.json. Make sure it's located in /root/.kaggle. Or use the environment method.


In [13]:
# !set KAGGLE_USERNAME=isaacndirangumuturi

In [14]:
# !set KAGGLE_KEY=25f1399d14d29fd98551b2c740607112


In [15]:
# !kaggle datasets list
# !kaggle competitions list


In [16]:
# os.environ['KAGGLE_USERNAME'] = isaacndirangumuturi
# os.environ['KAGGLE_KEY'] = 25f1399d14d29fd98551b2c740607112

In [17]:
# !ls /root/.kaggle
