# Areal Project

<div>
<img src="logo.jpg", width=150, ALIGN="left", border=20>

ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". The CDS, CHALEARN, AND/OR OTHER ORGANIZERS OR CODE AUTHORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL AUTHORS AND ORGANIZERS BE LIABLE FOR ANY SPECIAL, 
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. 
</div>

<div>
    <h2>Introduction </h2>
    <p>
     <br>
Aerial imagery has been a primary source of geographic data for quite a long time. With technology progress, aerial imagery became really practical for remote sensing : the science of obtaining information about an object, area or phenomenon.
Nowadays, New challenges in remote sensing impose the necessity of designing
pixel classification methods that, once trained on a certain dataset, generalize to other areas of the earth.
In this challenge, we will thus design pixel classification methods on areas.  The goal is to find urban areas in the Areal dataset. Areal Dataset is a small data set created from the <a href="https://project.inria.fr/aerialimagelabeling/">Inria Aerial Image Labeling Dataset</a>. The data set contains covers a wide range of urban settlement appearances from 5 differents cities of different geographic locations. The data set is divided into 3 parts : training set, validation set and test set.

References and credits: 
Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, Pierre Alliez.
</div>

In [1]:
import numpy as np
import random
from sample_code_submission.model import model
import re

In [2]:
model_dir = "sample_code_submission"
result_dir = 'sample_result_submission_pp/' 
problem_dir = 'ingestion_program/'  
score_dir = 'scoring_program/'

In [3]:
from sys import path; path.append(model_dir); path.append(problem_dir); path.append(score_dir);

<div>
    <h1> Step 1: Exploratory data analysis </h1>
<p>
We provide sample_data with the starting kit, but to prepare your submission, you must fetch the public_data from the challenge website and point to it.
</div>

In [1]:
data_dir = 'sample_preprocessed_data'
data_name = 'Areal'

In [54]:
from ingestion_program.data_io import read_as_df
data = read_as_df(data_dir  + '/' + data_name)

Reading sample_data/Areal_train from AutoML format
Number of examples = 65
Number of features = 196608
        Class
0      forest
1       river
2    mountain
3      meadow
4     wetland
5   chaparral
6        lake
7    snowberg
8       beach
9         sea
10      cloud
11     island
12     desert
Number of classes = 13


In [11]:
data.head()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,...,feature_4088,feature_4089,feature_4090,feature_4091,feature_4092,feature_4093,feature_4094,feature_4095,feature_4096,target
0,0.0,0.0,0.0,0.0,0.086625,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,2.014147,0.0,0.95539,0.0,0.0,0.0,forest
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.312153,0.0,0.0,0.0,2.953564,0.0,0.0,1.072329,chaparral
2,0.0,0.0,0.0,2.515129,0.095439,0.0,0.045738,0.148969,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,chaparral
3,0.0,0.0,0.0,0.0,1.188573,0.0,0.0,0.0,0.0,0.0,...,2.228957,1.464646,0.0,0.0,0.0,1.62498,0.0,0.0,0.0,beach
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,2.063259,0.0,0.0,0.0,1.344442,0.0,0.0,0.552447,river


In [12]:
data.describe()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,...,feature_4087,feature_4088,feature_4089,feature_4090,feature_4091,feature_4092,feature_4093,feature_4094,feature_4095,feature_4096
count,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,...,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0,300.0
mean,0.006075,0.000996,0.035626,0.198109,0.306302,0.145392,0.029904,0.015574,0.017993,0.288413,...,0.004769,0.60952,0.59716,0.32359,0.020528,0.031902,1.481293,0.0,0.004534,1.239524
std,0.105224,0.014681,0.23613,0.668374,0.436969,0.522102,0.198194,0.179693,0.155665,0.591321,...,0.082608,1.323339,0.917379,1.01901,0.174347,0.250427,1.776533,0.0,0.078524,1.317253
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.063618,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.872153,0.0,0.0,0.898127
75%,0.0,0.0,0.0,0.0,0.523254,0.0,0.0,0.0,0.0,0.366293,...,0.0,0.559731,0.934315,0.0,0.0,0.0,2.437103,0.0,0.0,2.187367
max,1.822537,0.249662,2.510623,4.527941,2.389151,3.870526,1.803066,2.875917,2.048211,4.265208,...,1.430818,9.752917,4.911635,7.399162,2.014147,3.326735,9.211884,0.0,1.360076,5.854851


In [68]:
print(data.iloc[:, -1:])
X = data.iloc[:, :-1]
y = data.iloc[:, -1:]

       target
0        lake
1        lake
2     wetland
3   chaparral
4      forest
5       beach
6   chaparral
7      desert
8      island
9         sea
10     forest
11    wetland
12     meadow
13   mountain
14   mountain
15     desert
16     desert
17      river
18      cloud
19     desert
20     island
21       lake
22      beach
23  chaparral
24     island
25      river
26        sea
27      cloud
28        sea
29    wetland
..        ...
35     forest
36     meadow
37     forest
38     forest
39      river
40      beach
41       lake
42      river
43       lake
44        sea
45   snowberg
46    wetland
47        sea
48   mountain
49      cloud
50  chaparral
51      beach
52     island
53      cloud
54   mountain
55     meadow
56     meadow
57  chaparral
58     island
59      beach
60   mountain
61     meadow
62   snowberg
63      river
64     desert

[65 rows x 1 columns]


# Step 2: Building a predictive model

In [13]:
from data_manager import DataManager
D = DataManager(data_name, data_dir)
print(D)

Info file found : /home/biard/Documents/université/m2/s1/Projet/Remote-Sensing-Image/starting_kit/sample_preprocessed_data/Areal_public.info
DataManager : Areal
info:
	usage = Sample dataset Areal preprocessed data
	name = areal
	task = multiclass.classification
	target_type = 
	feat_type = Numerical
	metric = accuracy
	time_budget = 12000
	feat_num = 4096
	target_num = 13
	label_num = 13
	train_num = 300
	valid_num = 100
	test_num = 0
	has_categorical = 0
	has_missing = 0
	is_sparse = 0
	format = dense
data:
	X_train = array(300, 4096)
	Y_train = array(300, 1)
	X_valid = array(100, 4096)
	Y_valid = array(100, 1)
	X_test = array(0,)
	Y_test = array(0,)
feat_type:	array(4096,)
feat_idx:	array(0,)

