# ___

# [ Machine Learning in Geosciences ]

**Department of Applied Geoinformatics and Carthography, Charles University** 

*Lukas Brodsky lukas.brodsky@natur.cuni.cz*

    
___


# Machine Learning Project!

Step 3

Goal: This notebook demonstrates the **data dicovery and visualization** step.  


Content:  **Data discovery and visualization**

    3.1/ Visualizing geographical and non-geographical data)
    3.2/ Looking for correlations
    3.3/ Experimenting with attribute combinations (optional) 
___    

## Setup environment

In [None]:
# Common imports
import numpy as np
import os

# add more based on the topic of the lab

# to make this notebook's output stable across runs
np.random.seed(42)

# plotting 
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# path to the current lab directory - set individually!!!
# TODO HERE! 
PROJECT_DIR = "./"
if os.path.isdir(PROJECT_DIR): 
    print('Ok continue.')
else: 
    print('Nok, set correct path to your project directory!')


In [None]:
import pandas as pd

# check the data set dir 
forest_path = os.path.join(PROJECT_DIR, "forest_fires")
print(os.listdir(forest_path))

# function to read the csv file 
def load_local_data(data_path, csv_file):
    csv_path = os.path.join(data_path, csv_file)
    return pd.read_csv(csv_path)

# load data 
fires = load_local_data(forest_path, "forestfires.csv")

# check header and some values 
fires.head()

### Plotting

In [None]:
plt.rcParams["figure.figsize"] = (10,8)
plt.scatter(fires['X'], fires['Y'], 
            c=fires['area'], s=fires['area'], 
            cmap="jet", alpha=0.5)

plt.colorbar(label="area")
plt.show()

### Correlations

In [None]:
corr_matrix = fires.corr() 

In [None]:
corr_matrix["area"].sort_values(ascending=False) 

In [None]:
# old: from pandas.tools.plotting import scatter_matrix
from pandas.plotting import scatter_matrix

attributes = ["temp", "DMC", "DC",
              "area"]
scatter_matrix(fires[attributes], figsize=(12, 8), alpha=0.5)

In [None]:
fires.plot(kind="scatter", x="temp", y="DMC", alpha=0.5)

In [None]:
# 3.3/ Experimenting with attribute combinations (optional) 

# Fine Fuel Moisture Code (FFMC), Duff Moisture Code (DMC), Drought Code (DC), Initial Spread Index (ISI)
# FFMC(rain, temperature, wind), DMC(rain, humidity, temperature), DC(rain, temperature) 
# and ISI(FFMC, humidity, rain) are already compound indicators. 
