# Example usage

Here, we will demonstrate how to utilize the modules in `pywildfire` to access data and use it to help create a predictive model. 

In [None]:
import pywildfire

print(pywildfire.__version__)

## Imports

In [None]:
from pywildfire.pyprep import *
from pywildfire.pyfeats import relevant_features
from pywildfire.pywildfire import calculate_rmse
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import requests
import zipfile
import os

## Access a zipped data source through its URL

We will first specify the location in which we would like the store the file(s) locally, and then plug both the output path and the URL into `download_extract_data`.

The following example zip file has been fabricated in order to illustrate the utility of the functions in this package. It contains three files: 01_example_data.csv, 02_example_data.txt, and 03_example_data.csv.

In [None]:
url = "https://example.com/data.zip"
output_path = './data'
download_extract_data(url, output_path)

This is what your working directory will look like after calling the function:

## Download data in your environment
Next, we can download the data as a pandas DataFrame in our Python IDE of choice using `get_csv`:

In [None]:
csv_file = '01_example_data.csv'
data = get_csv(output_path, csv_file)
print(data)

## Scale numeric variables

Now, we will scale the columns in our dataframe that are numeric using `scale_numeric_df`.

In [None]:
scaled_data = scale_numeric_df(sample_data)
scaled_data

## Identify Relevant Features

You can create a correlation matrix using the `corr()` method from pandas, which can then be used as a parameter alongside your defined target variable in the `relevant_features` function.

In [None]:
corr_matrix = scaled_data.corr()
corr_matrix

In [None]:
target_variable = 'A'

feats = relevant_features(corr_matrix, target_variable)
feats

## Model Evaluation

After separating your target and feature variables, splitting your data into test and train sets, and fitting a model appropriate for your data, and using it to to predict your target variable, `calculate_rmse` can be used to generate the root mean squared error (RMSE) between the observed and predicted values:

In [None]:
observed = pd.Series([4, 8, 5, 3, 7])
predicted = pd.Series([16, 4, 7, 9, 3])

result = calculate_rmse(observed, predicted)
result