# Example usage

Here we will demonstate how to use `delay_finder` in a project.

First, we'll import all of the functions of our package.

In [2]:
from delay_finder.data_split import data_split
from delay_finder.filter_columns import filter_columns
from delay_finder.load_and_save import load_data, save_model
from delay_finder.make_histogram import make_histogram
from delay_finder.read import read
from delay_finder.replace_value import replace_value

Note: All of our functions require importing the pandas library.

In [9]:
import pandas as pd

## Reading in Data
Our functions `load_data()` and `read()` can be used interchangeably to read a CSV file and return a pandas DataFrame, using the relative path of the CSV file.

We'll load in some sample data using these functions.

In [12]:
df1 = load_data('candy_example_data.csv')
df1

Unnamed: 0,candy,amount,wrapper_colour
0,kitkat,4,red
1,mars,7,black
2,snickers,5,brown
3,skittles,3,red
4,smarties,9,blue
5,aero,2,white
6,twix,6,black


In [13]:
df2 = read('candy_example_data.csv') #replace filename with the name of the file and its relative path
df2

Unnamed: 0,candy,amount,wrapper_colour
0,kitkat,4,red
1,mars,7,black
2,snickers,5,brown
3,skittles,3,red
4,smarties,9,blue
5,aero,2,white
6,twix,6,black


## Filtering Data

Our function `filtering data()` can be used to keep specified columns (and drop the rest) in a pandas DataFrame.

In [14]:
filtered_df = filter_columns(df1, ['candy', 'amount'])
filtered_df

Unnamed: 0,candy,amount
0,kitkat,4
1,mars,7
2,snickers,5
3,skittles,3
4,smarties,9
5,aero,2
6,twix,6


## Replacing a Value in a DataFrame

Our function `replace_value()` can be used to replace a specified value in a pandas DataFrame with a new value.

In [11]:
# Replaces a value in a column with a new value in a pandas DataFrame. Returns a pandas DataFrame.

df1_replace_kitkat_amount = replace_value(df1, 'amount', 4, 11)
df1_replace_kitkat_amount

Unnamed: 0,candy,amount,wrapper_colour
0,kitkat,11,red
1,mars,7,black
2,snickers,5,brown
3,skittles,3,red
4,smarties,9,blue
5,aero,2,white
6,twix,6,black


## Splitting Data

Our function `train_test_split()` can be used to split the input pandas DataFrame into training and testing sets with an 80/20 split. It also saves those sets to CSV files with the desired filenames.  
This function requires importing scikit-learn's `train_test_split()` function.

In [16]:
from sklearn.model_selection import train_test_split

data_split(df1, 'train_data.csv', 'test_data.csv')

Let's view the resulting training and testing sets.

In [18]:
training_data = load_data('train_data.csv')
training_data

Unnamed: 0,candy,amount,wrapper_colour
0,kitkat,4,red
1,snickers,5,brown
2,mars,7,black
3,twix,6,black
4,skittles,3,red


In [19]:
testing_data = load_data('test_data.csv')
testing_data

Unnamed: 0,candy,amount,wrapper_colour
0,smarties,9,blue
1,aero,2,white


## Saving a Trained Model as a Pickle Object and File

Our function `save_model()` will save a model as a pickle object and file. This function requires importing Python's pickle library.

As an example, we will first make a dummy classifier. Then, we'll save that model to a pickle object using `save_model()`.

In [24]:
import pickle
from sklearn.dummy import DummyClassifier

dummy_classifier = DummyClassifier(strategy = "stratified", random_state = 12)
model_pickle = save_model(dummy_classifier, 'example_pickle_model.pickle')

## Histogram

Our function `make_histogram()` can be used to make a histogram using a pandas DataFrame, the column name that will be on the x-axis, the x-axis title, and optionally the width and height of the desired histogram. This function requires importing the altair library and assumes that the y-axis denotes a number of flights.

In [25]:
import altair as alt

amount_histogram = make_histogram(df1, 'amount', "Amount", w=200, h=200)
amount_histogram