# Lineapy Functions

In this notebook, we'll cover the basic functions supported by the lineapy library. We'll use a dataset that contains a list of real estate properties and their respective features to show simple examples for each function.

In [1]:
# Let's start by importing the dataset:
import lineapy
import pandas as pd

assets = pd.read_csv("../tests/ames_train_cleaned.csv")
assets.head()

Unnamed: 0,Order,PID,MS_SubClass,MS_Zoning,Lot_Frontage,Lot_Area,Street,Alley,Lot_Shape,Land_Contour,...,Pool_QC,Fence,Misc_Feature,Misc_Val,Mo_Sold,Yr_Sold,Sale_Type,Sale_Condition,SalePrice,TotalBathrooms
0,1,526301100,20,RL,141.0,31770,Pave,,IR1,Lvl,...,,,,0,5,2010,WD,Normal,204900,2.0
1,2,526350040,20,RH,80.0,11622,Pave,,Reg,Lvl,...,,MnPrv,,0,6,2010,WD,Normal,95300,1.0
2,3,526351010,20,RL,81.0,14267,Pave,,IR1,Lvl,...,,,Gar2,12500,6,2010,WD,Normal,181900,1.5
3,4,526353030,20,RL,93.0,11160,Pave,,Reg,Lvl,...,,,,0,4,2010,WD,Normal,254100,3.5
4,5,527105010,60,RL,74.0,13830,Pave,,IR1,Lvl,...,,MnPrv,,0,3,2010,WD,Normal,199700,2.5


In [2]:
# Now, let's generate a few variables that we'll be using throughout this tutorial

# Average lot area by neighborhood
lot_area_by_neighborhood = assets[["Neighborhood", "Lot_Area"]].groupby("Neighborhood").mean()
lot_area_by_neighborhood.head()

Unnamed: 0_level_0,Lot_Area
Neighborhood,Unnamed: 1_level_1
Blmngtn,3411.913043
Blueste,2264.0
BrDale,1846.1
BrkSide,6942.391892
ClearCr,23686.424242


In [3]:
# Average sale price by neighborhood
sale_price_by_neighborhood = assets[["Neighborhood", "SalePrice"]].groupby("Neighborhood").mean()
sale_price_by_neighborhood.head()

Unnamed: 0_level_0,SalePrice
Neighborhood,Unnamed: 1_level_1
Blmngtn,195889.0
Blueste,126600.0
BrDale,104350.0
BrkSide,124309.797297
ClearCr,198676.727273


In [4]:
# NBVAL_IGNORE_OUTPUT

# Average sale price by neighborhood and year sold
sale_price_by_year_neighborhood = assets[["Neighborhood", "Yr_Sold", "SalePrice"]].groupby(["Neighborhood", "Yr_Sold"]).mean()
sale_price_by_year_neighborhood.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,SalePrice
Neighborhood,Yr_Sold,Unnamed: 2_level_1
Blmngtn,2006,212909.888889
Blmngtn,2007,191246.5
Blmngtn,2008,191890.666667
Blmngtn,2009,179360.0
Blmngtn,2010,175900.0


## Save()

The `save()` function allows you to save a variable's value and history as a data type called a `LineaArtifact`. The function requires two arguments: the variable to save and the string name to save it as. It returns the saved artifact.

In [5]:
# Let's save the variables we created above
lineapy.save(lot_area_by_neighborhood, "lot_area_by_neighborhood")
lineapy.save(sale_price_by_neighborhood, "sale_price_by_neighborhood")

# We can also use the output from the function to access the artifact and view the saved variable's history
sale_price_data = lineapy.save(sale_price_by_year_neighborhood, "sale_price_by_year_neighborhood")
print(sale_price_data.code)

import pandas as pd
assets = pd.read_csv("../tests/ames_train_cleaned.csv")
sale_price_by_year_neighborhood = assets[["Neighborhood", "Yr_Sold", "SalePrice"]].groupby(["Neighborhood", "Yr_Sold"]).mean()



## Get()

The `get()` function allows you to access a saved artifact. This function takes the string name of the artifact as its only argument and returns the saved artifact.

In [6]:
# NBVAL_IGNORE_OUTPUT

# Let's retrieve the artifacts we saved above
lot_area_by_neighborhood = lineapy.get("lot_area_by_neighborhood")
sale_price_by_neighborhood = lineapy.get("sale_price_by_neighborhood")
sale_price_by_year_neighborhood = lineapy.get("sale_price_by_year_neighborhood")

# Again, we can use the output to view the saved variable's history
print(sale_price_by_year_neighborhood.code)

import pandas as pd
assets = pd.read_csv("../tests/ames_train_cleaned.csv")
sale_price_by_year_neighborhood = assets[["Neighborhood", "Yr_Sold", "SalePrice"]].groupby(["Neighborhood", "Yr_Sold"]).mean()



## To_airflow()

The `to_airflow()` function is used to add a saved artifact to Airflow DAG so that the artifact can be accessed from the Airflow UI or CLI. It also produces a Python module containing the steps to generate the variable, a Dockerfile for producing the variable, and a text file containing a list of dependencies. The function takes a list of artifact names and the string name for the DAG as its arguments.

In [7]:
# NBVAL_IGNORE_OUTPUT
lineapy.to_airflow([lot_area_by_neighborhood.name], "lot_area_by_neighborhood")

Added Airflow DAG named 'lot_area_by_neighborhood'. Start a run from the Airflow UI or CLI.


PosixPath('/usr/src/airflow_home/dags/lot_area_by_neighborhood_dag.py')

In [8]:
# NBVAL_IGNORE_OUTPUT
sale_price_by_neighborhood.to_airflow()

Added Airflow DAG named 'sale_price_by_neighborhood'. Start a run from the Airflow UI or CLI.


PosixPath('/usr/src/airflow_home/dags/sale_price_by_neighborhood.py')

In [9]:
# NBVAL_IGNORE_OUTPUT
sale_price_by_year_neighborhood.to_airflow()

Added Airflow DAG named 'sale_price_by_year_neighborhood'. Start a run from the Airflow UI or CLI.


PosixPath('/usr/src/airflow_home/dags/sale_price_by_year_neighborhood.py')

## Catalog()

The `catalog()` function allows you to see a list of all previously saved artifacts, including when they were created.

In [10]:
# NBVAL_IGNORE_OUTPUT

# Here, we'll list the artifacts we created above
lineapy.catalog()

neighbothood_area_mean:2022-04-01T22:27:42 created on 2022-04-01 22:27:42.330268
neighbothood_area_mean:2022-04-01T22:28:05 created on 2022-04-01 22:28:05.541011
text:2022-04-01T22:34:18 created on 2022-04-01 22:34:18.501498
text:2022-04-01T22:34:47 created on 2022-04-01 22:34:47.074491
text:2022-04-01T22:34:51 created on 2022-04-01 22:34:51.949989
text:2022-04-01T22:35:01 created on 2022-04-01 22:35:01.516559
text:2022-04-01T22:35:07 created on 2022-04-01 22:35:07.608790
text:2022-04-01T22:37:47 created on 2022-04-01 22:37:47.315503
text:2022-04-01T22:37:52 created on 2022-04-01 22:37:52.330215
text:2022-04-01T22:38:16 created on 2022-04-01 22:38:16.494312
text:2022-04-01T22:39:02 created on 2022-04-01 22:39:02.597812
lot_area_by_neighborhood:2022-04-02T00:19:21 created on 2022-04-02 00:19:21.940169
lot_area_by_neighborhood:2022-04-02T00:37:27 created on 2022-04-02 00:37:27.325436
lot_area_by_neighborhood:2022-04-02T00:53:05 created on 2022-04-02 00:53:05.388167
sale_price_by_year:202

With these functions, you're ready to get started with the lineapy library!