# API Basics

In this tutorial, we will cover basic functionalities supported by `lineapy` using simple examples.

In [1]:
import lineapy
import pandas as pd

First, let’s create toy data to use.

In [2]:
# Create toy data to use
df = pd.DataFrame({
    "name": ["John", "Mary", "Nick", "Stacy", "Tom", "Ava"],
    "gender": ["M", "F", "M", "F", "M", "F"],
    "height": [183, 175, 170, 162, 168, 185],
    "weight": [85, 70, 63, 50, 75, 72],
})

In [3]:
# View data
df

Unnamed: 0,name,gender,height,weight
0,John,M,183,85
1,Mary,F,175,70
2,Nick,M,170,63
3,Stacy,F,162,50
4,Tom,M,168,75
5,Ava,F,185,72


Now, we might be interested in seeing if the data reflects any gender differences in these physical traits.

In [4]:
# Calculate male averages
avg_male_height = df.query("gender == 'M'")["height"].mean()
avg_male_weight = df.query("gender == 'M'")["weight"].mean()

In [5]:
# Calculate female averages
avg_female_height = df.query("gender == 'F'")["height"].mean()
avg_female_weight = df.query("gender == 'F'")["weight"].mean()

In [6]:
# Calculate gender differences
diff_avg_height = avg_male_height - avg_female_height
diff_avg_weight = avg_male_weight - avg_female_weight

In [7]:
# View result
print("Difference in average height:", diff_avg_height)
print("Difference in average weight:", diff_avg_weight)

Difference in average height: -0.3333333333333428
Difference in average weight: 10.333333333333329


From the current data set, we do not observe a significant gender difference in height. On the other hand, we see that males overall have heavier weights than females.

## Storing an Artifact with `save()`

Let’s say we are particularly interested in tracking the average height difference. You can use `lineapy`’s `save()` method for this.

The `save()` method allows you to save a variable's value and history as a data type called a `LineaArtifact`. The method requires two arguments: the variable to save and the string name to save it as. It returns the saved artifact.

In [8]:
# Store a variable as an artifact
artifact = lineapy.save(diff_avg_height, "gender_diff_avg_height")

In [9]:
# Check object type
print(type(artifact))

<class 'lineapy.graph_reader.apis.LineaArtifact'>


`LineaArtifact` object has two major attributes:

- `code`: Minimal essential code to get to the final state of the artifact.
- `value`: Final state of the artifact.

Hence, for the current artifact, we see:

In [10]:
# Check minimal essential code to get to the final state of the artifact
print(artifact.code)

import pandas as pd
df = pd.DataFrame({
    "name": ["John", "Mary", "Nick", "Stacy", "Tom", "Ava"],
    "gender": ["M", "F", "M", "F", "M", "F"],
    "height": [183, 175, 170, 162, 168, 185],
    "weight": [85, 70, 63, 50, 75, 72],
})
avg_male_height = df.query("gender == 'M'")["height"].mean()
avg_female_height = df.query("gender == 'F'")["height"].mean()
diff_avg_height = avg_male_height - avg_female_height



In [11]:
# Check the final state of the artifact
print(artifact.value)

-0.3333333333333428


## Retrieving an Artifact with `get()`

You can also retrieve any stored artifact using the `get()` method. The method takes the string name of the artifact as its argument and returns the corresponding artifact.

In [12]:
# Retrieve a saved artifact
artifact2 = lineapy.get("gender_diff_avg_height")

In [13]:
# Check minimal essential code to get to the final state of the artifact
print(artifact2.code)

import pandas as pd
df = pd.DataFrame({
    "name": ["John", "Mary", "Nick", "Stacy", "Tom", "Ava"],
    "gender": ["M", "F", "M", "F", "M", "F"],
    "height": [183, 175, 170, 162, 168, 185],
    "weight": [85, 70, 63, 50, 75, 72],
})
avg_male_height = df.query("gender == 'M'")["height"].mean()
avg_female_height = df.query("gender == 'F'")["height"].mean()
diff_avg_height = avg_male_height - avg_female_height



In [14]:
# Check the final state of the artifact
print(artifact2.value)

-0.3333333333333428


## Listing Artifacts with `catalog()`

The `catalog()` method allows you to see the list of all previously saved artifacts, including when they were created.

In [15]:
# List all saved artifacts
lineapy.catalog()

cleaned_data_housing:2022-04-05T17:57:54 created on 2022-04-05 17:57:54.041082
gender_diff_avg_height:2022-04-08T11:25:09 created on 2022-04-08 11:25:09.429225
gender_diff_avg_height:2022-04-08T11:27:37 created on 2022-04-08 11:27:37.457609
gender_diff_avg_height:2022-04-08T11:29:58 created on 2022-04-08 11:29:58.876472
gender_diff_avg_weight:2022-04-08T13:33:55 created on 2022-04-08 13:33:55.392070
gender_diff_avg_height:2022-04-08T16:19:19 created on 2022-04-08 16:19:19.426913
gender_diff_avg_height:2022-04-08T16:41:51 created on 2022-04-08 16:41:51.537567