<a href="https://colab.research.google.com/github/airbytehq/quickstarts/blob/aj%2Fairbyte-lib-quickstart/airbyte_lib_notebooks/AirbyteLib_Basic_Features_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# AirbyteLib Demo

Below is a pre-release demo of AirbyteLib.


## Install AirbyteLib


In [None]:
# Add virtual environment support for running in Google Colab ()
!apt-get install -qq python3.10-venv

# Install airbyte-lib
%pip install --quiet 'git+https://github.com/airbytehq/airbyte.git@master#egg=airbyte-lib&subdirectory=airbyte-lib'

In [None]:
# Test that the install was successful
import airbyte_lib as ab

## Load the Source Data using AirbyteLib


Create and install a source connector:


In [None]:
import airbyte_lib as ab

# Create and configure the source:
source: ab.Source = ab.get_connector("source-faker")

In [None]:
source.set_config(
    config={
        "count": 50_000, # Adjust this to get a larger or smaller dataset
        "seed": 123,
    },
)
# Verify the config and creds by running `check`:
source.check()

## Read Data from the AirbyteLib Cache

Once data is read, we can do anything we want to with the resulting streams. This includes `to_pandas()` which registers a Pandas dataframe and `to_sql_table()` which gives us a SQLAlchemy `Table` boject, which we can use to run SQL queries.


In [None]:
# Read data from the source into the internal cache:
read_result: ab.ReadResult = source.read()

In [None]:
# Display or transform the loaded data
products_df = read_result["products"].to_pandas()
display(products_df)

## Creating graphs

AirbyteLib integrates with Pandas, which integrates with `matplotlib` as well as many other popular libraries. We can use this as a means of quickly creating graphs.


In [None]:
%pip install matplotlib

import matplotlib.pyplot as plt

users_df = read_result["users"].to_pandas()

plt.hist(users_df['age'], bins=10, edgecolor='black')
plt.title('Histogram of Ages')
plt.xlabel('Ages')
plt.ylabel('Frequency')
plt.show()

## Working in SQL

Since data is cached in a local DuckDB database, we can query the data with SQL.

We can do this in multiple ways. One way is to use the [JupySQL Extension](https://jupysql.ploomber.io/en/latest/user-guide/template.html), which we'll use below.


In [None]:
# Install JupySQL to enable SQL cell magics
%pip install --quiet jupysql
# Load JupySQL extension
%load_ext sql
# Configure max row limit (optional)
%config SqlMagic.displaylimit = 200

In [None]:
# Get the SQLAlchemy 'engine' object for the cache
engine = read_result.cache.get_sql_engine()
# Pass the engine to JupySQL
%sql engine

In [None]:
# Get the table for the 'users' stream
users_table = read_result.cache.get_sql_table("users")
display(users_table.fullname)

In [None]:
%%sql
# We can now dynamically pass the table reference into a SQL query

SELECT name, occupation, age, nationality
FROM {{ users_table.fullname }}
LIMIT 20

In [None]:
# Show tables for the other streams
%sqlcmd tables