# Kangas DataGrid - Integrations

Before using Kangas DataGrid, you'll need to install it. We can do that in a notebook
with the `%pip install kangas` command:

In [1]:
%pip install kangas --quiet

[K     |████████████████████████████████| 11.9 MB 4.6 MB/s 
[K     |████████████████████████████████| 34.5 MB 225 kB/s 
[?25h

Once installed, we can import it. We'll just import the top-level `kangas` module as `kg` and use that throughout this demo.

In [2]:
import kangas as kg
kg.__version__

'1.1.31'

# Pandas DataFrame

This demonstrates using images with Kangas from a pandas DataFrame.

This is based on: https://towardsdatascience.com/rendering-images-inside-a-pandas-dataframe-3631a4883f60

First, we build a pandas DataFrame:

In [3]:
import pandas as pd

In [4]:
df = pd.DataFrame([
    [2768571, 130655, 1155027, 34713051, 331002277],        
    [1448753, 60632, 790040, 3070447, 212558178],
    [654405, 9536, 422931, 19852167, 145934619],
    [605216, 17848, 359891, 8826585, 1379974505],
    [288477, 9860, 178245, 1699369, 32969875]],
    columns=['Total Cases', 'Total Deaths', 'Total Recovered', 'Total Tests', 'Population'])

In [5]:
dg = kg.read_dataframe(df)

Reading DataFrame...


5it [00:00, 3719.67it/s]
100%|██████████| 5/5 [00:00<00:00, 7077.80it/s]


In [6]:
flags = [
  'https://www.countries-ofthe-world.com/flags-normal/flag-of-United-States-of-America.png',
  'https://www.countries-ofthe-world.com/flags-normal/flag-of-Brazil.png',
  'https://www.countries-ofthe-world.com/flags-normal/flag-of-Russia.png',
  'https://www.countries-ofthe-world.com/flags-normal/flag-of-India.png',
  'https://www.countries-ofthe-world.com/flags-normal/flag-of-Peru.png'
]

In [7]:
from kangas.datatypes.utils import download_filename

In [8]:
dg.append_column("Flag", [kg.Image(download_filename(url)) for url in flags])

In [9]:
dg

0,1,2,3,4,5,6
1,2768571,130655,1155027,34713051,331002277,"<Image, asse"
2,1448753,60632,790040,3070447,212558178,"<Image, asse"
3,654405,9536,422931,19852167,145934619,"<Image, asse"
4,605216,17848,359891,8826585,1379974505,"<Image, asse"
5,288477,9860,178245,1699369,32969875,"<Image, asse"
,,,,,,
[5 rows x 6 columns],[5 rows x 6 columns],[5 rows x 6 columns],[5 rows x 6 columns],[5 rows x 6 columns],[5 rows x 6 columns],[5 rows x 6 columns]
,,,,,,
,,,,,,
* Use DataGrid.save() to save to disk,* Use DataGrid.save() to save to disk,* Use DataGrid.save() to save to disk,* Use DataGrid.save() to save to disk,* Use DataGrid.save() to save to disk,* Use DataGrid.save() to save to disk,* Use DataGrid.save() to save to disk


In [10]:
dg.show()

Saving data...


100%|██████████| 5/5 [00:00<00:00, 13374.69it/s]


Saving datagrid to '/tmp/tmpokt9wedz/untitled.datagrid'...
Extending data...


100%|██████████| 5/5 [00:00<00:00, 2139.30it/s]


Computing statistics...


100%|██████████| 8/8 [00:00<00:00, 3939.24it/s]


<IPython.core.display.Javascript object>

# CSV Files

From https://www.kaggle.com/code/stassl/displaying-inline-images-in-pandas-dataframe/data download:

* `labels.csv` 
* `test.zip`

And save in this folder (or upload if you are on Google Colab).

In [25]:
! mkdir -p train
! unzip -o -q train.zip -d train

In [26]:
dg = kg.read_csv("labels.csv")

Loading CSV file 'labels.csv'...


10223it [00:00, 65439.28it/s]
100%|██████████| 10222/10222 [00:00<00:00, 18376.24it/s]


In [27]:
dg

0,1,2
1,000bec180eb18c7,boston_bull
2,001513dfcb2ffaf,dingo
3,001cdf01b096e06,pekinese
4,00214f311d5d224,bluetick
5,0021f9ceb3235ef,golden_retrieve
,,
...,...,...
10219,ffd3f636f7f379c,dandie_dinmont
10220,ffe2ca6c940cddf,airedale
10221,ffe5f6d8e2bff35,miniature_pinsc


In [28]:
dogs = kg.DataGrid(
    name="Dog Breeds",
    columns=["Breed", "Image"],
)

In [29]:
for row in dg.to_dicts():
    dogs.append([row["breed"], kg.Image("train/" + row["id"] + ".jpg")])

In [30]:
dogs.show()

Saving data...


100%|██████████| 10222/10222 [00:00<00:00, 101740.54it/s]

Saving datagrid to 'dog-breeds.datagrid'...





Extending data...


100%|██████████| 10222/10222 [00:02<00:00, 3462.81it/s]


Computing statistics...


100%|██████████| 4/4 [00:30<00:00,  7.61s/it]


<IPython.core.display.Javascript object>

# HuggingFace


In [12]:
%pip install datasets --quiet

[K     |████████████████████████████████| 441 kB 5.1 MB/s 
[K     |████████████████████████████████| 163 kB 67.9 MB/s 
[K     |████████████████████████████████| 95 kB 4.4 MB/s 
[K     |████████████████████████████████| 115 kB 73.7 MB/s 
[K     |████████████████████████████████| 212 kB 51.7 MB/s 
[K     |████████████████████████████████| 127 kB 46.8 MB/s 
[K     |████████████████████████████████| 115 kB 36.0 MB/s 
[?25h

In [13]:
from datasets import load_dataset

In [31]:
dataset = load_dataset("beans", split="train")



In [32]:
dg = kg.DataGrid(dataset, name="beans")

100%|██████████| 1034/1034 [01:44<00:00,  9.90it/s]


In [None]:
dg.show()

# URLs and Archived Files

Kangas can read URLs, and archived formats (including "zip", and "tgz" file formats).

In [33]:
dg = kg.read_datagrid("https://github.com/dsblank/examples/raw/main/mnist-60000-after-5-epochs.datagrid.zip")

In [34]:
dg.show()

<IPython.core.display.Javascript object>