<img src="https://relevance.ai/wp-content/uploads/2021/11/logo.79f303e-1.svg" width="150" alt="Relevance AI" />
<h5> Developer-first vector platform for ML teams </h5>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RelevanceAI/workflows/blob/main/workflows/dummy-datasets/Dummy_Datasets_Workflow.ipynb)

# Insert a dummy dataset

Pick from our range of datasets to try out on your own data!

# Step 0: Install

In [None]:
%%capture
!pip install -q -U RelevanceAI==2.1.8

# Step 1: Start up client

In [None]:
from relevanceai import Client
client = Client()

# Step 2: Choose a dataset and run the code block!

## Quick and easy fake dataset

A quick and dirty dataset for rapid experimentation.

It contains all of the basic dataset features and follows the following schema

```
{'_chunk_': 'chunks',
'_chunk_.label': 'text',
'_chunk_.label_chunkvector_': {'chunkvector': 5},
'insert_date_': 'date',
'sample_1_description': 'text',
'sample_1_label': 'text',
'sample_1_value': 'numeric',
'sample_1_vector_': {'vector': 5},
'sample_2_description': 'text',
'sample_2_label': 'text',
'sample_2_value': 'numeric',
'sample_2_vector_': {'vector': 5},
'sample_3_description': 'text',
'sample_3_label': 'text',
'sample_3_value': 'numeric',
'sample_3_vector_': {'vector': 5}}
```

In [None]:
from relevanceai.utils.datasets import mock_documents
ds = client.Dataset('dummy-mock_dataset')
ds.upsert_documents(mock_documents(100))

## Dummy Datasets with Vectors

We have already encoded some sample vectors ready for further downstream tasks in the following dummy datasets.

## E-commerce Dummy Dataset 

An e-commerce dataset containing encoded products, uncleaned product prices and their sources.

```
{
    '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'product_title_clip_vector_': [...],
    'product_image_clip_vector_': [...],
    'query': 'steel necklace',
    'source': 'eBay'
}
```

In [None]:
from relevanceai.utils.datasets import get_ecommerce_dataset_encoded

ds.upsert_documents(get_ecommerce_dataset_encoded())

In [None]:

from relevanceai.utils.datasets import get_ecommerce_dataset_encoded
ds = client.Dataset('dummy-ecommerce')
docs = get_ecommerce_dataset_encoded()
docs[0]


## Titanic Dataset

The famous dataset about Titanic survivors. Cleaned and is useful for tabular data examples.

```
{   'Unnamed: 0': 0,
    'PassengerId': 892,
    'Survived': 0,
    'Pclass': 3,
    'Age': 34.5,
    'SibSp': 0,
    'Parch': 0,
    'Fare': 7.8292,
    'male': 1,
    'Q': 1,
    'S': 0,
    'value_vector_': '[3.0, 34.5, 0.0, 0.0, 7.8292, 1.0, 1.0, 0.0]'
}
```

In [None]:
from relevanceai.utils.datasets import get_titanic_dataset
ds = client.Dataset('dummy-titanic')
ds.upsert_documents(get_titanic_dataset())

## COCO Dataset

The famous MS COCO dataset https://cocodataset.org/ encoded for easy use.

```
{
     '_id': '0abc',
     'annotations': {'caption': 'A table is adorned with wooden chairs with blue '
                                'accents.',
                     'id': 794853,
                     'image_id': 57870},
     'coco_url': 'http://images.cocodataset.org/train2014/COCO_train2014_000000057870.jpg',
     'coco_url_bit_medium_model_vector_': [0.039498601108789444, ... ],
     'coco_url_clip_vector_': [0.01488494873046875, ...],
     'coco_url_mobile_v2_model_vector_': [0.48851051926612854, ...],
     'date_captured': '2013-11-14 16:28:13',
     'file_name': 'COCO_train2014_000000057870.jpg',
     'flickr_url': 'http://farm4.staticflickr.com/3153/2970773875_164f0c0b83_z.jpg',
     'height': 480,
     'id': 57870,
     'insert_date_': '2022-03-27T03:48:26.119Z',
     'license': 5,
     'license_info': {'id': 5,
                      'name': 'Attribution-ShareAlike License',
                      'url': 'http://creativecommons.org/licenses/by-sa/2.0/'},
     'width': 640
 }
```

In [None]:

from relevanceai.utils.datasets import get_coco_dataset
ds = client.Dataset('dummy-coco')

ds.upsert_documents(get_coco_dataset())


## Dummy Datasets without Vectors


An e-commerce dataset with cleaned product prices and their sources.

```
{
    '_id': 'b7fc9acbc9ddd18855f96863d37a4fe9',
    'product_image': 'https://thumbs4.ebaystatic.com/d/l225/pict/321567405391_1.jpg',
    'product_link': 'https://www.ebay.com/itm/20-36-Mens-Silver-Stainless-Steel-Braided-Wheat-Chain-Necklace-Jewelry-3-4-5-6MM-/321567405391?pt=LH_DefaultDomain_0&var=&hash=item4adee9354f',
    'product_price': '$7.99 to $12.99',
    'product_title': '20-36Mens Silver Stainless Steel Braided Wheat Chain Necklace Jewelry 3/4/5/6MM"',
    'query': 'steel necklace',
    'source': 'eBay'
}
```

In [None]:

from relevanceai.utils.datasets import get_ecommerce_dataset_clean
ds = client.Dataset('dummy-ecommerce')
ds.upsert_documents(get_ecommerce_dataset_clean())


## Flipkart Dataset (E-commerce)

A flipkart e-commerce dataset. 
Sample Document: 

```
{
    '_id': 0,
    'product_name': "Alisha Solid Women's Cycling Shorts",
    'description': "Key Features of Alisha Solid Women's Cycling Shorts Cotton Lycra Navy, Red, Navy,Specifications of Alisha Solid Women's Cycling Shorts Shorts Details Number of Contents in Sales Package Pack of 3 Fabric Cotton Lycra Type Cycling Shorts General Details Pattern Solid Ideal For Women's Fabric Care Gentle Machine Wash in Lukewarm Water, Do Not Bleach Additional Details Style Code ALTHT_3P_21 In the Box 3 shorts",
    'retail_price': 999.0
}
```


In [None]:
from relevanceai.utils.datasets import get_flipkart_dataset
ds = client.Dataset('dummy-flipkart')
ds.upsert_documents(get_flipkart_dataset())

## Real Estate Dataset 

A sample real estate data containing images of houses, pricing, bathrooms and other details in Australia.

```
  {
      'propertyDetails': {'area': 'North Shore - Lower',
      'carspaces': 1,
      'streetNumber': '28',
      'latitude': -33.8115768,
      'allPropertyTypes': ['ApartmentUnitFlat'],
      'postcode': '2066',
      'unitNumber': '6',
      'bathrooms': 1.0,
      'bedrooms': 1.0,
      'features': ['BuiltInWardrobes', 'InternalLaundry','Intercom', 'Dishwasher'],
      'street': 'Epping Road',
      'propertyType': 'ApartmentUnitFlat',
      'suburb': 'LANE COVE',
      'state': 'NSW',
      'region': 'Sydney Region',
      'displayableAddress': '6/28 Epping Road, Lane Cove',
      'longitude': 151.166611},
      'listingSlug': '6-28-epping-road-lane-cove-nsw-2066-14688794',
      'id': 14688794,
      'headline': 'Extra large one bedroom unit',
      'summaryDescription': '<b></b><br />This modern and spacious one-bedroom apartment situated on the top floor, the quiet rear side of a small 2 story boutique block, enjoys a wonderfully private, leafy, and greenly outlook from 2 sides and balcony. A short stroll to city buse...',
      'advertiser': 'Ray White Lane Cove',
      'image_url': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_1_1_201203_101135-w1600-h1065',
      'insert_date_': '2021-03-01T14:19:22.805086',
      'labels': [],
      'image_url_5': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_5_1_201203_101135-w1600-h1067',
      'image_url_4': 'https://bucket-api.domain.com.au/v1/bucket/image/14688794_4_1_201203_101135-w1600-h1067',
      'priceDetails': {'displayPrice': 'Deposit Taken ! Inspection Cancelled thank you !!!'}
  ...
  }
```

In [None]:
from relevanceai.utils.datasets import get_realestate_dataset
ds = client.Dataset('dummy-realestate')
ds.upsert_documents(get_realestate_dataset())

## Games Dataset 

Our famous games dataset. 

Download an example games dataset (https://www.freetogame.com/)

Example
```
{
    'id': 1,
    'title': 'Dauntless',
    'thumbnail': 'https://www.freetogame.com/g/1/thumbnail.jpg',
    'short_description': 'A free-to-play, co-op action RPG with gameplay similar to Monster Hunter.',
    'game_url': 'https://www.freetogame.com/open/dauntless',
    'genre': 'MMORPG',
    'platform': 'PC (Windows)',
    'publisher': 'Phoenix Labs',
    'developer': 'Phoenix Labs, Iron Galaxy',
    'release_date': '2019-05-21',
    'freetogame_profile_url': 'https://www.freetogame.com/dauntless'
}
```

In [None]:
from relevanceai.utils.datasets import get_games_dataset
ds = client.Dataset('dummy-games-dataset')
ds.upsert_documents(get_games_dataset())

## News Dataset

An example news dataset (note - this is not encoded)

```
{
    'authors': 'Ruth Harris',
    'content': 'Sometimes the power of Christmas will make you do wild and wonderful things. You do not need to believe in the Holy Trinity to believe in the positive power of doing good for others.
    'domain': 'awm.com',
    'id': 141,
    'inserted_at': '2018-02-02 01:19:41.756632',
    'keywords': nan,
    'meta_description': nan,
    'meta_keywords': "['']",
    'scraped_at': '2018-01-25 16:17:44.789555',
    'summary': nan,
    'tags': nan,
    'title': 'Church Congregation Brings Gift to Waitresses Working on Christmas Eve, Has Them Crying (video)',
    'type': 'unreliable',
    'updated_at': '2018-02-02 01:19:41.756664',
    'url': 'http://awm.com/church-congregation-brings-gift-to-waitresses-working-on-christmas-eve-has-them-crying-video/'
}
```

In [None]:
from relevanceai.utils.datasets import get_news_dataset
ds = client.Dataset('dummy-news')
ds.upsert_documents(get_news_dataset())

## Iris Dataset

The famous Iris dataset

```

{
    "PetalLengthCm": "numeric",
    "PetalWidthCm": "numeric",
    "SepalLengthCm": "numeric",
    "SepalWidthCm": "numeric",
    "Species": "text",
    "insert_date_": "date"
}

```

In [None]:
from relevanceai.utils.datasets import get_iris_dataset
ds = client.Dataset('dummy-iris')
ds.upsert_documents(get_iris_dataset())

## Palmer Penguins Dataset

The popular [Palmer Penguins](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) dataset!

```
{
     'Body Mass (g)': 4350.0,
     'Clutch Completion': 'Yes',
     'Comments': nan,
     'Culmen Depth (mm)': 18.5,
     'Culmen Length (mm)': 40.3,
     'Date Egg': '2008-11-08',
     'Delta 13 C (o/oo)': -26.01152,
     'Delta 15 N (o/oo)': 8.39459,
     'Flipper Length (mm)': 196.0,
     'Individual ID': 'N49A2',
     'Island': 'Dream',
     'Region': 'Anvers',
     'Sample Number': 98,
     'Sex': 'MALE',
     'Species': 'Adelie Penguin (Pygoscelis adeliae)',
     'Stage': 'Adult, 1 Egg Stage',
     '_id': 'PAL0809'
}
```

In [None]:
from relevanceai.utils.datasets import get_palmer_penguins_dataset
ds = client.Dataset('dummy-palmer-penguins')
ds.upsert_documents(get_palmer_penguins_dataset())