# Welcome to Auctus Search! 🎉

If you’re new to dataset searching or just want to get started with the basics, you’re in the right place. In this notebook, we’ll walk through the fundamental steps of:

- Performing a **basic search** to find datasets.
- **Selecting a dataset** of interest from the search results.
- **Loading the dataset** for further analysis.

Let’s dive in and discover how to find and work with datasets using Auctus Search! 🚕

---

## 🎯 **Goal**

In this notebook, you’ll learn:
- How to perform a basic search using the `search_datasets()` method.
- How to select a dataset from the search results for further exploration.
- How to load the selected dataset into a usable format (e.g., a pandas DataFrame) for analysis.

---

**Ready to get started?** Let’s begin!

## Step 1: Import the Library

We start by importing the `AuctusSearch` class, which provides the functionality to interact with the Auctus API. We’ll also import `pandas` for loading and analysing datasets.

In [1]:
from auctus_search import AuctusSearch
import pandas as pd

## Step 2: Initialise AuctusSearch

Next, we create an instance of `AuctusSearch` to perform searches and manage datasets.

In [2]:
search = AuctusSearch()

## Step 3: Perform a Basic Search

Let’s perform a basic search using the `search_datasets()` method. By default, this method retrieves a small number of datasets (e.g., 10) on the first page. You can adjust the `size` and `page` parameters later if needed, but for now, we’ll keep it simple. Other examples cover advanced search techniques (pagination, etc.)

**Example**: Search for datasets related to "taxis".

In [3]:
collection = search.search_datasets(search_query="taxis", size=10, page=1)
collection.display()

Output()

When you run the above cell, you’ll see a grid of dataset cards, each representing a dataset related to "taxis". Each card includes:

- **Name**: The name of the dataset.
- **Source**: A link to the dataset's source.
- **Upload Date**: The date when the dataset was uploaded.
- **Description**: A brief overview of the dataset.
- **Type**: The primary type (e.g., Spatial, Tabular) and additional types.
- **Size**: The number of rows and columns in the dataset.
- **Relevancy**: A gauge showing how relevant the dataset is to your search query.

Take a moment to browse the results. You can scroll through the cards to find a dataset that interests you.

## Step 4: Load the Dataset for Analysis

Now that you’ve selected a dataset (by clicking on Select this dataset which at the top of the view should write the name of the selected dataset), let’s load it into a pandas DataFrame for further analysis. The `AuctusSearch` class provides a method to download and load the dataset.

**Example**: Load the selected dataset into a pandas DataFrame.

In [4]:
# Load the selected dataset into a pandas DataFrame
df = search.load_selected_dataset(display_table=False)

# Display the first few rows of the dataset
df.head()

Unnamed: 0,Closing Date,Public Vehicle Number,Sale Price,Seller’s Company Name,Buyer's Company Name
0,05/31/2012,3751,365000.0,SILDA RIVERA,"MAMOON JFK, INC."
1,01/20/2010,1712,102000.0,TASNEEM CAB CO.,CHICAGO'S FINEST CAB CORP.
2,04/05/2012,3779,275000.0,CLMH 1 LLC,LULU FOUR INC
3,10/30/2012,4483,345000.0,YC58 LLC,BLUE MOON TAXI LLC
4,12/20/2013,4948,325000.0,NEW CHICAGO INVESTMENTS INC.,L AND M PRESTIGE TAXI CORP


The `load_selected_dataset()` method downloads the dataset and loads it into a pandas DataFrame. You can now use pandas to explore and analyse the data. For example:

- Check the shape of the dataset: `df.shape`
- View column names: `df.columns`
- Summarise the data: `df.describe()`
- Visualise the data using libraries like matplotlib or seaborn.

But it's not all! You can also visualise the dataset with Skrub for interactive exploration. Let's do that next.

In [5]:
# Load the selected dataset into a pandas DataFrame While visualising with Skrub the dataset for interactive exploration
df = search.load_selected_dataset(display_table=True)

Processing column   5 / 5


Unnamed: 0_level_0,Closing Date,Public Vehicle Number,Sale Price,Seller’s Company Name,Buyer's Company Name
Unnamed: 0_level_1,Closing Date,Public Vehicle Number,Sale Price,Seller’s Company Name,Buyer's Company Name
0.0,05/31/2012,3751.0,365000.0,SILDA RIVERA,"MAMOON JFK, INC."
1.0,01/20/2010,1712.0,102000.0,TASNEEM CAB CO.,CHICAGO'S FINEST CAB CORP.
2.0,04/05/2012,3779.0,275000.0,CLMH 1 LLC,LULU FOUR INC
3.0,10/30/2012,4483.0,345000.0,YC58 LLC,BLUE MOON TAXI LLC
4.0,12/20/2013,4948.0,325000.0,NEW CHICAGO INVESTMENTS INC.,L AND M PRESTIGE TAXI CORP
,,,,,
5465.0,01/31/2025,5179.0,10000.0,AZZI EXPRESS INC.,KING ONE EXPRESS INC.
5466.0,01/31/2025,3754.0,10000.0,YAHOWA CAB CO,"N S CHICAGO, INC."
5467.0,01/31/2025,2685.0,8000.0,Z & M RIDE INC,"AJ TRANSPORTATION SERVICES, INC."
5468.0,02/07/2025,5579.0,8000.0,SKYWAY TRANSIT LTD,ABANABA 1 TRANS INC.

Column,Column name,dtype,Null values,Unique values,Mean,Std,Min,Median,Max
0,Closing Date,ObjectDType,0 (0.0%),1392 (25.4%),,,,,
1,Public Vehicle Number,Int64DType,0 (0.0%),4094 (74.8%),3470.0,2020.0,4.0,3474.0,6998.0
2,Sale Price,Float64DType,0 (0.0%),373 (6.8%),128000.0,122000.0,0.0,100000.0,390000.0
3,Seller’s Company Name,ObjectDType,0 (0.0%),3036 (55.5%),,,,,
4,Buyer's Company Name,ObjectDType,0 (0.0%),3375 (61.7%),,,,,

Column 1,Column 2,Cramér's V
Seller’s Company Name,Buyer's Company Name,0.41
Closing Date,Sale Price,0.382
Closing Date,Seller’s Company Name,0.307
Closing Date,Buyer's Company Name,0.219
Sale Price,Buyer's Company Name,0.168
Sale Price,Seller’s Company Name,0.165
Public Vehicle Number,Seller’s Company Name,0.159
Closing Date,Public Vehicle Number,0.14
Public Vehicle Number,Sale Price,0.102
Public Vehicle Number,Buyer's Company Name,0.0866


The above cell will display the dataset in an interactive table format, allowing you to explore the data further. You can sort columns, filter rows, and perform other operations directly within the table. See the API documentation for more details on the interactive table display. Let's Skrub!


## Step 5: Analyse the Dataset

Now that the dataset is loaded, you can perform any analysis of interest. Here’s a simple example to get you started:

**Example**: Calculate the average value of a numeric column (e.g., trip distance) in the dataset.

In [None]:
# Example: Calculate the average trip distance (replace 'trip_distance' with an actual column name)
if 'trip_distance' in df.columns:
    avg_trip_distance = df['trip_distance'].mean()
    print(f"Average trip distance: {avg_trip_distance:.2f} miles")
else:
    print("Column 'trip_distance' not found. Please check the column names using df.columns.")

Feel free to adapt this example to your specific dataset and analysis needs. You can also visualise the data using libraries like matplotlib or seaborn for more insights.