# Geospatial Vector Search with Qdrant
## Using qdrant for querying social media with vector search and geospatial filters without GPU (CPU only)

Notebook for the blog post: https://geo.rocks/post/geospatial-vector-search-qdrant/

In [1]:
import pandas as pd 

# read from remote GitHub Repo or download first 
df = pd.read_parquet("https://github.com/do-me/qdrant-tutorial/blob/main/CORDIS_55k_projects.parquet?raw=true") # 13Mb
df

The history saving thread hit an unexpected error (DatabaseError('database disk image is malformed')).History will not be written to the database.


Unnamed: 0,Collection,Record Number,Project acronym,Title,ID,Teaser
60024,project,191330,Hairy Cell Leukemia,Genetics-driven targeted therapy of Hairy Cell...,617471,"Hairy Cell Leukemia (HCL), a chronic B-cell ne..."
60025,project,73654,MODNET,Model theory and applications,512234,This proposal is designed to promote multi-dis...
60026,project,83907,DYNQUANTGR,"Dynamical quantum groups, deformation quantiza...",42212,The main goal of this proposal is two-fold: St...
60027,project,95038,ND-ETCRYPTOUC,New Directions in Efficient and Tamper-Resilie...,256544,Emerging ubiquitous devices such as WSN nodes ...
60028,project,86440,TAMBO,Societies of South Peru in the Context of Clim...,209938,"The project, Societies of South Peru in the Co..."
...,...,...,...,...,...,...
987809,project,225185,CustomerServiceAI,CustomerServiceAI: Fully language-independent ...,880954,Customer service is a huge industry: €720BN in...
987810,project,233472,eMOTIONAL Cities,eMOTIONAL Cities - Mapping the cities through ...,945307,As the world is becoming more urbanized and ci...
987811,project,227864,PRECISMEDLYM,"Aggressive T cell Lymphomas, integrated clinic...",882597,Lymphoid leukemias and lymphomas represent fre...
987812,project,226324,C-stemGMP,"c-GMP compliance of C-stem, an IPSc based cell...",881113,Scaling-up cell therapy manufacturing provides...


In [2]:
import numpy as np
# add random coordinates somewhere in the European mainland
df["lat"] = np.random.uniform(30,80, len(df))
df["lon"] = np.random.uniform(10,30, len(df))

In [4]:
from sentence_transformers import SentenceTransformer
from tqdm import tqdm
tqdm.pandas()

# load the model
model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')

# encode all teasers with the model
df["vector"] = df["Teaser"].progress_apply(lambda x: model.encode(x.lower()))

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55418/55418 [27:45<00:00, 33.27it/s]


In [19]:
from qdrant_client import QdrantClient
from qdrant_client.http.models import * #VectorParams

client = QdrantClient(host="localhost", port=6333)
client.recreate_collection(
    collection_name="test_collection",
    vectors_config=VectorParams(size=384, distance=Distance.DOT)
)

In [20]:
client.create_payload_index(
    collection_name="test_collection",
    field_name="Teaser",
    field_schema=models.TextIndexParams(
        type="text",
        tokenizer=models.TokenizerType.WORD,
        min_token_len=2,
        max_token_len=30,
        lowercase=True,
    )
)

InlineResponse2006(time=0.23038, status='ok', result=UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>))

In [21]:
def post_qdrant(row):
    """Inserting each row seperately for simplicity. Can be optimized through inserting multiple rows at once."""
    
    # normal payload, everything apart from vector and lat lon
    row_payload = row.iloc[:-3].to_dict()
    
    # add lat lon to payload 
    row_payload["location"] = row[["lat","lon"]].to_dict()
    
    # vector
    row_vector = row["vector"].tolist() #.to_json()) # vector
    
    # id as unique key in Qdrant
    row_id = row["Record Number"]
        
    # POST request to Qdrant API
    operation_info = client.upsert(
        collection_name="test_collection",
        wait=True,
        points=[
            PointStruct(id=row_id, vector=row_vector, payload=row_payload),
            ]
    )    

In [22]:
df.progress_apply(lambda x: post_qdrant(x), axis=1)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55418/55418 [17:34<00:00, 52.56it/s]


60024     None
60025     None
60026     None
60027     None
60028     None
          ... 
987809    None
987810    None
987811    None
987812    None
987813    None
Length: 55418, dtype: object

In [23]:
search_term = "earth observation"
search_result = client.search(
    collection_name="test_collection",
    query_vector=model.encode(search_term), 
    limit=3
)
search_result

[ScoredPoint(id=85228, version=4014, score=0.6618099, payload={'Collection': 'project', 'ID': '30849', 'Project acronym': 'SEOS', 'Record Number': 85228, 'Teaser': 'Earth observation from space is relevant in science education already in high schools since it sharpens the sensibility to the natural environment and thus stimulates the willingness to learn of its relevance to everyday life conditions. This covers a broad field of experience...', 'Title': 'Science education through earth observation for high schools', 'location': {'lat': 74.07343093948293, 'lon': 24.20374801673081}}, vector=None),
 ScoredPoint(id=222830, version=53120, score=0.60703325, payload={'Collection': 'project', 'ID': '842560', 'Project acronym': 'CALCHAS', 'Record Number': 222830, 'Teaser': 'Earth Observation (EO) is undergoing a radical transformation due to the massive volume of observations acquired by remote sensing and in-situ sensor networks. While satellites provide coarse-resolution, yet global-scale moni

In [25]:
search_term = "earth observation"

search_result = client.scroll(
    collection_name="test_collection",
    scroll_filter=Filter(
        must=[
            FieldCondition(
                key="Teaser",
                match=MatchText(text=search_term)
            )
        ]
    ),
    limit=3,
    with_payload=True,
)

search_result

([Record(id=82157, payload={'Collection': 'project', 'ID': '7512', 'Project acronym': 'SPARTAN', 'Record Number': 82157, 'Teaser': 'We propose to create a Centre of Excellence in the training of early stage researchers in the space, planetary (including Earth Observation) and astrophysical sciences in the Department of Physics and Astronomy at the University of Leicester.\r\n\r\nThe principal aims of this cent...', 'Title': 'Centre of excellence for space, planetary and astrophysics research Training and Networking', 'location': {'lat': 72.52626887469162, 'lon': 28.64183473357381}}, vector=None),
  Record(id=85228, payload={'Collection': 'project', 'ID': '30849', 'Project acronym': 'SEOS', 'Record Number': 85228, 'Teaser': 'Earth observation from space is relevant in science education already in high schools since it sharpens the sensibility to the natural environment and thus stimulates the willingness to learn of its relevance to everyday life conditions. This covers a broad field of

In [26]:
search_term = "earth observation as a method for fighting climate change in urban areas"

search_result = client.scroll(
    collection_name="test_collection",
    scroll_filter=Filter(
        must=[
            FieldCondition(
                key="Teaser",
                match=MatchText(text=search_term)
            )
        ]
    ),
    limit=3,
    with_payload=True,
)

search_result

([], None)

In [27]:
search_term = "earth observation as a method for fighting climate change in urban areas"
search_result = client.search(
    collection_name="test_collection",
    query_vector=model.encode(search_term), 
    limit=3
)
search_result

[ScoredPoint(id=231962, version=32212, score=0.6569561, payload={'Collection': 'project', 'ID': '101004211', 'Project acronym': 'ECFAS', 'Record Number': 231962, 'Teaser': 'The increasing number of tools and algorithms able to process and extract qualitative and quantitative information from Earth Observation products has an enormous potential to support the evaluation of weather-induced climate risks. The ECFAS project will contribute to the...', 'Title': 'A PROOF-OF-CONCEPT FOR THE IMPLEMENTATION OF A EUROPEAN COPERNICUS COASTAL FLOOD AWARENESS SYSTEM', 'location': {'lat': 31.21824594487511, 'lon': 29.857792095998956}}, vector=None),
 ScoredPoint(id=216035, version=46207, score=0.63480234, payload={'Collection': 'project', 'ID': '771056', 'Project acronym': 'LICCI', 'Record Number': 216035, 'Teaser': 'In the quest to better understand local climate change impacts on physical, biological, and socioeconomic systems and how such impacts are locally perceived, scientists are challenged b

In [32]:
search_term = "earth observation"

search_result = client.search(
    collection_name="test_collection",
    query_vector=model.encode(search_term), 
    query_filter=Filter(
        must=[
            FieldCondition(
                key="location",
                geo_bounding_box=models.GeoBoundingBox(
                    bottom_right=models.GeoPoint(
                        lat=48.495862,
                        lon=13.455868,
                    ),
                    top_left=models.GeoPoint(
                        lat=52.520711,
                        lon=5.403683,
                    ),
                ),
            )
        ]
    ),
    limit=3
)

search_result

[ScoredPoint(id=92067, version=44762, score=0.3700924, payload={'Collection': 'project', 'ID': '226701', 'Project acronym': 'CARBO-EXTREME', 'Record Number': 92067, 'Teaser': 'The aim of this project is to achieve an improved knowledge of the terrestrial carbon cycle in response to climate variability and extremes, to represent and apply this knowledge over Europe with predictive terrestrial carbon cycle modelling, to interpret the model predictions...', 'Title': 'The terrestrial Carbon cycle under Climate Variability and Extremes – a Pan-European synthesis', 'location': {'lat': 48.98955310027602, 'lon': 10.164019268032614}}, vector=None),
 ScoredPoint(id=81072, version=4970, score=0.35429376, payload={'Collection': 'project', 'ID': '517912', 'Project acronym': 'MSEPOA', 'Record Number': 81072, 'Teaser': 'We propose to use X-ray data from two pioneering in their kind and complementary multi-wavelength surveys, in order to perform a systematic study of obscured AGN. These leverage subst

In [36]:
search_term = "earth observation"# as a method for fighting climate change in urban areas"

search_result = client.search(
    collection_name="test_collection",
    query_vector=model.encode(search_term), 
    query_filter=Filter(
        must=[
            FieldCondition(
                key="location",
                geo_radius=models.GeoRadius(
                    center=models.GeoPoint(
                        lat=52.520711,
                        lon=13.403683,
                    ),
                    radius=10_000,
                ),
            )
        ]
    ),
    limit=3
)

search_result

[ScoredPoint(id=103492, version=8624, score=0.23580064, payload={'Collection': 'project', 'ID': '301230', 'Project acronym': 'NeBRiC', 'Record Number': 103492, 'Teaser': 'Cosmology has gone through an amazing revolution during the last decade owing to the large amount of new precise observational data. These data strongly indicate the existence of two periods of accelerated expansion in the history of the Universe. One in the primordial univer...', 'Title': 'Non-linear effects and backreaction in classical and quantum cosmology', 'location': {'lat': 52.565792386550584, 'lon': 13.485898163771736}}, vector=None),
 ScoredPoint(id=80900, version=33189, score=0.13301674, payload={'Collection': 'project', 'ID': '514222', 'Project acronym': 'CB_DIDACTIQUE', 'Record Number': 80900, 'Teaser': 'It is well known that Education is one of the key elements of every society. A worldwide scale look at of the economic, social and financial health of the different countries clearly shows an intimate lin