# CrateDB Full-Text and Vector Search Workshop

CrateDB supports both full-text and vector similarity searches, and has a native datatype for storing vectors.  In this workshop, you'll explore these concepts using community area data from the City of Chicago.

## Install Dependencies

First, install the required dependencies by executing the `pip install` command below.

In [None]:
! pip install -U sqlalchemy-cratedb pandas

## Connect to CrateDB

Before going any further, you'll need to update the code below to include a connection string for your CrateDB cluster.  If you prefer, you can set the environment variable `CRATEDB_CONNECTION_STRING` instead.

The code below assumes that you're using a managed [CrateDB Cloud](https://console.cratedb.cloud/) cluster.  If you're running CrateDB locally (for example with [Docker](https://hub.docker.com/_/crate)), use the "localhost" code block instead.

In [29]:
import os
import sqlalchemy as sa

# # Define database address when using CrateDB Cloud.
# # Please find these settings on your cluster overview page.
CONNECTION_STRING = os.environ.get(
    "CRATEDB_CONNECTION_STRING",
    "crate://<USERNAME>:<PASSWORD>@<HOST>/?ssl=true",
)

# # Define database address when using CrateDB on localhost.
# CONNECTION_STRING = os.environ.get(
#    "CRATEDB_CONNECTION_STRING",
#    "crate://crate@localhost/",
# )

# # Connect to CrateDB using SQLAlchemy.
engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))
connection = engine.connect()

## Create a Community Areas Table

First, you'll need to create a table to store the community areas data in.  You may have a `community_areas` table that was created by following other CrateDB workshops.  The code below drops any existing such table, replacing it with a new version.  This new version as an additional column `description_vec` in the `details` object.  You'll learn about what this is for later in this workshop!

In [78]:
_ = connection.execute(sa.text(
"""
DROP TABLE IF EXISTS community_areas
"""
))

_ = connection.execute(sa.text(
"""
CREATE TABLE IF NOT EXISTS community_areas (
   areanumber INTEGER PRIMARY KEY,
   name TEXT,
   details OBJECT(DYNAMIC) AS (
       description TEXT INDEX USING fulltext WITH (analyzer='english'),
       description_vec FLOAT_VECTOR(2048),
       population BIGINT
   ),
   boundaries GEO_SHAPE INDEX USING geohash WITH (PRECISION='1m', DISTANCE_ERROR_PCT=0.025)
);
"""))

## Load the Data

Next, load the community areas data, which is stored as a JSON file on GitHub...

In [None]:
def display_results(table_name, info):
    print(f"{table_name}: loaded {info['success_count']}, errors: {info['error_count']}")

    if info["error_count"] > 0:
        print(f"Errors: {info['errors']}")

# Load the community areas data file.
result = connection.execute(sa.text("""
    COPY community_areas 
    FROM 'https://github.com/crate/cratedb-datasets/raw/main/academy/chicago-data/chicago_community_areas_with_vectors.json' 
    RETURN SUMMARY;                                  
    """))

display_results("community_areas", result.mappings().first())

Once the data's loaded, verify that the output shows 0 errors.  Next, we'll run a `REFRESH` command to make sure that the data's up to date before querying it.  We'll also run `ANALYZE`, which collects statistics used by the query optimizer.

In [80]:
_ = connection.execute(sa.text("REFRESH TABLE community_areas"))
_ = connection.execute(sa.text("ANALYZE"))

## Familiarization with the Data

Before we try out some different ways to search the textual data in the `community_areas` table, let's first run a simple `SELECT` query to take a look at some of it.

In [81]:
import pandas as pd

pd.set_option("display.max_colwidth", None)

query = """
SELECT 
    name, details['description'] as desc_text, details['description_vec'] as desc_vec 
FROM community_areas WHERE areanumber = 51
"""
df = pd.read_sql(query, CONNECTION_STRING)
vals = df.to_dict(orient="records")

display(df)

Unnamed: 0,name,desc_text,desc_vec
0,SOUTH DEERING,"South Deering, located on Chicago's far South Side, is the largest of the 77 official community areas of that city. Primarily an industrial area, a small residential neighborhood exists in the northeast corner and Lake Calumet takes up a large portion of the area. 80% of the community area is zoned as industrial, natural wetlands, or parks. The remaining 20% is zoned for residential and small-scale commercial uses. It is part of the 10th Ward, once under the control of former Richard J. Daley ally Alderman Edward Vrdolyak. The neighborhood is named for Charles Deering, an executive in the Deering Harvester Company that would later form a major part of International Harvester. International Harvester owned Wisconsin Steel, which was originally established in 1875 and was located along Torrence Avenue south of 106th Street to 109th Street. It is the location of Calumet Fisheries, a historic seafood restaurant that opened in 1928 and has been featured on Anthony Bourdain: No Reservations. The original Calumet Bakery store, a South Side favorite since 1935, is located at 2510 E 106th St, Chicago, IL 60617. It was also the location of the Wisconsin Steel Works, originally the Joseph H. Brown Iron and Steel Company, which opened in 1875 and closed in 1980. Since the closing of the steel mill, the neighborhood has remained economically depressed.","[0.03875456, -0.008511306, -0.017262578, -0.03257543, -0.02393664, -0.0598416, 0.0027521136, -0.033445306, -0.009021234, 0.013333129, -0.026981214, 0.02165696, -0.0078289015, -0.0024690286, 0.024236599, -0.02699621, -0.030265752, 0.014502965, 0.008308833, -0.028675975, 0.016572675, -0.016977618, -0.025946358, 0.0040006884, 0.038934536, 0.026831234, -0.0020340895, -0.029455867, -0.04355389, -0.029440869, -0.00797888, 0.01945227, 0.038604584, -0.019857213, 0.009253701, 0.0011913953, -0.033205338, 0.019077323, 0.015732791, 0.026201323, 0.02080208, 0.04586356, -0.0030070778, 0.014870413, 0.0013901173, 0.010633508, -0.018942341, 0.010363545, 0.004874316, 0.0038469601, -0.0039856904, 0.011345908, -0.0065803262, 0.0021390747, 0.0126507245, -0.024431571, 0.052192673, 0.11812342, -0.05600214, 0.026561271, 0.0003538566, -0.0377647, -0.040944252, -0.0062578716, 0.052672606, 0.041874122, 0.0064378465, 0.05066289, 0.015987756, 0.00797888, -0.010940964, 0.009058729, -0.07240984, 0.03578498, 0.027011208, -0.026681256, -0.011533381, -0.004094425, -0.004746834, -0.023456708, 0.010813482, -0.030895663, -0.010611011, -0.009433676, 0.033205338, 0.03170555, -0.03233546, -0.05222267, -0.009096223, -0.00235092, 0.015522822, -0.041574165, 0.014113019, -0.03854459, 0.025046485, -0.02906592, -0.028106056, 0.0019150437, -0.027821096, 0.04211409, ...]"


Take a look at the values for `desc_text` and `dest_vec`.  

* `dest_text` is a free-text descrtipion of the characteristics of the community area, sourced from Wikipedia.  We'll use this to explore CrateDB's full-text search capabilities.

* `dest_vec` is a `FLOAT_VECTOR` column, containing vector embeddings created from the text in `dest_text` by passing it through OpenAI's `text-embedding-3-large` model.  These embeddings have been created for you, so you don't need to use the OpenAI API to work with data in this workbook.  We chose to use 2048 dimensions.

## Full-text Search

The first type of search we'll learn about here is full-text search.  We use full-text search when we want to find documents containing particular words or phrases whilst considering the search query can contain typos or synonyms and that we may want to search for given prefixes or perform fuzzy matching.

CrateDB uses Apache Lucene for full-text search.  Search indexes can be built over any number of `TEXT` columns in a table, including those deeply nested inside `OBJECT` columns.  Composite indexes containing data from more than one `TEXT` column can also be created.

Consider our `community_areas` table schema:

```sql
CREATE TABLE IF NOT EXISTS community_areas (
   ...
   details OBJECT(DYNAMIC) AS (
       description TEXT INDEX USING fulltext WITH (analyzer='english'),
       ...
```

Here, `description` is declared as `TEXT` with the additional `INDEX using fulltext` clause.  This tells CrateDB to create a full-text index for this field and that we expect the content to be in English.

### Introducing `MATCH`

The `MATCH` predicate is used to perform full-text searches.  Let's search for the term "railway" in our community area data:

In [82]:
query = """
SELECT name, _score, details['description'] as description
FROM community_areas 
WHERE match(details['description'], 'railway')
ORDER BY _score DESC;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,description
0,AUSTIN,1.749486,"Austin is one of 77 community areas in Chicago. Located on the city's West Side, it is the third largest community area by population (behind the Near North Side and Lake View) and the second-largest geographically (behind South Deering). Austin's eastern boundary is the Belt Railway located just east of Cicero Avenue. Its northernmost border is the Milwaukee District / West Line. Its southernmost border is at Roosevelt Road from the Belt Railway west to Austin Boulevard. The northernmost portion, north of North Avenue, extends west to Harlem Avenue, abutting Elmwood Park. In addition to Elmwood Park, Austin also borders the suburbs of Cicero and Oak Park"
1,BURNSIDE,1.150856,"Burnside is one of the 77 community areas in Chicago. The 47th numbered area, it is located on the city's far south side. This area is also called 'The Triangle' by locals, as it is bordered by railroad tracks on every side; the Canadian National Railway on the west, the Union Pacific Railroad on the south and the Norfolk Southern Railway on the east. With a population of 2,254 in 2016, it is the least populous of the community areas, as well as the second smallest by area after Oakland."
2,ASHBURN,1.140486,"Ashburn, one of Chicago's 77 community areas, is located on the south side of the city. Greater Ashburn covers nearly five square miles. The approximate boundaries of Ashburn are 72nd Street (north), Western Avenue (east), 87th Street (south) and Cicero Avenue (west). Ashburn, which got its name as the dumping site for the city's ashes, was slow to experience growth at the beginning of the 20th century. In 1893, the 'Clarkdale' subdivision was planned near 83rd and Central Park Avenue along the new Chicago and Grand Trunk Railway, with only 19 homes built in the first 50 years. The early residents were Dutch, Swedish and Irish. Ashburn opened Ashburn Flying Field, the first airfield in Chicago, in 1916"
3,WEST ENGLEWOOD,0.899924,"West Englewood, one of the 77 community areas, is on the southwest side of Chicago, Illinois. At one time it was known as South Lynne. The boundaries of West Englewood are Garfield Blvd to the north, Racine Ave to the east, the CSX and Norfolk Southern RR tracks to the west, and the Belt Railway of Chicago to the south. Though it is a separate community area, much of the history and culture of the neighborhood is linked directly to the Englewood neighborhood."
4,GREATER GRAND CROSSING,0.759754,"Greater Grand Crossing is one of the 77 community areas of Chicago, Illinois. It is located on the city's South Side. The name 'Grand Crossing' comes from an 1853 right-of-way feud between the Lake Shore and Michigan Southern Railway and the Illinois Central Railroad that led to a frog war and a crash that killed 18 people. The crash was the result of Roswell B. Mason (later to serve as mayor of Chicago) illegally constructing railroad tracks, on behalf of the Illinois Central, across another railroad company's tracks. Due to the lack of safety at the crossing, trains made complete stops here and therefore industry developed around the area to cater to the railroad workers."
5,LOOP,0.439959,"The Loop, one of Chicago's 77 designated community areas, is the central business district of the city and is the main section of Downtown Chicago. Home to Chicago's commercial core, it is the second largest commercial business district in North America after Midtown Manhattan in New York City, and contains the headquarters and regional offices of several global and national businesses, retail establishments, restaurants, hotels, and theaters, as well as many of Chicago's most famous attractions. It is home to Chicago's City Hall, the seat of Cook County, and numerous offices of other levels of government and consulates of foreign nations. The intersection of State Street and Madison Street is the origin point for the address system on Chicago's street grid. Most of Grant Park's 319 acres (129 hectares) are in the eastern section of the community area. The Loop community area is bounded on the north and west by the Chicago River, on the east by Lake Michigan, and on the south by Roosevelt Road. In 1803, the United States Army built Fort Dearborn in what is now the Loop, the first settlement in the area sponsored by the United States' federal government. When Chicago and Cook County were incorporated in the 1830s the area was selected as the site of their respective seats. Originally mixed use, the character of the area became commercial starting in the 1870s, especially after it was mostly destroyed in the Great Chicago Fire of 1871. At that time some of the world's earliest skyscrapers were constructed in the area, starting a legacy of architecture that continues to this day. In the late 19th century, cable car turnarounds and a prominent elevated railway loop encircled the area, giving the Loop its name. Starting in the 1920s many highways were constructed in the Loop, most prominently U.S. Route 66, which opened in 1926 with its eastern terminus in the area. While dominated by offices and public buildings, its residential population boomed during the latter 20th century and first decades of the 21st; its population has increased the most of Chicago's community areas since 1950."


`MATCH` returns a special column, `_score`.  This indicates the relative quality of the match.

### Experimenting with Full-text Search

The following query searches for the terms "railroad" OR "tracks":

In [83]:
query = """
SELECT name, _score, details['description'] AS description 
FROM community_areas 
WHERE MATCH(details['description'], 'railroad tracks') 
ORDER BY _score DESC
LIMIT 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,description
0,GREATER GRAND CROSSING,2.148552,"Greater Grand Crossing is one of the 77 community areas of Chicago, Illinois. It is located on the city's South Side. The name 'Grand Crossing' comes from an 1853 right-of-way feud between the Lake Shore and Michigan Southern Railway and the Illinois Central Railroad that led to a frog war and a crash that killed 18 people. The crash was the result of Roswell B. Mason (later to serve as mayor of Chicago) illegally constructing railroad tracks, on behalf of the Illinois Central, across another railroad company's tracks. Due to the lack of safety at the crossing, trains made complete stops here and therefore industry developed around the area to cater to the railroad workers."
1,BURNSIDE,1.898286,"Burnside is one of the 77 community areas in Chicago. The 47th numbered area, it is located on the city's far south side. This area is also called 'The Triangle' by locals, as it is bordered by railroad tracks on every side; the Canadian National Railway on the west, the Union Pacific Railroad on the south and the Norfolk Southern Railway on the east. With a population of 2,254 in 2016, it is the least populous of the community areas, as well as the second smallest by area after Oakland."
2,GRAND BOULEVARD,1.835274,"Grand Boulevard on the South Side of Chicago, Illinois, is one of the city's Community Areas. The boulevard from which it takes its name is now Martin Luther King Jr. Drive. The area is bounded by 39th to the north, 51st Street to the south, Cottage Grove Avenue to the east, and the Chicago, Rock Island & Pacific Railroad tracks to the west."
3,WEST TOWN,1.770313,"West Town, northwest of the Loop on Chicago's West Side, is one of the city's officially designated community areas. Much of this area was historically part of Polish Downtown, along Western Avenue, which was then the city's western boundary. West Town was a collection of several distinct neighborhoods and the most populous community area until it was surpassed by Near West Side in the 1960s. The boundaries of the community area are the Chicago River to the east, the Union Pacific railroad tracks to the south, the former railroad tracks on Bloomingdale Avenue to the North, and an irregular western border to the west that includes the city park called Humboldt Park. Humboldt Park is also the name of the community area to West Town's west, Logan Square is to the north, Near North Side to the east, and Near West Side to the south. The collection of neighborhoods in West Town along with the neighborhoods of Bucktown and the eastern portion of Logan Square have been referred to by some media as the 'Near Northwest Side'."
4,IRVING PARK,1.453797,"Irving Park is one of 77 officially designated Chicago community areas, and is located on the Northwest Side. It is bounded by the Chicago River on the east, the Milwaukee Road railroad tracks on the west, Addison Street on the south and Montrose Avenue on the north, west of Pulaski Road stretching to encompass the region between Belmont Avenue on the south and, roughly, Leland Avenue on the north. It is named after the American author Washington Irving. Old Irving Park, bounded by Montrose Avenue, Pulaski Road, Addison Street, and Cicero Avenue, has a variety of housing stock with Queen Anne, Victorian, and Italianate homes, a few farmhouses, and numerous bungalows. The CTA Blue Line runs through this neighborhood with stops at Addison, Irving Park, and Montrose."


Take a moment to study where the terms "railroad" or "tracks" are contained in the above matches.  What it we wanted to search for the specific phrase "railroad tracks"?  For that, we add `USING phrase`:

In [84]:
query = """
SELECT name, _score, details['description'] AS description 
FROM community_areas 
WHERE MATCH(details['description'], 'railroad tracks') USING phrase
ORDER BY _score DESC
LIMIT 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,description
0,GRAND BOULEVARD,1.835274,"Grand Boulevard on the South Side of Chicago, Illinois, is one of the city's Community Areas. The boulevard from which it takes its name is now Martin Luther King Jr. Drive. The area is bounded by 39th to the north, 51st Street to the south, Cottage Grove Avenue to the east, and the Chicago, Rock Island & Pacific Railroad tracks to the west."
1,WEST TOWN,1.770313,"West Town, northwest of the Loop on Chicago's West Side, is one of the city's officially designated community areas. Much of this area was historically part of Polish Downtown, along Western Avenue, which was then the city's western boundary. West Town was a collection of several distinct neighborhoods and the most populous community area until it was surpassed by Near West Side in the 1960s. The boundaries of the community area are the Chicago River to the east, the Union Pacific railroad tracks to the south, the former railroad tracks on Bloomingdale Avenue to the North, and an irregular western border to the west that includes the city park called Humboldt Park. Humboldt Park is also the name of the community area to West Town's west, Logan Square is to the north, Near North Side to the east, and Near West Side to the south. The collection of neighborhoods in West Town along with the neighborhoods of Bucktown and the eastern portion of Logan Square have been referred to by some media as the 'Near Northwest Side'."
2,BURNSIDE,1.668631,"Burnside is one of the 77 community areas in Chicago. The 47th numbered area, it is located on the city's far south side. This area is also called 'The Triangle' by locals, as it is bordered by railroad tracks on every side; the Canadian National Railway on the west, the Union Pacific Railroad on the south and the Norfolk Southern Railway on the east. With a population of 2,254 in 2016, it is the least populous of the community areas, as well as the second smallest by area after Oakland."
3,IRVING PARK,1.453797,"Irving Park is one of 77 officially designated Chicago community areas, and is located on the Northwest Side. It is bounded by the Chicago River on the east, the Milwaukee Road railroad tracks on the west, Addison Street on the south and Montrose Avenue on the north, west of Pulaski Road stretching to encompass the region between Belmont Avenue on the south and, roughly, Leland Avenue on the north. It is named after the American author Washington Irving. Old Irving Park, bounded by Montrose Avenue, Pulaski Road, Addison Street, and Cicero Avenue, has a variety of housing stock with Queen Anne, Victorian, and Italianate homes, a few farmhouses, and numerous bungalows. The CTA Blue Line runs through this neighborhood with stops at Addison, Irving Park, and Montrose."
4,GREATER GRAND CROSSING,1.426055,"Greater Grand Crossing is one of the 77 community areas of Chicago, Illinois. It is located on the city's South Side. The name 'Grand Crossing' comes from an 1853 right-of-way feud between the Lake Shore and Michigan Southern Railway and the Illinois Central Railroad that led to a frog war and a crash that killed 18 people. The crash was the result of Roswell B. Mason (later to serve as mayor of Chicago) illegally constructing railroad tracks, on behalf of the Illinois Central, across another railroad company's tracks. Due to the lack of safety at the crossing, trains made complete stops here and therefore industry developed around the area to cater to the railroad workers."


Take a moment to look at the results here and see how they differ to those from the previous query that searched for "railroad" or "tracks". 

Let's search for communities whose description matches both "railword" and "historic":

In [86]:
query = """
SELECT name, _score, details['description'] AS description 
FROM community_areas 
WHERE MATCH(details['description'], 'railroad historic') USING best_fields WITH (operator='and')
ORDER BY _score DESC
LIMIT 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,description
0,PULLMAN,1.834025,"Pullman, one of Chicago's 77 defined community areas, is a neighborhood located on the city's South Side. Twelve miles from the Chicago Loop, Pullman is situated adjacent to Lake Calumet. The area known as Pullman encompasses a much wider area than its two historic areas (the older historic area is often referred to as 'Pullman' and is a Chicago Landmark district and a national historical park. The northern annex historic area is usually referred to as 'North Pullman'). The development built by the Pullman Company is bounded by 103rd Street on the North, 115th Street on the South, the railroad tracks on the East and Cottage Grove on the West. Since the late 20th century, the Pullman neighborhood has been gentrifying. Many residents are involved in the restoration of their own homes, and projects throughout the district as a whole. Walking tours of Pullman are available/ Pullman has many historic and architecturally significant buildings; among these are the Hotel Florence; the Arcade Building, which was destroyed in the 1920s; the Clock Tower and Factory, the complex surrounding Market Square, and Greenstone Church. In the adjacent Kensington neighborhood of the nearby Roseland district is the home of one of the many beautiful churches in Chicago built in Polish Cathedral style, the former church of St. Salomea. It is now used by Salem Baptist Church of Chicago. In a contest sponsored by the Illinois Department of Commerce and Economic Opportunity, Pullman was one of seven sites nominated for the Illinois Seven Wonders."
1,LOGAN SQUARE,1.563071,"Logan Square is an official community area, historical neighborhood, and public square on the northwest side of the City of Chicago. The Logan Square community area is one of the 77 city-designated community areas established for planning purposes. The Logan Square neighborhood, located within the Logan Square community area, is centered on the public square that serves as its namesake, located at the three-way intersection of Milwaukee Avenue, Logan Boulevard and Kedzie Boulevard. The community area of Logan Square is, in general, bounded by the Metra/Milwaukee District North Line railroad on the west, the North Branch of the Chicago River on the east, Diversey Parkway on the north, and the 606 (also known as the Bloomingdale Trail) on the south. The area is characterized by the prominent historical boulevards, stately greystones and large bungalow-style homes."
2,WEST TOWN,1.408882,"West Town, northwest of the Loop on Chicago's West Side, is one of the city's officially designated community areas. Much of this area was historically part of Polish Downtown, along Western Avenue, which was then the city's western boundary. West Town was a collection of several distinct neighborhoods and the most populous community area until it was surpassed by Near West Side in the 1960s. The boundaries of the community area are the Chicago River to the east, the Union Pacific railroad tracks to the south, the former railroad tracks on Bloomingdale Avenue to the North, and an irregular western border to the west that includes the city park called Humboldt Park. Humboldt Park is also the name of the community area to West Town's west, Logan Square is to the north, Near North Side to the east, and Near West Side to the south. The collection of neighborhoods in West Town along with the neighborhoods of Bucktown and the eastern portion of Logan Square have been referred to by some media as the 'Near Northwest Side'."
3,HYDE PARK,1.372627,"Hyde Park is a neighborhood on the South Side of Chicago, Illinois, located on and near the shore of Lake Michigan 7 miles (11 km) south of the Loop. It is one of the city’s 77 municipally recognized community areas. Hyde Park’s boundaries and subdivisions have several local definitions. The community area’s formal boundaries are 51st Street (signed locally as Hyde Park Boulevard) on the north, Midway Plaisance on the south, Washington Park on the west, and Lake Michigan on the east. Another local definition considers a section to the north between 47th Street[3] and Hyde Park Boulevard to be in Hyde Park, although this area is, according to municipal boundaries, the southern half of the Kenwood community area. As such, it is often called “South Kenwood.” Hyde Park and South Kenwood are also sometimes collectively termed “Hyde Park-Kenwood” (as in the name of the epoynmous Historic District, for example). Meanwhile, the portion of Hyde Park that lies between the Illinois Central Railroad tracks and the lake is usually referred to as “East Hyde Park” and is usually also taken to include “Indian Village,” the small southeastern corner of Kenwood. Hyde Park is home to a number of institutions of higher education: the University of Chicago, Catholic Theological Union, Lutheran School of Theology at Chicago, McCormick Theological Seminary, and Chicago Theological Seminary. The community area is also home to the Museum of Science and Industry, and two of Chicago's four historic sites listed in the original 1966 National Register of Historic Places (Chicago Pile-1, the world's first artificial nuclear reactor, and Robie House). In the early 21st century, Hyde Park received national attention for its association with U.S. President Barack Obama, who, before running for president, was a Senior Lecturer for twelve years at the University of Chicago Law School, an Illinois state senator representing the area, and U.S senator from Illinois. The Barack Obama Presidential Center which is currently under construction in Jackson Park is located nearby."
4,WASHINGTON HEIGHTS,1.225143,"Washington Heights is the 73rd of Chicago's 77 community areas. Located 12 miles (19 km) from the Loop, it is on the city's far south side. Washington Heights is considered part of the Blue Island Ridge, along with the nearby community areas of Beverly, Morgan Park and Mount Greenwood, and the village of Blue Island. It contains a neighborhood also known as Washington Heights, as well as the neighborhoods of Brainerd and Fernwood. As of 2017, Washington Heights had 27,453 inhabitants. Named for the heights which are now part of the adjacent Beverly, the area was settled in the late 19th century at the intersection of two railroad lines. It was incorporated as a village in 1874, and was annexed by Chicago in 1890. During most of the 20th century, Washington Heights was primarily inhabited by Irish, Germans and Swedes; after late-20th-century white flight, it has been mainly inhabited by African-Americans. The area largely retained its middle-class character during its racial transition, declining somewhat in recent years. Historically influenced by transit, Washington Heights includes the original site of the former Chicago Bridge & Iron Company. The Brainerd Bungalow Historic District and the Carter G. Woodson Regional Library, home of the largest collection of African-American history in the midwestern United States, are in the area."


Again, take a moment to study the text in each matching result.

### Combining Full-text Search with Other Criteria

As full-text search in CrateDB uses SQL, you can combine it with other criteria.  For example, let's search for community areas whose description matches term "Univresity".

In [87]:
query = """
SELECT name, _score, details['population'] AS population, details['description'] AS description 
FROM community_areas 
WHERE MATCH(details['description'], 'Univresity')
ORDER BY _score DESC
LIMIT 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,population,description


How many results do we get?  None... because there's a small typo in the search term.  Specifying a `fuzziness` factor helps compensate for this sort of error in user input.  Let's try again:

In [88]:
query = """
SELECT name, _score, details['population'] AS population, details['description'] AS description 
FROM community_areas 
WHERE MATCH(details['description'], 'Univresity') USING best_fields WITH (fuzziness = 2)
ORDER BY _score DESC
LIMIT 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,population,description
0,ROGERS PARK,0.8227,55628,"Rogers Park is the first of Chicago's 77 community areas. Located 9 miles (14 km) from the Loop, it is on the city's far north side on the shore of Lake Michigan. The neighborhood is culturally diverse and features green spaces, early 20th century architecture, live theater, bars, restaurants, and beaches. It is bounded by the city of Evanston along Juneway Terrace and Howard Street to the north, Ridge Boulevard to the west, Devon Avenue and the Edgewater neighborhood to the south, and Lake Michigan to the east. The neighborhood just to the west, West Ridge, was part of Rogers Park until the 1890s and is still sometimes referred to as West Rogers Park. In the early 1900s, what is now Loyola University Chicago became established at the south eastern end of the community area along the lake. In 2022, Rogers Park was ranked as a top 5 neighborhood to live in the United States."
1,RIVERDALE,0.748409,7262,"Riverdale is one of the 77 official community areas of Chicago, Illinois and is located on the city's far south side. As originally designated by the Social Science Research Committee at the University of Chicago and officially adopted by the City of Chicago, the Riverdale community area extends from 115th Street south to the city boundary at 138th Street and from the Illinois Central Railroad tracks east to the Bishop Ford Freeway."
2,WOODLAWN,0.610295,24425,"Woodlawn, on the South Side of Chicago, Illinois, is one of Chicago's 77 community areas. It is bounded by Lake Michigan to the east, 60th Street to the north, Martin Luther King Drive to the west, and 67th Street to the south. Both Hyde Park Career Academy and the all-boys Catholic Mount Carmel High School are in this neighborhood; much of its eastern portion is occupied by Jackson Park. The Woodlawn section of the park includes the site of the planned Obama Presidential Center, an estimated $500 million investment. The northern edge of Woodlawn contains a portion of the campus of the University of Chicago."
3,NEAR WEST SIDE,0.609237,67881,"The Near West Side, one of the 77 community areas of Chicago, is on the West Side, west of the Chicago River and adjacent to the Loop. The Great Chicago Fire of 1871 started on the Near West Side. Waves of immigration shaped the history of the Near West Side of Chicago, including the founding of Hull House, a prominent settlement house. In the 19th century railroads became prominent features. In the mid-20th century, the area saw the development of freeways centered in the Jane Byrne Interchange. The area is home to the University of Illinois at Chicago (UIC), Chicago-Kent College of Law, and City Colleges' Malcolm X College. The United Center arena, the Illinois Medical District, Union Station, Ogilvie Station, and the Jane Byrne Interchange are also located in the community area."
4,SOUTH SHORE,0.596766,53971,"South Shore is one of 77 defined community areas of Chicago, Illinois, United States. Located on the city's South Side, the area is named for its location along the city's southern lakefront. Although South Shore has seen a greater than 40% decrease in residents since Chicago's population peaked in the 1950s, the area remains one of the most densely populated neighborhoods on the South Side. The community benefits from its location along the waterfront, its accessibility to Lake Shore Drive, and its proximity to major institutions and attractions such as the University of Chicago, the Museum of Science and Industry, and Jackson Park."


By adding a second clause, we can limit the results to those areas with a population of at least 30,000 people:

In [89]:
query = """
SELECT name, _score, details['population'] AS population, details['description'] AS description 
FROM community_areas 
WHERE MATCH(details['description'], 'Univresity') USING best_fields WITH (fuzziness = 2)
AND details['population'] >= 30000
ORDER BY _score DESC;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,population,description
0,ROGERS PARK,1.8227,55628,"Rogers Park is the first of Chicago's 77 community areas. Located 9 miles (14 km) from the Loop, it is on the city's far north side on the shore of Lake Michigan. The neighborhood is culturally diverse and features green spaces, early 20th century architecture, live theater, bars, restaurants, and beaches. It is bounded by the city of Evanston along Juneway Terrace and Howard Street to the north, Ridge Boulevard to the west, Devon Avenue and the Edgewater neighborhood to the south, and Lake Michigan to the east. The neighborhood just to the west, West Ridge, was part of Rogers Park until the 1890s and is still sometimes referred to as West Rogers Park. In the early 1900s, what is now Loyola University Chicago became established at the south eastern end of the community area along the lake. In 2022, Rogers Park was ranked as a top 5 neighborhood to live in the United States."
1,NEAR WEST SIDE,1.609237,67881,"The Near West Side, one of the 77 community areas of Chicago, is on the West Side, west of the Chicago River and adjacent to the Loop. The Great Chicago Fire of 1871 started on the Near West Side. Waves of immigration shaped the history of the Near West Side of Chicago, including the founding of Hull House, a prominent settlement house. In the 19th century railroads became prominent features. In the mid-20th century, the area saw the development of freeways centered in the Jane Byrne Interchange. The area is home to the University of Illinois at Chicago (UIC), Chicago-Kent College of Law, and City Colleges' Malcolm X College. The United Center arena, the Illinois Medical District, Union Station, Ogilvie Station, and the Jane Byrne Interchange are also located in the community area."
2,SOUTH SHORE,1.596766,53971,"South Shore is one of 77 defined community areas of Chicago, Illinois, United States. Located on the city's South Side, the area is named for its location along the city's southern lakefront. Although South Shore has seen a greater than 40% decrease in residents since Chicago's population peaked in the 1950s, the area remains one of the most densely populated neighborhoods on the South Side. The community benefits from its location along the waterfront, its accessibility to Lake Shore Drive, and its proximity to major institutions and attractions such as the University of Chicago, the Museum of Science and Industry, and Jackson Park."


Here's an example of a negative search... we'll look for smaller communities with a population of 10,000 or fewer and whose descriptions don't mention railroads:

In [128]:
query = """
SELECT name, _score, details['population'] AS population, details['description'] AS description 
FROM community_areas 
WHERE NOT MATCH(details['description'], 'railroad')
AND details['population'] <= 10000
ORDER BY _score DESC;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,_score,population,description
0,OAKLAND,2.0,6799,"Oakland, located on the South Side of Chicago, Illinois, USA, is one of 77 officially designated Chicago community areas. Bordered by 35th and 43rd Streets, Cottage Grove Avenue and Lake Shore Drive, The Oakland area was constructed between 1872 and 1905. Some of Chicago's great old homes may be seen on Drexel Boulevard. The late 19th-century Monument Baptist Church on Oakwood Blvd. is modeled after Boston's Trinity Church. Oakwood/41st Street Beach in Burnham Park is at 4100 S. Lake Shore Drive. With an area of only 0.6 sq mi Oakland is the smallest community area by area in Chicago."
1,FULLER PARK,2.0,2567,"Fuller Park is the 37th of Chicago's 77 community areas. Located on the city's South Side, it is 5 miles (8.0 km) from the Loop. It is named for a small park also known as Fuller Park within the neighborhood, which is in turn named for Melville Weston Fuller, a Chicago attorney who was the Chief Justice of the United States between 1888 and 1910."


## Vector Similarity Search

CrateDB is also a vector database.  You can store, retrieve and search data represented as vector embeddings.  These embeddings capture the semantic meaning of the original data.  As CrateDB is a multi-model database, these embeddings can be seamlessly integrated into your existing datasets, making CrateDB a powerful foundation for AI and machine learning systems.

Vectors are numerical representations of data, used to quantify their features and characteristics.  Complex data such as text, images and audio are transformed into a format that allows them to be compared with each other mathematically. 

In CrateDB, vectors are natively supported with the `FLOAT VECTOR` data type, a one-dimensional array-like structure.  For our Chicago community areas data, we've already created vector representationss of the area's description text for you, using OpenAI's `text-embedding-3-large` model.  For more information about embeddings with this model, check out the [OpenAI documentation](https://platform.openai.com/docs/guides/embeddings).

Let's begin by running a simple `SELECT` query that returns the vector data for a couple of community areas:

In [94]:
query = """
SELECT name, details['description_vec'] as description_vec 
FROM community_areas WHERE areanumber in (75, 76)
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,name,description_vec
0,MORGAN PARK,"[0.01755214, -0.0051074913, -0.011452965, 0.0012298813, -0.013429849, 0.03489687, -0.010889067, -0.023865206, -0.019652173, 0.037178386, -0.001395972, 0.011096478, -0.026017092, 0.046356313, 0.01947069, -0.033730183, -0.04163772, 0.03979695, 0.019781806, -0.017487323, 0.03728209, -0.0369969, -0.0652307, 0.035622805, -0.0013821985, 0.026535619, -0.0021486464, -0.023528162, -0.013533555, -0.025485603, 0.002033598, -0.009203854, 0.061912127, 0.024668923, -0.002649349, -0.032822758, -0.033082023, -0.017616956, -0.022892967, 0.02763749, 0.053408284, 0.028518986, -0.0010629804, -0.0020854508, 0.023178156, -0.007434381, -0.016566938, -0.021829987, -0.00049422105, -0.0060019502, 0.011174257, 0.015296547, 0.022789262, -0.034482047, -0.0036070035, 0.008114948, 0.01836882, 0.09597935, -0.029711599, 0.0052922163, 0.00889922, -0.020287368, -0.05193048, -0.01665768, 0.009339968, 0.0575565, 0.017837329, 0.02323001, -0.011290926, -0.0049616555, 0.01106407, 0.014868762, -0.027404152, -0.0056713894, 0.036530226, -0.020118847, -0.013935413, 0.03360055, 0.0039570094, -0.047160033, 0.022633703, -0.0672011, -0.051048983, 0.024772627, 0.060771365, 0.015361363, 0.029867155, -0.06429735, -0.013313181, 0.015011357, 0.010325169, -0.040445108, -0.036556154, 0.0064232536, 0.013261328, -0.059267636, 0.02408558, 0.008834404, 0.0077973497, 0.025291154, ...]"
1,OHARE,"[-0.011676581, -0.00701739, -0.014594137, -0.011009166, -0.038417667, -0.017416349, -0.0024837365, -0.02482783, -0.046680897, -0.0021468508, -0.06569904, -0.013755101, -0.0004584504, -0.0031622748, 0.014161906, -0.05174054, -0.055274658, -0.018496925, -0.00076752703, -0.0124901915, 0.020899618, 0.0110663725, -0.029544229, 0.016246783, -0.008186955, 0.04019744, -0.016106945, 0.004547955, -0.030688368, -0.04998619, 0.009693406, 0.011988041, 0.03300207, -0.022564976, 0.02697627, -0.019755477, -0.03676502, -0.010602361, -0.0061433944, 0.017924855, 0.0175689, 0.03145113, -0.0077610807, 0.00407123, 0.03785831, 0.012966916, 0.045765586, -0.0033974592, -0.035392053, 0.015344184, -0.016310345, -0.022170885, -0.005777905, -0.020454675, 0.019132558, 0.04884205, 0.06529224, 0.05118118, -0.041316155, -0.019615639, 0.02806956, -0.0037756609, -0.0067249984, -0.0022199487, 0.027433926, 0.042002637, 0.0010082731, 0.010951959, -0.0023311845, 0.020213135, -0.023734542, 0.0088035185, -0.03773118, 0.03262069, 0.0122804325, -0.015166206, 0.011040947, 0.044468895, -0.0032925797, -0.048282694, 0.03686672, -0.018446073, -0.01604338, 0.011155361, 0.012763513, 0.021141158, -0.0071254475, -0.05118118, 0.006152929, -0.007907276, -0.029544229, -0.014911953, -0.02773903, 0.021192009, -0.025043946, -0.051867664, 0.004223782, 0.011479534, 0.014594137, 0.06107163, ...]"


Now we can explore vector similarity using a k-nearest neighbor (kNN) search.  CrateDB enables us to do this in SQL, using the `knn_match` function.  This finds the k data points that are most similar to a given query data point.

`knn_match` takes 3 parameters:

* **A search vector**: the name of a column in the database containing vectors to search.

* **A query vector**: the vector we want to find the nearest neighbors in the search vector column for.  This will be our query text, in vector form.

* **k**: the number of most similar vectors to the given query vector

Below, we've created a vector representation of the text "Which neighborhoods are associated with European immigrants?", again using the OpenAI text-embeddings-3-large model.  Using this as the query vector, we can find out which community area desctiptions contain information about this topic.

In [118]:
immigrants_query = "[0.029187094420194626, 0.0059433975256979465, -0.0307051669806242, -0.0024525464978069067, -0.04534167796373367, -0.05244510993361473, -0.015452832914888859, -0.056541044265031815, 0.01020402554422617, 0.003014663001522422, 0.0018224031664431095, -0.013977725058794022, -0.02854262851178646, -0.020035693421959877, 0.027282342314720154, -0.03150716796517372, -0.029530808329582214, -0.021066837012767792, 0.00476545887067914, -0.008012845180928707, -0.012760402634739876, 0.011800866574048996, -0.04989589378237724, -0.026179591193795204, 0.04978132247924805, 0.007783702574670315, -0.00924448948353529, 0.0061045135371387005, -0.04287838935852051, 0.02184019610285759, 0.011679134331643581, 0.005105593241751194, 0.03147852420806885, -0.043136175721883774, -0.028943629935383797, -0.02682405710220337, 0.02131030149757862, 0.035803597420454025, 0.010075132362544537, 0.004901512525975704, -0.02006433717906475, 0.0023952608462423086, 0.033598098903894424, 0.0062047638930380344, -0.004393101669847965, -0.008148899301886559, -0.037579458206892014, 0.01200136635452509, 0.002278899075463414, -0.003866788698360324, 0.009452150203287601, -0.003224114188924432, -0.03806638717651367, -0.03823824226856232, 0.0004569434386212379, 0.018975907936692238, -0.004829905461519957, 0.05588225647807121, 0.02729666419327259, 0.06049375981092453, 0.0027139128651469946, -0.0017078316304832697, -0.03299659490585327, -0.03560309857130051, 0.012989545240998268, 0.041073888540267944, 0.0023755687288939953, 0.02016458660364151, -0.036662884056568146, -0.03755081444978714, -0.004854967817664146, 0.006269210018217564, 0.012180384248495102, 0.016269154846668243, 0.01465799380093813, -0.013318939134478569, 0.0015368694439530373, -0.03376995399594307, -0.008148899301886559, 0.017644014209508896, -0.0021517963614314795, -0.006902934052050114, -0.03162173926830292, 0.05023961141705513, -0.018317122012376785, 0.026695163920521736, -0.01848897896707058, -0.04703160747885704, 0.02921573631465435, 0.011571723036468029, 0.03692067041993141, -0.01063366886228323, -0.045427605509757996, 0.0035946813877671957, 0.009108435362577438, -0.022541945800185204, -0.020508302375674248, -0.02035076543688774, -0.0004392653936520219, 0.03299659490585327, 0.01489429734647274, 0.007941238582134247, 0.009172881953418255, -0.034657884389162064, 0.017128441482782364, 0.003576779505237937, 0.008642989210784435, 0.03356945514678955, -0.03749352693557739, 0.045427605509757996, 0.06330076605081558, 0.012889295816421509, 0.015438511967658997, -0.006426746025681496, -0.0057070935145020485, -0.015553083270788193, 0.006226245779544115, -0.010934418998658657, 0.028943629935383797, 0.01336190290749073, 0.018646514043211937, -0.05485111474990845, 0.04680246487259865, -0.025076840072870255, -0.0050697894766926765, -0.006838487461209297, -0.03293931111693382, 0.020608551800251007, 0.0066093443892896175, 0.04697431996464729, 0.019434193149209023, -0.029129808768630028, -0.02658059261739254, 0.0011457151267677546, 0.03084838017821312, 0.030160952359437943, 0.03955581784248352, 0.03299659490585327, -0.000906726170796901, 0.008392363786697388, -0.0288863442838192, -0.03924074396491051, -0.018904300406575203, -0.030676523223519325, -0.0053096734918653965, -0.06490476429462433, 0.0018617871683090925, 0.028528308495879173, 0.0008324336959049106, -0.040214601904153824, -0.01387031376361847, 0.0016398048028349876, 0.06043647602200508, -0.004908673465251923, 0.026709483936429024, -0.0016809789231047034, -0.004124574363231659, 0.0012781884288415313, -0.012015688233077526, -0.01475108228623867, -0.011865312233567238, -0.007511595264077187, 0.0040064225904643536, 0.02231280319392681, 0.036290526390075684, 0.030676523223519325, -0.02108115889132023, 0.010970222763717175, 0.0016138472128659487, 0.035030242055654526, -0.04342260584235191, -0.009831667877733707, -0.013419188559055328, 0.018847014755010605, -0.006845647934824228, -0.00812025647610426, -0.041360318660736084, -0.0039276545867323875, 0.008900774642825127, -0.028327807784080505, -0.01589679718017578, -0.007998524233698845, 0.014192546717822552, 0.018460335209965706, -0.008020006120204926, -0.012273473665118217, 0.026551948860287666, -0.013490796089172363, 0.05677018687129021, -0.06519119441509247, 0.025749947875738144, 0.047776322811841965, -0.01926233619451523, -0.02322937548160553, -0.016369406133890152, -0.010067972354590893, 0.053132541477680206, 0.009137078188359737, 0.003788020694628358, 0.03989953175187111, -0.002103461418300867, -0.016455333679914474, -0.03775131329894066, -0.0459718219935894, 0.016211869195103645, 0.009323257021605968, -0.004149637185037136, 0.0642746239900589, -0.019419871270656586, -0.026466019451618195, 0.043794963508844376, -0.04330803453922272, 0.03425688296556473, 0.021396230906248093, -0.0510416105389595, 0.020608551800251007, 0.00135606131516397, 0.031736310571432114, 0.03113481029868126, 0.057400330901145935, 0.05614004284143448, 0.03995681554079056, -0.0011734629515558481, -0.013225849717855453, -0.01844601333141327, 0.03820960223674774, -0.02690998464822769, 0.04551353678107262, -0.008907935582101345, -0.058173686265945435, 0.012058652006089687, -0.006147477775812149, -0.037951815873384476, 0.030533310025930405, 0.062097761780023575, 0.003338685492053628, -0.016842013224959373, -0.05734304338693619, -0.011199365369975567, 0.02970266528427601, -0.010791204869747162, 0.015596047975122929, -0.03425688296556473, -0.03250966966152191, 0.020909301936626434, -0.024733126163482666, -0.016498297452926636, 0.027568770572543144, -0.004067288711667061, 0.02247033827006817, 0.047776322811841965, -0.03162173926830292, 0.05573904141783714, -0.02543487586081028, 0.008098773658275604, 0.06536304950714111, -0.10563493520021439, -0.009860310703516006, 0.06352990865707397, -0.01830280013382435, -0.015366904437541962, -0.02461855486035347, 0.013068313710391521, -0.0037414759863168, -0.003462207969278097, 0.015309618785977364, -0.0518149696290493, 0.02970266528427601, -0.02859991416335106, -0.018016371876001358, -0.02198340930044651, -0.01800204999744892, -0.032824739813804626, 0.029230058193206787, 0.002006791764870286, 0.020809052512049675, 0.005273870192468166, -0.015481475740671158, -0.04027188941836357, -0.03170766681432724, 0.025162769481539726, -0.01853194274008274, 0.006569960620254278, 0.01422118954360485, 0.020866338163614273, 0.015280975960195065, -0.03354081138968468, -0.0011117017129436135, 0.07281019538640976, -0.0474039651453495, -0.0064804512076079845, -0.016455333679914474, 0.011170722544193268, 0.035230740904808044, -0.04098796099424362, 0.005402762908488512, 0.05101296678185463, -0.015724940225481987, -0.06335804611444473, -0.009015345945954323, -0.001956666586920619, -0.014607868157327175, 0.00271212263032794, 0.024346446618437767, -0.008313595317304134, -0.015295297838747501, -0.03480109944939613, 0.006988862529397011, -0.03368402644991875, -0.030304165557026863, -0.024847697466611862, 0.004289271309971809, -0.004654468037188053, 0.03861059993505478, 0.0518149696290493, 0.012531260028481483, 0.01824551448225975, -0.018417371436953545, 0.037808600813150406, -0.03568902611732483, 0.008020006120204926, -0.004353717435151339, -0.031822238117456436, 0.00972425751388073, 0.01695658452808857, -0.03021823801100254, 0.003054047003388405, 0.006455388851463795, -0.007776541635394096, -0.019649015739560127, 0.006462549790740013, -0.012817688286304474, 0.03777995705604553, 0.018045013770461082, 0.10271336138248444, 0.009008185938000679, 0.05436418578028679, 0.03640509769320488, -0.024174589663743973, -0.025277340784668922, 0.02749716304242611, -0.03620459884405136, -0.03531666845083237, -0.0415608175098896, -0.011664812453091145, 0.024647196754813194, 0.0015458203852176666, 0.005263129249215126, 0.006544897798448801, -0.017658334225416183, 0.0007474001031368971, 0.008986703120172024, -0.026179591193795204, 0.04774767905473709, 0.02672380581498146, -0.010211186483502388, -0.067253477871418, -0.008979542180895805, 0.07974177598953247, 0.0013668023748323321, -0.03978496044874191, 0.028270522132515907, -0.03382724151015282, 0.022498982027173042, -0.01748647727072239, 0.010146739892661572, -0.03646238520741463, -0.002121363300830126, -0.019505800679326057, -0.013855992816388607, -0.015495797619223595, -0.015438511967658997, -0.01646965555846691, 0.004876450169831514, -0.016641512513160706, 0.005248807370662689, 0.008743238635361195, -0.0140564925968647, 0.053676754236221313, -0.03560309857130051, -0.03480109944939613, 0.025735625997185707, -0.041217103600502014, -0.01556740514934063, -0.014307118020951748, -0.005520915146917105, -0.005929076112806797, -0.0004097274213563651, -0.0010517307091504335, -0.010540579445660114, -0.005463629029691219, 0.016068655997514725, -0.00658786203712225, -0.00769777363166213, -0.01283201016485691, 0.007991363294422626, -0.0244753398001194, 0.022885659709572792, -0.010089454241096973, 0.03199409693479538, -0.03818095847964287, -0.035373955965042114, 0.013591046445071697, 0.04697431996464729, 0.008900774642825127, -0.05250239744782448, 0.04287838935852051, 0.000841384578961879, -0.01539554726332426, -0.016870655119419098, -0.04809139296412468, -0.012015688233077526, 0.012395205907523632, -0.03162173926830292, 0.02801273576915264, 0.0248620193451643, -0.016641512513160706, 0.04539896175265312, 0.029187094420194626, -0.000517362030223012, 0.021109802648425102, -0.020751764997839928, -0.013318939134478569, -0.01987815834581852, -0.010741079226136208, 0.025835877284407616, 0.0730966255068779, -0.01844601333141327, -0.007647648919373751, -0.008156060241162777, 0.0024024215526878834, -0.03480109944939613, -0.00304688629694283, 0.020393729209899902, -0.03313980996608734, -0.01518072560429573, -0.008234827779233456, 0.040071386843919754, -0.009301775135099888, -0.022183910012245178, -0.008241988718509674, 0.009817346930503845, 0.044138677418231964, -0.008822007104754448, -0.017801549285650253, 0.01662719063460827, -0.007604684215039015, 0.012803367339074612, -0.01266731321811676, -0.03150716796517372, 0.0014410949079319835, -0.01891862228512764, -0.04611503705382347, -0.04153217375278473, 0.028485342860221863, -0.05542397126555443, 0.004389521200209856, 0.004131735302507877, 0.05576768517494202, -0.0015860993880778551, -0.03875381499528885, -0.014536261558532715, 0.0364910289645195, -0.007790863048285246, -0.04113117605447769, -0.02365901879966259, -0.037522170692682266, -0.008234827779233456, -0.007934077642858028, -0.041016604751348495, 0.009165721014142036, 0.014128100126981735, -0.027182092890143394, -0.025792913511395454, -0.03841010108590126, -0.025348948314785957, -5.474034696817398e-05, -0.02294294536113739, 0.017457835376262665, 0.022140946239233017, -0.03823824226856232, 0.0149659039452672, 0.004493351560086012, -0.0212243739515543, 0.027840878814458847, 0.03614731505513191, 0.015051833353936672, 0.002139264950528741, -0.023716304451227188, 0.0686856210231781, 0.013297456316649914, 0.022255517542362213, 0.006845647934824228, -0.02553512714803219, -0.03408502787351608, 0.0012996706645935774, -0.008370880968868732, -0.02690998464822769, -0.010196864604949951, 0.028943629935383797, 0.016784727573394775, -0.0074328272603452206, 0.007619005627930164, -0.026838377118110657, 0.04746124893426895, 0.004686690866947174, 0.02002137340605259, 0.031249381601810455, 0.010268472135066986, -0.005037566181272268, 0.037235744297504425, -0.03233781084418297, 0.04104524478316307, -0.0012056862469762564, -0.006365879904478788, 0.004489771090447903, 0.006748978514224291, 0.0314212366938591, -0.0014876394998282194, -0.014837011694908142, 0.007654809392988682, -0.018317122012376785, -0.015223690308630466, -0.012251991778612137, 0.027382591739296913, 0.03156445175409317, 0.0029018817003816366, 0.03233781084418297, 0.04726075008511543, -0.017744263634085655, 0.003383440198376775, -0.01815958507359028, 0.01111343689262867, -0.00833507813513279, 0.017085477709770203, 0.02188315987586975, 0.013118438422679901, 0.013462153263390064, -0.02950216457247734, 0.0015941552119329572, -0.030676523223519325, -0.04227688908576965, -0.0189615860581398, 0.03319709748029709, -0.006143897771835327, 0.006931576877832413, 0.008428167551755905, -0.024847697466611862, 0.020092979073524475, -0.022584909573197365, -0.0007653019274584949, -0.005621165037155151, -0.04462560638785362, -0.014178224839270115, 0.012674474157392979, -0.01581086963415146, 0.004987441468983889, 0.00931609608232975, 0.014593547210097313, -0.005843147169798613, 0.0379231721162796, -0.0021052516531199217, -0.010819847695529461, -0.03525938466191292, -0.012094455771148205, -0.015624690800905228, -0.022527623921632767, 0.00016525598766747862, 0.0031077524181455374, -0.005474370438605547, 0.036032743752002716, -0.023501481860876083, -0.02327233925461769, -0.02562105469405651, 0.028370771557092667, 0.021396230906248093, -0.01125665195286274, -0.054478757083415985, -5.51878911210224e-05, 4.660062040784396e-05, 0.032738812267780304, 0.048692893236875534, -0.01248113438487053, 0.05487975478172302, -0.004457548260688782, 0.00564980786293745, -0.004332235548645258, 0.0036090028006583452, -0.003535605501383543, 0.0009470051736570895, 0.026709483936429024, 0.0366056002676487, 0.009251650422811508, -0.011886795051395893, 0.024632876738905907, 0.03861059993505478, 0.001097380300052464, 0.043508533388376236, -0.028327807784080505, 0.010963061824440956, 0.011893955990672112, 0.012173223309218884, -0.0039025922305881977, 0.01973494328558445, 0.0288290586322546, -0.012223348952829838, 0.0048513878136873245, -0.014328599907457829, -0.014350082725286484, -0.024160267785191536, 0.015681976452469826, 0.008728917688131332, 0.006129576358944178, 0.01304683182388544, 0.014736761339008808, 0.03147852420806885, 0.0025742787402123213, -0.001742740161716938, -0.000804685871116817, 0.013454992324113846, 0.010375882498919964, -0.012495456263422966, 0.04181860387325287, 0.034027740359306335, 0.017615370452404022, 0.018975907936692238, 0.0284996647387743, 0.0453130342066288, -0.01671312004327774, -0.022398730739951134, -0.009366221725940704, -0.002150006126612425, -0.00744714867323637, -0.02020755037665367, -0.03525938466191292, 0.04027188941836357, -0.007611845154315233, 0.022355766966938972, -0.025277340784668922, -0.02576426975429058, -0.026308484375476837, 0.01838872767984867, -0.025177091360092163, -0.05485111474990845, 0.006444647908210754, -0.0011251281248405576, -0.061524905264377594, -0.005395602434873581, 0.01451477874070406, -0.012767563574016094, -0.030103666707873344, 0.011020347476005554, -0.01510911900550127, -0.01379154622554779, -0.023630375042557716, -0.0035033822059631348, -0.007540238089859486, -0.06599318981170654, 0.014342921786010265, -0.020422372967004776, -0.013999206945300102, 0.036376457661390305, 0.028098665177822113, -0.030304165557026863, -0.01858922839164734, 0.008070130832493305, 0.010590704157948494, -0.020536944270133972, -0.01424983236938715, 0.0306478813290596, -9.644594683777541e-05, -0.024790411815047264, -0.015639012679457664, -0.039154816418886185, 0.04038646072149277, 0.008879292756319046, -0.013347581960260868, -0.014980225823819637, 0.043279390782117844, -0.03313980996608734, 0.0018456755205988884, 0.015295297838747501, 0.00837804190814495, 0.011091955006122589, 0.00993907917290926, 0.0033977616112679243, -0.0026297743897885084, 0.0012683424865826964, 0.026551948860287666, 0.008829167112708092, -0.018460335209965706, -0.004439646378159523, 0.012688795104622841, 0.015452832914888859, -0.018804050981998444, -0.03548852726817131, 0.023630375042557716, -0.005932656116783619, -0.02131030149757862, -0.003899011993780732, -0.0071177552454173565, 0.03253830969333649, 0.025406233966350555, 0.031306665390729904, -0.0010982754174619913, 0.01691362075507641, 0.010075132362544537, -0.011908276937901974, -0.03614731505513191, 0.014385886490345001, 0.006186862010508776, 0.0489506796002388, 0.016441011801362038, 0.005162878893315792, -0.007389863021671772, -0.017300298437476158, -0.010526258498430252, 0.008292113430798054, 7.899168849689886e-05, 0.01513776183128357, 0.013168564066290855, 0.0541350431740284, -0.027797915041446686, -0.013863153755664825, 0.0017436352791264653, -0.02509116195142269, -0.008005685172975063, -0.030877023935317993, -0.001954876584932208, 0.009717096574604511, -0.027984092012047768, -0.03760810196399689, 0.026036377996206284, 0.005746477749198675, -0.0014124519657343626, -0.024031376466155052, -0.023601733148097992, -0.007525916676968336, 0.02711048536002636, 0.002524153795093298, 0.030017737299203873, -0.015653332695364952, 0.04230553284287453, -0.015223690308630466, -0.006512674503028393, 0.025778591632843018, -0.03654831275343895, 0.027167771011590958, -0.020852016285061836, -0.009810185991227627, -0.005026825238019228, 0.037092529237270355, -0.007647648919373751, -0.050268251448869705, 0.010934418998658657, 0.014908618293702602, 0.03339759632945061, -0.00945931114256382, -0.002448966260999441, -0.044997964054346085, 0.031392596662044525, 0.0007232326897792518, 0.0001092009770218283, 0.018646514043211937, 0.001387389493174851, -0.00977438222616911, 0.028098665177822113, -0.010010686703026295, -0.024017054587602615, -0.007049728650599718, -0.02165401726961136, -0.03924074396491051, 0.02159673161804676, -0.02428916096687317, 0.018360085785388947, -0.025134125724434853, 0.015238011255860329, -0.026981592178344727, 0.01177938375622034, -0.0009125442011281848, 0.00019210868049412966, -0.0012629718985408545, 0.012660152278840542, -0.03079109452664852, -0.03491567075252533, -0.018274156376719475, 0.007962720468640327, -0.007239487487822771, -0.031163452193140984, -0.02394544705748558, -0.025348948314785957, -0.0046723694540560246, -0.026508985087275505, -0.00886497087776661, -0.0207804087549448, -0.022785410284996033, 0.035517171025276184, 0.0033637480810284615, -0.008514096029102802, -0.002221613423898816, 0.0481773242354393, -0.013490796089172363, -0.007099853828549385, -0.03614731505513191, 0.00633365660905838, -0.015982726588845253, -0.00991759728640318, 0.03394181281328201, 0.020522622391581535, -0.014450332149863243, -0.006100933067500591, -0.001264762133359909, 0.043078891932964325, 0.0015288136200979352, -0.027038877829909325, -0.020379409193992615, -0.008220505900681019, 0.018087977543473244, 0.02950216457247734, -0.006394522730261087, -0.0358322411775589, 0.001455416320823133, -0.015982726588845253, 0.011220848187804222, -0.005599682684987783, 0.029903165996074677, -0.0055961026810109615, -0.003462207969278097, -0.004493351560086012, -0.024790411815047264, 0.014013528823852539, 0.0018036062829196453, -0.015266655012965202, 0.034371454268693924, 0.007597523741424084, 0.01451477874070406, -0.0007693298393860459, 0.011313937604427338, 0.0002855336933862418, 0.010275633074343204, 0.034027740359306335, 0.007783702574670315, 0.01734326407313347, -0.02030780166387558, 0.007375541143119335, 0.00453989626839757, -0.0270675215870142, -0.007031826768070459, -0.0054886918514966965, -0.0357176698744297, 0.02721073478460312, -0.027382591739296913, -0.01264583133161068, -0.004210503306239843, 0.037092529237270355, 0.006827746517956257, 0.03692067041993141, -0.009631168097257614, -0.026895662769675255, 0.010311436839401722, 0.03328302502632141, 0.010927258059382439, 0.0357176698744297, 0.0009416346438229084, -0.013061152771115303, 0.006852808874100447, 0.030332809314131737, 0.03697795793414116, 0.012423848733305931, -0.0153525834903121, 0.021582409739494324, -0.0068993535824120045, 0.00746147008612752, 0.018789729103446007, -0.02523437701165676, 0.021238693967461586, -0.023000231012701988, -0.04032917320728302, 0.007311095017939806, -0.003809502813965082, 0.017830193042755127, -0.008736077696084976, -0.023315303027629852, -0.0022144524846225977, 0.0003298406663816422, -0.01398488599807024, 0.025019554421305656, 0.040501032024621964, 0.027239378541707993, -0.037264384329319, 0.01513776183128357, -0.008514096029102802, 0.010325757786631584, 0.0366056002676487, -0.039928171783685684, 0.014385886490345001, -0.01329029630869627, 0.01691362075507641, 0.0035911009181290865, -0.03007502295076847, -0.008614345453679562, 0.014837011694908142, -0.013834510929882526, 0.031679023057222366, 0.015782225877046585, 0.03732167184352875, 0.01149295549839735, 0.014908618293702602, 0.0012316438369452953, 0.03614731505513191, -0.01462935097515583, 0.0010624717688187957, -0.004178280010819435, -0.0415608175098896, 0.03147852420806885, 0.0008512305794283748, -0.005216584540903568, -0.008020006120204926, 0.011091955006122589, -0.034743811935186386, -0.004457548260688782, 0.0307051669806242, -0.0031399757135659456, 0.03122073784470558, -0.007203684188425541, -0.012459652498364449, 0.014765404164791107, 0.003673449158668518, 0.05364811420440674, 0.014908618293702602, -0.0024668679106980562, 0.006258469074964523, -0.02643737755715847, -0.03365538269281387, -0.007081951946020126, -0.002200131071731448, 0.04270653426647186, -0.016412369906902313, -0.01848897896707058, 0.0050626290030777454, -0.024418054148554802, 0.02231280319392681, -0.024632876738905907, 0.016842013224959373, -0.005166459362953901, 0.024890661239624023, -0.010540579445660114, -0.011915437877178192, 0.043365318328142166, 0.015252333134412766, -0.03614731505513191, -0.014600707218050957, -0.046200964599847794, 0.03027552366256714, 0.006287111900746822, 0.034027740359306335, -0.050411466509103775, 0.003913333173841238, -0.0029985513538122177, -0.0069566392339766026, 0.06192590296268463, -0.011399866081774235, -0.011722098104655743, 0.009065471589565277, 0.0031184933613985777, -0.0027336047496646643, -0.014865654520690441, -0.008843488991260529, 0.016412369906902313, 0.0036465965677052736, -0.031879525631666183, 0.03654831275343895, 0.01339054573327303, 0.03368402644991875, -0.013705617748200893, 0.0004392653936520219, 0.017157085239887238, -0.0033476364333182573, 0.016168905422091484, 0.02202637493610382, -0.03165038302540779, -0.006419585086405277, 0.04823460802435875, 0.059749044477939606, 0.0393553152680397, 0.02609366364777088, 0.029043879359960556, -0.023544447496533394, -0.027898164466023445, 0.012438170611858368, 0.0027425556909292936, -0.00881484616547823, -0.03233781084418297, 0.020078659057617188, 0.0540204718708992, 0.006491192616522312, -0.03643374145030975, 0.01997840777039528, -0.031679023057222366, 0.006892192643135786, -2.2615007765125483e-05, 0.0386965312063694, -0.006663049571216106, 0.010855651460587978, -0.02342987433075905, 0.027711985632777214, -0.0018850595224648714, -0.0034085025545209646, -0.06530576199293137, -0.0386965312063694, -0.003224114188924432, -0.026308484375476837, 0.006208343897014856, -0.028041379526257515, -0.018761085346341133, 0.00455063721165061, 0.0032080025412142277, 0.013877474702894688, 0.008213345892727375, -0.00340134184807539, 0.023931125178933144, 0.014185385778546333, -0.01585383340716362, 0.011514437384903431, 0.011134919710457325, 0.019362585619091988, -0.0153525834903121, 0.001956666586920619, -0.008227666839957237, -0.006534156855195761, 0.029874522238969803, 0.02318640984594822, -0.00969561468809843, 0.0001409766700817272, 0.010776882991194725, 0.02648034133017063, -0.02921573631465435, 0.009115596301853657, -0.03932667151093483, 0.008299274370074272, -0.0061832815408706665, -0.0033906009048223495, -0.002529524266719818, -0.027797915041446686, 0.0021517963614314795, -0.01661287061870098, 0.007554559502750635, 0.01742919161915779, -0.010970222763717175, -0.011292454786598682, 0.05923347547650337, -0.0012710277223959565, 0.014665153808891773, 0.005503013264387846, 0.00817038118839264, 0.023601733148097992, 0.03308252617716789, -0.0045363157987594604, 0.0018492558738216758, 0.00931609608232975, 0.015295297838747501, -0.024203233420848846, 0.03545988351106644, 0.004479030147194862, 0.01795908436179161, 0.0005213898839429021, 0.007511595264077187, -0.010146739892661572, 0.018073657527565956, 0.0004213635984342545, 0.01781587116420269, 0.012946581467986107, -0.010282794013619423, 0.028428057208657265, -0.0007362114847637713, -0.014342921786010265, 0.010755401104688644, 0.009043988771736622, -0.026609234511852264, -0.009781543165445328, 0.018546264618635178, 0.016584226861596107, 0.017887478694319725, 0.05307525396347046, 0.011514437384903431, -0.01261718850582838, 0.009566721506416798, -0.016970906406641006, 0.004350137431174517, 0.005406343378126621, -0.016698798164725304, 0.038581959903240204, 0.004697432275861502, -0.027955450117588043, 0.005023244768381119, 0.015925440937280655, -0.017987728118896484, 0.08547034859657288, 0.02600773423910141, 0.02643737755715847, 0.01424983236938715, -0.007762220222502947, 0.0004448597028385848, -0.02839941531419754, 0.009180042892694473, -0.005173619836568832, -0.012094455771148205, -0.019992729648947716, -0.023744946345686913, -0.02543487586081028, -0.012574223801493645, -0.017787227407097816, 0.007669130805879831, 0.008327917195856571, -0.004607923328876495, 0.036233242601156235, 0.0011358691845089197, -0.01175790186971426, 0.00921584665775299, 0.018574906513094902, 0.0033512169029563665, 0.0034998017363250256, 0.0004211398190818727, 0.003020033473148942, -0.00471533415839076, -0.019362585619091988, 0.012201866135001183, 0.003673449158668518, -0.006723915692418814, 0.04273517429828644, -0.01704251393675804, 0.022456016391515732, -0.0008324336959049106, 0.0266235563904047, -0.03236645460128784, -0.0020676578860729933, 0.011442829854786396, -0.02572130598127842, -0.0028517567552626133, -0.014722439460456371, -0.010791204869747162, -0.025993412360548973, 0.0037271545734256506, -0.02735394984483719, 0.04359446093440056, -0.009652649983763695, -0.02237008884549141, 0.03354081138968468, -0.01180802658200264, -0.04362310469150543, 0.0008771881693974137, -0.004905092995613813, 0.015280975960195065, -0.004092351533472538, 0.004987441468983889, 0.0070067644119262695, 0.04038646072149277, -0.01492294017225504, -0.007898273877799511, -0.019806550815701485, -0.007171460893005133, -0.023386910557746887, -0.03646238520741463, -0.008206184953451157, -0.02188315987586975, -0.0062334067188203335, 0.00474039651453495, 0.019276658073067665, 0.003780859988182783, 0.027568770572543144, -0.00523448595777154, -0.0020945104770362377, -0.015724940225481987, -0.0027103323955088854, 0.003820243990048766, -0.013440671376883984, 0.011836669407784939, 0.015624690800905228, 0.0025205733254551888, 0.04196181893348694, -0.02088065817952156, 0.0008686848450452089, -0.008478292264044285, 0.015596047975122929, 0.047346677631139755, -0.025377590209245682, -0.034887026995420456, 0.010017846710979939, 0.007490112911909819, -0.006011424120515585, -0.0149659039452672, -0.007454309146851301, 0.004905092995613813, -0.024031376466155052, -0.006942317821085453, -0.011929758824408054, 0.009581043384969234, -0.012968063354492188, 0.0007223376305773854, 0.0284423790872097, -0.00953091774135828, -0.004156797658652067, -0.009394864551723003, -0.00771209504455328, 0.007840988226234913, 0.022112302482128143, -0.012731759808957577, 0.029874522238969803, -0.03597545623779297, 0.0033995516132563353, 0.02676677145063877, 0.0009120967006310821, 0.020952265709638596, 0.020508302375674248, -0.03749352693557739, 0.03769402951002121, -0.01111343689262867, -0.007719255983829498, 0.03932667151093483, -0.0021446356549859047, 0.014042171649634838, -0.016226191073656082, -0.022012053057551384, 0.010705276392400265, -0.004353717435151339, -0.004572119563817978, -0.04210503399372101, 0.02633712813258171, -0.013855992816388607, -0.004010003060102463, -0.017686977982521057, -0.00029157556127756834, 0.004776200279593468, 0.002035434590652585, -0.016111619770526886, 0.03176495432853699, -0.013204366900026798, 0.05356218293309212, 0.022398730739951134, 0.021496480330824852, 0.028385093435645103, 0.011793705634772778, 0.0024937207344919443, 0.009187203831970692, 0.002796261105686426, 0.028084343299269676, -0.012065812945365906, 0.006677370984107256, -0.03147852420806885, 0.021238693967461586, -0.00015473867824766785, -0.019906800240278244, 0.02964537963271141, -0.04548489302396774, 0.0474039651453495, 0.031593095511198044, 0.01624051295220852, 0.0008825586992315948, 0.0005665919743478298, 0.0015037511475384235, 0.04688839241862297, -0.01877540722489357, 0.02322937548160553, -0.0007438197499141097, 0.09440693259239197, 0.013268813490867615, 0.007862470112740993, -0.007289612665772438, -0.048836108297109604, -0.02845670096576214, 0.023157767951488495, 0.03892567381262779, -0.006484031677246094, -0.01656990498304367, 0.025205733254551888, -0.01061218697577715, -0.0031077524181455374, 0.020007051527500153, 0.012652992270886898, -0.021854516118764877, 0.02347283996641636, -0.010526258498430252, 0.037808600813150406, -0.02279973216354847, -0.02188315987586975, 0.020393729209899902, -0.03236645460128784, 0.02194044552743435, 0.00746147008612752, -0.0055674598552286625, 0.028872022405266762, 0.00953091774135828, -0.024489661678671837, -0.003848886815831065, 0.0035427662078291178, 0.02418891154229641, 0.001933394349180162, -0.018904300406575203, -0.013999206945300102, 0.029444878920912743, 0.008528416976332664, -0.008893613703548908, -0.023773590102791786, -0.004342976491898298, 0.023544447496533394, 0.03093430958688259, 0.006627246271818876, 0.02375926822423935, 0.003114913124591112, 0.009373381733894348, 0.03291066735982895, 0.026322806254029274, -0.033311668783426285, -0.02255626767873764, 0.03199409693479538, 0.0085713816806674, 0.01983519457280636, 0.03036145120859146, 0.025305984541773796, -0.0005352637963369489, 0.03586088493466377, -0.013168564066290855, -0.007919755764305592, -0.009287453256547451, 0.03772267326712608, -0.013304617255926132, 0.006852808874100447, 0.015624690800905228, -0.008449649438261986, 0.0054886918514966965, 0.018517620861530304, -0.005692772101610899, -0.0030057120602577925, -0.04325074702501297, -0.042649246752262115, -0.002269948134198785, 0.011621847748756409, -0.022012053057551384, -0.0072466484270989895, -0.0024077920243144035, 0.011979884468019009, 0.0055674598552286625, -2.998551462951582e-05, 0.013762903399765491, -0.014264154247939587, -0.01101318746805191, 0.011464312672615051, 0.029330307617783546, -0.008449649438261986, 0.025706984102725983, -0.02572130598127842, 0.0036197437439113855, -0.027769271284341812, 0.03789452835917473, -0.02533462643623352, -0.00993907917290926, -0.013569563627243042, 0.009588203392922878, -0.00885780993849039, -0.0016165324486792088, 0.011120597831904888, 0.003349426668137312, 0.027797915041446686, -0.0022520464845001698, -0.012960902415215969, 0.007611845154315233, 0.008342238143086433, 0.005008923355489969, 0.008349399082362652, -0.037665385752916336, -0.011027508415281773, 0.0029484264086931944, -0.019505800679326057, 0.003762958338484168, -0.008227666839957237, 0.005871789995580912, 0.00014310251572169363, 9.829206828726456e-05, -0.043995462357997894, -0.029903165996074677, -0.001034724060446024, 0.03520209714770317, 0.0016478606266900897, 0.003213373012840748, -0.0061546387150883675, -0.007293193135410547, 0.00587895093485713, -0.003809502813965082, 0.02312912419438362, 0.03835281357169151, 0.033025238662958145, 0.013104117475450039, -0.024217553436756134, -0.0031471364200115204, -0.019964085891842842, -0.011571723036468029, 0.01518072560429573, 0.023114804178476334, 0.013583885505795479, 0.004421744495630264, -0.020279157906770706, -0.016054334118962288, -0.011149240657687187, -0.006125995889306068, 0.006276370957493782, 0.004421744495630264, -0.017371905967593193, -0.0015028560301288962, 0.007468630559742451, 0.009652649983763695, 0.008370880968868732, -0.025993412360548973, -0.01556740514934063, -0.008915096521377563, 0.007719255983829498, 0.006942317821085453, 0.00797704141587019, -0.032309167087078094, 0.0022592071909457445, -0.018617872148752213, 0.013877474702894688, -0.008098773658275604, -0.0011296035954728723, 0.010103775188326836, -0.0034264044370502234, 0.01762969233095646, 0.023000231012701988, 0.014185385778546333, -0.00385604752227664, -0.0003383440198376775, -0.009967721998691559, 0.03185088187456131, -0.0038059225771576166, -0.030046381056308746, 0.028986593708395958, -0.012789045460522175, -0.0167560838162899, 0.015939762815833092, 0.002273528603836894, -0.023386910557746887, -0.017443513497710228, 0.00039339205250144005, 0.02173994481563568, -0.01982087269425392, 0.00787679199129343, 0.0014956953236833215, 0.03717845678329468, 0.02380223199725151, -0.004922994878143072, -0.008478292264044285, -0.02112412266433239, -0.0032563372515141964, 0.0015306038549169898, 0.02801273576915264, 0.005180780775845051, 0.0035570876207202673, 0.012817688286304474, 0.003293931018561125, 0.00814173836261034, -0.0193482656031847, 0.01266731321811676, -0.009416346438229084, 0.01154308021068573, 0.01195124164223671, 0.0008055809885263443, 0.008628667332231998, 0.0016290637431666255, -0.0037844404578208923, 0.024117304012179375, -0.010647990740835667, -0.007633327506482601, 0.013920439407229424, 0.018145263195037842, 0.0013372644316405058, -0.00794839859008789, 0.018173906952142715, -0.007200103718787432, 0.019434193149209023, -0.027282342314720154, 0.008220505900681019, -0.015008868649601936, -0.013827349990606308, -0.002991390647366643, 0.02059422992169857, 0.03013230860233307, 0.008850649930536747, 0.014128100126981735, 0.021682659164071083, -0.017601048573851585, -0.005800182931125164, -0.004618664272129536, -0.001057996298186481, 0.019434193149209023, -0.008800524286925793, 0.011442829854786396, -0.035746313631534576, -0.0019602470565587282, -0.022570589557290077, 0.0011224427726119757, 0.017400549724698067, 0.006555638741701841, -0.034657884389162064, 0.0010472552385181189, -0.010103775188326836, 0.004564959090203047, -0.01309695653617382, 0.038152314722537994, 0.009538078680634499, -0.03391316905617714, 0.01195124164223671, 0.013634010218083858, 0.0005759904161095619, 0.035517171025276184, -0.020279157906770706, 0.023501481860876083, -0.002962747821584344, -0.028428057208657265, 0.026752449572086334, -0.00087539799278602, -0.008077291771769524, -0.028943629935383797, 0.01795908436179161, -0.015553083270788193, -0.007397023495286703, 0.004149637185037136, -0.016942262649536133, 0.009466471150517464, -0.03262424096465111, 0.0073254164308309555, -0.010282794013619423, -0.033454883843660355, 0.017758585512638092, -0.0015458203852176666, -0.009538078680634499, 0.001262076897546649, 0.04551353678107262, 0.03643374145030975, -0.029903165996074677, -0.006365879904478788, 0.01599704846739769, 0.011091955006122589, -0.0040279049426317215, 0.019992729648947716, 0.01815958507359028, 0.024117304012179375, 0.015366904437541962, -0.029144128784537315, 0.011507276445627213, 0.00767629174515605, -0.002275318605825305, -0.01767265610396862, -0.010433169081807137, -0.00010590033343760297, -0.008048648945987225, 0.004156797658652067, 0.006416005082428455, 0.02179723046720028, 0.02126733772456646, -0.007286032196134329, -0.016870655119419098, 0.010690954513847828, -0.005635486450046301, 0.02950216457247734, 0.004289271309971809, 0.007726416457444429, -0.0226565171033144, 0.004829905461519957, 0.010039329528808594, 0.009187203831970692, -0.035173457115888596, 0.00046857958659529686, 0.02327233925461769, -0.004099512007087469, 0.011399866081774235, 0.016555584967136383, 0.021969087421894073, -0.00905114971101284, -0.009709935635328293, 0.006917255464941263, -0.005316834431141615, 0.003582149976864457, 0.006398103199899197, 0.0011322888312861323, -0.006412424612790346, 0.009151400066912174, 0.006663049571216106, -0.031392596662044525, -0.009717096574604511, -0.013412028551101685, 0.00945931114256382, -0.0040064225904643536, -0.0076619703322649, -0.0063408175483345985, 0.0007563510444015265, 0.004063708707690239, 0.0010356190614402294, -1.7719983588904142e-05, -0.002586809918284416, -0.0023218633141368628, 0.007919755764305592, 0.024360768496990204, -0.0026745288632810116, 0.0024256939068436623, 0.02414594776928425, 0.017873156815767288, -0.011936919763684273, -0.0033852302003651857, -0.025506483390927315, -0.004393101669847965, 0.007099853828549385, -0.011442829854786396, -0.0008615240803919733, -0.007475791499018669, -0.004163958597928286, -0.020365087315440178, -0.011686294339597225, 0.008399524725973606, 0.001149295479990542, 0.024074340239167213, 0.004063708707690239, -0.015023190528154373, -0.005066209472715855, 0.0018438852857798338, 0.00023182830773293972, -0.0007666445453651249, -0.002468658145517111, 0.012516938149929047, 0.017271656543016434, 0.009795865043997765, 0.013777225278317928, -0.0037378957495093346, 0.011156401596963406, 0.02902955748140812, -0.008564220741391182, 0.0037414759863168, -0.010920098051428795, 0.0048800306394696236, -0.0026870600413531065, -0.00017185727483592927, -0.01656990498304367, -0.00238272943533957, 0.007840988226234913, -0.01800204999744892, 0.011979884468019009, 0.0005357113550417125, 0.02198340930044651, -0.005646227393299341, -0.01248113438487053, 0.006197602953761816, -0.002508042147383094, 0.009831667877733707, -0.0003052256943192333, -0.012968063354492188, 0.0013104117242619395, -0.008227666839957237, -0.00996056105941534, 0.015051833353936672, -0.03291066735982895, 0.007540238089859486, -0.02892930805683136, -0.018646514043211937, 0.0013202576665207744, 0.013540920801460743, 0.0008767406106926501, 0.013855992816388607, 0.0030092925298959017, -0.010519097559154034, 0.02347283996641636, 0.0029824397061020136, 0.01566765457391739, 0.020150264725089073, 0.02917277254164219, -0.0003074634005315602, 0.02126733772456646, 0.018403049558401108, 0.019376907497644424, -0.010741079226136208, -0.024847697466611862, -0.00042785299592651427, 0.0015610369155183434, 0.00261724297888577, -0.006774040870368481, 0.034657884389162064, -0.0018143473425880075, 0.006337237078696489, -0.008270631544291973, -0.0009487953502684832, -0.016398048028349876, 0.00431075319647789, -0.00020956294611096382, -0.02562105469405651, 0.003336895490065217, -0.0004967749118804932, -0.008628667332231998, -9.795640653464943e-05, -0.008886453695595264, 0.04181860387325287, -0.015939762815833092, 0.0004918519407510757, -0.008478292264044285, 0.008678792044520378, 0.008743238635361195, 0.023300983011722565, -0.015724940225481987, 0.0030486765317618847, -0.02566402032971382, -0.022584909573197365, 0.019434193149209023, 0.024217553436756134, 0.009122757241129875, -7.742528396192938e-05, -0.002269948134198785, 0.004371619317680597, 0.006194022484123707, -0.016498297452926636, 0.00034953263821080327, -0.013082634657621384, 0.0005880740936845541, 0.0013050411362200975, 0.0036340653896331787, 0.005130655597895384, 0.008227666839957237, -0.010755401104688644, -0.03918346017599106, 0.025950448587536812, 0.010597865097224712, 0.024303482845425606, 0.0018778988160192966, 0.015080476179718971, 0.013161403127014637, 0.007762220222502947, -0.009946240112185478, 0.0036644984502345324, 0.0011134919477626681, -0.010146739892661572, 0.008700274862349033, 0.008055809885263443, 0.00885780993849039, -0.014722439460456371, -0.021281659603118896, -0.010261311195790768, -0.01748647727072239, 0.025706984102725983, -0.012273473665118217, -0.022828374058008194, 0.02327233925461769, -0.008449649438261986, -0.0043250746093690395, -0.01834576390683651, -0.010010686703026295, -0.0211670882999897, -0.010289954021573067, 0.0270675215870142, 0.017013870179653168, -0.008972382172942162, -0.023601733148097992, -0.010726758278906345, -0.02216958813369274, -0.0028392253443598747, 0.02202637493610382, 0.01781587116420269, -0.016025690361857414, 0.008249148726463318, 0.013719938695430756, -0.00830643530935049, 0.011206526309251785, 0.024203233420848846, 0.03763674199581146, -0.004679530393332243, -0.005313253961503506, -0.0018564165802672505, -0.010397365316748619, -0.00408161012455821, 0.011528759263455868, 0.03339759632945061, -0.005821665283292532, 0.007250228896737099, -0.016441011801362038, -0.014436011202633381, -0.017071155831217766, -0.010834168642759323, -0.01997840777039528, -0.012438170611858368, -0.023601733148097992, 0.010712436400353909, 0.008299274370074272, -0.010289954021573067, -0.0021070418879389763, -0.00523448595777154, 0.01695658452808857, 0.003508752677589655, -0.0023505063727498055, -0.02648034133017063, -0.0031184933613985777, -0.007346898317337036, 0.03746488690376282, 0.009094114415347576, 0.003530234796926379, -0.008478292264044285, 0.02194044552743435, -0.013383385725319386, 0.001800025929696858, -0.011879634112119675, -0.008950899355113506, -0.02255626767873764, 0.013154242187738419, -4.564958726405166e-05, -0.013211527839303017, -0.007561719976365566, -0.008020006120204926, 0.026738127693533897, 0.012309277430176735, -0.020107300952076912, -0.006945898290723562, 0.0035463464446365833, 0.009487953968346119, 0.0016523360973224044, 0.013583885505795479, -0.0040565477684140205, 0.00988895446062088, 0.003838145872578025, 0.013877474702894688, -0.009015345945954323, 0.005685611627995968, -0.004751137457787991, 0.0122376699000597, -0.010762562043964863, -0.000837356667034328, -0.00746147008612752, 0.015066154301166534, -0.011979884468019009, -3.255889896536246e-05, -0.018574906513094902, 0.030017737299203873, 0.030819738283753395, -0.011056151241064072, -0.002150006126612425, 0.019090479239821434, -0.006176120601594448, -0.03826688602566719, 0.02748284302651882, 0.01873244345188141, 0.017586728557944298, -0.009423507377505302, -0.01013241894543171, 0.025835877284407616, 0.03554581478238106, -0.010189704596996307, -0.007339737843722105, -0.0037378957495093346, -0.017157085239887238, -0.0015261283842846751, 0.008005685172975063, -0.004382360726594925, 0.0017311039846390486, -0.00019848620286211371, 0.0015001707943156362, 0.011321097612380981, 0.0023236535489559174, 0.005875370465219021, -0.04124574735760689, -0.00588969187811017, 0.004897932521998882, -2.0559098629746586e-05, -0.019290979951620102, 0.00019479395996313542, 0.004747556988149881, 0.010010686703026295, 0.026709483936429024, -0.029731309041380882, 0.010411686263978481, 0.01791612058877945, 0.0011179674183949828, -0.03013230860233307, -0.0011296035954728723, 0.010125258006155491, -0.0009434248204343021, 0.002644095802679658, 0.0025384752079844475, -0.01983519457280636, -0.014328599907457829, -0.0016550213331356645, -0.006544897798448801, -0.010254151187837124, -0.006831326521933079, -0.011407027021050453, -0.006637987215071917, -0.003899011993780732, 0.024990912526845932, 0.005503013264387846, -0.012595705687999725, -0.019290979951620102, 0.01022550743073225, 0.0026816895697265863, -0.005592522211372852, -0.010590704157948494, -0.00630859425291419, -0.002576068975031376, 0.01815958507359028, 0.012359402142465115, 0.008499774150550365, 0.0051700398325920105, 0.00744714867323637, 0.01180802658200264, 0.01781587116420269, -0.008105934597551823, 0.015982726588845253, -0.0009568511741235852, 0.02059422992169857, -0.0070855324156582355, 0.014980225823819637, -0.03967038914561272, 0.005520915146917105, 0.02318640984594822, -0.0032993017230182886, -0.0031238640658557415, 0.007146398536860943, -0.01585383340716362, -0.0057070935145020485, -0.028671521693468094, -0.0014330390840768814, -0.03634781390428543, -0.0048513878136873245, -0.031822238117456436, 0.0026655779220163822, -0.004010003060102463, -0.007064050063490868, -0.009373381733894348, 0.0031811497174203396, 0.014235510490834713, -0.01382018905133009, -0.004199762362986803, 0.011449990794062614, -0.017930442467331886, 0.01704251393675804, -0.007128496654331684, -0.00886497087776661, 0.004697432275861502, -0.02715344913303852, 0.01677040569484234, 0.007182201836258173, 0.010261311195790768, -0.009946240112185478, -0.0033404757268726826, 0.005796602461487055, -0.013834510929882526, -0.01336190290749073, 0.009946240112185478, -0.001996050588786602, -0.0029573773499578238, -0.03697795793414116, -0.029444878920912743, -0.02308616042137146, -0.013884635642170906, 0.014837011694908142, 0.004826324991881847, -0.0005303408252075315, 0.009466471150517464, -0.04617232084274292, -0.013684135861694813, 0.004622244741767645, 0.03414231166243553, -0.019362585619091988, 0.023000231012701988, 0.030246879905462265, 0.03190816566348076, 0.022814054042100906, -0.0016684477450326085, 0.011056151241064072, 0.01429995708167553, 0.023730624467134476, -0.012960902415215969, 0.002450756262987852, 0.006383781787008047, -0.029187094420194626, -0.03176495432853699, -0.007454309146851301, -0.0008507830207236111, -0.004153217654675245, -0.008506935089826584, 0.001492114970460534, -0.009502274915575981, -0.0009908645879477262, -0.025105483829975128, 0.0029090424068272114, -0.009043988771736622, -0.024790411815047264, -0.020565588027238846, 0.023258017376065254, -0.02529166266322136, 0.05387725681066513, -0.0067991032265126705, 0.008363720960915089, -0.019749265164136887, -0.01125665195286274, -0.0033995516132563353, 0.0051485574804246426, -0.005080530885607004, 0.0025384752079844475, 0.01891862228512764, -0.008184703066945076, -0.005789441987872124, 0.019462836906313896, -0.026036377996206284, -0.009824507869780064, -0.020250516012310982, 0.013354741968214512, -0.03394181281328201, 0.006036486942321062, 0.008471131324768066, -0.005327575374394655, -0.005016084294766188, -0.006000683177262545, 0.00680268369615078, 0.025549449026584625, -0.027797915041446686, -0.0030719488859176636, -0.01571061834692955, 0.03099159523844719, 0.00838520284742117, -0.009831667877733707, -0.0015977355651557446, 0.011528759263455868, -0.002046175766736269, 0.010196864604949951, 0.01156456209719181, -0.01173641998320818, -0.006351558491587639, -0.0015699878567829728, 0.0026423055678606033, -0.0012235880130901933, 0.017615370452404022, -0.02145351655781269, 0.0074328272603452206, 0.04345124587416649, 0.007260969839990139, 0.01646965555846691, 0.000619402271695435, 0.03156445175409317, -0.013562403619289398, 0.03391316905617714, -0.02020755037665367, -0.010461811907589436, 0.011979884468019009, 0.0010391994146630168, -0.020852016285061836, 0.0054886918514966965, 0.046200964599847794]"

query = f"""
SELECT _score, name, details['description'] as description FROM community_areas 
WHERE knn_match(details['description_vec'], {immigrants_query}, 2) 
ORDER BY _score DESC limit 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,_score,name,description
0,0.460555,LOWER WEST SIDE,"Lower West Side is a community area on the West Side of Chicago, Illinois, United States. It is three miles southwest of the Chicago Loop and its main neighborhood is Pilsen. The Heart of Chicago is a neighborhood in the southwest corner of the Lower West Side. In the late 19th century, Pilsen was inhabited by German, Polish, Italian, and Czech immigrants. Czech immigrants were the most prominent and named the district after Pilsen, the fourth largest city of the Czech Republic."
1,0.459431,ALBANY PARK,"Albany Park is one of 77 well-defined community areas of Chicago. Located on the Northwest Side of the City of Chicago with the North Branch of the Chicago River forming its east and north boundaries, it includes the ethnically diverse Albany Park neighborhood, with one of the highest percentages of foreign-born residents of any Chicago neighborhood. Although the majority of those foreign-born residents are from Latin America, mostly from Mexico (especially from the state of Michoacán), Guatemala, and Ecuador, substantial numbers are from the Philippines, India, Korea, Cambodia, Somalia, Serbia, Croatia, Bosnia, Romania, Pakistan and the Middle East (especially Iraq, Iran, and Lebanon). Over 40 different languages are spoken in its public schools. Due to the diverse population and immigrant population attraction, the population of the neighborhood increased by 16.5% during the 1990s."
2,0.44854,WEST LAWN,"West Lawn, one of Chicago's 77 official community areas, is located on the southwest side of the city. It is considered to be a 'melting pot' of sorts, due to its constant change of races moving in and out of the area, as well as the diversity that exists there. It has a small town atmosphere in the big city. West Lawn is home to many Polish-Americans, Irish-Americans, Mexican-Americans, and other people of Latin American and Eastern European origin."
3,0.447165,BRIDGEPORT,"Bridgeport is one of the 77 community areas in Chicago, on the city's South Side, bounded on the north by the South Branch of the Chicago River, on the west by Bubbly Creek, on the south by Pershing Road, and on the east by the Union Pacific railroad tracks. Neighboring communities are Pilsen across the river to the north, McKinley Park to the west, Canaryville to the south, and Armour Square to the east. Bridgeport has been the home of five Chicago mayors. Once known for its racial intolerance, Bridgeport today ranks as one of the city's most diverse neighborhoods"
4,0.446597,PORTAGE PARK,"Portage Park is located on the northwest side of the City of Chicago, Illinois and is one of 77 officially designated Chicago community areas. Portage Park is bordered by the community areas of Jefferson Park and Forest Glen to the north, Dunning and the suburb of Harwood Heights to the west, Irving Park to the east and Belmont-Cragin to the south. The area is notable for its Six Corners outdoor shopping district, centered at the intersection of Irving Park Road, Cicero Avenue and the diagonal Milwaukee Avenue, the Portage Theater and for its namesake - Portage Park. The name of the park was taken from the major portage linking the Des Plaines and Chicago rivers along what is today Irving Park Road. The area was so swampy that in wet weather, Native Americans and trappers were easily able to paddle through the area in either direction without leaving their canoes. In those days, the Des Plaines was perhaps the most significant way to the Illinois, and then on to the Mississippi (and to return). Portage Park has the largest Polish community in the Chicago Metropolitan Area according to the 2000 census. Portage Park is home to the Polish American Association, the Polish Jesuit Millennium Center, the Polish Army Veterans Association in the beautiful building of the former Irving State bank, in addition to the multitude of Polish shops and businesses throughout the district. One of the area's parks is named Chopin Park after Frédéric Chopin, Poland's most famous pianist and composer."


As vector similarity search is expressed in SQL, we can of course combine it with other types of search.  Imagine you're moving to Chicago and have been trying to choose which area to live in.  You really like the Hyde Park area and wonder which other areas might be like it.  

Let's run a `knn_match` query, using the vector representation of Hyde Park's description as the query vector:

In [101]:
query = """
SELECT _score, name FROM community_areas 
WHERE knn_match(details['description_vec'], (SELECT details['description_vec'] FROM community_areas WHERE name='HYDE PARK'), 2) 
ORDER BY _score DESC limit 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,_score,name
0,1.0,HYDE PARK
1,0.622107,KENWOOD
2,0.614028,WOODLAWN
3,0.592906,WASHINGTON PARK
4,0.54471,ROGERS PARK


Here, we get Hyde Park itself back, as it's a good match for its own vector :)  Let's run a more specific query that does a couple of additional things:

1. Excludes Hyde Park from the results.

1. Requires community areas to be at least partially inside a geo polygon describing an area of the city that we want to be in, for proximity to work.

In [129]:
query = """
SELECT _score, name FROM community_areas 
WHERE knn_match(details['description_vec'], (SELECT details['description_vec'] FROM community_areas WHERE name='HYDE PARK'), 2) 
AND name != 'HYDE PARK' 
AND INTERSECTS(boundaries, 'POLYGON ((-87.6260901267975 41.83558299924806, -87.62229326051806 41.723377049484725, -87.61754602677676 41.71558271165077, -87.59761054877902 41.71451963553886, -87.59049048769104 41.70566112778525, -87.5235581113496 41.722313603775035, -87.60900410022978 41.83983231675009, -87.6260901267975 41.83558299924806))')
ORDER BY _score DESC limit 5;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,_score,name
0,2.622107,KENWOOD
1,2.614028,WOODLAWN
2,2.592906,WASHINGTON PARK
3,2.534467,OAKLAND


Our result set no longer contains Hyde Park, as we know we like that area.  Notice also that Rogers Park is no longer in the result set - it's indicated by the blue marker on the map below,  entirely outside of our desired area which is represented by the grey polygon.

![Rogers Park and our desirable areas polygon](desirable_community_areas.png)

## Towards Hybrid Search

Hybrid Search ([read our blog post](https://cratedb.com/blog/hybrid-search-explained)) is a technique that enhances relevancy and accuracy by combining the results of two or more search algorithms, achieving better accuracy and relevancy than each algorithm would individually.

A common scenario is to combine semantic search (vector search) with lexical search (keyword search). Semantic search excels at understanding the context of a phrase. Lexical search is great at finding how many times a keyword or phrase appears in a document, taking into account the length and the average length of your documents.

As we've seen, CrateDB supports both full-text and vector similarity searches.  We can perform a hybrid search by combining the results of each of these types of search, in a process known as re-ranking.  We won't cover that here, but let's see how simple it is to combine these two types of search in a single query.

In [127]:
query = f"""
WITH full_text AS (
  SELECT
    _score AS full_text_score, name
  FROM
    community_areas
  WHERE
    MATCH(details['description'], 'european immigrants')
  ORDER BY
    _score DESC
),
vector AS (
  SELECT 
    _score AS vector_score, name
  FROM 
    community_areas
  WHERE
    knn_match(details['description_vec'], {immigrants_query}, 2)
  ORDER BY _score DESC
)
SELECT full_text_score, vector_score, full_text.name AS full_text_name, vector.name AS vector_name 
FROM full_text FULL JOIN vector ON vector.name = full_text.name;
"""

df = pd.read_sql(query, CONNECTION_STRING)
df

Unnamed: 0,full_text_score,vector_score,full_text_name,vector_name
0,1.19746,0.460555,LOWER WEST SIDE,LOWER WEST SIDE
1,1.117726,0.459431,ALBANY PARK,ALBANY PARK
2,1.483632,0.44854,WEST LAWN,WEST LAWN
3,,0.447165,,BRIDGEPORT
4,,0.446597,,PORTAGE PARK
5,,0.445427,,WEST TOWN
6,,0.443825,,HUMBOLDT PARK
7,,0.443574,,EDGEWATER
8,1.236782,,NEAR WEST SIDE,
9,1.196814,,GARFIELD RIDGE,


Here we see we can retrieve the scores for both types of search, which can then be normalized and re-ranked.  For more details on ways to achieve this in SQL, see our [blog post](https://cratedb.com/blog/hybrid-search-explained).

## Additional Resources

The following are additional resources and workbooks that expand on the topics covered here:

* [Blog: Hybrid Search in CrateDB](https://cratedb.com/blog/hybrid-search-explained)
* [Blog: Dissecting a Hybrid Search Query in SQL](https://cratedb.com/blog/dissecting-a-hybrid-search-query-in-sql)
* [CrateDB documentation: Full-text Search](https://cratedb.com/docs/guide/feature/search/fts/index.html)
* [CrateDB documentation: Hybrid Search](https://cratedb.com/docs/guide/feature/search/hybrid/index.html)
* [Jupyter notebook: Applying RAG using CrateDB and LangChain](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb_rag_customer_support_langchain.ipynb)


## Continue your Learning Journey

To learn more about CrateDB, sign up for our courses at the CrateDB Academy.  We recommend the [CrateDB Fundamentals](https://learn.cratedb.com/cratedb-fundamentals) course for a comprehensive overview, and our [Advanced Time Series](https://learn.cratedb.com/time-series) course for a deep dive into time series data concepts.