Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated notebooks and utils.py to use the latest stable Azure Search API Version, needed for future Index projection and automatic embedding #43

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
22 changes: 14 additions & 8 deletions 01-Load-Data-ACogSearch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -548,7 +548,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 11,
"metadata": {
"tags": []
},
Expand All @@ -559,7 +559,7 @@
"text": [
"200\n",
"Status: inProgress\n",
"Items Processed: 400\n",
"Items Processed: 0\n",
"True\n"
]
}
Expand Down Expand Up @@ -613,7 +613,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 12,
"metadata": {},
"outputs": [
{
Expand All @@ -632,18 +632,24 @@
" {\"name\": \"id\", \"type\": \"Edm.String\", \"key\": \"true\", \"filterable\": \"true\" },\n",
" {\"name\": \"title\",\"type\": \"Edm.String\",\"searchable\": \"true\",\"retrievable\": \"true\"},\n",
" {\"name\": \"chunk\",\"type\": \"Edm.String\",\"searchable\": \"true\",\"retrievable\": \"true\"},\n",
" {\"name\": \"chunkVector\",\"type\": \"Collection(Edm.Single)\",\"searchable\": \"true\",\"retrievable\": \"true\",\"dimensions\": 1536,\"vectorSearchConfiguration\": \"vectorConfig\"},\n",
" {\"name\": \"chunkVector\",\"type\": \"Collection(Edm.Single)\",\"searchable\": \"true\",\"retrievable\": \"true\",\"dimensions\": 1536,\"vectorSearchProfile\": \"my_vectorsearch_profile\"},\n",
" {\"name\": \"name\", \"type\": \"Edm.String\", \"searchable\": \"true\", \"retrievable\": \"true\", \"sortable\": \"false\", \"filterable\": \"false\", \"facetable\": \"false\"},\n",
" {\"name\": \"location\", \"type\": \"Edm.String\", \"searchable\": \"false\", \"retrievable\": \"true\", \"sortable\": \"false\", \"filterable\": \"false\", \"facetable\": \"false\"},\n",
"\n",
" ],\n",
" \"vectorSearch\": {\n",
" \"algorithmConfigurations\": [\n",
" \"algorithms\": [\n",
" {\n",
" \"name\": \"vectorConfig\",\n",
" \"name\": \"vector_algorithm_hnsw\",\n",
" \"kind\": \"hnsw\"\n",
" }\n",
" ]\n",
" ],\n",
" \"profiles\": [\n",
" {\n",
" \"name\": \"my_vectorsearch_profile\",\n",
" \"algorithm\": \"vector_algorithm_hnsw\"\n",
" }\n",
" ] \n",
" },\n",
" \"semantic\": {\n",
" \"configurations\": [\n",
Expand Down Expand Up @@ -709,7 +715,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.9"
},
"vscode": {
"interpreter": {
Expand Down
116 changes: 57 additions & 59 deletions 02-LoadCSVOneToMany-ACogSearch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -140,77 +140,77 @@
"name": "stdout",
"output_type": "stream",
"text": [
"No. of lines: 90000\n"
"No. of lines: 103\n"
]
},
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"</style>\n",
"<table id=\"T_87464\">\n",
"<table id=\"T_09152\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_87464_level0_col0\" class=\"col_heading level0 col0\" >cord_uid</th>\n",
" <th id=\"T_87464_level0_col1\" class=\"col_heading level0 col1\" >source_x</th>\n",
" <th id=\"T_87464_level0_col2\" class=\"col_heading level0 col2\" >title</th>\n",
" <th id=\"T_87464_level0_col3\" class=\"col_heading level0 col3\" >abstract</th>\n",
" <th id=\"T_87464_level0_col4\" class=\"col_heading level0 col4\" >authors</th>\n",
" <th id=\"T_87464_level0_col5\" class=\"col_heading level0 col5\" >url</th>\n",
" <th id=\"T_09152_level0_col0\" class=\"col_heading level0 col0\" >cord_uid</th>\n",
" <th id=\"T_09152_level0_col1\" class=\"col_heading level0 col1\" >source_x</th>\n",
" <th id=\"T_09152_level0_col2\" class=\"col_heading level0 col2\" >title</th>\n",
" <th id=\"T_09152_level0_col3\" class=\"col_heading level0 col3\" >abstract</th>\n",
" <th id=\"T_09152_level0_col4\" class=\"col_heading level0 col4\" >authors</th>\n",
" <th id=\"T_09152_level0_col5\" class=\"col_heading level0 col5\" >url</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_87464_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_87464_row0_col0\" class=\"data row0 col0\" >ug7v899j</td>\n",
" <td id=\"T_87464_row0_col1\" class=\"data row0 col1\" >PMC</td>\n",
" <td id=\"T_87464_row0_col2\" class=\"data row0 col2\" >Clinical features of culture-p...</td>\n",
" <td id=\"T_87464_row0_col3\" class=\"data row0 col3\" >OBJECTIVE: This retrospective ...</td>\n",
" <td id=\"T_87464_row0_col4\" class=\"data row0 col4\" >Madani, Tariq A; Al-Ghamdi, Ai...</td>\n",
" <td id=\"T_87464_row0_col5\" class=\"data row0 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/</a></td>\n",
" <th id=\"T_09152_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_09152_row0_col0\" class=\"data row0 col0\" >ug7v899j</td>\n",
" <td id=\"T_09152_row0_col1\" class=\"data row0 col1\" >PMC</td>\n",
" <td id=\"T_09152_row0_col2\" class=\"data row0 col2\" >Clinical features of culture-p...</td>\n",
" <td id=\"T_09152_row0_col3\" class=\"data row0 col3\" >OBJECTIVE: This retrospective ...</td>\n",
" <td id=\"T_09152_row0_col4\" class=\"data row0 col4\" >Madani, Tariq A; Al-Ghamdi, Ai...</td>\n",
" <td id=\"T_09152_row0_col5\" class=\"data row0 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/</a></td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_87464_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_87464_row1_col0\" class=\"data row1 col0\" >02tnwd4m</td>\n",
" <td id=\"T_87464_row1_col1\" class=\"data row1 col1\" >PMC</td>\n",
" <td id=\"T_87464_row1_col2\" class=\"data row1 col2\" >Nitric oxide: a pro-inflammato...</td>\n",
" <td id=\"T_87464_row1_col3\" class=\"data row1 col3\" >Inflammatory diseases of the r...</td>\n",
" <td id=\"T_87464_row1_col4\" class=\"data row1 col4\" >Vliet, Albert van der; Eiseric...</td>\n",
" <td id=\"T_87464_row1_col5\" class=\"data row1 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59543/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59543/</a></td>\n",
" <th id=\"T_09152_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_09152_row1_col0\" class=\"data row1 col0\" >02tnwd4m</td>\n",
" <td id=\"T_09152_row1_col1\" class=\"data row1 col1\" >PMC</td>\n",
" <td id=\"T_09152_row1_col2\" class=\"data row1 col2\" >Nitric oxide: a pro-inflammato...</td>\n",
" <td id=\"T_09152_row1_col3\" class=\"data row1 col3\" >Inflammatory diseases of the r...</td>\n",
" <td id=\"T_09152_row1_col4\" class=\"data row1 col4\" >Vliet, Albert van der; Eiseric...</td>\n",
" <td id=\"T_09152_row1_col5\" class=\"data row1 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59543/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59543/</a></td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_87464_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_87464_row2_col0\" class=\"data row2 col0\" >ejv2xln0</td>\n",
" <td id=\"T_87464_row2_col1\" class=\"data row2 col1\" >PMC</td>\n",
" <td id=\"T_87464_row2_col2\" class=\"data row2 col2\" >Surfactant protein-D and pulmo...</td>\n",
" <td id=\"T_87464_row2_col3\" class=\"data row2 col3\" >Surfactant protein-D (SP-D) pa...</td>\n",
" <td id=\"T_87464_row2_col4\" class=\"data row2 col4\" >Crouch, Erika C...</td>\n",
" <td id=\"T_87464_row2_col5\" class=\"data row2 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59549/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59549/</a></td>\n",
" <th id=\"T_09152_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_09152_row2_col0\" class=\"data row2 col0\" >ejv2xln0</td>\n",
" <td id=\"T_09152_row2_col1\" class=\"data row2 col1\" >PMC</td>\n",
" <td id=\"T_09152_row2_col2\" class=\"data row2 col2\" >Surfactant protein-D and pulmo...</td>\n",
" <td id=\"T_09152_row2_col3\" class=\"data row2 col3\" >Surfactant protein-D (SP-D) pa...</td>\n",
" <td id=\"T_09152_row2_col4\" class=\"data row2 col4\" >Crouch, Erika C...</td>\n",
" <td id=\"T_09152_row2_col5\" class=\"data row2 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59549/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59549/</a></td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_87464_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_87464_row3_col0\" class=\"data row3 col0\" >2b73a28n</td>\n",
" <td id=\"T_87464_row3_col1\" class=\"data row3 col1\" >PMC</td>\n",
" <td id=\"T_87464_row3_col2\" class=\"data row3 col2\" >Role of endothelin-1 in lung d...</td>\n",
" <td id=\"T_87464_row3_col3\" class=\"data row3 col3\" >Endothelin-1 (ET-1) is a 21 am...</td>\n",
" <td id=\"T_87464_row3_col4\" class=\"data row3 col4\" >Fagan, Karen A; McMurtry, Ivan...</td>\n",
" <td id=\"T_87464_row3_col5\" class=\"data row3 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59574/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59574/</a></td>\n",
" <th id=\"T_09152_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_09152_row3_col0\" class=\"data row3 col0\" >2b73a28n</td>\n",
" <td id=\"T_09152_row3_col1\" class=\"data row3 col1\" >PMC</td>\n",
" <td id=\"T_09152_row3_col2\" class=\"data row3 col2\" >Role of endothelin-1 in lung d...</td>\n",
" <td id=\"T_09152_row3_col3\" class=\"data row3 col3\" >Endothelin-1 (ET-1) is a 21 am...</td>\n",
" <td id=\"T_09152_row3_col4\" class=\"data row3 col4\" >Fagan, Karen A; McMurtry, Ivan...</td>\n",
" <td id=\"T_09152_row3_col5\" class=\"data row3 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59574/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59574/</a></td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_87464_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_87464_row4_col0\" class=\"data row4 col0\" >9785vg6d</td>\n",
" <td id=\"T_87464_row4_col1\" class=\"data row4 col1\" >PMC</td>\n",
" <td id=\"T_87464_row4_col2\" class=\"data row4 col2\" >Gene expression in epithelial ...</td>\n",
" <td id=\"T_87464_row4_col3\" class=\"data row4 col3\" >Respiratory syncytial virus (R...</td>\n",
" <td id=\"T_87464_row4_col4\" class=\"data row4 col4\" >Domachowske, Joseph B; Bonvill...</td>\n",
" <td id=\"T_87464_row4_col5\" class=\"data row4 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59580/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59580/</a></td>\n",
" <th id=\"T_09152_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_09152_row4_col0\" class=\"data row4 col0\" >9785vg6d</td>\n",
" <td id=\"T_09152_row4_col1\" class=\"data row4 col1\" >PMC</td>\n",
" <td id=\"T_09152_row4_col2\" class=\"data row4 col2\" >Gene expression in epithelial ...</td>\n",
" <td id=\"T_09152_row4_col3\" class=\"data row4 col3\" >Respiratory syncytial virus (R...</td>\n",
" <td id=\"T_09152_row4_col4\" class=\"data row4 col4\" >Domachowske, Joseph B; Bonvill...</td>\n",
" <td id=\"T_09152_row4_col5\" class=\"data row4 col5\" ><a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59580/\">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59580/</a></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x7f36d4016ad0>"
"<pandas.io.formats.style.Styler at 0x7fe597885a50>"
]
},
"execution_count": 6,
Expand Down Expand Up @@ -452,8 +452,8 @@
"output_type": "stream",
"text": [
"200\n",
"Status: inProgress\n",
"Items Processed: 15000\n",
"Status: success\n",
"Items Processed: 103\n",
"True\n"
]
}
Expand Down Expand Up @@ -511,18 +511,24 @@
" {\"name\": \"id\", \"type\": \"Edm.String\", \"key\": \"true\", \"filterable\": \"true\" },\n",
" {\"name\": \"title\",\"type\": \"Edm.String\",\"searchable\": \"true\",\"retrievable\": \"true\"},\n",
" {\"name\": \"chunk\",\"type\": \"Edm.String\",\"searchable\": \"true\",\"retrievable\": \"true\"},\n",
" {\"name\": \"chunkVector\",\"type\": \"Collection(Edm.Single)\",\"searchable\": \"true\",\"retrievable\": \"true\",\"dimensions\": 1536,\"vectorSearchConfiguration\": \"vectorConfig\"},\n",
" {\"name\": \"chunkVector\",\"type\": \"Collection(Edm.Single)\",\"searchable\": \"true\",\"retrievable\": \"true\",\"dimensions\": 1536,\"vectorSearchProfile\": \"my_vectorsearch_profile\"},\n",
" {\"name\": \"name\", \"type\": \"Edm.String\", \"searchable\": \"true\", \"retrievable\": \"true\", \"sortable\": \"false\", \"filterable\": \"false\", \"facetable\": \"false\"},\n",
" {\"name\": \"location\", \"type\": \"Edm.String\", \"searchable\": \"false\", \"retrievable\": \"true\", \"sortable\": \"false\", \"filterable\": \"false\", \"facetable\": \"false\"},\n",
"\n",
" ],\n",
" \"vectorSearch\": {\n",
" \"algorithmConfigurations\": [\n",
" \"algorithms\": [\n",
" {\n",
" \"name\": \"vectorConfig\",\n",
" \"name\": \"vector_algorithm_hnsw\",\n",
" \"kind\": \"hnsw\"\n",
" }\n",
" ]\n",
" ],\n",
" \"profiles\": [\n",
" {\n",
" \"name\": \"my_vectorsearch_profile\",\n",
" \"algorithm\": \"vector_algorithm_hnsw\"\n",
" }\n",
" ] \n",
" },\n",
" \"semantic\": {\n",
" \"configurations\": [\n",
Expand Down Expand Up @@ -570,14 +576,6 @@
"# NEXT\n",
"Now that we have two separate text-based indexes loaded with two different types of information and its correspongind vector-based indexes, In the next notebook 3, we will do a Multi-Index query, sort the results based on the reranker semantic score of Azure Search, and then use OpenAI to understand this results and give the best answer possible"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7505d8f9-39c7-4b87-a85f-283b6fea3de0",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -596,7 +594,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.9"
}
},
"nbformat": 4,
Expand Down