Wikipedia Lookup
===

Quick experiments with Wikipedia lookup.

Copying the API approach used by https://github.com/minimaxir/simpleaichat

In [4]:
from typing import Union

import httpx

In [6]:
httpx.__version__

'0.24.1'

In [12]:
# code adopted from https://github.com/minimaxir/simpleaichat/tree/main (MIT-licensed)
WIKIPEDIA_API_URL = "https://en.wikipedia.org/w/api.php"
# set the user agent according to https://www.mediawiki.org/wiki/API:Etiquette#The_User-Agent_header
WIKIPEDIA_HEADERS = {
    "user-agent": f"WikipediaLookup/0.1 (https://github.com/levon003/llm-math-education; levon003@umn.org) httpx/{httpx.__version__}"
}


def wikipedia_search(query: str, n: int = 1) -> Union[str, list[str]]:
    SEARCH_PARAMS = {
        "action": "query",
        "list": "search",
        "format": "json",
        "srlimit": n,
        "srsearch": query,
        "srwhat": "text",
        "srprop": "",
    }

    with httpx.Client(headers=WIKIPEDIA_HEADERS) as client:
        r_search = client.get(WIKIPEDIA_API_URL, params=SEARCH_PARAMS)
        results = [x["title"] for x in r_search.json()["query"]["search"]]

    return results[0] if n == 1 else results


def wikipedia_lookup(query: str, sentences: int = 1) -> str:
    LOOKUP_PARAMS = {
        "action": "query",
        "prop": "extracts",
        "exsentences": sentences,
        "exlimit": "1",
        "explaintext": "1",
        "formatversion": "2",
        "format": "json",
        "titles": query,
    }
    with httpx.Client(headers=WIKIPEDIA_HEADERS) as client:
        r_lookup = client.get(WIKIPEDIA_API_URL, params=LOOKUP_PARAMS)
    return r_lookup.json()["query"]["pages"][0]["extract"]


def wikipedia_search_lookup(query: str, sentences: int = 1) -> str:
    return wikipedia_lookup(wikipedia_search(query, 1), sentences)

In [15]:
wikipedia_search("Maths")

'Mathematics'

In [16]:
wikipedia_lookup("Mathematics", sentences=5)

'Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics with the major subdisciplines of number theory, algebra, geometry, and analysis, respectively. There is no general consensus among mathematicians about a common definition for their academic discipline.\nMost mathematical activity involves the discovery of properties of abstract objects and the use of pure reason to prove them. These objects consist of either abstractions from nature or—in modern mathematics—entities that are stipulated to have certain properties, called axioms.'

In [17]:
wikipedia_search_lookup("Cartesian grid", sentences=3)

'A regular grid is a tessellation of n-dimensional Euclidean space by congruent parallelotopes (e.g. bricks). \nIts opposite is irregular grid.'

In [19]:
wikipedia_search_lookup("Probability", sentences=3)

'In science, the probability of an event is a number that indicates how likely the event is to occur. It is expressed as a number in the range from 0 and 1, or, using percentage notation, in the range from 0% to 100%.  The more likely it is that the event will occur, the higher its probability.'