<a href="https://colab.research.google.com/github/HeatherDriver/MathGraph/blob/main/01_Wikidata_Wolfram_queries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive, userdata
import requests
import pickle
import pprint
import random
import re

In [None]:
drive.mount("/content/drive")
%cd '/content/drive/MyDrive/Colab Notebooks/Math_Graph/pickle_files'

# Get API key
wolfram = userdata.get('Wolfram_alpha')

Mounted at /content/drive
/content/drive/MyDrive/Colab Notebooks/Math_Graph/pickle_files


In [None]:
def read_pickle(file_name):
  with open(file_name, 'rb') as file:
    data = pickle.load(file)
  return data

In [None]:
# Load mizar_subjects
dict_file = "mizar_subjects.pkl"
mizar_subj_list = read_pickle(dict_file)

In [None]:
def get_wikidata_subject_info(entity_name, subject_dict):
  """Retrieves data from Wikidata API"""
  # Construct Wikidata API URL call
  url = f"https://www.wikidata.org/w/api.php?action=wbsearchentities&search={entity_name}&language=en&format=json"

  # Send a GET request to Wikidata API
  response = requests.get(url)

  # If search was successful, extract results
  if response.status_code == 200:
    search_results = response.json().get("search")
    if search_results:
      subject_dict.update({entity_name : search_results})
    else:
      print(f"No results found for {entity_name}")
  return subject_dict

In [None]:
def load_or_create_dict(subject_dict_name, function_to_run, subj_list):
  """If pickle file not in existence, create via running appropriate API function"""

  try:
    subject_dict = read_pickle(subject_dict_name)

  except FileNotFoundError:
    print(f"{subject_dict_name} not found, creating {subject_dict_name}...")
    subject_dict = dict()
    for entity_name in subj_list:
      subject_dict = function_to_run(entity_name, subject_dict)

    with open(subject_dict_name, 'wb') as file:
      pickle.dump(subject_dict, file)

  return subject_dict

In [None]:
def split_camel_case(text):
    # Use regex to insert a space before uppercase letters preceded by lowercase letters
    return re.sub(r'(?<=[a-z])(?=[A-Z])', ' ', text)

In [None]:
mizar_wikidata_dict = load_or_create_dict("mizar_wikidata_api.pkl", get_wikidata_subject_info, mizar_subj_list)

print(f"dictionary length: {len(mizar_wikidata_dict)}")
random_pairs = random.sample(list(mizar_wikidata_dict.items()), 5)

for key, value in random_pairs:
  print(f"{key}:\n{value}")

dictionary length: 96
Pell's_equation:
[{'id': 'Q853067', 'title': 'Q853067', 'pageid': 804878, 'concepturi': 'http://www.wikidata.org/entity/Q853067', 'repository': 'wikidata', 'url': '//www.wikidata.org/wiki/Q853067', 'display': {'label': {'value': "Pell's equation", 'language': 'en'}, 'description': {'value': 'mathematical equation, specifically a kind of Diophantine equation', 'language': 'en'}}, 'label': "Pell's equation", 'description': 'mathematical equation, specifically a kind of Diophantine equation', 'match': {'type': 'label', 'language': 'en', 'text': "Pell's equation"}}, {'id': 'Q128057376', 'title': 'Q128057376', 'pageid': 122040493, 'concepturi': 'http://www.wikidata.org/entity/Q128057376', 'repository': 'wikidata', 'url': '//www.wikidata.org/wiki/Q128057376', 'display': {'label': {'value': 'PELL’S EQUATIONS IN GAUSSIAN INTEGERS', 'language': 'en'}, 'description': {'value': 'scholarly article', 'language': 'en'}}, 'label': 'PELL’S EQUATIONS IN GAUSSIAN INTEGERS', 'descri

In [None]:
def get_wolfram_short_answer_query_name(entity_name):
  """ Prepares the entity name for Wolfram API query: requires query to be prefixed 'What is ...'"""
  entity_name_split = entity_name.split("_")

  if len(entity_name_split) > 2:
    concatenated_subtext = '+'.join(entity_name_split[1:-1])
    concatenated = f"{entity_name_split[0]}+{concatenated_subtext}+{entity_name_split[-1]}"
  elif len(entity_name_split) == 2:
    concatenated = f"{entity_name_split[0]}+{entity_name_split[-1]}"
  else:
    concatenated = f"{entity_name_split[0]}"

  query_name = f"What+is+{concatenated}%3f"
  return query_name

def get_wolfram_alpha_subject_info(entity_name, subject_dict):
  """ Retrieves data from Wolfram short answer API'"""
  query_name = get_wolfram_short_answer_query_name(entity_name)

  # # Construct WolframAlpha API call
  url = f"http://api.wolframalpha.com/v1/result?appid={wolfram}&i={query_name}"

  # Send a GET request to WolframAlpha API
  response = requests.get(url)

  # If search was successful, extract results
  if response.status_code == 200:
    search_results = response.text
    if search_results:
      subject_dict.update({entity_name : search_results})
    else:
      print(f"No results found for {entity_name}")
  return subject_dict

In [None]:
mizar_wolfram_alpha_dict = load_or_create_dict("mizar_short_ans_api.pkl", get_wolfram_alpha_subject_info, mizar_subj_list)

print(f"dictionary length: {len(mizar_wolfram_alpha_dict)}")
random_pairs = random.sample(list(mizar_wolfram_alpha_dict.items()), 5)

for key, value in random_pairs:
  print(f"{key}:\n{value}")

dictionary length: 65
Bertrand's_postulate:
Bertrand's postulate posits that if n is greater than 3, then there is always at least one prime between n and two times n minus two
Binomial_theorem:
a theorem giving the expansion of a binomial raised to a given power
Jordan_curve_theorem:
If J is a simple closed curve in R^2, then the Jordan curve theorem, also called the Jordan‐Brouwer theorem states that R^2-J has two components (an inside and outside), with J the boundary of each.  The Jordan curve theorem is a standard result in algebraic topology with a rich history.  A complete proof can be found in Hatcher, or in classic texts such as Spanier.  Recently, a proof checker was used by a Japanese‐Polish team to create a computer‐checked proof of the theorem
Subset:
a set whose members are members of another set; a set contained within another set
Fundamental_theorem_of_algebra:
The fundamental theorem of algebra states that every polynomial equation having complex coefficients and degre

In [None]:
# Repeat for additional scraped topics from wolfram math world

In [None]:
# Load topics
dict_file = "topics.pkl"
topics = read_pickle(dict_file)

# Load sub_topics
dict_file = "sub_topics.pkl"
sub_topics = read_pickle(dict_file)

# Load alg_2
dict_file = "alg_2.pkl"
alg_2 = read_pickle(dict_file)

# Load alg_3
dict_file = "alg_3.pkl"
alg_3 = read_pickle(dict_file)

In [None]:
scraped_subj_list = []
for key, value in topics.items():
  scraped_subj_list.append(key)
  scraped_subj_list.extend(value)

for key, value in sub_topics.items():
  key = split_camel_case(key)
  scraped_subj_list.append(key)
  scraped_subj_list.extend(value)

for key, value in alg_2.items():
  scraped_subj_list.append(key)
  scraped_subj_list.extend(value)

for key, value in alg_3.items():
  scraped_subj_list.append(key)
  scraped_subj_list.extend(value)

scraped_subj_list = list(set(scraped_subj_list))
scraped_subj_list = [x.replace(" ", "_") for x in scraped_subj_list]

In [None]:
scraped_wikidata_dict = load_or_create_dict("wolfram_wikidata_api.pkl", get_wikidata_subject_info, scraped_subj_list)

print(f"dictionary length: {len(scraped_wikidata_dict)}")
random_pairs = random.sample(list(scraped_wikidata_dict.items()), 5)

for key, value in random_pairs:
  print(f"{key}:\n{value}")

dictionary length: 3227
Row_Vector:
[{'id': 'Q2916003', 'title': 'Q2916003', 'pageid': 2790383, 'concepturi': 'http://www.wikidata.org/entity/Q2916003', 'repository': 'wikidata', 'url': '//www.wikidata.org/wiki/Q2916003', 'display': {'label': {'value': 'row vector', 'language': 'en'}, 'description': {'value': '1 × n matrix in linear algebra', 'language': 'en'}}, 'label': 'row vector', 'description': '1 × n matrix in linear algebra', 'match': {'type': 'label', 'language': 'en', 'text': 'row vector'}}]
Cesàro_Mean:
[{'id': 'Q2045894', 'title': 'Q2045894', 'pageid': 1973519, 'concepturi': 'http://www.wikidata.org/entity/Q2045894', 'repository': 'wikidata', 'url': '//www.wikidata.org/wiki/Q2045894', 'display': {'label': {'value': 'Cesàro mean', 'language': 'en'}}, 'label': 'Cesàro mean', 'match': {'type': 'label', 'language': 'en', 'text': 'Cesàro mean'}}, {'id': 'Q98382962', 'title': 'Q98382962', 'pageid': 96598998, 'concepturi': 'http://www.wikidata.org/entity/Q98382962', 'repository': '

In [None]:
scraped_wolfram_alpha_dict = load_or_create_dict("wolfram_short_ans_api.pkl", get_wolfram_alpha_subject_info, scraped_subj_list)

print(f"dictionary length: {len(scraped_wolfram_alpha_dict)}")
random_pairs = random.sample(list(scraped_wolfram_alpha_dict.items()), 5)

for key, value in random_pairs:
  print(f"{key}:\n{value}")

wolfram_short_ans_api.pkl not found, creating wolfram_short_ans_api.pkl...
dictionary length: 5026
Productive_Property:
A property that is always fulfilled by the product of topological spaces, if it is fulfilled by each single factor.  Examples of productive properties are connectedness, and path‐connectedness, axioms T_0, T_1, T_2 and T_3, regularity and complete regularity, the property of being a Tychonoff space, but not axiom T_4 and normality, which does not even pass, in general, from a space X to X×X.  Metrizability is not productive, but is preserved by products of at most ℵ_0 spaces
Green_Space:
A G‐space provides local notions of harmonic, hyperharmonic, and superharmonic functions.  When there exists a nonconstant superharmonic function greater than 0, it is a called a Green space.  Examples are R^n (for n≥3) and any bounded domain of R^n
Gelfand_Mazur_Theorem:
If A is a unital Banach algebra where every nonzero element is invertible, then A is the algebra of complex number