# Training a gpt 3 model for nl2sparql
Openai only supports fine-tuning on their gpt3 models: ada, babbage, curie and davinci. They do not yet support fine-tuning 
on gpt3.5 (aka chatGPT, aka "gpt-3.5-turbo"). This notebook examines the models for which fine-tuninig is available.

In [51]:
# pip install these as needed
import openai
import pandas as pd

In [66]:
# these imports should not require installation
import json
import string
import random
import subprocess


We will need the lcquad data as a DataFrame. You may need to change this file path.

In [38]:
lcquad_filename = '../../lcquad2.0.train.json'
lcquad_df = pd.read_json(lcquad_filename)

### Set up the openai api key
The api key is a secret and so should not be checked into github. This is what the ini file should look like:
```
[OPENAI]
OPENAI_API_KEY=<openai key here>
[WANDB]
WANDB_API_KEY=<wandb key here>
```
Add your own api key there, or ask Max for his.

In [102]:
import configparser
config = configparser.ConfigParser()
config.read('secrets.ini')

['secrets.ini']

In [104]:
import os
os.environ.update({'OPENAI_API_KEY': config['OPENAI']['OPENAI_API_KEY']})

### Sanity check: openai's tutorial example
This is here just to validate that the api setup is working

In [16]:
def generate_prompt(animal):
    return """Suggest three names for an animal that is a superhero.

Animal: Cat
Names: Captain Sharpclaw, Agent Fluffball, The Incredible Feline
Animal: Dog
Names: Ruff the Protector, Wonder Canine, Sir Barks-a-Lot
Animal: {}
Names:""".format(
        animal.capitalize()
    )


In [130]:
def run_prompt(prompt="", model="text-davinci-003", temperature=0.6, stop=None):
    response = openai.Completion.create(
        model=model,
        prompt=prompt,
        temperature=temperature,
        max_tokens=100,
        stop=stop
    )
    return response

In [18]:
response = run_prompt(generate_prompt('cow'))
response['choices'][0]['text']

' Super Moo-er, The Amazing Bovine, Mighty Hoofy'

### Sanity check: the openai CLI should be working as well
Implant the key in the shell environment

In [106]:
# Implant the openai key in the shell environment
!eval `cat secrets.ini | grep OPENAI_API_KEY | sed 's/^/export /'`

In [40]:
!which openai

/opt/homebrew/anaconda3/envs/conda210/bin/openai


In [41]:
# this is a quick way to validate that the CLI is working
!openai api fine_tunes.list

{
  "data": [
    {
      "created_at": 1677959387,
      "fine_tuned_model": "ada:ft-personal-2023-03-04-20-06-30",
      "hyperparams": {
        "batch_size": 1,
        "learning_rate_multiplier": 0.1,
        "n_epochs": 4,
        "prompt_loss_weight": 0.01
      },
      "id": "ft-90O7QVVHQ86vwYsLsRe3lFbZ",
      "model": "ada",
      "object": "fine-tune",
      "organization_id": "org-6Tm9wvTU2DAyCUVamArcvPxV",
      "result_files": [
        {
          "bytes": 43024,
          "created_at": 1677960391,
          "filename": "compiled_results.csv",
          "id": "file-xZCujiVRhZnleRx5KowmvYt7",
          "object": "file",
          "purpose": "fine-tune-results",
          "status": "processed",
          "status_details": null
        }
      ],
      "status": "succeeded",
      "training_files": [
        {
          "bytes": 37497,
          "created_at": 1677959386,
          "filename": "openai_train.jsonl",
          "id": "file-PtdM8l5yCZybCnRknRIFCIiv",
          

## Fine-tuning
Choose a base model

In [26]:
base_model = 'ada'

Does the model know anything about sparql already?

In [22]:
response = run_prompt("Please show me a sample sparql query", model=base_model)
response['choices'][0]['text']

" that works in the sample query above. Please provide a sample query that works in the sample query above.\n\nPlease provide sample query that doesn't work in the sample query above.\n\nPlease provide sample query that doesn't work in the sample"

In [27]:
response = run_prompt("What does a sparql query do?", model=base_model)
response['choices'][0]['text']

"\n\nQueries are basically a way to extract information from a database in a way that I can't do in the database.\n\nFor example, I have a database of records that contain a list of people in a particular city. I want"

In [28]:
response = run_prompt("Please translate this question to sparql: 'What is Delta Air Line's periodical literature mouthpiece'", model=base_model)
response['choices'][0]['text']

"\n\n\n\n(4/4)\n\nsparql: 'What is Delta Air Line's periodical literature mouthpiece?'\n\n\n\n(4/4)\n\nsparql: 'What is Delta Air Line's periodical"

Openai wants fine-tune data in a certain format. See https://platform.openai.com/docs/guides/fine-tuning/prepare-training-data.
This script prepares lcquad data. Note that the script prefers the "paraphrased" question. This is something we could play with.

In [42]:
def make_finetune_data(df, filename=None):
    training_json = []
    for index, row in df.iterrows():
        d = {}
        question = row['paraphrased_question']
        if len(question) == 0 or len(question) > 2048:
            question = row['question']
        d['prompt'] = f"{question} ->"
        d['completion'] = f" {row['sparql_wikidata']} \n"
        training_json.append(json.dumps(d))
    if filename is None:
        return training_json
    with open(filename, 'w') as f:
        for l in training_json:
            f.write(l + '\n')   

Play with it a bit to see that it's working

In [45]:
training_json = make_finetune_data(lcquad_df[0:9])

In [46]:
training_json[0:5]

['{"prompt": "What is Delta Air Line\'s periodical literature mouthpiece? ->", "completion": "  select distinct ?obj where { wd:Q188920 wdt:P2813 ?obj . ?obj wdt:P31 wd:Q1002697 }  \\n"}',
 '{"prompt": "What is the name of Ranavalona I\'s husband\'s child? ->", "completion": " SELECT ?answer WHERE { wd:Q169794 wdt:P26 ?X . ?X wdt:P22 ?answer} \\n"}',
 '{"prompt": "Are Jeff Bridges and Lane Chandler both photographers? ->", "completion": " ASK WHERE { wd:Q174843 wdt:P106 wd:Q1804811 . wd:Q174843 wdt:P106 wd:Q33231 } \\n"}',
 '{"prompt": "What range are the papers at the Monique Genonceaux about? ->", "completion": " SELECT ?answer WHERE { wd:Q675176 wdt:P515 ?X . ?X wdt:P156 ?answer} \\n"}',
 '{"prompt": "Which is the operating income for Qantas? ->", "completion": " select distinct ?answer where { wd:Q32491 wdt:P3362 ?answer} \\n"}']

Now create a training file. Choose how many examples you want to start with.

In [47]:
sample_size = 200

In [55]:
def random_train_file_name(N=5):
    random_s = ''.join(random.choices(string.ascii_uppercase + string.digits, k=N))    
    return f'openai_train_{random_s}.jsonl'

In [56]:
train_file_name = random_train_file_name()

In [59]:
make_finetune_data(lcquad_df.iloc[0:sample_size], filename=train_file_name)

We hope that this call simply asserts that everything looks good - it should not prompt. If it prompts, it will crash because 
we're taking input from /dev/null. If you see a crash, try running this command in a shell, without the /dev/null prompt.

In [61]:
!openai tools fine_tunes.prepare_data -f {train_file_name} < /dev/null

Analyzing...

- Your file contains 200 prompt-completion pairs
- All prompts end with suffix ` ->`
- All completions end with suffix ` \n`

No remediations found.

You can use your file for fine-tuning:
> openai api fine_tunes.create -t "openai_train_SZ3K1.jsonl"

After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string ` ->` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=[" \n"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 5.19 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.


Now we create the training run. Note that many more hyperparameters can be specified. See https://platform.openai.com/docs/guides/fine-tuning/hyperparameters

### Create the fine-tuning run
In principal, this command creates the job and streams back the messages. In practice, I always see `Stream interrupted (client disconnected)` when I run it in a jupyter notebook.

In [63]:
!openai api fine_tunes.create -t {train_file_name} -m {base_model} < /dev/null

Upload progress: 100%|████████████████████| 38.8k/38.8k [00:00<00:00, 24.2Mit/s]
Uploaded file from openai_train_SZ3K1.jsonl: file-wLPJKzEDfq1u4DupT7D0PDxH
Created fine-tune: ft-iCN2sLZ7X207KdKTZGhXwGNo
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-03-05 11:33:42] Created fine-tune: ft-iCN2sLZ7X207KdKTZGhXwGNo

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-iCN2sLZ7X207KdKTZGhXwGNo



In [75]:
def get_last_run_id():
    result = subprocess.run(['openai','api', 'fine_tunes.list'], stdout=subprocess.PIPE)
    runs = json.loads(result.stdout)
    last_run = runs['data'][-1]
    run_id = fine_tuned_model = None
    if 'id' in last_run:
        run_id = last_run['id']
    if 'fine_tuned_model' in last_run:
        fine_tuned_model = last_run['fine_tuned_model']  
    return run_id, fine_tuned_model

In [78]:
last_run, fine_tuned_model = get_last_run_id()
last_run, fine_tuned_model

('ft-iCN2sLZ7X207KdKTZGhXwGNo', 'ada:ft-personal-2023-03-05-19-39-25')

Again, this command in principal streams all messages until completion, but in practice, also times out.

So run this cell over and over until you see "Status: succeeded 🎉"

In [74]:
!openai api fine_tunes.follow -i {last_run} 

[2023-03-05 11:33:42] Created fine-tune: ft-iCN2sLZ7X207KdKTZGhXwGNo
[2023-03-05 11:36:53] Fine-tune costs $0.02
[2023-03-05 11:36:53] Fine-tune enqueued. Queue number: 0
[2023-03-05 11:36:54] Fine-tune started
[2023-03-05 11:37:36] Completed epoch 1/4
[2023-03-05 11:38:05] Completed epoch 2/4
[2023-03-05 11:38:34] Completed epoch 3/4
[2023-03-05 11:39:03] Completed epoch 4/4
[2023-03-05 11:39:26] Uploaded model: ada:ft-personal-2023-03-05-19-39-25
[2023-03-05 11:39:26] Uploaded result file: file-7UXK6O1KZZzvkzsMEqzAAU8Q
[2023-03-05 11:39:26] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m ada:ft-personal-2023-03-05-19-39-25 -p <YOUR_PROMPT>


Get the fine-tuned model name. Make sure it's not None.

In [79]:
last_run, fine_tuned_model = get_last_run_id()
last_run, fine_tuned_model

('ft-iCN2sLZ7X207KdKTZGhXwGNo', 'ada:ft-personal-2023-03-05-19-39-25')

Take a look at a few questions

In [124]:
response = run_prompt("What is Delta Air Line's periodical literature mouthpiece? ->", 
                      model=fine_tuned_model, stop=[" \n"])
response['choices'][0]['text']

'  select distinct ?obj where { wd:Q206897 wdt:P108 ?obj . ?obj wdt:P31 wd:Q284047 } '

In [83]:
lcquad_df.iloc[sample_size+5]['paraphrased_question']

'What grant was gotten Mary Tyler Moore ?'

In [131]:
response = run_prompt(f"{lcquad_df.iloc[sample_size+5]['paraphrased_question']} ->", 
                      model=fine_tuned_model, stop=[" \n"])
response['choices'][0]['text']
response

<OpenAIObject text_completion id=cmpl-6qqmO7fJ8aXZHBi1R8B3kBnLQFOcO at 0x13459d4f0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": " SELECT ?obj WHERE { wd:Q429161 p:P166 ?s . ?s ps:P166 ?obj . ?s pq:P585 ?x filter(contains(YEAR(?x),'1962')) }"
    }
  ],
  "created": 1678053960,
  "id": "cmpl-6qqmO7fJ8aXZHBi1R8B3kBnLQFOcO",
  "model": "ada:ft-personal-2023-03-05-21-03-15",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 53,
    "prompt_tokens": 9,
    "total_tokens": 62
  }
}

### Fine-tune further if you want
You just do the same things as above, but specify the fine-tuned model name

In [88]:
next_sample_size = 1000

In [89]:
next_train_file_name = random_train_file_name()

In [91]:
make_finetune_data(lcquad_df.iloc[sample_size:sample_size+next_sample_size], filename=next_train_file_name)

In [93]:
!openai tools fine_tunes.prepare_data -f {next_train_file_name} < /dev/null

Analyzing...

- Your file contains 1000 prompt-completion pairs
- All prompts end with suffix ` ->`
- All completions end with suffix ` \n`

No remediations found.

You can use your file for fine-tuning:
> openai api fine_tunes.create -t "openai_train_QDVLV.jsonl"

After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string ` ->` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=[" \n"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 16.18 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.


#### Note: we're using the fine_tuned_model here

In [95]:
!openai api fine_tunes.create -t {train_file_name} -m {fine_tuned_model} < /dev/null

Found potentially duplicated files with name 'openai_train_SZ3K1.jsonl', purpose 'fine-tune' and size 38794 bytes
file-wLPJKzEDfq1u4DupT7D0PDxH
Upload progress: 100%|████████████████████| 38.8k/38.8k [00:00<00:00, 19.3Mit/s]is file anyway: 
Uploaded file from openai_train_SZ3K1.jsonl: file-YTD6TqmbQ3ZBH1B5IZSRdJqo
Created fine-tune: ft-x7hEyQMoafisWYGmadSSPLWA
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-03-05 12:53:14] Created fine-tune: ft-x7hEyQMoafisWYGmadSSPLWA

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-x7hEyQMoafisWYGmadSSPLWA



In [96]:
last_run, fine_tuned_model = get_last_run_id()
last_run, fine_tuned_model

('ft-x7hEyQMoafisWYGmadSSPLWA', None)

Again, run this cell until you see the success message

In [99]:
!openai api fine_tunes.follow -i {last_run} < /dev/null

[2023-03-05 12:53:14] Created fine-tune: ft-x7hEyQMoafisWYGmadSSPLWA
[2023-03-05 12:58:23] Fine-tune costs $0.02
[2023-03-05 12:58:23] Fine-tune enqueued. Queue number: 2
[2023-03-05 12:58:27] Fine-tune is in the queue. Queue number: 1
[2023-03-05 12:59:00] Fine-tune is in the queue. Queue number: 0
[2023-03-05 13:00:23] Fine-tune started
[2023-03-05 13:01:07] Completed epoch 1/4
[2023-03-05 13:01:44] Completed epoch 2/4
[2023-03-05 13:02:22] Completed epoch 3/4
[2023-03-05 13:02:54] Completed epoch 4/4
[2023-03-05 13:03:15] Uploaded model: ada:ft-personal-2023-03-05-21-03-15
[2023-03-05 13:03:16] Uploaded result file: file-owhfEUuqduH3kiZ6G0ub7dBW
[2023-03-05 13:03:16] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m ada:ft-personal-2023-03-05-21-03-15 -p <YOUR_PROMPT>


Pick up the fine-tuned model name. Note: this model was trained from the previous fine-tuned model, but this one has a different, new name.
Presumably, the previous fine-tunesd model is still around. I'm not sure how to delete them.

In [100]:
last_run, fine_tuned_model = get_last_run_id()
last_run, fine_tuned_model

('ft-x7hEyQMoafisWYGmadSSPLWA', 'ada:ft-personal-2023-03-05-21-03-15')

### Sync with wandb

You may need a paid openai account for this to work.

In [122]:
project_name = 'nl2sparql'

In [123]:
!WANDB_API_KEY=`cat secrets.ini | grep WANDB_API_KEY | sed 's/^WANDB_API_KEY=//'` openai wandb sync --project {project_name} < /dev/null

[34m[1mwandb[0m: Currently logged in as: [33mdaziff-berkeley[0m ([33maskwiki[0m). Use [1m`wandb login --relogin`[0m to force relogin
No new successful fine-tunes were found
🎉 wandb sync completed successfully


## Check syntactic correctness

In [132]:
def generate_sparql(question, model='ada', stop=None):
    response = run_prompt(f"{question} ->", model=model, stop=stop)
    # print(response)
    translation = response['choices'][0]['text']
    # print(translation)
    if translation is None or len(translation) == 0:
        return None
    # logging.info(f'sparql {translation}')
    return translation


In [137]:
generate_sparql("What is the name of Bill Gate's mother?", model=fine_tuned_model, stop=[" \n"])

' SELECT ?answer WHERE { wd:Q510543 wdt:P1081 ?X . ?X wdt:P1042 ?answer}'

In [156]:
def generate_lots_of_sparql(l, generator=None):
    result = []
    count = 1
    if generator is None:
        generator = lambda s: generate_sparql(s)
    for s in l:
        if count % 10 == 0:
            print(count)
        result.append(generator(s))
        count += 1
    print(count-1)
    return result


In [143]:
generator = lambda s: generate_sparql(s, model=fine_tuned_model, stop=[" \n"])

In [144]:
generator("What is the name of Bill Gate's mother?")

' SELECT ?answer WHERE { wd:Q42949 wdt:P166 ?X . ?X wdt:P1546 ?answer}'

In [146]:
sparqls = generate_lots_of_sparql(lcquad_df[-10: -1]['question'], generator=generator)

10


In [147]:
sparqls

[' SELECT ?value WHERE { wd:Q42159 p:P166 ?s . ?s ps:P166 wd:Q254075 . ?s pq:P585 ?value}',
 ' ASK WHERE { wd:Q20277 wdt:P2960 ?obj filter(?obj = 45.6) } ',
 '  select distinct ?obj where { wd:Q202785 wdt:P157 ?obj . ?obj wdt:P31 wd:Q1002657 } ',
 ' select ?ent where { ?ent wdt:P31 wd:Q2424 . ?ent wdt:P1079 ?obj . ?ent wdt:P25 wd:Q168070 } ORDER BY DESC(?obj)LIMIT 5 ',
 " SELECT DISTINCT ?sbj ?sbj_label WHERE { ?sbj wdt:P31 wd:Q878593 . ?sbj rdfs:label ?sbj_label . FILTER(CONTAINS(lcase(?sbj_label), 's')) . FILTER (lang(?sbj_label) = 'en') } LIMIT 25 ",
 " SELECT DISTINCT ?sbj ?sbj_label WHERE { ?sbj wdt:P31 wd:Q1439618 . ?sbj rdfs:label ?sbj_label . FILTER(STRSTARTS(lcase(?sbj_label), 'h')) . FILTER (lang(?sbj_label) = 'en') } LIMIT 25 ",
 " SELECT ?value WHERE { wd:Q174600 p:P26 ?s . ?s ps:P26 ?x filter(contains(?x,'117.6')) . ?s pq:P26 ?value}",
 ' select distinct ?answer where { wd:Q91291 wdt:P2819 ?answer}',
 " SELECT ?answer WHERE { wd:Q2210156 wdt:P2250 ?answer . ?answer wdt:P25

In [148]:
from wikibaseintegrator import wbi_helpers
from wikibaseintegrator.wbi_config import config as wbi_config
import logging

In [149]:
wbi_config['USER_AGENT'] = 'AskwikiBot/1.0 (https://www.wikidata.org/wiki/User:What_Tottles_Meant)'
wbi_config['BACKOFF_MAX_TRIES'] = 1


In [150]:
from requests.exceptions import HTTPError
def run_sparql(query):
    try:
        results = wbi_helpers.execute_sparql_query(query)
    except HTTPError as he:
        logging.error(f'HTTPError {he}')
        print(f"failed query {query}")
        return None
    # print(results)
    if 'boolean' in results:
        return pd.DataFrame([{'Boolean': results['boolean'] }])
    jsonResult = [dict([(k, b[k]['value']) for k in b]) for b in results['results']['bindings']]
    df = pd.DataFrame.from_dict(jsonResult)
    return df


In [151]:
def validate_queries(qs):
    validation_results = []
    count = 0
    for q in qs:
        # print(q)
        df = run_sparql(q)
        result_count = 0
        if df is None:
            run_result = 'Fail'
            print(f"Failed query number {count}")
        else:
            run_result = 'Pass'
            result_count = len(df)
        validation_results.append((run_result, result_count))
        count += 1
    return validation_results 

In [152]:
validate_queries(sparqls)

[('Pass', 0),
 ('Pass', 1),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0)]

In [157]:
sparqls = generate_lots_of_sparql(lcquad_df[0: 10]['question'], generator=generator)

10
10


In [158]:
validate_queries(sparqls)

[('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 1),
 ('Pass', 0),
 ('Pass', 0),
 ('Pass', 0)]

In [159]:
sparqls

[' SELECT ?answer WHERE { wd:Q868 wdt:P47 ?answer . ?answer wdt:P29 wd:Q107752}',
 ' SELECT ?answer WHERE { wd:Q93472 wdt:P166 ?X . ?X wdt:P1446 ?answer}',
 ' SELECT ?answer WHERE { wd:Q228344 wdt:P100 ?X . ?X wdt:P2294 ?answer}',
 ' select distinct ?answer where { wd:Q979 wdt:P4719 ?answer}',
 ' select distinct ?answer where { wd:Q2165 wdt:P2260 ?answer}',
 " SELECT DISTINCT ?sbj ?sbj_label WHERE { ?sbj wdt:P31 wd:Q186979 . ?sbj rdfs:label ?sbj_label . FILTER(STRSTARTS(lcase(?sbj_label), 'p')) . FILTER (lang(?sbj_label) = 'en') } LIMIT 25 ",
 ' ASK WHERE { wd:Q139813 wdt:P2149 ?obj filter(?obj < 15.139813) } ',
 '  select distinct ?obj where { wd:Q737 wdt:P47 ?obj . ?obj wdt:P31 wd:Q367912 } ',
 ' select distinct ?answer where { wd:Q4235 wdt:P3691 ?answer}',
 ' SELECT ?answer WHERE { wd:Q242456 wdt:P119 ?X . ?X wdt:P802 ?answer}']

In [161]:
lcquad_df[0:10]['sparql_wikidata']

0     select distinct ?obj where { wd:Q188920 wdt:P...
1    SELECT ?answer WHERE { wd:Q169794 wdt:P26 ?X ....
2    ASK WHERE { wd:Q174843 wdt:P106 wd:Q1804811 . ...
3    SELECT ?answer WHERE { wd:Q675176 wdt:P515 ?X ...
4    select distinct ?answer where { wd:Q32491 wdt:...
5    SELECT DISTINCT ?sbj ?sbj_label WHERE { ?sbj w...
6    ASK WHERE { wd:Q4180017 wdt:P6257 ?obj filter(...
7     select distinct ?obj where { wd:Q202729 wdt:P...
8    select distinct ?answer where { wd:Q235975 wdt...
9    SELECT ?answer WHERE { wd:Q1356316 wdt:P156 ?X...
Name: sparql_wikidata, dtype: object

In [162]:
validate_queries(lcquad_df[0:10]['sparql_wikidata'])

[('Pass', 1),
 ('Pass', 1),
 ('Pass', 1),
 ('Pass', 0),
 ('Pass', 1),
 ('Pass', 0),
 ('Pass', 1),
 ('Pass', 0),
 ('Pass', 1),
 ('Pass', 1)]

In [169]:
lcquad_df.iloc[6]['paraphrased_question']

'Does malin 1 have a right ascension lower than 15.1398?'

In [170]:
lcquad_df.iloc[6]['sparql_wikidata']

'ASK WHERE { wd:Q4180017 wdt:P6257 ?obj filter(?obj < 15.1398) } '

In [168]:
sparqls[6]

' ASK WHERE { wd:Q139813 wdt:P2149 ?obj filter(?obj < 15.139813) } '

In [165]:
lcquad_df.iloc[6]

NNQT_question           Does the {right ascension} of the {Malin 1} {l...
uid                                                                 18423
subgraph                                              boolean with filter
template_index                                                        441
question                Is the right ascension of malin 1 less than 15...
sparql_wikidata         ASK WHERE { wd:Q4180017 wdt:P6257 ?obj filter(...
sparql_dbpedia18        ASK { ?statement1 <http://www.w3.org/1999/02/2...
template                            ASK ?sbj ?pred ?obj filter ?obj = num
answer                                                                 []
template_id                                                             3
paraphrased_question    Does malin 1 have a right ascension lower than...
Name: 6, dtype: object

In [167]:
lcquad_df.iloc[6]['paraphrased_question']

'Does malin 1 have a right ascension lower than 15.1398?'