- **Step 1: Connect to the Big Query - Google Cloud Platform** In this step, we imported our dataset on Big Query and generated the keys to have access to the database from our notebook **NOTE** Big query is not free for too many queries, so please beware of not sending to many queries at the time
- **Step 2: Test the database and simple queries** In this step, we created simple SQL queries to test the connection
- **Step 3: Sample query from chatGPT and fact check!** In this step, we created sample queries and looked at the results given by our database, more on the notebook.

In [36]:
from google.colab import auth
auth.authenticate_user()
print("Authenticated")

Authenticated


In [53]:
import pandas as pd
from google.cloud import bigquery

# Create a "Client" object
PROJECT_NAME = "fact-ai-checker" # REPLACE WITH YOUR PROJECT NAME

client = bigquery.Client(project=PROJECT_NAME)

In [54]:
# Create a reference to the World Cup dataset
dataset_ref = client.dataset("fact_ai_database", project=PROJECT_NAME)  # CHANGE WITH YOUR OWN DATASET (NOT TABLE) NAME

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

In [55]:
# List the tables in the dataset
tables = list(client.list_tables(dataset))
for table in tables:
    print(table.table_id)

presidents


In [56]:
# Create a reference to the "Teams" table and fetch the table
table_ref = dataset_ref.table("presidents") # Make sure it matches the table name printed above (case sensitive)
presidents = client.get_table(table_ref)

In [57]:
# Display the schema of the "Teams" table
presidents.schema

[SchemaField('president_name', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('pres_num', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('associated_party', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('begin_office', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('end_office', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('url_source', 'STRING', 'NULLABLE', None, None, (), None)]

In [58]:
client.list_rows(presidents, max_results=5).to_dataframe()

Unnamed: 0,president_name,pres_num,associated_party,begin_office,end_office,url_source
0,John Tyler,10,Whig,1841,1845,
1,Zachary Taylor,12,Whig,1849,1850,
2,Millard Fillmore,13,Whig,1850,1853,
3,Andrew Jackson,7,Democratic,1829,1837,
4,Martin Van Buren,8,Democratic,1837,1841,


In [85]:
q1 = """
select *
from %s.fact_ai_database.presidents
""" %PROJECT_NAME

query_job = client.query(q1)
query_job.to_dataframe()

Unnamed: 0,president_name,pres_num,associated_party,begin_office,end_office,url_source
0,John Tyler,10,Whig,1841,1845,
1,Zachary Taylor,12,Whig,1849,1850,
2,Millard Fillmore,13,Whig,1850,1853,
3,Andrew Jackson,7,Democratic,1829,1837,
4,Martin Van Buren,8,Democratic,1837,1841,
5,James K. Polk,11,Democratic,1845,1849,
6,Franklin Pierce,14,Democratic,1853,1857,
7,James Buchanan,15,Democratic,1857,1861,
8,Grover Cleveland,22,Democratic,1885,1889,
9,Grover Cleveland,24,Democratic,1893,1897,


In [59]:
from google.cloud import bigquery
# Set variables


In [67]:
def evaluate_query(president_name,pres_num,begin_office,end_office):
  # Construct the parameterized query
  query_template = """
  SELECT *
  FROM {}.fact_ai_database.presidents
  WHERE president_name = @president_name AND pres_num = @pres_num AND begin_office = @begin_office AND end_office= @end_office
  """
  # Prepare the job configuration with the parameters
  job_config = bigquery.QueryJobConfig(
      query_parameters=[
          bigquery.ScalarQueryParameter("president_name", "STRING", president_name),
          bigquery.ScalarQueryParameter("pres_num", "STRING", pres_num),
          bigquery.ScalarQueryParameter("begin_office", "STRING", begin_office),
          bigquery.ScalarQueryParameter("end_office", "STRING", end_office),
      ]
  )
  formatted_query = query_template.format(PROJECT_NAME)
  query_job = client.query(formatted_query, job_config=job_config)
  result_df = query_job.to_dataframe()

  if len(result_df) == 1:
      return result_df
  else:
      return("This is incorrect!")


#### Example 1:
- **Question 1:** "From when to when was Barack Obama the president of the United States" ?
- **Answer 1:** "**Barack Obama** served as the **44th** President of the United States. His presidency began on January 20, **2009**, and he served two terms, concluding on January 20, **2017**. Therefore, Barack Obama was the President of the United States from January 20, 2009, to January 20, 2017."


In [74]:
president_name = "Barack Obama"
pres_num = "44"
begin_office = "2009"
end_office = "2017"
fact_checking = evaluate_query(president_name,pres_num,begin_office,end_office)

In [75]:
fact_checking

Unnamed: 0,president_name,pres_num,associated_party,begin_office,end_office,url_source
0,Barack Obama,44,Democratic,2009,2017,"['https://fr.wikipedia.org/wiki/Barack_Obama',..."


In [76]:
for index, row in fact_checking.iterrows():
    president_name = row['president_name']
    url_sources = row['url_source']
    print(f"President: {president_name}")
    print(f"Source of info: {url_sources}")

President: Barack Obama
Source of info: ['https://fr.wikipedia.org/wiki/Barack_Obama', 'https://www.whitehouse.gov/about-the-white-house/presidents/barack-obama/']


#### Example 2:
- **Question 2:** "From when to when was George W. Bush the president of the United States" ?
- **Answer 2:** "**George W.** Bush served as the **43rd** President of the United States. His presidency began on January 20, **2001**, and he served two terms, concluding on January 20, **2009**. Therefore, George W. Bush was the President of the United States from January 20, 2001, to January 20, 2009."


In [77]:
president_name = "George W. Bush"
pres_num = "43"
begin_office = "2001"
end_office = "2009"
fact_checking = evaluate_query(president_name,pres_num,begin_office,end_office)

In [78]:
fact_checking

Unnamed: 0,president_name,pres_num,associated_party,begin_office,end_office,url_source
0,George W. Bush,43,Republican,2001,2009,['https://fr.wikipedia.org/wiki/George_W._Bush...


In [79]:
for index, row in fact_checking.iterrows():
    president_name = row['president_name']
    url_sources = row['url_source']
    print(f"President: {president_name}")
    print(f"Source of info: {url_sources}")

President: George W. Bush
Source of info: ['https://fr.wikipedia.org/wiki/George_W._Bush', 'https://www.whitehouse.gov/about-the-white-house/presidents/george-w-bush/']


#### Example 3:
- **Question 3:** "From when to when was William (Bill) Clinton the president of the United States" ?
- **Answer 3:** "**William (Bill) Clinton** served as the **42**nd President of the United States. His presidency began on January 20, **1993**, and he served two terms, concluding on January 20, **2001**. Therefore, Bill Clinton was the President of the United States from January 20, 1993, to January 20, 2001."


In [80]:
president_name = "William (Bill) Clinton"
pres_num = "42"
begin_office = "1993"
end_office = "2001"
fact_checking = evaluate_query(president_name,pres_num,begin_office,end_office)

In [81]:
fact_checking

Unnamed: 0,president_name,pres_num,associated_party,begin_office,end_office,url_source
0,William (Bill) Clinton,42,Democratic,1993,2001,"['https://fr.wikipedia.org/wiki/Bill_Clinton',..."


In [82]:
for index, row in fact_checking.iterrows():
    president_name = row['president_name']
    url_sources = row['url_source']
    print(f"President: {president_name}")
    print(f"Source of info: {url_sources}")

President: William (Bill) Clinton
Source of info: ['https://fr.wikipedia.org/wiki/Bill_Clinton', 'https://www.whitehouse.gov/about-the-white-house/presidents/william-j-clinton/']


#### Example 4: Same example as before, but with an error ! let's put 41st president instead of 42

In [83]:
president_name = "William (Bill) Clinton"
pres_num = "41"
begin_office = "1993"
end_office = "2001"
fact_checking = evaluate_query(president_name,pres_num,begin_office,end_office)

In [84]:
fact_checking

'This is incorrect!'