In [18]:
import logging, sys
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Uncomment if you want to temporarily disable logger
logging.disable(sys.maxsize)

In [19]:
# fetch "New York City" page from Wikipedia
from pathlib import Path

import requests
response = requests.get(
    'https://en.wikipedia.org/w/api.php',
    params={
        'action': 'query',
        'format': 'json',
        'titles': 'New York City',
        'prop': 'extracts',
        # 'exintro': True,
        'explaintext': True,
    }
).json()
page = next(iter(response['query']['pages'].values()))
nyc_text = page['extract']

data_path = Path('data')
if not data_path.exists():
    Path.mkdir(data_path)

with open('data/nyc_text.txt', 'w') as fp:
    fp.write(nyc_text)

In [3]:
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = ""

In [2]:
from llama_index import GPTTreeIndex, SimpleDirectoryReader, LLMPredictor, ServiceContext
from llama_index.logger import LlamaLogger
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI

In [3]:
# gpt-3 (davinci)
llm_predictor_gpt3 = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))
service_context_gpt3 = ServiceContext.from_defaults(llm_predictor=llm_predictor_gpt3)

# gpt-4
llm_predictor_gpt4 = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
service_context_gpt4 = ServiceContext.from_defaults(llm_predictor=llm_predictor_gpt4)

In [5]:
documents = SimpleDirectoryReader('data').load_data()

In [None]:
index = GPTTreeIndex.from_documents(documents, service_context=service_context_gpt4)

In [5]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt4,
    verbose=True
)
response_gpt4 = query_engine.query(
    "What battles took place in New York City in the American Revolution?",
)

> Starting query: What battles took place in New York City in the American Revolution?
>[Level 0] Current response: ANSWER: 2

This summary was selected because it discusses the history of New York City, including significant events during the American Revolution, such as the Battle of Long Island. The other summaries do not mention battles or the American Revolution.
>[Level 0] Selected node: [2]/[2]
>[Level 0] Node [2] Summary text: New York City has a rich history, from its beginnings as a Dutch trading post to its growth into a major global city. Slavery played a significant role in the city's early economy, and the African Burying Ground was discovered during construction in the 1990s. The American Revolution saw significant events in New York, including the Battle of Long Island and the establishment of the city as the national capital. Throughout the 19th century, the city's population grew rapidly, and it became a center for industry, commerce, and immigration. The 20th century

In [10]:
str(response_gpt4)

'The Battle of Long Island took place in New York City in the American Revolution.'

In [8]:
response_gpt4.source_nodes[0]

SourceNode(source_text="South Carolina. Most cases were that of domestic slavery, as a New York household then commonly enslaved few or several people. Others were hired out to work at labor. Slavery became integrally tied to New York's economy through the labor of slaves throughout the port, and the banking and shipping industries trading with the American South. During construction in Foley Square in the 1990s, the African Burying Ground was discovered; the cemetery included 10,000 to 20,000 of graves of colonial-era Africans, some enslaved and some free.The 1735 trial and acquittal in Manhattan of John Peter Zenger, who had been accused of seditious libel after criticizing colonial governor William Cosby, helped to establish the freedom of the press in North America. In 1754, Columbia University was founded under charter by King George II as King's College in Lower Manhattan.\n\n\n=== American Revolution ===\n\nThe Stamp Act Congress met in New York in October 1765, as the Sons of L

In [30]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt3,
    verbose=True
)
response_gpt3 = query_engine.query(
    "What battles took place in New York City in the American Revolution?",
)

> Starting query: What battles took place in New York City in the American Revolution?
>[Level 0] Current response: 
ANSWER: 2. This summary was selected because it mentions the American Revolution and the significant events that took place in New York City during this time, such as the Battle of Long Island and the establishment of the city as the national capital.
>[Level 0] Selected node: [2]/[2]
>[Level 0] Node [2] Summary text: New York City has a rich history, from its beginnings as a Dutch trading post to its growth into a major global city. Slavery played a significant role in the city's early economy, and the African Burying Ground was discovered during construction in the 1990s. The American Revolution saw significant events in New York, including the Battle of Long Island and the establishment of the city as the national capital. Throughout the 19th century, the city's population grew rapidly, and it became a center for industry, commerce, and immigration. The 20th century s

In [12]:
str(response_gpt3)

'No battles took place in New York City during the American Revolution. The city was occupied by British forces from 1776 to 1783. The Battle of Long Island (1776), the Battle of White Plains (1776), and the Battle of Fort Washington (1776) all took place in the vicinity of New York City, but not within the city itself.'

In [13]:
response_gpt3.source_nodes[0]

SourceNode(source_text="but his proposal was not acted on. Anger at new military conscription laws during the American Civil War (1861–1865), which spared wealthier men who could afford to pay a $300 (equivalent to $6,602 in 2021) commutation fee to hire a substitute, led to the Draft Riots of 1863, whose most visible participants were ethnic Irish working class.The draft riots deteriorated into attacks on New York's elite, followed by attacks on Black New Yorkers and their property after fierce competition for a decade between Irish immigrants and Black people for work. Rioters burned the Colored Orphan Asylum to the ground, with more than 200 children escaping harm due to efforts of the New York Police Department, which was mainly made up of Irish immigrants. At least 120 people were killed. Eleven Black men were lynched over five days, and the riots forced hundreds of Blacks to flee the city for Williamsburg, Brooklyn, and New Jersey. The Black population in Manhattan fell below 10,

In [31]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt4,
    verbose=True
)

response_gpt4 = query_engine.query(
    "What are the airports in New York City?",
)

> Starting query: What are the airports in New York City?
>[Level 0] Current response: ANSWER: 9

This summary was selected because it mentions that New York City is served by three major airports and discusses plans to expand and improve facilities at LaGuardia Airport. None of the other summaries specifically mention airports in the city.
>[Level 0] Selected node: [9]/[9]
>[Level 0] Node [9] Summary text: New York City is known for its efforts in environmental sustainability, with initiatives such as the Citi Bike program, green office buildings, and a commitment to reducing greenhouse gas emissions. The city's water supply comes from the protected Catskill Mountains watershed, providing pure drinking water without the need for purification. New York City's air quality is within the recommended limits of the World Health Organization, and efforts are being made to revitalize polluted areas like Newtown Creek. The city's government operates under a strong mayor-council system, with th

In [15]:
str(response_gpt4)

'The context information does not provide a complete list of airports in New York City. However, it mentions John F. Kennedy International Airport (JFK), LaGuardia Airport, and Stewart International Airport near Newburgh, New York. Other airports mentioned in the context serving the New York metropolitan area include Long Island MacArthur Airport, Trenton–Mercer Airport, Westchester County Airport, and Teterboro Airport (a general aviation airport).'

In [17]:
response_gpt4.source_nodes[0].source_text

"were the busiest and fourth busiest U.S. gateways for international air passengers, respectively, in 2012; as of 2011, JFK was the busiest airport for international passengers in North America.Plans have advanced to expand passenger volume at a fourth airport, Stewart International Airport near Newburgh, New York, by the Port Authority of New York and New Jersey. Plans were announced in July 2015 to entirely rebuild LaGuardia Airport in a multibillion-dollar project to replace its aging facilities. Other commercial airports in or serving the New York metropolitan area include Long Island MacArthur Airport, Trenton–Mercer Airport and Westchester County Airport. The primary general aviation airport serving the area is Teterboro Airport.\n\n\n=== Ferries ===\n\nThe Staten Island Ferry is the world's busiest ferry route, carrying more than 23 million passengers from July 2015 through June 2016 on the 5.2-mile (8.4 km) route between Staten Island and Lower Manhattan and running 24 hours a 

In [32]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt3,
    verbose=True
)

response_gpt3 = query_engine.query(
    "What are the airports in New York City?",
)

> Starting query: What are the airports in New York City?
>[Level 0] Current response: 
ANSWER: 8. This summary provides information about the city's transportation system, which includes three major airports. It also mentions plans to expand and improve facilities at LaGuardia Airport, which indicates that the airports in New York City are a current topic of discussion.
>[Level 0] Selected node: [8]/[8]
>[Level 0] Node [8] Summary text: New York City is home to numerous cultural institutions, historic sites, and diverse cuisine influenced by its immigrant history. The city is known for its street parades, distinctive regional accent, and sports teams across various leagues. Environmental issues are a concern due to the city's size and density, but efforts have been made to reduce its environmental impact and carbon footprint. New York City has a high rate of public transit use, a growing number of cyclists, and is considered the most walkable large city in the United States.
>[Level 1

In [33]:
print(str(response_gpt3))

The airports in New York City are John F. Kennedy International Airport (JFK), LaGuardia Airport (LGA), Newark Liberty International Airport (EWR), and Stewart International Airport (SWF).


In [21]:
response_gpt3.source_nodes[0].source_text

'it is one of the country\'s biggest sources of pollution and has the lowest per-capita greenhouse gas emissions rate and electricity usage. Governors Island is planned to host a US$1 billion research and education center to make New York City the global leader in addressing the climate crisis.\n\n\n=== Environmental impact reduction ===\nNew York City has focused on reducing its environmental impact and carbon footprint. Mass transit use in New York City is the highest in the United States. Also, by 2010, the city had 3,715 hybrid taxis and other clean diesel vehicles, representing around 28% of New York\'s taxi fleet in service, the most of any city in North America. New York City is the host of Climate Week NYC, the largest Climate Week to take place globally and regarded as major annual climate summit.\nNew York\'s high rate of public transit use, more than 200,000 daily cyclists as of 2014, and many pedestrian commuters make it the most energy-efficient major city in the United St

In [None]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt4,
    verbose=True
)

response_gpt4 = query_engine.query(
    "What battles took place in New York City in the American Revolution?",
)

In [None]:
print(str(response_gpt4))

In [None]:
response_gpt4.source_nodes[0].source_text

In [28]:
response_gpt4 = query_engine.query(
    "Who is the current mayor of New York City?",
)

> Starting query: Who is the current mayor of New York City?
>[Level 0] Current response: ANSWER: 9

This summary was selected because it specifically mentions the current mayor of New York City, Eric Adams.
>[Level 0] Selected node: [9]/[9]
>[Level 0] Node [9] Summary text: New York City is known for its efforts in environmental sustainability, with initiatives such as the Citi Bike program, green office buildings, and a commitment to reducing greenhouse gas emissions. The city's water supply comes from the protected Catskill Mountains watershed, providing pure drinking water without the need for purification. New York City's air quality is within the recommended limits of the World Health Organization, and efforts are being made to revitalize polluted areas like Newtown Creek. The city's government operates under a strong mayor-council system, with the current mayor being Eric Adams. The Democratic Party holds the majority of public offices in the city.

New York City's extensive tra

In [29]:
print(str(response_gpt4))

The context information does not provide the name of the current mayor of New York City.


In [24]:
response_gpt4.source_nodes[0].source_text

"carried by a Republican  presidential election since President Calvin Coolidge won the five boroughs in 1924. A Republican candidate for statewide office has not won all five boroughs of the city since it was incorporated in 1898. In 2012, Democrat Barack Obama became the first presidential candidate of any party to receive more than 80% of the overall vote in New York City, sweeping all five boroughs. Party platforms center on affordable housing, education, and economic development, and labor politics are of importance in the city. Thirteen out of 27 U.S. congressional districts in the state of New York include portions of New York City.New York is one of the most important sources of political fundraising in the United States. At least four of the top five ZIP Codes in the nation for political contributions were in Manhattan for the 2004, 2006, and 2008 elections. The top ZIP Code, 10021 on the Upper East Side, generated the most money for the 2004 presidential campaigns of George W

In [8]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt3,
    verbose=True
)

response_gpt3 = query_engine.query(
    "Who is the current mayor of New York City?",
)

> Starting query: Who is the current mayor of New York City?
>[Level 0] Current response: 
ANSWER: 8. This summary provides information about the city's government, which includes the current mayor, Eric Adams.
>[Level 0] Selected node: [8]/[8]
>[Level 0] Node [8] Summary text: New York City is home to numerous cultural institutions, historic sites, and diverse cuisine influenced by its immigrant history. The city is known for its street parades, distinctive regional accent, and sports teams across various leagues. Environmental issues are a concern due to the city's size and density, but efforts have been made to reduce its environmental impact and carbon footprint. New York City has a high rate of public transit use, a growing number of cyclists, and is considered the most walkable large city in the United States.
>[Level 1] Current response: 
ANSWER: 10. This summary does not provide any information about the current mayor of New York City, so it is not relevant to the question.
>[L

In [9]:
print(str(response_gpt3))

The current mayor of New York City is Bill de Blasio.


In [10]:
response_gpt3.source_nodes[0].source_text

'it is one of the country\'s biggest sources of pollution and has the lowest per-capita greenhouse gas emissions rate and electricity usage. Governors Island is planned to host a US$1 billion research and education center to make New York City the global leader in addressing the climate crisis.\n\n\n=== Environmental impact reduction ===\nNew York City has focused on reducing its environmental impact and carbon footprint. Mass transit use in New York City is the highest in the United States. Also, by 2010, the city had 3,715 hybrid taxis and other clean diesel vehicles, representing around 28% of New York\'s taxi fleet in service, the most of any city in North America. New York City is the host of Climate Week NYC, the largest Climate Week to take place globally and regarded as major annual climate summit.\nNew York\'s high rate of public transit use, more than 200,000 daily cyclists as of 2014, and many pedestrian commuters make it the most energy-efficient major city in the United St

In [12]:
logger = LlamaLogger()
query_engine = index.as_query_engine(
    service_context=service_context_gpt4,
    llama_logger=logger,
    verbose=True,
)
response_gpt4 = query_engine.query(
    "What is the demographic breakdown of NYC by ethnicity?",
)

> Starting query: What is the demographic breakdown of NYC by ethnicity?
>[Level 0] Current response: ANSWER: 4

This summary was selected because it provides the demographic breakdown of New York City by ethnicity, including percentages for White (non-Hispanic), Hispanic or Latino, Black or African American (non-Hispanic), Asian, and Native American (non-Hispanic) populations.
>[Level 0] Selected node: [4]/[4]
>[Level 0] Node [4] Summary text: New York City experiences a humid subtropical climate with cold winters and hot summers. The city receives 49.5 inches of precipitation annually, and average winter snowfall is 29.8 inches. Hurricanes and tropical storms are rare, but Hurricane Sandy in 2012 prompted discussions of constructing seawalls and coastal barriers. The city has a complex park system, including national, state, and city parks. The largest municipal park is Pelham Bay Park in the Bronx. New York City is the most populous city in the United States, with 8,804,190 resident

In [14]:
str(response_gpt4)

"The context information does not provide a complete demographic breakdown of NYC by ethnicity. However, it mentions that in 1940, Whites represented 92% of the city's population, and approximately 37% of the city's population is foreign-born as of 2013. The ten largest sources of foreign-born individuals in the city as of 2011 were the Dominican Republic, China, Mexico, Guyana, Jamaica, Ecuador, Haiti, India, Russia, and Trinidad and Tobago. Additionally, the Asian American population in New York City numbers more than one million according to the 2010 census."

In [16]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt3,
    llama_logger=logger,
    verbose=True,
)

response_gpt3 = query_engine.query(
    "What is the demographic breakdown of NYC by ethnicity?",
)

> Starting query: What is the demographic breakdown of NYC by ethnicity?
>[Level 0] Current response: 
ANSWER: 5. This summary provides information on the population of New York City, including the percentage of White (non-Hispanic), Hispanic or Latino, Black or African American (non-Hispanic), Asian, and Native American (non-Hispanic) residents. It also mentions the city's large foreign-born population and the fastest-growing nationality in the state. This information is directly relevant to the question of the demographic breakdown of NYC by ethnicity.
>[Level 0] Selected node: [5]/[5]
>[Level 0] Node [5] Summary text: New York City is a diverse and multicultural metropolis, with a population of over 8.4 million people. The city has the largest European and non-Hispanic white population of any American city, as well as the largest Jewish population of any city in the world. The city is also home to a significant number of Asian, African, and Latin American immigrants, with various et

In [18]:
str(response_gpt3)

'The demographic breakdown of NYC by ethnicity is as follows: Ukrainian Americans (55,000), Scottish Americans (35,000), Spanish Americans (30,838), Norwegian Americans (20,000), Swedish Americans (20,000), Czech Americans (12,000-14,000), Lithuanian Americans (12,000-14,000), Portuguese Americans (12,000-14,000), Scotch-Irish Americans (12,000-14,000), Welsh Americans (12,000-14,000), Arab Americans (over 160,000), Central Asian Americans (over 30,000), Albanian Americans (concentrated in the Bronx), Greek Americans (concentrated in Astoria, Queens), Cypriot Americans (concentrated in Astoria, Queens), Jewish Americans (1.6 million), Indian Americans (20%), Korean Americans (15%), and other ethnicities.'

In [19]:
logger = LlamaLogger()
query_engine = index.as_query_engine(
    service_context=service_context_gpt4,
    llama_logger=logger,
    verbose=True,
)

response_gpt4 = query_engine.query(
    "Why is congestion pricing in NYC being introduced?",
)

> Starting query: Why is congestion pricing in NYC being introduced?
>[Level 0] Current response: ANSWER: (9) This summary was selected because it mentions congestion pricing in New York City, stating that it is set to go into effect in 2022. The other summaries do not mention congestion pricing or its purpose.
>[Level 0] Selected node: [9]/[9]
>[Level 0] Node [9] Summary text: New York City is known for its efforts in environmental sustainability, with initiatives such as the Citi Bike program, green office buildings, and a commitment to reducing greenhouse gas emissions. The city's water supply comes from the protected Catskill Mountains watershed, providing pure drinking water without the need for purification. New York City's air quality is within the recommended limits of the World Health Organization, and efforts are being made to revitalize polluted areas like Newtown Creek. The city's government operates under a strong mayor-council system, with the current mayor being Eric Ada

In [23]:
str(response_gpt4)

'The context information does not explicitly state why congestion pricing is being introduced in NYC.'

In [27]:
str(response_gpt4.source_nodes[0].source_text)

"in September 2016, to shuttle riders between the Jersey Shore and Manhattan, anticipated to start service in 2017; this would be the largest vessel in its class.\n\n\n=== Taxis, vehicles for hire, and trams ===\n\nOther features of the city's transportation infrastructure encompass 13,587 yellow taxicabs; other vehicle for hire companies; and the Roosevelt Island Tramway, an aerial tramway that transports commuters between Roosevelt Island and Manhattan Island.\n\n\n=== Streets and highways ===\n\nDespite New York's heavy reliance on its vast public transit system, streets are a defining feature of the city. The Commissioners' Plan of 1811 greatly influenced the city's physical development. Several of the city's streets and avenues, including Broadway, Wall Street, Madison Avenue, and Seventh Avenue are also used as metonyms for national industries there: the theater, finance, advertising, and fashion organizations, respectively.\nNew York City also has an extensive web of freeways an

In [21]:
query_engine = index.as_query_engine(
    service_context=service_context_gpt3,
    llama_logger=logger, 
    verbose=True
)

response_gpt3 = query_engine.query(
    "Why is congestion pricing in NYC being introduced?",
)

> Starting query: Why is congestion pricing in NYC being introduced?
>[Level 0] Current response: 
ANSWER: 8. This summary provides information about the city's transportation infrastructure, including the introduction of congestion pricing in 2022. It also mentions the city's extensive public transit system, which is heavily relied upon, and the plans to expand and improve facilities at LaGuardia Airport. This information is relevant to the question of why congestion pricing is being introduced in NYC.
>[Level 0] Selected node: [8]/[8]
>[Level 0] Node [8] Summary text: New York City is home to numerous cultural institutions, historic sites, and diverse cuisine influenced by its immigrant history. The city is known for its street parades, distinctive regional accent, and sports teams across various leagues. Environmental issues are a concern due to the city's size and density, but efforts have been made to reduce its environmental impact and carbon footprint. New York City has a high r

In [22]:
str(response_gpt3)

'Congestion pricing in New York City is being introduced in order to reduce traffic congestion and raise revenue for public transportation improvements. The plan is to charge drivers a fee for entering certain parts of the city during peak hours, with the goal of reducing traffic and encouraging people to use public transportation instead.'