Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Program written in Java to interact with the CIA World Factbook to answer trivia-ish questions
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.


This is a program for connecting to the CIA World Factbook in order to answer a series of questions my COMS 1007 instructor asked us to answer. 

The questions are:

1. List countries in South America that are prone to earthquakes.
2. Find the country with the lowest elevation point in Europe.
3. List all countries in the southeastern hemisphere.
4. List countries in Asia with more than 10 political parties.
5. Find all countries that have the color blue in their flag.
6. Find the top 5 countries with the highest electricity consumption per capita. (Electricity
consumption % population)
7. A landlocked country is one that is entirely enclosed by land. For example, Austria is landlocked
and shares its borders with Germany, Czech Republic, Hungary, etc. There are certain countries
that are entirely landlocked by a single country. Find these countries.
8. I want to go on a vacation with a friend. Our goal is to visit as many capital cities as we can in as
short a geographical distance as possible. To make things easier (and not worry about spherical geometry), we are fine with travelling to capitals that are within 10 degrees of latitude and longitude of each other. Find the lat/long coordinates and the list of countries/capitals so that the number of capitals is maximized.
9. Wild card – List countries who grant their citizens universal suffrage at age 20.
10. Wild card – List countries with unemployment rates below 5%. 


What I first did was get the URLs and country names for every entry in the
World Factbook. Then I made a list of blacklisted territories/dependencies
to remove from the list of countries. When I have an arraylist of just
countries that are legitimate countries, I opened up http requests to
parse the entirety of their html pages into a string, which is then passed
into a new Text object to hold the string. I can then access the html from
Text objects instead of opening up http requests again, which increases
the speed of the program.

Question 1: I used regexes to find the continent name and parsed out the
paragraph that describes the natural hazhard. If the continent name equals
the query continent name, then the program looks through the natural
hazhard description to look for the name of the natural hazhard (eg.

Question 2: I used a regex to first find the continent that matches the
query continent, and then from there I parse the lowest elevation to a
String, and converted it to a Double. Then I keep track of the current
lowest elevation, replacing it if there is a lower elevation. The final
lowest elevation is then printed. 

Question 3: I used a regex that finds the geographic coordinates line that
lists the latitude and longitude. Then I check if the latitude is North or
South depending on the query, and then the longitude for East or West
depending on the query. Then the countries that satisfy the query are

Question 4: I first used a regex to identify if the current country is in
the continent of the query continent. If so, then I went on to parse the
paragraph that lists the political parties. Because they are separated by
a semicolon, I count the number of semicolons a country's page has, and if
it exceeds the indicated number of political parties asked by the user,
the country is printed.

Question 5: I used a regex to parse out the description of the current
country's flag. From there, I know that for each color, there is a
description about what it stands for, so there are always spaces around
the color. I look for the color that the user is querying with spaces
around it, and return all the countries that contain " blue ", for

Question 6: For this question, I removed all unnecessary spaces from the
full text of the HTML itself in order to make parsing using regexes
easier. Because there are different electricity categories (consumption,
production, etc.) it was a bit difficult to find a regex that worked for
consumption. Then I parse out the electricity and the extension
(trillion, billion, million) in order to have correct calculations.
Parsing out population was less of an ordeal. I used another regex for that
one and saved it to a String and converted it to a Double. I scaled
everything down by 1000 when I was doing calcuations because a trillion
is not a number that Java can hold in a double or an int. So scaling it
was an easy way around that problem. After all of that parsing and
converting, electricity consumption is divided by population. An
arraylist containing country names and containing the calculations in
order are used for reference and the arraylist containing the
consumption per capita is sorted. Then the index of the highest 5, or
whatever the user wants, countries are returned by looking through the
reference arraylists.

Question 7: For this one, I looked for the word "landlocked" first. Then
if that was true, I looked at the border countries. The countries are
separated by commas, so I counted the commas in the parsed section of
HTML which I again got using a regex. If the number of commas equaled 1,
then that meant that the country was landlocked within another country.

Question 8: For this question, I parsed out the name of the capital and
its coordinates for each country. The latitude and longitude were
converted 0 - 180 and 0 - 360 respectively. Then they were parsed into
Doubles and passed in as parameters for a Capital object, which were
added to a capitals arraylist of Capitals. Then I created a 2D array that
is one greater dimensionally than the size of the capitals arraylist (I
also removed a country because it does not have an official capital -
Nauru). The array is .size() + 1 because the first row and columns hold
the information for each capital. Then the capitals are added to the
array's top row and leftmost column. After that, I implemented an
algorithm that calculates the difference between latitude and longitude
of the two capitals in each cell of the "grid". If it comes out less than
5 for both latitude and longitude (because it uses a capital is the center point), 
the cell of the grid is marked as "Yes". Else it is marked
as "No." After this, the the number of Yeses per
row is counted and stored in an arraylist. Then I extract the highest
number of yeses from the arraylist, find the index of the highest number
of yeses, add 1 since the 2D array had to account for capital names and
was thus one index larger than the number of capitals. 
With the row index of the largest number of Yeses figured out, I
implemented a for loop to return the capital information that had a "Yes"
checked in its place on the grid.

Note: This is not the most efficient algorithm as it only calculates maximum number
of capitals using one capital as the center point.

Question 9: This was one of my wildcard questions, which was find the
countries that grant universal suffrage at a certain age that the user
could input. I used a regex to parse out the suffrage information, and
checked if it contained the query age. 

Question 10: Another wildcard; find the countries that have an
unemployment rate below a certain percentage. I used a regex to parse out 
the unemployment information, parsed the number into a double, and then
checked if it was less than the query percentage.
Something went wrong with that request. Please try again.