# Data Explotarion

### Description of variables in the dataset

SOURCE_SUBREDDIT: the subreddit where the link originates

TARGET_SUBREDDIT: the subreddit where the link ends

POST_ID: the post in the source subreddit that starts the link

TIMESTAMP: time time of the post

POST_LABEL: label indicating if the source post is explicitly negative towards the target post. 

The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. 

The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.

POST_PROPERTIES: a vector representing the text properties of the source post, listed as a list of comma separated numbers. The vector elements are the following:

01. Number of characters
2. Number of characters without counting white space
3. Fraction of alphabetical characters
4. Fraction of digits
5. Fraction of uppercase characters
6. Fraction of white spaces
7. Fraction of special characters, such as comma, exclamation mark, etc.
8. Number of words
9. Number of unique works
10. Number of long words (at least 6 characters)
11. Average word length
12. Number of unique stopwords
13. Fraction of stopwords
14. Number of sentences
15. Number of long sentences (at least 10 words)
16. Average number of characters per sentence
17. Average number of words per sentence
18. Automated readability index
19. Positive sentiment calculated by VADER
20. Negative sentiment calculated by VADER
21. Compound sentiment calculated by VADER
22. LIWC_Funct
23. LIWC_Pronoun
24. LIWC_Ppron
25. LIWC_I
26. LIWC_We
27. LIWC_You
28. LIWC_SheHe
29. LIWC_They
30. LIWC_Ipron
31. LIWC_Article
32. LIWC_Verbs
33. LIWC_AuxVb
34. LIWC_Past
35. LIWC_Present
36. LIWC_Future
37. LIWC_Adverbs
38. LIWC_Prep
39. LIWC_Conj
40. LIWC_Negate
41. LIWC_Quant
42. LIWC_Numbers
43. LIWC_Swear
44. LIWC_Social
45. LIWC_Family
46. LIWC_Friends
47. LIWC_Humans
48. LIWC_Affect
49. LIWC_Posemo
50. LIWC_Negemo
51. LIWC_Anx
52. LIWC_Anger
53. LIWC_Sad
54. LIWC_CogMech
55. LIWC_Insight
56. LIWC_Cause
57. LIWC_Discrep
58. LIWC_Tentat
59. LIWC_Certain
60. LIWC_Inhib
61. LIWC_Incl
62. LIWC_Excl
63. LIWC_Percept
64. LIWC_See
65. LIWC_Hear
66. LIWC_Feel
67. LIWC_Bio
68. LIWC_Body
69. LIWC_Health
70. LIWC_Sexual
71. LIWC_Ingest
72. LIWC_Relativ
73. LIWC_Motion
74. LIWC_Space
75. LIWC_Time
76. LIWC_Work
77. LIWC_Achiev
78. LIWC_Leisure
79. LIWC_Home
80. LIWC_Money
81. LIWC_Relig
82. LIWC_Death
83. LIWC_Assent
84. LIWC_Dissent
85. LIWC_Nonflu
86. LIWC_Filler

LIWC - linguistic inquiry and word count (codebook: https://www.liwc.app/static/documents/LIWC-22%20Manual%20-%20Development%20and%20Psychometrics.pdf)

In [22]:
import pandas as pd
import numpy as np

Here, the two datasets are combined and we take a look at the dataframe. Then we identify all uniwue subreddits and write it to a separate file.

In [2]:
df_title = pd.read_csv("data/soc-redditHyperlinks-title.tsv", sep="\t")
df_body = pd.read_csv("data/soc-redditHyperlinks-body.tsv", sep="\t")
df_combined = pd.concat([df_body, df_title], ignore_index=True)
display(df_combined.head())

Unnamed: 0,SOURCE_SUBREDDIT,TARGET_SUBREDDIT,POST_ID,TIMESTAMP,LINK_SENTIMENT,PROPERTIES
0,leagueoflegends,teamredditteams,1u4nrps,2013-12-31 16:39:58,1,"345.0,298.0,0.75652173913,0.0173913043478,0.08..."
1,theredlion,soccer,1u4qkd,2013-12-31 18:18:37,-1,"101.0,98.0,0.742574257426,0.019801980198,0.049..."
2,inlandempire,bikela,1u4qlzs,2014-01-01 14:54:35,1,"85.0,85.0,0.752941176471,0.0235294117647,0.082..."
3,nfl,cfb,1u4sjvs,2013-12-31 17:37:55,1,"1124.0,949.0,0.772241992883,0.0017793594306,0...."
4,playmygame,gamedev,1u4w5ss,2014-01-01 02:51:13,1,"715.0,622.0,0.777622377622,0.00699300699301,0...."


## Look at the total number of unique subreddits

In [11]:
# Combine both source and target columns into a single Pandas Series
all_subreddits_series = pd.concat([df_combined['SOURCE_SUBREDDIT'], df_combined['TARGET_SUBREDDIT']])

# Get the unique values from this combined series and convert to a list
unique_subreddit_list = all_subreddits_series.unique().tolist()

print(f"Found {len(unique_subreddit_list)} unique subreddits.")

Found 67180 unique subreddits.


In [4]:
# Write the unique subreddits to a CSV file
unique_subreddit_df = pd.DataFrame(unique_subreddit_list, columns=['subreddit'])
unique_subreddit_df.to_csv("data/unique_subreddits.csv", index=False)
print("Unique subreddits saved to 'data/unique_subreddits.csv'.")

Unique subreddits saved to 'data/unique_subreddits.csv'.


## Look at the total number of posts

In [10]:
# Display the total number of posts and number in the title and body datasets
total_posts = len(df_combined)
total_title_posts = len(df_title)
total_body_posts = len(df_body)

print(f"Total posts: {total_posts}, Title posts: {total_title_posts}, Body posts: {total_body_posts}")

Total posts: 858488, Title posts: 571927, Body posts: 286561


# Expand properties column
Here we create a separate column for each value in the properties column.

In [5]:

# Example: assume you already have df_body loaded
# df_body = pd.read_csv("soc-redditHyperlinks-body.tsv", sep="\t")

# Step 1. Create the list of column names for POST_PROPERTIES
post_props_cols = [
    "num_chars", "num_chars_no_space", "frac_alpha", "frac_digits",
    "frac_upper", "frac_spaces", "frac_special", "num_words",
    "num_unique_words", "num_long_words", "avg_word_length",
    "num_unique_stopwords", "frac_stopwords", "num_sentences",
    "num_long_sentences", "avg_chars_per_sentence", "avg_words_per_sentence",
    "readability_index", "sent_pos", "sent_neg", "sent_compound",
    "LIWC_Funct", "LIWC_Pronoun", "LIWC_Ppron", "LIWC_I", "LIWC_We",
    "LIWC_You", "LIWC_SheHe", "LIWC_They", "LIWC_Ipron", "LIWC_Article",
    "LIWC_Verbs", "LIWC_AuxVb", "LIWC_Past", "LIWC_Present", "LIWC_Future",
    "LIWC_Adverbs", "LIWC_Prep", "LIWC_Conj", "LIWC_Negate", "LIWC_Quant",
    "LIWC_Numbers", "LIWC_Swear", "LIWC_Social", "LIWC_Family",
    "LIWC_Friends", "LIWC_Humans", "LIWC_Affect", "LIWC_Posemo",
    "LIWC_Negemo", "LIWC_Anx", "LIWC_Anger", "LIWC_Sad", "LIWC_CogMech",
    "LIWC_Insight", "LIWC_Cause", "LIWC_Discrep", "LIWC_Tentat",
    "LIWC_Certain", "LIWC_Inhib", "LIWC_Incl", "LIWC_Excl", "LIWC_Percept",
    "LIWC_See", "LIWC_Hear", "LIWC_Feel", "LIWC_Bio", "LIWC_Body",
    "LIWC_Health", "LIWC_Sexual", "LIWC_Ingest", "LIWC_Relativ",
    "LIWC_Motion", "LIWC_Space", "LIWC_Time", "LIWC_Work", "LIWC_Achiev",
    "LIWC_Leisure", "LIWC_Home", "LIWC_Money", "LIWC_Relig", "LIWC_Death",
    "LIWC_Assent", "LIWC_Dissent", "LIWC_Nonflu", "LIWC_Filler"
]

#  Split POST_PROPERTIES into columns
df_combined[post_props_cols] = df_combined["PROPERTIES"].str.split(",", expand=True).astype(float)
df_combined = df_combined.drop(columns=["PROPERTIES"])

In [6]:
# First get subreddit counts
subreddit_counts = df_combined["SOURCE_SUBREDDIT"].value_counts()

# Keep only subreddits with at least 20 posts
valid_subreddits = subreddit_counts[subreddit_counts >= 20].index

# Now compute averages only for those
avg_props_by_subreddit = (
    df_combined[df_combined["SOURCE_SUBREDDIT"].isin(valid_subreddits)]
    .groupby("SOURCE_SUBREDDIT")[post_props_cols]
    .mean()
)

# Preview
display(avg_props_by_subreddit.head())


Unnamed: 0_level_0,num_chars,num_chars_no_space,frac_alpha,frac_digits,frac_upper,frac_spaces,frac_special,num_words,num_unique_words,num_long_words,...,LIWC_Achiev,LIWC_Leisure,LIWC_Home,LIWC_Money,LIWC_Relig,LIWC_Death,LIWC_Assent,LIWC_Dissent,LIWC_Nonflu,LIWC_Filler
SOURCE_SUBREDDIT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100daysofrejection,38.066667,31.6,0.621454,0.129779,0.132037,0.195114,0.053653,7.466667,7.422222,0.866667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100movies365days,1877.361111,1673.722222,0.702896,0.047292,0.022451,0.114269,0.135543,271.777778,148.888889,49.222222,...,0.002299,0.052544,0.000252,0.000546,0.000576,0.001099,0.000537,0.000245,0.000775,0.000757
13point1,84.125,71.375,0.774844,0.012727,0.124862,0.156525,0.055904,14.0,13.25,3.541667,...,0.030635,0.173018,0.0,0.0,0.0,0.0,0.0,0.0,0.002451,0.0
195,148.0,129.32,0.781407,0.008233,0.18142,0.147186,0.063175,25.48,16.92,3.84,...,0.002806,0.00908,0.000187,0.00542,4e-05,0.000833,0.009586,0.001737,0.005429,0.000607
1984isreality,211.6875,179.90625,0.802653,0.006236,0.113757,0.150318,0.040793,35.1875,27.5,8.65625,...,0.01116,0.004992,0.008231,0.013284,0.002564,0.004811,0.000762,0.001362,0.0,0.00012


In [8]:
top5_subreddits = {}

for col in post_props_cols:
    # For each metric, take the top 5 subreddits
    top5_subreddits[col] = (
        avg_props_by_subreddit[[col]]
        .sort_values(by=col, ascending=False)
        .head(5)
    )

# Display the top 5 subreddits for sentiment negative as an example
print("Top 5 subreddits by avg number of words:\n")
display(top5_subreddits["sent_neg"])


Top 5 subreddits by avg number of words:



Unnamed: 0_level_0,sent_neg
SOURCE_SUBREDDIT,Unnamed: 1_level_1
sjwnews,0.362789
100daysofrejection,0.357067
prostatecancer,0.318643
topcuntsofreddit,0.29152
thatkisscamgif,0.28175


In [16]:
# Load approved subreddit–country matches
approved = pd.read_csv("data/subreddit_matches_approved.csv")

# --- Ensure consistent column names ---
# Expect columns like 'subreddit' and 'predicted_country' or 'country' in the approved CSV
approved.columns = [c.strip().lower() for c in approved.columns]
if 'country' not in approved.columns and 'predicted_country' in approved.columns:
    approved.rename(columns={'predicted_country': 'country'}, inplace=True)

# --- Merge country info onto source and target subreddits ---
merged = (
    df_combined
    .merge(approved[['subreddit', 'country']], how='left',
           left_on='SOURCE_SUBREDDIT', right_on='subreddit')
    .rename(columns={'country': 'source_country'})
    .drop(columns=['subreddit'])
    .merge(approved[['subreddit', 'country']], how='left',
           left_on='TARGET_SUBREDDIT', right_on='subreddit')
    .rename(columns={'country': 'target_country'})
    .drop(columns=['subreddit'])
)

# --- Keep only interactions where both countries are known ---
merged_valid = merged.dropna(subset=['source_country', 'target_country'])

# --- Aggregate counts of inter-country interactions ---
country_interactions = (
    merged_valid.groupby(['source_country', 'target_country'])
    .size()
    .reset_index(name='n_interactions')
)

# Optional: remove self-interactions
country_interactions = country_interactions.query("source_country != target_country")

print(country_interactions.head())



  source_country target_country  n_interactions
0    Afghanistan          China               1
1    Afghanistan           Iran               2
2    Afghanistan         Mexico               1
3    Afghanistan       Pakistan               1
4        Albania        Armenia               1


In [17]:
country_interactions.sort_values(by='n_interactions', ascending=False).head(20)

Unnamed: 0,source_country,target_country,n_interactions
920,United Kingdom,Ireland,62
449,Iran,United States,47
372,India,Pakistan,39
662,Pakistan,India,35
491,Israel,"Palestine, State of",33
267,France,Canada,32
472,Ireland,United Kingdom,30
680,"Palestine, State of",Israel,28
126,Brazil,Portugal,26
132,Canada,Australia,25


In this list of the countries with the most interactions, we can see that countries that are eithe close together geographically or have shared history have a lot of interactions.

In [None]:

# Count how many times each country appeared as source or target
source_counts = merged['source_country'].value_counts().rename_axis('country').reset_index(name='as_source')
target_counts = merged['target_country'].value_counts().rename_axis('country').reset_index(name='as_target')

# Merge both and compute totals
country_activity = pd.merge(source_counts, target_counts, on='country', how='outer').fillna(0)
country_activity['total_posts'] = country_activity['as_source'] + country_activity['as_target']

# Sort descending by total posts
country_activity_sorted = country_activity.sort_values(by='total_posts', ascending=False).reset_index(drop=True)

# Show results
display(country_activity_sorted.head(15))


Unnamed: 0,country,as_source,as_target,total_posts
0,Canada,3461.0,3665.0,7126.0
1,United States,2851.0,2906.0,5757.0
2,India,2171.0,2615.0,4786.0
3,Japan,2538.0,1611.0,4149.0
4,United Kingdom,1387.0,2467.0,3854.0
5,Australia,1114.0,1539.0,2653.0
6,France,1252.0,984.0,2236.0
7,Brazil,1068.0,524.0,1592.0
8,Ireland,501.0,997.0,1498.0
9,Sweden,622.0,843.0,1465.0


In [19]:
science_subreddits = [
    'science', 'askscience', 'space', 'physics', 'biology', 'chemistry',
    'psychology', 'geology', 'astronomy', 'neuroscience', 'environment',
    'ecology', 'climate', 'datascience', 'machinelearning', 'math', 'engineering'
]

science_interactions = merged[
    merged['SOURCE_SUBREDDIT'].isin(science_subreddits)
    | merged['TARGET_SUBREDDIT'].isin(science_subreddits)
].copy()

# Count when the source is from a country interacting with any science subreddit
science_outgoing = (
    science_interactions.groupby('source_country')
    .size()
    .reset_index(name='science_outgoing')
)

# Count when the target is a science subreddit from a country
science_incoming = (
    science_interactions.groupby('target_country')
    .size()
    .reset_index(name='science_incoming')
)

# Merge and compute totals
science_country_stats = pd.merge(science_outgoing, science_incoming,
                                 left_on='source_country', right_on='target_country',
                                 how='outer', suffixes=('_source', '_target'))

# Clean up
science_country_stats['country'] = science_country_stats['source_country'].combine_first(science_country_stats['target_country'])
science_country_stats = science_country_stats[['country', 'science_outgoing', 'science_incoming']].fillna(0)
science_country_stats['total_science_interactions'] = science_country_stats['science_outgoing'] + science_country_stats['science_incoming']

# Sort by total interactions
science_country_stats = science_country_stats.sort_values(by='total_science_interactions', ascending=False)
display(science_country_stats)


Unnamed: 0,country,science_outgoing,science_incoming,total_science_interactions
7,Canada,37.0,4.0,41.0
19,India,13.0,13.0,26.0
48,United States,13.0,3.0,16.0
47,United Kingdom,10.0,3.0,13.0
2,Australia,11.0,1.0,12.0
1,Argentina,8.0,0.0,8.0
29,Netherlands,7.0,0.0,7.0
9,China,6.0,1.0,7.0
30,New Zealand,4.0,3.0,7.0
36,Portugal,6.0,0.0,6.0


In [24]:
import pandas as pd
from rapidfuzz import fuzz

# Your base list of science-related keywords
science_base_list = [
    'science', 'askscience', 'space', 'physics', 'biology', 'chemistry',
    'psychology', 'geology', 'astronomy', 'neuroscience', 'environment',
    'ecology', 'climate', 'datascience', 'machinelearning', 'math', 'engineering'
]

# Set a threshold for matching
# 90 is a good starting point, finding 'physics' in 'askphysics' (100)
# but avoiding 'art' in 'math' (low score).
THRESHOLD = 90

def is_science_sub(subreddit_name, science_list, threshold):
    """
    Checks if a subreddit name contains a fuzzy match to any science keyword
    using a partial ratio.
    """
    sub_lower = str(subreddit_name).lower()
    
    # Check for direct match first (fast)
    if sub_lower in science_list:
        return True
        
    # Check for partial fuzzy match
    for base_word in science_list:
        # fuzz.partial_ratio finds the best matching substring.
        # e.g., partial_ratio('physics', 'askphysics') is 100
        score = fuzz.partial_ratio(base_word, sub_lower)
        if score > threshold:
            return True
    return False

# --- 1. Get ALL unique subreddits from your data ---
# This ensures we only do the expensive matching once
print("Finding all unique subreddits...")
all_subreddits = pd.Series(pd.concat([
    merged['SOURCE_SUBREDDIT'],
    merged['TARGET_SUBREDDIT']
]).unique())
print(f"Found {len(all_subreddits)} unique subreddits.")

# --- 2. Apply the fuzzy filter ---
print("Applying fuzzy match to find science subreddits...")
# This applies our function to every unique subreddit
mask = all_subreddits.apply(is_science_sub, args=(science_base_list, THRESHOLD))

# This is your new, much larger list of science subreddits
expanded_science_subreddits = all_subreddits[mask].tolist()

print(f"\n--- Found {len(expanded_science_subreddits)} science-related subreddits ---")
# Display a sample of what was found
print(pd.Series(expanded_science_subreddits).sample(min(20, len(expanded_science_subreddits))))

Finding all unique subreddits...
Found 67180 unique subreddits.
Applying fuzzy match to find science subreddits...

--- Found 684 science-related subreddits ---
228               madscience
351               spaceblogs
474            artpsychology
437      theydidthepenismath
662           earthfromspace
91         socialengineering
541              steinermath
197           colamakerspace
675       chicagoenvironment
213        shittyspacexideas
605           sammyspacetime
468        whydidhedothemath
82                  maschine
429            biochemistry2
517      theysortadidthemath
181    mechanicalengineering
419        fabricofspacetime
102      musicandmathematics
415         syntheticbiology
406           climate_change
dtype: object


In [27]:
# --- 1. Double-check that your inputs are correct ---

# (You can skip this if you're sure, but it's good practice)
print(f"Original list length: {len(science_base_list)}")
print(f"Expanded list length: {len(expanded_science_subreddits)}")
print(f"Total science_interactions rows: {science_interactions.shape[0]}")

# --- 2. Create the 'science_topic' column using the EXPANDED list ---

with_topic_df = science_interactions.assign(
    science_topic = science_interactions['SOURCE_SUBREDDIT'].where(
        # --- THIS IS THE FIX ---
        science_interactions['SOURCE_SUBREDDIT'].isin(expanded_science_subreddits), 
        # ---------------------
        science_interactions['TARGET_SUBREDDIT']
    )
)

# --- 3. Group by country and the identified topic ---

topic_counts_all = (
    with_topic_df
    .groupby(['source_country', 'science_topic'])
    .size()
    .reset_index(name='n_interactions')
)

# --- 4. Filter and Sort ---

# Filter out rows where the source is not a country (i.e., source_country is NaN)
topic_counts = (
    topic_counts_all[topic_counts_all['source_country'].notna()]
    .sort_values(by='n_interactions', ascending=False)
)

print("\n--- Top 20 Country-to-Science-Topic Interactions (Corrected) ---")
display(topic_counts.head(20))

Original list length: 17
Expanded list length: 684
Total science_interactions rows: 11736

--- Top 20 Country-to-Science-Topic Interactions (Corrected) ---


Unnamed: 0,source_country,science_topic,n_interactions
18,Canada,science,16
19,Canada,spacecanada,14
41,India,science,7
93,United States,science,6
9,Australia,science,5
13,Brazil,science,4
70,Portugal,science,4
49,Japan,science,4
87,United Kingdom,science,4
20,Chile,science,3


In [28]:
topic_counts = (
    science_interactions
    .assign(science_topic = science_interactions['SOURCE_SUBREDDIT'].where(
        science_interactions['SOURCE_SUBREDDIT'].isin(science_subreddits),
        science_interactions['TARGET_SUBREDDIT']
    ))
    .groupby(['source_country', 'science_topic'])
    .size()
    .reset_index(name='n_interactions')
    .sort_values(by='n_interactions', ascending=False)
)

display(topic_counts.head(20))


Unnamed: 0,source_country,science_topic,n_interactions
18,Canada,science,16
19,Canada,space,14
41,India,science,7
93,United States,science,6
9,Australia,science,5
13,Brazil,science,4
70,Portugal,science,4
49,Japan,science,4
87,United Kingdom,science,4
20,Chile,science,3


In [29]:

# Count how many times each country appeared as source or target
source_counts = merged['source_country'].value_counts().rename_axis('country').reset_index(name='as_source')
target_counts = merged['target_country'].value_counts().rename_axis('country').reset_index(name='as_target')

# Merge both and compute totals
country_activity = pd.merge(source_counts, target_counts, on='country', how='outer').fillna(0)
country_activity['total_posts'] = country_activity['as_source'] + country_activity['as_target']

# We only need the country and total_posts columns for this analysis
country_totals = country_activity[['country', 'total_posts']]

# Merge the science interactions with the total posts for each country
combined_stats = pd.merge(
    country_totals,
    science_country_stats[['country', 'total_science_interactions']],
    on='country',
    how='left'
)

# If a country had total posts but 0 science posts, fillna(0)
combined_stats['total_science_interactions'] = combined_stats['total_science_interactions'].fillna(0)

# Calculate the Science-to-Total Ratio
combined_stats['science_ratio'] = (
    combined_stats['total_science_interactions'] / combined_stats['total_posts']
)

# Handle any potential inf/-inf values if total_posts was 0
combined_stats.replace([np.inf, -np.inf], 0, inplace=True)

# Sort by the new ratio
combined_stats_sorted = combined_stats.sort_values(by='science_ratio', ascending=False)


print("--- Countries by Science Interaction Ratio (Science Posts / Total Posts) ---")
# Filter to countries with a minimum number of total posts for meaningful ratios
min_posts = 100
display(combined_stats_sorted[combined_stats_sorted['total_posts'] >= min_posts].head(20))

# Display the raw, unfiltered list
display(combined_stats_sorted.head(20))

--- Countries by Science Interaction Ratio (Science Posts / Total Posts) ---


Unnamed: 0,country,total_posts,total_science_interactions,science_ratio
32,Chile,153.0,3.0,0.019608
87,Latvia,109.0,1.0,0.009174
116,New Zealand,789.0,7.0,0.008872
59,Greece,227.0,2.0,0.008811
40,Denmark,498.0,4.0,0.008032
7,Argentina,1044.0,8.0,0.007663
115,Netherlands,972.0,7.0,0.007202
149,South Africa,283.0,2.0,0.007067
133,Portugal,910.0,6.0,0.006593
122,Norway,918.0,6.0,0.006536


Unnamed: 0,country,total_posts,total_science_interactions,science_ratio
6,Antarctica,15.0,1.0,0.066667
50,Fiji,16.0,1.0,0.0625
25,Bulgaria,34.0,1.0,0.029412
79,Jamaica,44.0,1.0,0.022727
171,Uruguay,50.0,1.0,0.02
32,Chile,153.0,3.0,0.019608
47,Estonia,66.0,1.0,0.015152
134,Puerto Rico,67.0,1.0,0.014925
87,Latvia,109.0,1.0,0.009174
116,New Zealand,789.0,7.0,0.008872
