# Code Challenge: CIA World Factbook Data Analysis

In this task, you'll be using two CSVs from the CIA's [World Factbook](https://www.cia.gov/the-world-factbook/) dataset.  

- The file **`data/c2119.csv`** maps country names to population.  
- The file **`data/c2228.csv`** maps country names to adult obesity rates.  

Your task is to implement the function:

```python
def analyze(factbook_pop: str, factbook_obesity: str) -> pd.DataFrame:
    ...


## Requirements

The function should:

1. Read the CSV files (from the two parameter path strings) into a single DataFrame with the columns:

`Country`

`Population`

`Obesity Rate`

2. Filter the data:

- Keep only countries with obesity rates higher than 20%.

- Keep only countries with populations larger than 10,000,000 (10⁷).

3. Sort the resulting DataFrame by Obesity Rate in descending order.

4. Select the top 10 countries.

5. Reindex the result so the index runs from 1 to 10.

6. Return the DataFrame.

In [4]:
import pandas as pd

def analyze(factbook_pop: str, factbook_obesity: str) -> pd.DataFrame:
    # Step 1: Read population CSV
    df_pop = pd.read_csv(factbook_pop)
    df_pop = df_pop.rename(columns={"Name": "Country", "Value": "Population"})
    df_pop = df_pop[["Country", "Population"]]  # drop Pos
    
    # Step 2: Read obesity CSV
    df_obesity = pd.read_csv(factbook_obesity)
    df_obesity = df_obesity.rename(columns={"Name": "Country", "Value": "Obesity Rate"})
    df_obesity = df_obesity[["Country", "Obesity Rate"]]  # drop Pos
    
    # Step 3: Merge on Country
    df = pd.merge(df_pop, df_obesity, on="Country", how="inner")
    
    # Step 4: Convert to numeric (in case strings with commas are present)
    df["Population"] = pd.to_numeric(df["Population"].astype(str).str.replace(",", ""), errors="coerce")
    df["Obesity Rate"] = pd.to_numeric(df["Obesity Rate"].astype(str).str.replace(",", ""), errors="coerce")
    
    # Step 5: Filter conditions
    df = df[(df["Obesity Rate"] > 20) & (df["Population"] > 10_000_000)]
    
    # Step 6: Sort descending by Obesity Rate
    df = df.sort_values(by="Obesity Rate", ascending=False)
    
    # Step 7: Select top 10
    df = df.head(10)
    
    # Step 8: Reset index from 1-10
    df.index = range(1, len(df) + 1)
    
    return df


In [5]:
PATH_POPULATION = '/workspaces/ds-ml-preparation/dataset/factbook/c2119.csv'
PATH_OBESITY = '/workspaces/ds-ml-preparation/dataset/factbook/c2228.csv'

In [6]:
df_result = analyze(PATH_POPULATION, PATH_OBESITY)

In [7]:
df_result

Unnamed: 0,Country,Population,Obesity Rate
1,Egypt,88487396,33.1
2,United States,321368864,33.0
3,Saudi Arabia,27752316,33.0
4,Czech Republic,10644842,32.7
5,Mexico,121736809,32.1
6,South Africa,53675563,31.3
7,Venezuela,29275460,30.3
8,Argentina,43431886,29.7
9,Chile,17508260,29.4
10,Turkey,79414269,27.8


## Reference
- [Factbook Dataset](https://github.com/thewiremonkey/factbook.csv/tree/master)