# 595. Big Countries

**Difficulty:** Easy  
**Challenge:** 30 Days of Pandas

## Problem Description

Table: `World`

```
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |
+-------------+---------+
```

`name` is the primary key (column with unique values) for this table.  
Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.

A country is **big** if:
- it has an area of at least three million (i.e., `3,000,000 km²`), or
- it has a population of at least twenty-five million (i.e., `25,000,000`).

Write a solution to find the `name`, `population`, and `area` of the **big countries**.

Return the result table in **any order**.

## Examples

### Example 1:

**Input:**
```
World table:
+-------------+-----------+---------+------------+--------------+
| name        | continent | area    | population | gdp          |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |
+-------------+-----------+---------+------------+--------------+
```

**Output:**
```
+-------------+------------+---------+
| name        | population | area    |
+-------------+------------+---------+
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |
+-------------+------------+---------+
```

## Constraints

- The result format is in the following example.



In [1]:
import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    """
    Find big countries based on area or population criteria.
    
    Args:
        world: DataFrame with columns ['name', 'continent', 'area', 'population', 'gdp']
        
    Returns:
        DataFrame with columns ['name', 'population', 'area'] for big countries
    """
    return world[
        (world['area'] >= 3_000_000) | \
        (world['population'] >= 25_000_000)
    ][['name', 'population', 'area']]


In [2]:
data = {
    'name': ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola'],
    'continent': ['Asia', 'Europe', 'Africa', 'Europe', 'Africa'],
    'area': [652230, 28748, 2381741, 468, 1246700],
    'population': [25500100, 2831741, 37100000, 78115, 20609294],
    'gdp': [20343000000, 12960000000, 188681000000, 3712000000, 100990000000]
}
world = pd.DataFrame(data)
result = big_countries(world)
print(result)

          name  population     area
0  Afghanistan    25500100   652230
2      Algeria    37100000  2381741


## Solution Approach

I solved this using Pandas boolean indexing with the OR operator to filter countries that meet either criteria.

### Key Insight

The problem requires filtering a DataFrame based on two conditions combined with OR logic. In Pandas, we use the `|` operator (not Python's `or`) to combine boolean conditions, and we need parentheses to ensure proper precedence.

### How It Works

1. Create a boolean mask for countries with area >= 3,000,000
2. Create a boolean mask for countries with population >= 25,000,000
3. Combine both masks using the `|` (OR) operator with parentheses
4. Filter the DataFrame using the combined mask
5. Select only the required columns: `['name', 'population', 'area']`

### Why This Works

Pandas boolean indexing is vectorized and efficient. The `|` operator performs element-wise OR operation on boolean Series, which is exactly what we need. The parentheses are crucial because `|` has lower precedence than comparison operators.

### Example Walkthrough

For the example data:
- **Afghanistan**: area 652,230 < 3M, but population 25,500,100 >= 25M → BIG ✓
- **Algeria**: area 2,381,741 < 3M, but population 37,100,000 >= 25M → BIG ✓
- **Others**: Don't meet either condition

### Complexity Analysis

- **Time Complexity:** O(n) - single pass through the DataFrame to create boolean mask
- **Space Complexity:** O(n) - boolean mask requires space proportional to DataFrame size

### Edge Cases Handled

- **Empty DataFrame:** Returns empty DataFrame with correct columns
- **No big countries:** Returns empty DataFrame
- **All countries are big:** Returns all countries with selected columns

### Takeaways

Pandas boolean indexing is a powerful and idiomatic way to filter DataFrames. The key is understanding operator precedence and using `|` for OR operations instead of Python's `or` keyword. This approach is both readable and performant for DataFrame filtering operations.
