
### ðŸ“Š **What this dataset really contains**

* **Rows:** 74 buildings
* **Columns (only 4):**

  1. `Building Name`
  2. `City`
  3. `Country`
  4. `Continent`


## ðŸ”¥ 30 HARD Pandas Questions (Based on THIS Dataset)

### **A. Data Integrity & Validation**

1. Check whether any **building name appears in more than one country**.
2. Identify cities that belong to **more than one continent** (data consistency check).
3. Find countries whose buildings are listed under **multiple continents**.
4. Detect duplicated rows using **all columns**, and remove them safely.
5. Verify whether every country maps to **exactly one continent**.

---

### **B. Advanced GroupBy & Aggregation**

6. Count buildings per country and compute each countryâ€™s **percentage contribution**.
7. Compute the **city-to-country ratio** (unique cities / total buildings) per country.
8. Find the **top 5 countries** with the highest number of distinct cities.
9. Rank continents by **average number of buildings per country**.
10. Identify countries whose building count is **above their continentâ€™s average**.

---

### **C. Ranking & Window Operations**

11. Rank countries **within each continent** by number of buildings.
12. Compute cumulative building counts **continent-wise**, sorted by country.
13. For each continent, find the **country at the 75th percentile** of building count.
14. Identify countries that are **local maxima** within continent rankings.
15. Calculate rolling averages of country building counts (window = 3).

---

### **D. Complex Filtering & Logic**

16. Find cities that contain **all buildings of a given country** (if any).
17. Identify countries where **every city has only one building**.
18. Find continents where **one country dominates (>40%)** the building count.
19. Detect countries whose buildings are spread across **maximum number of cities**.
20. Identify cities shared by **multiple countries** (if data errors exist).

---

### **E. Set Operations & Comparisons**

21. Compare Asia vs North America: find **countries unique to each**.
22. Find continents that have **no country with more than 5 buildings**.
23. Identify countries whose cities are **exclusive (no overlap with others)**.
24. Compute **Jaccard similarity** between continents based on country sets.
25. Find countries that appear in the **top 3 ranks of multiple continents** (logical test).

---

### **F. Reshaping & MultiIndex**

26. Create a **MultiIndex (Continent â†’ Country â†’ City)** and query it.
27. Pivot the data to show **countries vs continents** with building counts.
28. Build a table showing **top city per country** (by building frequency).
29. Normalize building counts **within each continent** using `groupby().apply()`.
30. Produce a **continent-level entropy score** based on country distribution.

---

## ðŸŽ¯ Difficulty Justification

These are **hard** because they require:

* Careful **data validation**
* **Hierarchical groupings**
* Ranking + window functions
* Set theory & statistical reasoning
* `groupby.apply`, `transform`, MultiIndex logic

---



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
df=pd.read_csv("top_100_tallest_buildings_world.csv")
df.head()

Unnamed: 0,Building Name,City,Country,Continent
0,Burj Khalifa,Dubai,United Arab Emirates,Asia
1,Merdeka 118,Kuala Lumpur,Malaysia,Asia
2,Shanghai Tower,Shanghai,China,Asia
3,Abraj Al-Bait Clock Tower,Mecca,Saudi Arabia,Asia
4,Ping An Finance Centre,Shenzhen,China,Asia


In [4]:
#1
group_building = df.groupby("Building Name")["Country"].nunique()
group_building
# print(group_building[group_building > 1])

#2

group_city=df.groupby("City")["Continent"].nunique()
print(group_city[group_city > 1])

Series([], Name: Continent, dtype: int64)


In [None]:
#4
df_clean=df.drop_duplicates()
df_clean

#5
(df.groupby("Country")["Continent"].nunique() == 1).all()


#6
counts = df["Country"].value_counts()
percent = counts / counts.sum() * 100
pd.DataFrame({"count": counts, "percent": percent})


Country
China                   29
United States           14
United Arab Emirates     8
Malaysia                 4
Saudi Arabia             3
South Korea              3
Taiwan                   2
Russia                   2
Vietnam                  2
Kuwait                   1
United Kingdom           1
Australia                1
Mexico                   1
Chile                    1
Poland                   1
Sweden                   1
Name: count, dtype: int64


Unnamed: 0_level_0,count,percent
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
China,29,39.189189
United States,14,18.918919
United Arab Emirates,8,10.810811
Malaysia,4,5.405405
Saudi Arabia,3,4.054054
South Korea,3,4.054054
Taiwan,2,2.702703
Russia,2,2.702703
Vietnam,2,2.702703
Kuwait,1,1.351351


In [3]:
(df.groupby("Country")["City"].nunique() /df.groupby("Country").size())


Country
Australia               1.000000
Chile                   1.000000
China                   0.482759
Kuwait                  1.000000
Malaysia                0.250000
Mexico                  1.000000
Poland                  1.000000
Russia                  1.000000
Saudi Arabia            0.666667
South Korea             1.000000
Sweden                  1.000000
Taiwan                  1.000000
United Arab Emirates    0.250000
United Kingdom          1.000000
United States           0.428571
Vietnam                 1.000000
dtype: float64



# âœ… SOLUTIONS TO ALL 30 QUESTIONS (WITH CODE)

---

## **A. Data Integrity & Validation**

### **1. Building names appearing in more than one country**

```python
df.groupby("Building Name")["Country"].nunique().loc[lambda x: x > 1]
```

---

### **2. Cities belonging to more than one continent**

```python
df.groupby("City")["Continent"].nunique().loc[lambda x: x > 1]
```

---

### **3. Countries listed under multiple continents**

```python
df.groupby("Country")["Continent"].nunique().loc[lambda x: x > 1]
```

---

### **4. Detect and remove duplicate rows**

```python
df_clean = df.drop_duplicates()
```

---

### **5. Verify one-to-one mapping of country â†’ continent**

```python
(df.groupby("Country")["Continent"].nunique() == 1).all()
```

---

## **B. Advanced GroupBy & Aggregation**

### **6. Country-wise building count with percentage**

```python
counts = df["Country"].value_counts()
percent = counts / counts.sum() * 100
pd.DataFrame({"count": counts, "percent": percent})
```

---

### **7. City-to-country ratio**

```python
(df.groupby("Country")["City"].nunique() /
 df.groupby("Country").size())
```

---

### **8. Top 5 countries by number of cities**

```python
df.groupby("Country")["City"].nunique().sort_values(ascending=False).head(5)
```

---

### **9. Rank continents by avg buildings per country**

```python
df.groupby("Continent").apply(
    lambda x: x.shape[0] / x["Country"].nunique()
).sort_values(ascending=False)
```

---

### **10. Countries above continent average**

```python
country_counts = df.groupby(["Continent", "Country"]).size()
continent_avg = country_counts.groupby(level=0).mean()

country_counts[country_counts >
country_counts.index.get_level_values(0).map(continent_avg)]
```

---

## **C. Ranking & Window Operations**

### **11. Rank countries within continent**

```python
df.groupby(["Continent", "Country"]).size() \
  .groupby(level=0, group_keys=False) \
  .rank(ascending=False, method="dense")
```

---

### **12. Cumulative building count per continent**

```python
df.groupby(["Continent", "Country"]).size() \
  .sort_index() \
  .groupby(level=0).cumsum()
```

---

### **13. 75th percentile country per continent**

```python
df.groupby(["Continent", "Country"]).size() \
  .groupby(level=0).quantile(0.75)
```

---

### **14. Local maxima countries**

```python
s = df.groupby(["Continent", "Country"]).size()
s[s == s.groupby(level=0).transform("max")]
```

---

### **15. Rolling average (window=3)**

```python
df.groupby("Country").size().sort_values().rolling(3).mean()
```

---

## **D. Complex Filtering & Logic**

### **16. Cities containing all buildings of a country**

```python
df.groupby("Country").filter(
    lambda x: x["City"].nunique() == 1
)[["Country", "City"]].drop_duplicates()
```

---

### **17. Countries where every city has one building**

```python
df.groupby("Country").filter(
    lambda x: x["City"].value_counts().max() == 1
)["Country"].unique()
```

---

### **18. Continents dominated by one country (>40%)**

```python
df.groupby(["Continent", "Country"]).size() \
  .groupby(level=0) \
  .apply(lambda x: x.max() / x.sum() > 0.4)
```

---

### **19. Countries with max city spread**

```python
df.groupby("Country")["City"].nunique().idxmax()
```

---

### **20. Cities shared by multiple countries**

```python
df.groupby("City")["Country"].nunique().loc[lambda x: x > 1]
```

---

## **E. Set Operations & Comparisons**

### **21. Asia vs North America unique countries**

```python
asia = set(df[df.Continent=="Asia"].Country)
na = set(df[df.Continent=="North America"].Country)

asia - na, na - asia
```

---

### **22. Continents with no country >5 buildings**

```python
df.groupby(["Continent", "Country"]).size() \
  .groupby(level=0).max().loc[lambda x: x <= 5]
```

---

### **23. Countries with exclusive cities**

```python
city_country_count = df.groupby("City")["Country"].nunique()
exclusive_cities = city_country_count[city_country_count == 1].index

df[df.City.isin(exclusive_cities)]["Country"].unique()
```

---

### **24. Jaccard similarity between continents**

```python
from itertools import combinations

continent_sets = df.groupby("Continent")["Country"].apply(set)

{(a,b): len(continent_sets[a]&continent_sets[b]) /
         len(continent_sets[a]|continent_sets[b])
 for a,b in combinations(continent_sets.index, 2)}
```

---

### **25. Countries in top 3 of multiple continents**

```python
ranked = df.groupby(["Continent", "Country"]).size() \
           .groupby(level=0, group_keys=False) \
           .rank(ascending=False)

ranked[ranked <= 3].reset_index()["Country"].value_counts().loc[lambda x: x > 1]
```

---

## **F. Reshaping & MultiIndex**

### **26. MultiIndex creation**

```python
df.set_index(["Continent", "Country", "City"]).sort_index()
```

---

### **27. Pivot table (Country vs Continent)**

```python
pd.pivot_table(df, index="Country", columns="Continent",
               values="Building Name", aggfunc="count", fill_value=0)
```

---

### **28. Top city per country**

```python
df.groupby(["Country", "City"]).size() \
  .groupby(level=0).idxmax().apply(lambda x: x[1])
```

---

### **29. Normalize building count within continent**

```python
df.groupby(["Continent", "Country"]).size() \
  .groupby(level=0).apply(lambda x: x / x.sum())
```

---

### **30. Continent entropy score**

```python
import numpy as np

def entropy(x):
    p = x / x.sum()
    return -(p * np.log2(p)).sum()

df.groupby("Continent")["Country"].value_counts() \
  .groupby(level=0).apply(entropy)
```

---

