# Designing Features for Social Data Research

## Topic: Exploring the relationship between a country’s education expenditure and the number of refugees it hosts

## 🎯 Research Objective

To explore whether there is a relationship between a country’s education expenditure (% of GDP) and the number of refugees it hosts over time.


## 🧱 1. Core Variables

| Feature Name   | Description                              | Type         |
|----------------|------------------------------------------|--------------|
| `country`      | Name of the country                      | Categorical  |
| `year`         | Observation year                         | Numerical    |
| `edu_exp_gdp`  | Education expenditure as % of GDP        | Numerical    |
| `refugee_count`| Number of refugees hosted                | Numerical    |



## 📊 2. Control Variables (to reduce bias)

🔹 Economic Indicators

| Feature Name        | Description                             |
|---------------------|-----------------------------------------|
| `gdp_per_capita`    | GDP per capita (USD)                    |
| `gov_exp_gdp`       | Total government spending (% of GDP)    |
| `unemployment_rate` | Unemployment rate (%)                   |
| `inflation_rate`    | Annual inflation (%)                    |

## 🔹 Social Indicators

| Feature Name        | Description                          |
|---------------------|--------------------------------------|
| `population`        | Total population                     |
| `urban_pop_percent` | Urban population (% of total)        |
| `literacy_rate`     | Adult literacy rate (%)              |
| `education_index`   | UNDP’s Education Index               |
| `hdi`               | Human Development Index              |

## 🔹 Refugee-Specific Indicators

| Feature Name           | Description                                 |
|------------------------|---------------------------------------------|
| `refugees_per_capita`  | Refugees per 1,000 people                   |
| `asylum_apps`          | Number of asylum applications received      |
| `refugee_policy_score` | Score indicating refugee policy stance      |
| `refugee_growth_rate`  | Year-over-year change in refugee count      |

## 🔹 Political/Geopolitical Indicators

| Feature Name         | Description                                      |
|----------------------|--------------------------------------------------|
| `political_stability`| World Bank Political Stability Index             |
| `democracy_index`    | EIU Democracy Index or Freedom Score            |
| `conflict_nearby`    | Binary: 1 if neighbors have conflict             |
| `oda_received`       | Official Development Aid (USD)                  |




## 🛠️ 3. Optional Derived Features

| Feature Name             | Description                                         |
|--------------------------|-----------------------------------------------------|
| `edu_exp_per_capita`     | Estimated education spend per person (USD)         |
| `refugees_per_gdp`       | Refugees per million USD of GDP                    |
| `refugees_to_edu_ratio`  | Refugees / Education Expenditure (%)               |


## 📚 4. Data Sources (Suggested)
| Source                      | Data                                         |
|-----------------------------|----------------------------------------------|
| World Bank                  | Education, GDP, population, gov spending     |
| UNHCR                       | Refugee and asylum data                      |
| UNDP HDR                    | HDI, Education Index                         |
| World Governance Indicators | Political Stability                          |
| EIU / MIPEX / OECD          | Refugee policies, democracy indexes          |

