In [31]:
import pandas as pd
import matplotlib as plt
import numpy as np
import requests
from bs4 import BeautifulSoup

---

## 🚀 Project Goals

The primary goals of this project were to:

* **Preprocess** meat production data from [Our World in Data](https://ourworldindata.org/grapher/animals-slaughtered-for-meat?tab=table&time=1962..2023)
* **Classify entities** as countries or regions for accurate comparisons
* **Scrape country flags**, and attach them to corresponding entities for Power BI visualization
* **Create clean, structured data** compatible with slicers, filters, and visuals in Power BI dashboards

---

## 🧱 Step-by-Step Data Engineering Process

### 1. 🏷️ Renaming Column Headers

The raw dataset had long, clunky column names that included measurement units and codes. To simplify the dataframe for analysis and melting:

```python
df.columns = df.columns[:3].tolist() + [
    'Cattle', 'Goat', 'Chicken', 'Turkey', 'Pig', 'Lamb/Mutton', 'Duck'
]
```

> ✅ This made the dataset more readable, accessible, and manageable for filtering and reshaping.

---

### 2. 🗺️ Assigning Region vs. Country Labels

Initial classification logic used a string pattern:

> If an entity contained `"("` and wasn’t in a known exception list, it was assumed to be a region.

Through inspection and validation, I updated the logic to handle edge cases like `"Macao"` and `"Polynesia"` that lacked `"("` but are actually regions.

```python
conditions = [
    df['Entity'].isin(exception_countries),  # E.g. 'China (FAO)', 'Sudan (former)'
    df['Entity'].isin(manual_regions),       # E.g. 'World', 'Yugoslavia', 'Oceania'
    df['Entity'].str.contains(r'\('),        # Most other regions
]
values = [0, 1, 1]

df['Region'] = np.select(conditions, values, default=0)
```

> ⚠️ This logic ensures that Power BI doesn't compare regional aggregates to individual countries in metrics or rankings.

---

### 3. 🔄 Wide → Long Data Transformation

To enable filtering by meat type in Power BI, the dataset was reshaped from **wide to long** format using `pandas.melt()`. This ensures each row represents one year, one entity, and one meat category.

```python
df_long = df.melt(
    id_vars=['Entity', 'Code', 'Year', 'Region'],
    value_vars=['Cattle', 'Goat', 'Chicken', 'Turkey', 'Pig', 'Lamb/Mutton', 'Duck'],
    var_name='Meat_Type',
    value_name='Slaughter_Count'
)
```

> ✅ This format is optimal for Power BI dashboards and time-series charts.

---

### 4. 🏳️ Flag Scraping: Matching Country Names to Flag Images

#### 🛑 Attempt 1: REST API

* Used `https://restcountries.com/v3.1/all`
* Returned flags in `.svg` format

> ❌ **Power BI does not render SVG images reliably**, so this was not suitable.

#### ✅ Attempt 2: HTML Scraping from [Flagpedia](https://flagpedia.net/index)

* Used `BeautifulSoup` to extract `<img>` tags and `<span>` names
* Constructed `.png` image URLs from the flag filenames
* Cleaned all URLs to remove query strings (e.g., `?v=un`)
* Mapped country names to Entity names, manually fixing mismatches for:

  * `"Cote d'Ivoire"` → `"Côte d'Ivoire"`
  * `"Sao Tome and Principe"` → `"São Tomé and Príncipe"`
  * `"East Timor"` → `"Timor-Leste"`
  * `"Democratic Republic of Congo"` → `"DR Congo"`

---

### 5. 🗂️ Splitting Data for Dashboard Use

To avoid mismatches and performance issues in Power BI:

* The **preprocessed slaughter data** (with meat types and region flag) was saved as:

```python
df_long.to_csv('../data/preprocessed_data.csv', index=False)
```

* The **flags dataframe** (only countries + flag URLs) was saved separately as:

```python
flags_df.to_csv('../data/flags.csv', index=False)
```

> ✅ This separation keeps Power BI relationships clean and avoids applying flags to regions or historic entities without proper images.

---

## 📌 Why This Matters

These transformations:

* ✅ Enable dynamic filtering by country, meat type, and time
* ✅ Preserve historical entities (e.g., `'Sudan (former)'`) without confusion
* ✅ Enhance visual clarity and engagement through national flags
* ✅ Maintain clean semantic separation between **countries vs. regions**

The resulting Power BI dashboard supports:

* Time-series trends
* Rankings (absolute, relative change)
* Flag icons
* Region-specific comparisons

---


In [32]:
url = "../data/raw_data.csv"

In [33]:
df = pd.read_csv(url)
df

Unnamed: 0,Entity,Code,Year,"Meat of cattle with the bone, fresh or chilled | 00000867 || Producing or slaughtered animals | 005320 || animals","Meat, goat | 00001017 || Producing or slaughtered animals | 005320 || animals","Meat, chicken | 00001058 || Producing or slaughtered animals | 005321 || animals","Meat, turkey | 00001080 || Producing or slaughtered animals | 005321 || animals","Meat, pig | 00001035 || Producing or slaughtered animals | 005320 || animals","Meat, lamb and mutton | 00000977 || Producing or slaughtered animals | 005320 || animals","Meat, duck | 00001069 || Producing or slaughtered animals | 005321 || animals"
0,Afghanistan,AFG,1961,360000.0,940000.0,7000000.0,,,4336000.0,
1,Afghanistan,AFG,1962,384000.0,875000.0,7500000.0,,,4355000.0,
2,Afghanistan,AFG,1963,396000.0,810000.0,7700000.0,,,4673000.0,
3,Afghanistan,AFG,1964,402000.0,750000.0,8000000.0,,,5010000.0,
4,Afghanistan,AFG,1965,408000.0,875000.0,8500000.0,,,5179000.0,
...,...,...,...,...,...,...,...,...,...,...
14575,Zimbabwe,ZWE,2019,2350000.0,2461470.0,71479000.0,29000.0,193820.0,51842.0,28000.0
14576,Zimbabwe,ZWE,2020,2300000.0,2501662.0,71367000.0,30000.0,183923.0,52614.0,28000.0
14577,Zimbabwe,ZWE,2021,2576748.0,2587830.0,82885000.0,30000.0,196173.0,47741.0,26000.0
14578,Zimbabwe,ZWE,2022,2675482.0,2713257.0,69773000.0,31000.0,283905.0,48159.0,25000.0


In [34]:
df.columns

Index(['Entity', 'Code', 'Year',
       'Meat of cattle with the bone, fresh or chilled | 00000867 || Producing or slaughtered animals | 005320 || animals',
       'Meat, goat | 00001017 || Producing or slaughtered animals | 005320 || animals',
       'Meat, chicken | 00001058 || Producing or slaughtered animals | 005321 || animals',
       'Meat, turkey | 00001080 || Producing or slaughtered animals | 005321 || animals',
       'Meat, pig | 00001035 || Producing or slaughtered animals | 005320 || animals',
       'Meat, lamb and mutton | 00000977 || Producing or slaughtered animals | 005320 || animals',
       'Meat, duck | 00001069 || Producing or slaughtered animals | 005321 || animals'],
      dtype='object')

In [35]:
df.columns = df.columns[:3].tolist() + [
    'Cattle', 'Goat', 'Chicken', 'Turkey', 'Pig', 'Lamb/Mutton', 'Duck'
]
df.columns

Index(['Entity', 'Code', 'Year', 'Cattle', 'Goat', 'Chicken', 'Turkey', 'Pig',
       'Lamb/Mutton', 'Duck'],
      dtype='object')

In [36]:
unique_entities = df['Entity'].unique()
unique_entities

array(['Afghanistan', 'Africa', 'Africa (FAO)', 'Albania', 'Algeria',
       'Americas (FAO)', 'Angola', 'Antigua and Barbuda', 'Argentina',
       'Armenia', 'Asia', 'Asia (FAO)', 'Australia', 'Austria',
       'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belgium-Luxembourg (FAO)', 'Belize',
       'Benin', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana',
       'Brazil', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi',
       'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Caribbean (FAO)',
       'Central African Republic', 'Central America (FAO)',
       'Central Asia (FAO)', 'Chad', 'Chile', 'China', 'China (FAO)',
       'Colombia', 'Comoros', 'Congo', 'Cook Islands', 'Costa Rica',
       "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus', 'Czechia',
       'Czechoslovakia', 'Democratic Republic of Congo', 'Denmark',
       'Djibouti', 'Dominica', 'Dominican Republic', 'East Timor',
       'Eastern Africa (FAO)', 'Eastern Asia (

In [37]:
country_entities = [i for i in unique_entities if "(" not in i]
country_entities

['Afghanistan',
 'Africa',
 'Albania',
 'Algeria',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Asia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Cape Verde',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo',
 'Cook Islands',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Czechoslovakia',
 'Democratic Republic of Congo',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'East Timor',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Europe',
 'Faroe Islands',
 'Fiji',
 'Finland',
 'France',
 'French Guiana',
 'French Polynesia',
 'Gabon',
 'Gambia',
 'Georgia',
 'Ge

In [38]:
paran_entities = [i for i in unique_entities if "(" in i]
print(paran_entities)
print(len(paran_entities))

['Africa (FAO)', 'Americas (FAO)', 'Asia (FAO)', 'Belgium-Luxembourg (FAO)', 'Caribbean (FAO)', 'Central America (FAO)', 'Central Asia (FAO)', 'China (FAO)', 'Eastern Africa (FAO)', 'Eastern Asia (FAO)', 'Eastern Europe (FAO)', 'Ethiopia (former)', 'Europe (FAO)', 'European Union (27)', 'European Union (27) (FAO)', 'Land Locked Developing Countries (FAO)', 'Least Developed Countries (FAO)', 'Low Income Food Deficit Countries (FAO)', 'Micronesia (FAO)', 'Micronesia (country)', 'Middle Africa (FAO)', 'Net Food Importing Developing Countries (FAO)', 'Northern Africa (FAO)', 'Northern America (FAO)', 'Northern Europe (FAO)', 'Oceania (FAO)', 'Small Island Developing States (FAO)', 'South America (FAO)', 'South-eastern Asia (FAO)', 'Southern Africa (FAO)', 'Southern Asia (FAO)', 'Southern Europe (FAO)', 'Sudan (former)', 'Western Africa (FAO)', 'Western Asia (FAO)', 'Western Europe (FAO)']
36


**'China (FAO)'**,   **'Micronesia (FAO)'**,   **'Sudan (former)'**,  **'Micronesia (country)'**,  **'Sudan (former)'**,  and  **'Belgium-Luxembourg (FAO)'**   are  all  countries  and  not  regions

In [39]:
exception_countries = ['China (FAO)','Micronesia (FAO)', 'Micronesia (country)', 'Sudan (former)']
manual_regions = [
    'Africa', 'Asia', 'Europe', 'North America', 'South America', 'Oceania',
    'Polynesia', 'Melanesia',
    'High-income countries', 'Low-income countries',
    'Lower-middle-income countries', 'Upper-middle-income countries',
    'World', 'Serbia and Montenegro', 'USSR', 'Yugoslavia', 'Reunion'
]

In [40]:
conditions = [
    df['Entity'].isin(exception_countries),  # countries with "(" or odd names
    df['Entity'].isin(manual_regions),       # known regions without parentheses
    df['Entity'].str.contains(r'\('),        # anything else with parentheses
]
values = [0, 1, 1]

In [41]:
df['Region'] = np.select(conditions, values, default=0)
df[['Entity', 'Region']]

Unnamed: 0,Entity,Region
0,Afghanistan,0
1,Afghanistan,0
2,Afghanistan,0
3,Afghanistan,0
4,Afghanistan,0
...,...,...
14575,Zimbabwe,0
14576,Zimbabwe,0
14577,Zimbabwe,0
14578,Zimbabwe,0


In [42]:
df[conditions[1]]

Unnamed: 0,Entity,Code,Year,Cattle,Goat,Chicken,Turkey,Pig,Lamb/Mutton,Duck,Region
63,Africa,,1961,12614469.0,26487736.0,350404000.0,1425000.0,3835586.0,34614513.0,8221000.0,1
64,Africa,,1962,12712172.0,27080963.0,364998000.0,1510000.0,3987514.0,35021812.0,8411000.0,1
65,Africa,,1963,12786035.0,26721023.0,376697000.0,1585000.0,4070344.0,35340310.0,8587000.0,1
66,Africa,,1964,13196142.0,27231173.0,397380000.0,1670000.0,4313906.0,36815350.0,8783000.0,1
67,Africa,,1965,13432494.0,28267679.0,415427000.0,1756000.0,4536320.0,38013473.0,8860000.0,1
...,...,...,...,...,...,...,...,...,...,...,...
14449,Yugoslavia,OWID_YGS,1987,2185000.0,,403669000.0,2920000.0,14128000.0,5238000.0,200000.0,1
14450,Yugoslavia,OWID_YGS,1988,2207000.0,,391776000.0,2870000.0,14156000.0,5476000.0,250000.0,1
14451,Yugoslavia,OWID_YGS,1989,2167500.0,,324900000.0,2825000.0,13758000.0,5285000.0,240000.0,1
14452,Yugoslavia,OWID_YGS,1990,2461000.0,,320000000.0,3050000.0,13325513.0,5279000.0,270000.0,1


In [43]:
meat_columns = df.columns[4:]  # All meat types

df_long = df.melt(
    id_vars=['Entity', 'Code', 'Year', 'Region'],
    value_vars=meat_columns,
    var_name='Meat_Type',
    value_name='Slaughter_Count'
)
df_long

Unnamed: 0,Entity,Code,Year,Region,Meat_Type,Slaughter_Count
0,Afghanistan,AFG,1961,0,Goat,940000.0
1,Afghanistan,AFG,1962,0,Goat,875000.0
2,Afghanistan,AFG,1963,0,Goat,810000.0
3,Afghanistan,AFG,1964,0,Goat,750000.0
4,Afghanistan,AFG,1965,0,Goat,875000.0
...,...,...,...,...,...,...
87475,Zimbabwe,ZWE,2019,0,Duck,28000.0
87476,Zimbabwe,ZWE,2020,0,Duck,28000.0
87477,Zimbabwe,ZWE,2021,0,Duck,26000.0
87478,Zimbabwe,ZWE,2022,0,Duck,25000.0


In [44]:
resp = requests.get("https://flagpedia.net/index")

# print(resp.status_code)
# print(resp.headers['Content-Type'])
# print(resp.text[:500])  # Preview first 500 chars
# countries = resp.json()
# countries

# Realized the request is getting html and not json data so I need to use beautiful soup to parse it

soup = BeautifulSoup(resp.content, "html.parser")
soup

<!DOCTYPE html>

<html dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<title>Country flags of the world (list of all 254) | Flagpedia.net</title>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="all, follow" name="robots"/>
<meta content="Up-to-date list of all 254 country flags of the world with images, names and main information about countries." name="description"/>
<meta content="Country flags of the world (list of all 254)" property="og:title"/>
<meta content="Up-to-date list of all 254 country flags of the world with images, names and main information about countries." property="og:description"/>
<meta content="https://flagpedia.net/assets/og-en.png" property="og:image"/>
<meta content="https://flagpedia.net/index" property="og:url"/>
<link href="https://flagpedia.net/index" rel="canonical"/>
<link href="/apple-touch-icon.png" rel="apple-touch-icon" sizes="180x180"/>
<link href="/favic

In [45]:
# flags_list = [
#     {
#       "Entity": c.get("name", {}).get("common"),
#       "Flag_URL": c.get("flags", {}).get("png") or c.get("flags", {}).get("jpg")
#     }
#     for c in countries
# ]
# flags_df = pd.DataFrame(flags_list)
# flags_df

entities = []

for item in soup.select("ul.flag-grid li a"):
    # Get country/entity name from <span>
    name_tag = item.find("span")
    entity = name_tag.text.strip() if name_tag else None

    # Get country code from img src
    img_tag = item.find("img")
    if img_tag:
        src = img_tag.get("src", "")
        filename = src.split("/")[-1].split("?")[0]  # e.g., 'ax.png'
        # Create official .png URL (16x12, 32x24, or w320 version)
        flag_url = f"https://flagcdn.com/w320/{filename}"

        entities.append({
            "Entity": entity,
            "Flag_URL": flag_url
        })

entities[:5]

[{'Entity': 'Afghanistan', 'Flag_URL': 'https://flagcdn.com/w320/af.png'},
 {'Entity': 'Åland Islands', 'Flag_URL': 'https://flagcdn.com/w320/ax.png'},
 {'Entity': 'Albania', 'Flag_URL': 'https://flagcdn.com/w320/al.png'},
 {'Entity': 'Algeria', 'Flag_URL': 'https://flagcdn.com/w320/dz.png'},
 {'Entity': 'American Samoa', 'Flag_URL': 'https://flagcdn.com/w320/as.png'}]

In [46]:
flags_df = pd.DataFrame(entities)
flags_df

Unnamed: 0,Entity,Flag_URL
0,Afghanistan,https://flagcdn.com/w320/af.png
1,Åland Islands,https://flagcdn.com/w320/ax.png
2,Albania,https://flagcdn.com/w320/al.png
3,Algeria,https://flagcdn.com/w320/dz.png
4,American Samoa,https://flagcdn.com/w320/as.png
...,...,...
249,Wallis and Futuna,https://flagcdn.com/w320/wf.png
250,Western Sahara,https://flagcdn.com/w320/eh.png
251,Yemen,https://flagcdn.com/w320/ye.png
252,Zambia,https://flagcdn.com/w320/zm.png


In [47]:
flags_df['Flag_URL'].dtype

dtype('O')

In [48]:
db_flag_names = flags_df['Entity']
db_flag_names

0            Afghanistan
1          Åland Islands
2                Albania
3                Algeria
4         American Samoa
             ...        
249    Wallis and Futuna
250       Western Sahara
251                Yemen
252               Zambia
253             Zimbabwe
Name: Entity, Length: 254, dtype: object

In [49]:
all_countries_of_slaught = country_entities + exception_countries
all_countries_of_slaught

['Afghanistan',
 'Africa',
 'Albania',
 'Algeria',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Asia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Cape Verde',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo',
 'Cook Islands',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Czechoslovakia',
 'Democratic Republic of Congo',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'East Timor',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Europe',
 'Faroe Islands',
 'Fiji',
 'Finland',
 'France',
 'French Guiana',
 'French Polynesia',
 'Gabon',
 'Gambia',
 'Georgia',
 'Ge

In [50]:
left_out_flags = []

# Normalize database names: lowercase, stripped
db_flag_names_normalized = set(name.lower().strip() for name in db_flag_names)

for i in all_countries_of_slaught:
    if '(' in i:
        cleaned = i.split('(')[0].strip().lower()
    else:
        cleaned = i.strip().lower()

    if cleaned not in db_flag_names_normalized:
        left_out_flags.append(i)

print(f"{len(left_out_flags)} countries without matching flag")
print(left_out_flags)  # preview some of them


24 countries without matching flag
['Africa', 'Asia', 'Congo', "Cote d'Ivoire", 'Czechoslovakia', 'Democratic Republic of Congo', 'East Timor', 'Europe', 'High-income countries', 'Low-income countries', 'Lower-middle-income countries', 'Macao', 'Melanesia', 'North America', 'Oceania', 'Polynesia', 'Reunion', 'Sao Tome and Principe', 'Serbia and Montenegro', 'South America', 'USSR', 'Upper-middle-income countries', 'World', 'Yugoslavia']


In [51]:
problem_countries = [
    'Congo', "Cote d'Ivoire", 'Czechoslovakia', 'Democratic Republic of Congo',
    'East Timor', 'Sao Tome and Principe', 'Yugoslavia'
]

In [52]:
from difflib import get_close_matches

for country in problem_countries:
    close = get_close_matches(country, flags_df['Entity'].tolist(), n=3, cutoff=0.5)
    print(f"{country} → {close}")

Congo → ['DR Congo', 'Togo', 'Mongolia']
Cote d'Ivoire → ["Côte d'Ivoire", 'South Korea', 'North Korea']
Czechoslovakia → ['Czechia', 'Slovakia', 'Colombia']
Democratic Republic of Congo → ['Republic of the Congo', 'Dominican Republic', 'Central African Republic']
East Timor → []
Sao Tome and Principe → ['São Tomé and Príncipe']
Yugoslavia → ['Bolivia', 'Mongolia', 'Bulgaria']


In [53]:
df[df['Entity'].str.lower().str.contains("cz")]['Entity'].unique()

array(['Czechia', 'Czechoslovakia'], dtype=object)

In [54]:
problem_map = {
    'Congo': 'Republic of the Congo',
    'Democratic Republic of Congo': 'DR Congo',
    "Cote d'Ivoire": "Côte d'Ivoire",
    'East Timor': 'Timor-Leste',
    'Sao Tome and Principe': 'São Tomé and Príncipe',
    'Czechoslovakia': 'Czechia'
}

In [55]:
# Use the original scraped 'entities' list
corrected_rows = []
for missing_entity, replacement_entity in problem_map.items():
    match = next((item for item in entities if item['Entity'] == replacement_entity), None)
    if match:
        corrected_rows.append({
            'Entity': missing_entity,
            'Flag_URL': match['Flag_URL']
        })
    else:
        print(f"❌ Could not find flag for: {replacement_entity}")

# Convert to DataFrame
manual_flags_df = pd.DataFrame(corrected_rows)

# Append to your main flag DataFrame
flags_df = pd.concat([flags_df, manual_flags_df], ignore_index=True)

In [56]:
flags_df.tail(7)

Unnamed: 0,Entity,Flag_URL
253,Zimbabwe,https://flagcdn.com/w320/zw.png
254,Congo,https://flagcdn.com/w320/cg.png
255,Democratic Republic of Congo,https://flagcdn.com/w320/cd.png
256,Cote d'Ivoire,https://flagcdn.com/w320/ci.png
257,East Timor,https://flagcdn.com/w320/tl.png
258,Sao Tome and Principe,https://flagcdn.com/w320/st.png
259,Czechoslovakia,https://flagcdn.com/w320/cz.png


In [57]:
df[df['Entity'].str.lower().str.contains("sudan")]['Entity'].unique()

array(['South Sudan', 'Sudan', 'Sudan (former)'], dtype=object)

In [61]:
df_long

Unnamed: 0,Entity,Code,Year,Region,Meat_Type,Slaughter_Count
0,Afghanistan,AFG,1961,0,Goat,940000.0
1,Afghanistan,AFG,1962,0,Goat,875000.0
2,Afghanistan,AFG,1963,0,Goat,810000.0
3,Afghanistan,AFG,1964,0,Goat,750000.0
4,Afghanistan,AFG,1965,0,Goat,875000.0
...,...,...,...,...,...,...
87475,Zimbabwe,ZWE,2019,0,Duck,28000.0
87476,Zimbabwe,ZWE,2020,0,Duck,28000.0
87477,Zimbabwe,ZWE,2021,0,Duck,26000.0
87478,Zimbabwe,ZWE,2022,0,Duck,25000.0


In [59]:
df_long.to_csv('../data/preprocessed_data.csv', index=False)

In [60]:
flags_df.to_csv('../data/flags.csv', index=False)