# List of Countries by GDP Sector Composition
The dataset is available on [Kaggle](https://www.kaggle.com/datasets/rajkumarpandey02/list-of-countries-by-gdp-sector-composition).
The goal of this dataset is to provide data visualisation of the GDP sector composition of countries.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from os import stat
from IPython.display import display
%matplotlib inline

## Importing the dataset
First, we need to import the dataset, using the `pandas` library.

In [None]:
# Size of the file

filename = 'data/gdp-sector-composition.csv'

file = stat(filename)
print(f'File size: {file.st_size / 1024} kB.')

# Read the CSV file

df = pd.read_csv(filename, low_memory=False)
df.rename(columns={'Unnamed: 5': 'Rank Agriculture', 'Unnamed: 6' : 'Agriculture % of GDP', 'Unnamed: 8' : 'Rank Industry', 'Unnamed: 9' : 'Industry % of GPD', 'Unnamed: 11': 'Rank Services', 'Unnamed: 12' : 'Services % of GDP'}, inplace=True)

for col in df.columns:
    if col.startswith('Country'): continue
    if df[col].dtype == 'object':
        df[col] = pd.to_numeric(df[col], errors='coerce')

if df.isna().sum().sum() > 0:
    print('There are missing values in the dataset.')
    print('Dropping lines')
    print(f'Number of lines before dropping: {df.shape}')
    df.dropna(axis = 0, inplace=True)
    print(f'Number of lines after dropping: {df.shape}')
else: print(f'Data from the CSV file: {df.shape}')

## Data visualization
We are now going to visualize the data, using the `matplotlib` library.

In [None]:
display(df.info())
display(df.head(10))
display(df.describe())

Now, as our data are ready, we can now visualize them.

In [None]:
plt.figure(figsize=(15, 9))
plt.hist(df['Agriculture % of GDP'], bins=20, color = 'blue', alpha = 0.5, histtype='stepfilled', label='Agriculture', edgecolor='black')
plt.hist(df['Industry % of GPD'], bins=20, color = 'red', alpha = 0.5, histtype='stepfilled', label='Industry', edgecolor='black')
plt.hist(df['Services % of GDP'], bins=20, color = 'green', alpha = 0.5, histtype='stepfilled', label='Services', edgecolor='black')
plt.legend()
plt.title('GDP Sector Composition')
plt.show()

Now, we are going to plot this importance per country.

In [None]:
plt.figure(figsize=(15, 9))
plt.title('Weight of agriculture in GDP')
plt.barh(df['Country/Economy'], df['Agriculture % of GDP'], color = 'blue')
plt.show()

In [None]:
plt.figure(figsize=(15, 9))
plt.title('Weight of industry in GDP')
plt.barh(df['Country/Economy'], df['Industry % of GDP'], color = 'blue')
plt.show()

In [None]:
plt.figure(figsize=(15, 9))
plt.title('Weight of services in GDP')
plt.barh(df['Country/Economy'], df['Services % of GDP'], color = 'blue')
plt.show()

## Repartition per country

In [None]:
for country in df['Country/Economy']:
    plt.figure(figsize=(10, 7))
    plt.title(f'Weight of sectors in GDP of {country}')
    plt.pie(df[df['Country/Economy'] == country][['Agriculture % of GDP', 'Industry % of GPD', 'Services % of GDP']].values[0], labels=['Agriculture', 'Industry', 'Services'], autopct='%1.1f%%', shadow=True, startangle=90)
    plt.show()