# Countries and Industries as Nodes

The following notebook attempts to visualise clear networks with countries and inudstries as nodes. 

### Tabel of content:

* [1. Build networks with countries as nodes](#chapter1)
    * [1.1 Calculate the common attributes and build a network df](#section_1_1)
        * [All industries](#sub_section_1_1_1)
        * [Diversified Banks](#sub_section_1_1_2)
        * [Industrial Machinery and Supplies and Components](#sub_section_1_1_3)
        * [Pharmaceuticals](#sub_section_1_1_4)
    * [1.2 Create network graphs and visualize them on a Europe map](#section_1_2)
        * [All industries](#sub_section_1_2_1)
        * [Diversified Banks](#sub_section_1_2_2)
        * [Industrial Machinery and Supplies and Components](#sub_section_1_2_3)
        * [Pharmaceuticals](#sub_section_1_2_4)
    * [1.3 Create clusters by applying the louvain method (all industries)](#section_1_3)
        * [2011](#sub_section_1_3_1)
        * [2013](#sub_section_1_3_2)
        * [2016](#sub_section_1_3_3)
        * [2022](#sub_section_1_3_4)
    * [1.4 Financial data](#section_1_4)
* [2. Build networks with industries as nodes](#chapter2)
    * [2.1 Calculate the common attributes and build a network df](#section_2_1)
        * [Filter the dataframes by Weight (>700)](#sub_section_2_1_1)
    * [2.2 Create network graphs and visualize them](#section_2_2)
    * [2.3 Create clusters by applying the louvain method](#section_2_3)
        * [2011](#sub_section_2_3_1)
        * [2013](#sub_section_2_3_2)
        * [2016](#sub_section_2_3_3)
        * [2022](#sub_section_2_3_4)


### Import packages and files

In [227]:
# Import packages
import pandas as pd
import networkx as nx
from collections import defaultdict
import matplotlib.pyplot as plt
import numpy as np
import os
from sklearn.preprocessing import MinMaxScaler
import community as community_louvain
import random
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from itertools import combinations
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import mpl_interactions.ipyplot as iplt
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display, clear_output


In [228]:
# Import csv files
# CIQ Data extract

CIQ_df = pd.read_csv('../Files/CIQ_data_DS_LAB_Preprocessed.csv')

# Network data
df_2011 = pd.read_csv('../Files/labeled_network_2011_dach_20240114.csv')
df_2013 = pd.read_csv('../Files/labeled_network_2013_dach_20240114.csv')
df_2016 = pd.read_csv('../Files/labeled_network_2016_dach_20240114.csv')
df_2022 = pd.read_csv('../Files/labeled_network_2022_dach_20240114.csv')


In [229]:
# 2022: Update the 'label' column where it contains 'bm', 'supplier', 'customer', 'ceo' or 'cfo'
df_2022.loc[df_2022['label'].str.contains('bm'), 'label'] = 'person'
df_2022.loc[df_2022['label'].str.contains('supplier'), 'label'] = 'org'
df_2022.loc[df_2022['label'].str.contains('customer'), 'label'] = 'org'
df_2022.loc[df_2022['label'].str.contains('ceo'), 'label'] = 'person'
df_2022.loc[df_2022['label'].str.contains('cfo'), 'label'] = 'person'

# 1. Build networks with countries as nodes <a class='anchor' id='chapter1'></a>

In [230]:
# Function to exchange each company with its HOME_COUNTRY
def replace_company_with_country(df, industry=None):

    # Filter out rows that don't belong to the specified industry, if an industry is specified
    if industry is not None:
        industry_companies = df[(df['label'] == 'industry') & (df['attribute'] == industry)]['company']
        df_filtered = df[df['company'].isin(industry_companies)].copy()
    else:
        df_filtered = df.copy()

    # Create a dictionary to map companies to their home countries
    country_map = df_filtered[df_filtered['label'] == 'home_country'].set_index('company')['attribute'].to_dict()

    # Replace company names with their corresponding home country using .loc for proper assignment
    df_filtered.loc[:, 'company'] = df_filtered['company'].map(country_map)

    # Remove rows where company is NaN, indicating no home_country was found
    df_filtered.dropna(subset=['company'], inplace=True)

    # Optionally remove rows for non-European companies
    excluded_countries = ['united states', 'bermuda', 'australia', 'jersey']
    df_filtered = df_filtered[~df_filtered['company'].isin(excluded_countries)]

    # Remove rows where label is 'home_country' as these are no longer needed
    df_filtered = df_filtered[df_filtered['label'] != 'home_country']

    # Rename the 'company' column to 'country'
    #df_filtered.rename(columns={'company': 'country'}, inplace=True)

    return df_filtered

We will apply this function for the whole data (all industries) and for the top 3 industries across all years: Diversified Banks, Industrial Machinery and Supplies and Components, Pharmaceuticals.

#### All industries <a id="sub_section_1_0_1"></a>

In [231]:
# Apply the function for 2011, 2013, 2016 and 2022 to replace the company with its home_country
countries_2011 = replace_company_with_country(df_2011)
countries_2013 = replace_company_with_country(df_2013)
countries_2016 = replace_company_with_country(df_2016)
countries_2022 = replace_company_with_country(df_2022)

In [232]:
# Have a look into the result
countries_2011

Unnamed: 0,company,label,attribute
0,united kingdom,city,paris
1,united kingdom,city,wales
2,united kingdom,city,hyderabad
3,united kingdom,country,china
4,united kingdom,country,japan
...,...,...,...
188607,switzerland,person,deutscher herold
188608,switzerland,person,dieter wemmer
188609,switzerland,person,vernon louis sankey
188610,switzerland,person,dublin orange stone


#### Diversified Banks <a id="sub_section_1_0_2"></a>

In [233]:
# Apply the function for 2011, 2013, 2016 and 2022 to replace the company with its home_country
countries_2011_divbanks = replace_company_with_country(df_2011, industry='diversified banks')
countries_2013_divbanks = replace_company_with_country(df_2013, industry='diversified banks')
countries_2016_divbanks = replace_company_with_country(df_2016, industry='diversified banks')
countries_2022_divbanks = replace_company_with_country(df_2022, industry='diversified banks')

#### Industrial Machinery and Supplies and Components <a id="sub_section_1_0_3"></a>

In [234]:
# Apply the function for 2011, 2013, 2016 and 2022 to replace the company with its home_country
countries_2011_indmachinery = replace_company_with_country(df_2011, industry='industrial machinery and supplies and components')
countries_2013_indmachinery = replace_company_with_country(df_2013, industry='industrial machinery and supplies and components')
countries_2016_indmachinery = replace_company_with_country(df_2016, industry='industrial machinery and supplies and components')
countries_2022_indmachinery = replace_company_with_country(df_2022, industry='industrial machinery and supplies and components')

#### Pharmaceuticals <a id="sub_section_1_1_4"></a>

In [235]:
# Apply the function for 2011, 2013, 2016 and 2022 to replace the company with its home_country
countries_2011_pharma = replace_company_with_country(df_2011, industry='pharmaceuticals')
countries_2013_pharma = replace_company_with_country(df_2013, industry='pharmaceuticals')
countries_2016_pharma = replace_company_with_country(df_2016, industry='pharmaceuticals')
countries_2022_pharma = replace_company_with_country(df_2022, industry='pharmaceuticals')

### 1.1 Calculate the common attributes and build a network df <a id="section_1_1"></a>

We will calculate the common attributes and build a network dataframe with columns 'Source', 'Target', 'Common Attributes' and 'Weight'. Again, we will apply the function to the whole data and for the top 3 industries separately.

In [236]:
def calculate_common_attributes(df):

    if 'company' not in df.columns or 'attribute' not in df.columns or 'label' not in df.columns:
        raise ValueError("Dataframe must contain 'company', 'attribute', and 'label' columns")

    # Define weights for different labels
    label_weights = {'person': 10, 'org': 7, 'country': 4, 'city': 4, 'other': 1}

    # Preprocess the dataframe for efficient lookup
    attribute_data = {}
    for company in df['company'].unique():
        attributes = df[(df['company'] == company) & df['attribute'].notna()][['attribute', 'label']]
        attribute_data[company] = attributes.set_index('attribute')['label'].to_dict()

    # Initialize a list to store the results
    comparison_data = []

    # Generate all combinations of company pairs and calculate common attributes
    for company1, company2 in combinations(attribute_data.keys(), 2):
        attributes1 = attribute_data[company1]
        attributes2 = attribute_data[company2]

        # Find common attributes and calculate weighted weight
        common_attributes = set(attributes1).intersection(set(attributes2))
        weight = sum(min(label_weights.get(attributes1[attr], 1), label_weights.get(attributes2[attr], 1)) for attr in common_attributes)

        # Append the result
        comparison_data.append({
            'Source': company1,
            'Target': company2,
            'Common Attributes': ', '.join(common_attributes),
            'Weight': weight
        })

    # Convert the result into a DataFrame
    comparison_df = pd.DataFrame(comparison_data)
    return comparison_df


#### All industries <a id="sub_section_1_1_1"></a>

In [237]:
# Apply the funciton to create network dataframes for 2011, 2013, 2016 and 2022
countries_2011_network_df = calculate_common_attributes(countries_2011)
countries_2013_network_df = calculate_common_attributes(countries_2013)
countries_2016_network_df = calculate_common_attributes(countries_2016)
countries_2022_network_df = calculate_common_attributes(countries_2022)

In [238]:
# Result for 2011
countries_2011_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
5,united kingdom,france,"timor-leste, environment health and safety, tr...",5130
3,united kingdom,switzerland,"dundee, share capital, hsbc group, são paulo, ...",4260
7,united kingdom,germany,"dundee, timor-leste, share capital, são paulo,...",4079
69,switzerland,germany,"group policy, dundee, group condensed, share c...",4064
94,france,germany,"smart grid, it services, timor-leste, agf grou...",4002
...,...,...,...,...
167,finland,cyprus,"financial instruments, italy, yale university,...",85
155,luxembourg,cyprus,"financial instruments, health and safety, ital...",83
168,portugal,malta,"spain, switzerland, algeria, germany, united k...",61
169,portugal,cyprus,"switzerland, bnp paribas, united states, europ...",35


In [239]:
# Result for 2013
countries_2013_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
4,united kingdom,france,"456.4, environment health and safety, shanghai...",4987
3,united kingdom,switzerland,"rhodes, alice wallenberg foundation, integrate...",4625
6,united kingdom,germany,"share capital, são paulo, pfizer inc, budapest...",4203
64,switzerland,germany,"share capital, são paulo, pfizer inc, budapest...",4090
76,france,germany,"it services, timor-leste, agf group, share cap...",3843
...,...,...,...,...
146,poland,cyprus,"italy, recognition and measurement, lithuania,...",96
151,luxembourg,cyprus,"johannesburg, italy, kazakhstan, united kingdo...",81
145,poland,malta,"italy, european economic area, supreme court, ...",78
150,luxembourg,malta,"italy, european economic area, european centra...",77


In [240]:
# Result for 2016
countries_2016_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
4,united kingdom,france,"financial data, global ethics and compliance, ...",6109
3,united kingdom,switzerland,"unrealised losses, environment health and safe...",5822
5,united kingdom,germany,"financial data, environment health and safety,...",4934
66,switzerland,france,"smart grid, share capital, gen energy, environ...",4037
80,france,germany,"financial data, joint ventures associates, sha...",3866
...,...,...,...,...
169,portugal,cyprus,"joint arrangements, switzerland, financial ins...",54
160,poland,cyprus,"finland, switzerland, canada, united states, e...",50
170,malta,cyprus,"london stock exchange, united states, brussels...",42
159,poland,malta,"germany, united states, european union, europe...",36


In [241]:
# Result for 2022
countries_2022_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
8,united kingdom,germany,"financial data, competition appeal tribunal, i...",8459
5,united kingdom,france,"dundee, timor-leste, environment health and sa...",8390
3,united kingdom,switzerland,"global ethics and compliance, auxly cannabis g...",8375
43,sweden,germany,"financial data, biodiversity expert group, ass...",7798
60,netherlands,luxembourg,sas fr eurofins environment testing lux holdin...,7716
...,...,...,...,...
63,netherlands,czechia,"prague, research and development, united kingd...",32
170,czechia,portugal,"research and development, united states, europ...",31
174,czechia,cyprus,"united states, 0, board member, competition an...",26
173,czechia,poland,"prague, united states, european commission, cz...",20


#### Diversified Banks <a id="sub_section_1_1_2"></a>

In [242]:
# Apply the funciton to create network dataframes for 2011, 2013, 2016 and 2022
countries_2011_network_df_divbanks = calculate_common_attributes(countries_2011_divbanks)
countries_2013_network_df_divbanks = calculate_common_attributes(countries_2013_divbanks)
countries_2016_network_df_divbanks = calculate_common_attributes(countries_2016_divbanks)
countries_2022_network_df_divbanks = calculate_common_attributes(countries_2022_divbanks)

In [243]:
countries_2022_network_df_divbanks.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
13,spain,united kingdom,"share capital, hsbc group, são paulo, global s...",1030
15,spain,france,"bnp paribas securities services sca, madrid, u...",838
39,united kingdom,france,"group risk, global markets and securities serv...",814
24,italy,france,"top employer, group financial risk, group risk...",778
11,spain,italy,"stakeholder capitalism metrics, share capital,...",759
...,...,...,...,...
56,germany,denmark,"financial disclosures, ireland, european commi...",111
34,ireland,denmark,"capital markets, strategy transformation, irel...",107
35,ireland,norway,"capital markets, capital group, group strategy...",99
61,denmark,belgium,"financial disclosures, ireland, capital requir...",99


#### Industrial Machinery and Supplies and Components <a id="sub_section_1_1_3"></a>

In [244]:
# Apply the funciton to create network dataframes for 2011, 2013, 2016 and 2022
countries_2011_network_df_indmachinery = calculate_common_attributes(countries_2011_indmachinery)
countries_2013_network_df_indmachinery = calculate_common_attributes(countries_2013_indmachinery)
countries_2016_network_df_indmachinery = calculate_common_attributes(countries_2016_indmachinery)
countries_2022_network_df_indmachinery = calculate_common_attributes(countries_2022_indmachinery)

In [245]:
countries_2022_network_df_indmachinery.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
13,sweden,united kingdom,"health and safety, são paulo, budapest, italy,...",560
12,sweden,switzerland,"são paulo, budapest, italy, united nations glo...",494
15,sweden,finland,"italy, united nations global compact, slovenia...",453
35,switzerland,united kingdom,"johannesburg, share capital, são paulo, budape...",401
11,sweden,germany,"italy, united nations global compact, slovenia...",387
40,united kingdom,finland,"share capital, 4, italy, slovenia, lithuania, ...",342
10,sweden,spain,"noneligible activities, health and safety, sta...",325
30,germany,switzerland,"italy, european economic area, united nations ...",324
37,switzerland,finland,"share capital, italy, united nations global co...",315
26,spain,united kingdom,"johannesburg, health and safety, madrid, italy...",308


#### Pharmaceuticals <a id="sub_section_1_1_4"></a>

In [246]:
# Apply the funciton to create network dataframes for 2011, 2013, 2016 and 2022
countries_2011_network_df_pharma = calculate_common_attributes(countries_2011_pharma)
countries_2013_network_df_pharma = calculate_common_attributes(countries_2013_pharma)
countries_2016_network_df_pharma = calculate_common_attributes(countries_2016_pharma)
countries_2022_network_df_pharma = calculate_common_attributes(countries_2022_pharma)

In [247]:
countries_2022_network_df_pharma.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
2,united kingdom,switzerland,"tunisia, global ethics and compliance, são pau...",963
0,united kingdom,germany,"são paulo, pfizer inc, budapest, italy, verona...",653
8,germany,switzerland,"são paulo, budapest, italy, slovenia, supreme ...",531
6,united kingdom,belgium,"madrid, rwanda, budapest, antibribery and anti...",374
3,united kingdom,denmark,"tunisia, health and safety, slovenia, belgrade...",368
18,switzerland,denmark,"tunisia, us supreme court, slovenia, lithuania...",352
21,switzerland,belgium,"madrid, rwanda, budapest, united nations, espo...",336
12,germany,belgium,"budapest, united nations, espoo, upstream tran...",293
9,germany,denmark,"financial disclosures, corporate governance re...",248
1,united kingdom,france,"tunisia, consumer healthcare, ireland, biomedi...",246


### 1.2 Create network graphs and visualize them on a Europe map <a id="section_1_2"></a>

In this section, we will use the coordinates of European capital cities to plot the network graphs on a Europe map. For the color of the nodes, we use the 'coolwarm' color map where the color is dependent of degree (10%) and sum of connections with weights (90%). For the edges, we use 'Reds' where edges with a higher weight are more red than edges with a low weight.

In [248]:
# Function to create a graph from a dataframe
def create_graph_from_df(df, source_col='Source', target_col='Target', weight_col='Weight'):
    G = nx.Graph()
    for _, row in df.iterrows():
        # Use a default weight of 1 if the weight column is not in the DataFrame or the value is missing
        weight = row.get(weight_col, 0)
        G.add_edge(row[source_col], row[target_col], weight=weight)
    return G
    

# Normalizing function for longitude and latitude
def normalize_coordinates(coord, min_coord, max_coord, axis_size):
    # Scale the coordinate from [0, axis_size]
    return axis_size * (coord - min_coord) / (max_coord - min_coord)
    

# Coordinates for the capital cities or central points of countries
country_coordinates = {
    'italy': (12.4964, 41.9028),  # Rome, Italy
    'france': (2.3522, 48.8566),  # Paris, France
    'germany': (13.4050, 52.5200),  # Berlin, Germany
    'spain': (-3.7038, 40.4168),  # Madrid, Spain
    'united kingdom': (-0.1276, 51.5074),  # London, UK
    'portugal': (-9.1393, 38.7223),  # Lisbon, Portugal
    'belgium': (4.3517, 50.8503),  # Brussels, Belgium
    'netherlands': (4.8952, 52.3702),  # Amsterdam, Netherlands
    'switzerland': (7.4474, 46.9480),  # Bern, Switzerland
    'austria': (16.3738, 48.2082),  # Vienna, Austria
    'sweden': (18.0686, 59.3293),  # Stockholm, Sweden
    'norway': (10.7522, 59.9139),  # Oslo, Norway
    'denmark': (12.5683, 55.6761),  # Copenhagen, Denmark
    'finland': (24.9384, 60.1699),  # Helsinki, Finland
    'poland': (21.0122, 52.2297),  # Warsaw, Poland
    'czech republic': (14.4378, 50.0755),  # Prague, Czech Republic
    'czechia': (14.4378, 50.0755),
    'greece': (23.7275, 37.9838),  # Athens, Greece
    'hungary': (19.0402, 47.4979),  # Budapest, Hungary
    'ireland': (-6.2603, 53.3498),  # Dublin, Ireland
    'luxembourg': (6.1303, 49.6116),  # Luxembourg City, Luxembourg
    'slovakia': (17.1077, 48.1486),  # Bratislava, Slovakia
    'slovenia': (14.5058, 46.0569),  # Ljubljana, Slovenia
    'croatia': (15.9819, 45.8150),  # Zagreb, Croatia
    'estonia': (24.7536, 59.4370),  # Tallinn, Estonia
    'latvia': (24.1052, 56.9496),  # Riga, Latvia
    'lithuania': (25.2799, 54.6872),  # Vilnius, Lithuania
    'bulgaria': (23.3219, 42.6977),  # Sofia, Bulgaria
    'romania': (26.1025, 44.4268),  # Bucharest, Romania
    'serbia': (20.4489, 44.7866),  # Belgrade, Serbia
    'bosnia and herzegovina': (18.4131, 43.8563),  # Sarajevo, Bosnia and Herzegovina
    'albania': (19.8187, 41.3275),  # Tirana, Albania
    'north macedonia': (21.4254, 42.0038),  # Skopje, North Macedonia
    'montenegro': (19.2629, 42.4304),  # Podgorica, Montenegro
    'kosovo': (21.1669, 42.6675),  # Pristina, Kosovo
    'malta': (14.5146, 35.8989),  # Valletta, Malta
    'cyprus': (33.3823, 35.1856),  # Nicosia, Cyprus
    'iceland': (-21.8277, 64.1265),  # Reykjavik, Iceland
    'andorra': (1.5218, 42.5063),  # Andorra la Vella, Andorra
    'liechtenstein': (9.5209, 47.1410),  # Vaduz, Liechtenstein
    'monaco': (7.4246, 43.7384),  # Monaco
    'san marino': (12.4578, 43.9424),  # San Marino
    'vatican city': (12.4534, 41.9029)  # Vatican City
}


def visualize_graph_on_map(df, country_coordinates=country_coordinates, title=None, axis_size=(12, 10), show=True):
    plt.ioff()
    # Create a graph
    G = create_graph_from_df(df)

    # Create a plot with the map projection
    fig, ax = plt.subplots(figsize=axis_size, subplot_kw={'projection': ccrs.PlateCarree()})

    # Add features to the map: coastlines, borders, and land
    ax.add_feature(cfeature.COASTLINE)
    ax.add_feature(cfeature.BORDERS, linestyle=':')
    ax.add_feature(cfeature.LAND, facecolor='lightgray')

    # Set the extent for the map to focus on a specific region (e.g., Europe)
    ax.set_extent([-25, 40, 34, 65], crs=ccrs.PlateCarree())

    # Normalizing node sizes (do this only once)
    max_degree = max(dict(G.degree()).values())
    node_sizes = [(G.degree(n) / max_degree * 100) + 10 for n in G.nodes()]  # Node size in range [10, 100]

    # Calculate combined metric for node colors: degree + sum of edge weights
    node_weights = {node: sum([G[node][neighbor]['weight'] for neighbor in G[node]]) for node in G.nodes()}
    max_weight = max(node_weights.values())
    combined_metric = {node: (G.degree(node) / max_degree) * 0.1 + (node_weights[node] / max_weight) * 0.9 for node in G.nodes()}
    
    # Normalize the combined metric and create color mapping
    max_combined_metric = max(combined_metric.values())
    normalized_metric = {node: metric / max_combined_metric for node, metric in combined_metric.items()}
    node_colors = [plt.cm.coolwarm(normalized_metric[node]) for node in G.nodes()]

    # Using a colormap to map weights to colors for edges
    edge_weights = np.array([G[u][v]['weight'] for u, v in G.edges()])
    edge_color_map = plt.colormaps['Reds']
    edge_colors = edge_color_map(edge_weights / edge_weights.max())

    # Visualize the graph
    nx.draw(G, pos={country: country_coordinates[country] for country in G.nodes() if country in country_coordinates}, 
            ax=ax, with_labels=True, node_color=node_colors, node_size=node_sizes, edge_color=edge_colors, edge_cmap=edge_color_map)
    plt.title(title)

    if show == False:
        return fig

    plt.show()

#### All industries <a id="sub_section_1_2_1"></a>

In [249]:
%matplotlib ipympl

In [250]:
def visualize_graph_on_map_interactive(dataframes, title='Countries as Nodes'):
    # Precompute figures
    precomputed_figures = {}
    for year, df in dataframes.items():
        precomputed_figures[year] = visualize_graph_on_map(df, axis_size=(15, 12), show = False, title=title)

    def update_plot_precomputed(year):
        
        # Clear the output of the widget to remove the old plot
        clear_output(wait=True)

        # Display the precomputed figure corresponding to the selected year
        fig = precomputed_figures[year]
        display(fig)

    # Create a dropdown to select the dataframe
    dropdown = widgets.Dropdown(options=list(dataframes.keys()))

    # Create an interactive output for your plot
    output = widgets.interactive_output(update_plot_precomputed, {'year': dropdown})

    # Display the dropdown and output
    display(dropdown, output)


In [251]:
# All industries
all_industries = {'2011': countries_2011_network_df, '2013': countries_2013_network_df, '2016': countries_2016_network_df, '2022': countries_2022_network_df}
visualize_graph_on_map_interactive(all_industries, title = 'Countries as Nodes - All Industries')

Dropdown(options=('2011', '2013', '2016', '2022'), value='2011')

Output()

This result can be interpreted through various economic, geopolitical and historical lenses:

1. **United Kingdom:**
The UK's leading position might be due to its significant financial sector, especially London's role as a global financial hub. The UK has a diverse economy with strong sectors in finance, pharmaceuticals and energy, which aligns with the key industries identified in your first network. The historical legacy of the British Empire and the English language's global reach might also contribute to its centrality.

2.  **Germany:**
As Europe's largest economy, Germany's prominence is likely due to its strong industrial base, particularly in automotive, machinery and chemical sectors. Germany is known for its engineering expertise, robust manufacturing sector and its role as a leader in the European Union (EU).

3. **France:**
France's significance can be attributed to its diversified economy, with strengths in sectors like aerospace, defense, luxury goods and pharmaceuticals. France is also a key player in the EU and has a global cultural and economic influence.

4. **Netherlands:**
The Netherlands is a major logistical and trade hub, thanks to its strategic location and the Port of Rotterdam, one of the world's largest ports. The country also has a strong presence in sectors like oil and gas, chemicals and financial services.

5. **Belgium and Luxembourg:**
These countries, though smaller in size, are important due to their strategic positions in Europe and their roles in EU and international politics. Brussels hosts several key EU institutions, and Luxembourg is a notable financial center, especially for private banking and fund management.

6. **Spain:**
Spain's importance could stem from its diverse economy, with strengths in telecommunications, banking and renewable energy. Spain also benefits from its cultural and linguistic ties to Latin America.

7. **Sweden:**
Known for its high level of innovation and strong welfare state, Sweden has a robust economy with leading companies in technology, automotive and pharmaceutical sectors.

8. **Italy:**
Italy's significance is likely due to its fashion and luxury goods sector, as well as its automotive, machinery and food industries. Italy also has a rich cultural heritage and is a key EU economy.

The prominence of these countries in our network reflects a combination of their economic size, diversity and integration into global markets. Their positions are likely influenced by factors such as:
- Economic policies and stability.
- Infrastructure, particularly in transportation and logistics.
- Innovation capacity and technological advancements.
- Historical and cultural ties that influence trade and economic relationships.
- Political and economic integration within the EU and other international bodies.

In summary, the network's structure likely mirrors the complex interplay of historical, economic and geopolitical factors that shape the global business landscape.


#### Diversified Banks <a id="sub_section_1_2_2"></a>

In [252]:
# Diversified Banks
div_banks = {'2011': countries_2011_network_df_divbanks, '2013': countries_2013_network_df_divbanks, '2016': countries_2016_network_df_divbanks, '2022': countries_2022_network_df_divbanks}
visualize_graph_on_map_interactive(div_banks, title = 'Countries as Nodes - Diversified Banks')

Dropdown(options=('2011', '2013', '2016', '2022'), value='2011')

Output()

1. **United Kingdom:** The UK, particularly London, is a global financial center. Its leading position in our network is likely due to the size and scope of its banking sector, the presence of numerous multinational banks, and London's role in global finance (e.g., foreign exchange market, international banking). The regulatory environment, historical significance, and the English language's status as the lingua franca of business also contribute to its centrality.

2. **Spain:** The prominence of Spain, particularly in diversified banks, might be surprising but can be attributed to the global reach and influence of its largest banks. Spanish banks have significant operations not just in Europe but also in Latin America, which might boost their prominence in our network. The Spanish banking sector, known for its robustness and expansion, has weathered various economic challenges, potentially leading to a more interconnected and influential role in the global banking network.

3. **France** and **Italy:** Both countries have large economies with significant banking sectors. French and Italian banks are among the largest in Europe, with extensive domestic and international operations. The size of their economies and the banks' roles in financing industrial and consumer activities contribute to their prominence.

4. The **Netherlands**, **Austria**, and **Germany**:
- Netherlands: Known for its stable and innovative financial sector, the Netherlands hosts several large, internationally active banks. Its strategic location and role as a gateway to Europe also enhance its significance in the banking sector.
- Austria: While smaller compared to others, Austria's banking sector is known for its role in Central and Eastern Europe, acting as a bridge between Western and Eastern European markets.
- Germany: Despite being Europe's largest economy, Germany's position in the middle range might be due to the unique structure of its banking system, which is regionally focused and includes a large number of smaller, local banks (Sparkassen and Volksbanken). However, its few large banks do have a significant international presence.

The structure of this network within the 'diversified banks' industry could be due to factors such as:
* Global Expansion: Banks from these countries may have a significant presence in international markets, influencing their centrality in the network.
* Historical and Linguistic Ties: These can shape international banking relationships and the flow of capital.
* Regulatory Environment: Countries with stable, transparent, and favorable regulatory environments for banking can attract more international business.
* Economic Policies: Policies that promote financial services, fintech, and international banking can enhance the role of a country's banking sector.
* Crisis Management and Resilience: Banks that have successfully navigated financial crises may emerge stronger and more interconnected within the global banking network.

Understanding the dynamics of the 'diversified banks' industry requires considering not only the size of the banks but also their strategies, the regulatory environment, historical ties, and how these banks interact within the global financial system.

#### Industrial Machinery and Supplies and Components <a id="sub_section_1_2_3"></a>

In [253]:
# Industrial Machinery and Supplies and Components
ind_mach = {'2011': countries_2011_network_df_indmachinery, '2013': countries_2013_network_df_indmachinery, '2016': countries_2016_network_df_indmachinery, '2022': countries_2022_network_df_indmachinery}
visualize_graph_on_map_interactive(ind_mach, title = 'Countries as Nodes - Industrial Machinery and Supplies and Components')

Dropdown(options=('2011', '2013', '2016', '2022'), value='2011')

Output()

1. **Sweden:** Sweden's leading position might be surprising, but it can be attributed to its strong tradition in high-quality manufacturing, innovation, and engineering. Sweden is home to some world-leading companies in areas such as automotive, precision engineering, and telecommunications. Companies like SKF and Sandvik are renowned for their industrial machinery and advanced manufacturing technologies. Sweden's focus on sustainability, innovation, and a highly skilled workforce contribute to its prominence in this sector.
2. **United Kingdom:** The UK's industrial sector has a strong tradition, particularly in areas like aerospace, automotive, and industrial equipment. Despite deindustrialization trends in recent decades, the UK still hosts significant industrial machinery and component manufacturers, often focused on high-value, precision-engineered products.
3. **Switzerland:** Known for its precision engineering and high-quality manufacturing, Switzerland hosts numerous companies specializing in industrial machinery and components, particularly for the pharmaceutical, biotech, and food industries. Swiss companies are known for their reliability, quality, and focus on high-technology sectors, which makes them key players in the global market for industrial machinery.
4. **Germany:** Germany's position is consistent with its reputation as an industrial powerhouse. The country is renowned for its engineering expertise, innovation, and the quality of its industrial products. The German Mittelstand (medium-sized companies) forms the backbone of this sector, with numerous companies that are global leaders in their niche areas. Companies like Siemens, Bosch, and Thyssenkrupp have substantial international influence in industrial machinery and components.
5. **Austria:** Austria might not be the largest economy, but it has a strong industrial base. Austrian companies, often part of larger German or European conglomerates, are known for their expertise in specialized industrial machinery and components, with a focus on quality and innovation. However, according to the plots, Austria lost its relevence in this sector over the years.

The structure of this network within the 'industrial machinery, supplies, and components' industry could be due to factors such as:
- Historical Industrial Base: Countries with a long history of industrialization tend to have established companies with deep expertise and innovation in industrial machinery.
- High-Quality Manufacturing: A focus on quality, precision, and reliability makes products from these countries preferred in the global market.
- Innovation and R&D: Countries that invest heavily in research and development tend to lead in creating advanced industrial machinery and components.
- Skilled Workforce: A highly skilled workforce is critical for sophisticated manufacturing and engineering.
- Export Orientation: Countries that have a strong export orientation and favorable trade relationships can effectively distribute their industrial machinery and components globally.

#### Pharmaceuticals <a id="sub_section_1_2_4"></a>

In [254]:
# Pharmaceuticals
pharma = {'2011': countries_2011_network_df_pharma, '2013': countries_2013_network_df_pharma, '2016': countries_2016_network_df_pharma, '2022': countries_2022_network_df_pharma}
visualize_graph_on_map_interactive(pharma, title = 'Countries as Nodes - Pharmaceuticals')

Dropdown(options=('2011', '2013', '2016', '2022'), value='2011')

Output()

1. **United Kingdom:** The UK's leading position in pharmaceuticals is well-documented. The country has a strong tradition of scientific research and innovation, supported by world-class universities and research institutions. The UK is home to some of the world's largest pharmaceutical companies, such as GlaxoSmithKline and AstraZeneca. The country's regulatory environment, represented by the Medicines and Healthcare products Regulatory Agency (MHRA), is also considered one of the most efficient and rigorous, making it an attractive place for clinical trials and drug approvals. The UK government’s investment in life sciences and the presence of a significant biotech startup ecosystem contribute to its leading role in this network.
2. **Switzerland:** Switzerland's prominence in the pharmaceutical industry is well established. It is home to giants like Novartis and Roche, which are among the world's largest pharmaceutical companies in terms of revenue and R&D investment. Switzerland's strength in this sector is also supported by its strong intellectual property laws, favorable tax environment, highly skilled workforce, and a tradition of precision and quality in manufacturing. The Swiss pharmaceutical sector benefits from substantial investment in R&D and a significant presence in global markets.
3. **Germany:** Germany's position as a key player in the pharmaceutical sector is underpinned by its sizeable market, strong industrial base, and extensive R&D activities. Companies like Bayer and Boehringer Ingelheim are known for their global presence and significant contributions to pharmaceutical innovation. The country's strong engineering and chemical sectors complement its pharmaceutical industry, providing expertise and infrastructure for drug development and production. Additionally, Germany's robust and universal healthcare system creates a favorable domestic market for pharmaceutical products.

The structure of this network within the 'pharmaceuticals' industry can be attributed to factors such as:
- Strong R&D Infrastructure: Countries that invest heavily in research and development, and host world-class universities and research institutions, are typically at the forefront of pharmaceutical innovation.
- Presence of Major Pharmaceutical Companies: Countries that are home to large, multinational pharmaceutical companies tend to have a more significant influence in the industry due to the global reach and impact of these companies.
- Regulatory Environment: An efficient and rigorous regulatory environment is crucial for the development and approval of pharmaceutical products.
- Historical Legacy and Specialization: Countries with a long history and specialization in the pharmaceutical sector often develop ecosystems that support the growth and dominance of this industry.

### 1.3 Create clusters by applying the louvain method (all industries) <a id="section_1_3"></a>

In the following section, we will apply the Louvain method to identify clusters within the network. Here, we only use the network with all industries.

In [255]:
import matplotlib.colors as mcolors

def apply_louvain_interactive(df, seed=1):
    # Create a graph
    G = create_graph_from_df(df)

    # Set the random seed for reproducibility
    random.seed(seed)

    # Compute the best partition
    partition = community_louvain.best_partition(G, random_state=seed)

    # Drawing parameters
    pos = nx.spring_layout(G, seed=seed)  # Using the same seed for layout
    original_cmap = plt.colormaps['viridis']
    cmap = mcolors.ListedColormap(original_cmap(np.linspace(0, 1, max(partition.values()) + 1)))


    # Precompute community plots
    precomputed_figures = {}
    figure_size = (10, 10)  # Set a consistent figure size for all plots
    for i, com in enumerate(set(partition.values())):
        list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
        subgraph = G.subgraph(list_nodes)

        fig, ax = plt.subplots(figsize=figure_size)
        nx.draw(subgraph, pos={node: pos[node] for node in subgraph}, ax=ax, with_labels=True, node_color=[cmap(i) for n in subgraph])
        ax.set_title(f"Community {i}")
        
        plt.tight_layout()  # Adjust layout to fit in the figure area
        precomputed_figures[i] = fig

    def update_plot_community(community):
        clear_output(wait=True)
        display(precomputed_figures[community])

    # Create a dropdown to select the community
    dropdown = widgets.Dropdown(options=list(precomputed_figures.keys()), description="Community:")

    # Create an interactive output for your plot
    output = widgets.interactive_output(update_plot_community, {'community': dropdown})

    # Display the dropdown and output
    display(dropdown, output)

#### 2011 <a id="sub_section_1_3_1"></a>

In [256]:
# Apply the louvain method for 2011
apply_louvain_interactive(countries_2011_network_df)

Dropdown(description='Community:', options=(0, 1), value=0)

Output()

#### 2013 <a id="sub_section_1_3_2"></a>

In [257]:
# Apply the louvain method for 2013
apply_louvain_interactive(countries_2013_network_df)

Dropdown(description='Community:', options=(0, 1), value=0)

Output()

#### 2016 <a id="sub_section_1_3_3"></a>

In [258]:
# Apply the louvain method for 2016
apply_louvain_interactive(countries_2016_network_df)

Dropdown(description='Community:', options=(0, 1), value=0)

Output()

#### 2022 <a id="sub_section_1_3_4"></a>

In [259]:
# Apply the louvain method for 2022
apply_louvain_interactive(countries_2022_network_df)

Dropdown(description='Community:', options=(0, 1, 2, 3), value=0)

Output()

1. **Luxembourg and Netherlands Cluster:** Luxembourg and the Netherlands share several similarities that could explain their clustering:
- Economic Policies and Business Environment: Both countries are known for their favorable corporate tax policies and business-friendly environments, which attract multinational corporations and large financial institutions.
- Financial Services Sector: Luxembourg is a global hub for private banking and investment funds, while the Netherlands is known for its robust financial sector.
- EU Integration and Logistics: As founding members of the EU, both countries are deeply integrated into European economic structures. The Netherlands, with the Port of Rotterdam, is a crucial logistics and transportation hub, and Luxembourg is strategically positioned within Europe, enhancing its role in logistics and distribution.
- Size and Economic Structure: Their relatively small size and open economies make them more similar in economic structure compared to larger European countries.

2. **Spain and Portugal Cluster:** The clustering of Spain and Portugal can be attributed to:
- Geographical Proximity and Historical Ties: Sharing the Iberian Peninsula and a long history, these countries have significant cultural and linguistic ties.
- Economic Similarities: Both economies are somewhat diversified but have a strong emphasis on sectors like tourism, agriculture, and renewable energy.
- EU and Economic Integration: Spain and Portugal joined the EU at the same time (in 1986), which has led to parallel economic development and integration paths within the broader European context.

3. **Rest of the European Countries Cluster:** The clustering of the remaining European countries likely reflects:
- Diverse but Interconnected Economies: These countries have diverse economies but are interconnected through various sectors like manufacturing, technology, and services. This diversity within interconnectivity can lead to a more cohesive clustering.
- EU Integration: Being part of the EU (except for non-EU members in our analysis) creates common economic and regulatory frameworks, fostering closer economic ties.
- Historical and Cultural Links: Shared historical events, cultural ties and geographical proximity (especially for continental European countries) contribute to stronger economic relationships.

### 1.4 Financial data <a id="section_1_4"></a>

In this section, we extract the financial data to search for diffences between clusters.

In [260]:
# Define a function to calculate the financial data per country (mean of the companies in this country)
def group_and_calculate_mean(df):

    financial_labels = [
        "iq_capex", "iq_cash_oper", "iq_cogs", "iq_cost_rev", "iq_curr_taxes",
        "iq_ebitda", "iq_ebitda_margin", "iq_gp", "iq_gross_margin", "iq_int_exp_ltd",
        "iq_marketcap", "iq_net_debt", "iq_ni", "iq_ni_margin", "iq_oper_inc",
        "iq_return_assets", "iq_return_equity", "iq_total_assets", "iq_total_debt_capital",
        "iq_total_debt_equity", "iq_total_equity", "iq_total_liab", "iq_total_rev"
    ]

    # Filter the DataFrame for specific labels
    filtered_df = df[df['label'].isin(financial_labels)].copy()

    # Ensure 'attribute' column is numeric and handle missing values
    filtered_df['attribute'] = pd.to_numeric(filtered_df['attribute'], errors='coerce').fillna(0)

    # Group by specified columns and calculate the mean for the attribute
    grouped_df = filtered_df.groupby(['company', 'label'])['attribute'].mean().reset_index()

    # Pivot the DataFrame
    wide_df = grouped_df.pivot(index='company', columns='label', values='attribute')

    # Reset the index to make 'company' a column again
    wide_df.reset_index(inplace=True)

    # Optionally, reorder the columns if needed
    column_order = ['company', 'iq_total_rev', 'iq_capex', 'iq_cash_oper', 'iq_cogs', 'iq_cost_rev', 'iq_curr_taxes', 'iq_gp', 'iq_int_exp_ltd', 
                    'iq_marketcap', 'iq_net_debt', 'iq_ni', 'iq_total_debt_capital', 'iq_total_equity', 
                    'iq_total_liab', 'iq_ni_margin', 'iq_return_assets', 'iq_return_equity', 
                    'iq_total_debt_equity', 'iq_gross_margin']

    wide_df = wide_df[column_order]

    return wide_df

In [261]:
# Apply the function for 2011, 2013, 2016 and 2022 to get the financial data per country for each year
countries_2011_fin = group_and_calculate_mean(countries_2011)
countries_2013_fin = group_and_calculate_mean(countries_2013)
countries_2016_fin = group_and_calculate_mean(countries_2016)
countries_2022_fin = group_and_calculate_mean(countries_2022)

In [262]:
countries_2022_fin

label,company,iq_total_rev,iq_capex,iq_cash_oper,iq_cogs,iq_cost_rev,iq_curr_taxes,iq_gp,iq_int_exp_ltd,iq_marketcap,iq_net_debt,iq_ni,iq_total_debt_capital,iq_total_equity,iq_total_liab,iq_ni_margin,iq_return_assets,iq_return_equity,iq_total_debt_equity,iq_gross_margin
0,austria,13457.602222,-750.266667,1775.440333,8131.247333,8131.247333,932.338,5326.354889,14.415,8658.647421,-1419.995778,1485.013667,25.485833,10799.682333,67264.584444,15.774644,5.074556,15.148911,42.940322,21.665211
1,belgium,9600.439562,-740.32,2123.395937,5539.126063,5539.126063,297.083438,4061.3135,37.398194,14244.347216,3656.743875,807.190187,32.102938,12156.515812,40236.69225,25.098219,1.6454,7.900256,56.865419,40.7259
2,cyprus,2801.0,-794.0,206.0,1490.0,1490.0,164.0,1311.0,0.0,0.0,2523.0,-288.0,58.4737,2242.0,3838.0,-10.282,-0.4451,-12.9583,140.8117,46.8047
3,czechia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,denmark,48425.027667,-3087.019762,5378.533381,23387.688333,23387.688333,1944.217857,25037.339333,870.973429,200071.139277,44350.373524,6640.511095,35.409352,31714.154333,210935.122381,12.067424,8.762805,23.411571,75.205771,51.330514
5,finland,8451.413625,-397.680625,-65.806,5775.683187,5775.683187,168.125937,2675.730437,13.825,12007.999515,1000.15275,646.325062,31.718375,5757.151687,7768.048437,1.451037,6.712838,17.282287,52.16145,37.08225
6,france,26593.846941,-1601.043779,2672.967265,16241.619544,16242.075426,1010.28075,10351.771515,79.786853,32163.715029,-1268.951574,1564.676103,42.0971,20285.692926,129834.596794,8.293521,4.587851,-4.1384,79.630375,39.866793
7,germany,33596.07494,-1396.395955,3068.953881,24439.055179,25894.592493,659.624373,7701.482448,122.608776,21785.998357,8925.329149,1474.768612,38.09427,19555.393119,80423.472985,4.868399,3.476463,2.133213,88.882215,33.352325
8,ireland,10013.982,-418.697364,1504.646364,6760.617636,6760.617636,193.863182,3253.364364,56.344727,12722.755114,-430.158455,753.199182,35.945073,6854.531091,19681.006091,7.943436,4.266091,11.729082,61.353164,28.831727
9,italy,19873.07696,-1232.199386,5850.156412,13339.021198,14052.914055,680.944607,5820.162906,187.825179,13905.060324,12170.025284,1543.589768,43.272771,13818.356693,116502.833114,16.696811,3.926375,13.517261,135.93665,36.238664


The financial data does not appear correct. Therefore, we did not continue the analysis here, but we refer to the notebooks for the D-A-CH region for further analysis.

# 2. Build networks with industries as nodes <a class='anchor' id='chapter2'></a>

We build networks with industries as nodes. The overall structure here is quite similar to chapter 1.

In [263]:
# Function to exchange each company with its INDUSTRY
def replace_company_with_industry(df):

    # Create a dictionary to map companies to their home countries
    industry_map = df[df['label'] == 'industry'].set_index('company')['attribute'].to_dict()

    # Replace company names with their corresponding home country
    df['company'] = df['company'].map(industry_map)

    # Remove rows where company is NaN, indicating no industry was found
    df = df.dropna(subset=['company'])

    # Remove rows where label is 'industry'
    df = df[df['label'] != 'industry']

    # Rename the 'company' column to 'industry'
    #df = df.rename(columns={'company': 'industry'})

    return df

In [264]:
# Apply the function for 2011, 2013, 2016 and 2022 to replace the company with its industry
industries_2011 = replace_company_with_industry(df_2011)
industries_2013 = replace_company_with_industry(df_2013)
industries_2016 = replace_company_with_industry(df_2016)
industries_2022 = replace_company_with_industry(df_2022)

In [265]:
# Have a look into the result
industries_2011

Unnamed: 0,company,label,attribute
0,asset management and custody banks,city,paris
1,asset management and custody banks,city,wales
2,asset management and custody banks,city,hyderabad
3,asset management and custody banks,country,china
4,asset management and custody banks,country,japan
...,...,...,...
188607,multi_line insurance,person,deutscher herold
188608,multi_line insurance,person,dieter wemmer
188609,multi_line insurance,person,vernon louis sankey
188610,multi_line insurance,person,dublin orange stone


### 2.1 Calculate the common attributes and build a network df <a id="section_2_1"></a>

In [266]:
# Apply the calculate_common_attributes funciton to create network dataframes for 2011, 2013, 2016 and 2022
industries_2011_network_df = calculate_common_attributes(industries_2011)
industries_2013_network_df = calculate_common_attributes(industries_2013)
industries_2016_network_df = calculate_common_attributes(industries_2016)
industries_2022_network_df = calculate_common_attributes(industries_2022)

In [267]:
industries_2011_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
556,diversified banks,multi_line insurance,"pierre coll, share capital, association frança...",1885
573,diversified banks,pharmaceuticals,"planet finance, são paulo, alice wallenberg fo...",1703
331,industrial machinery and supplies and components,diversified banks,"share capital, são paulo, alice wallenberg fou...",1602
551,diversified banks,life and health insurance,"joint ventures associates, budapest, united na...",1520
549,diversified banks,apparel accessories and luxury goods,"são paulo, budapest, left bank, united nations...",1496
...,...,...,...,...
5995,industrial gases,interactive media and services,"european union, 0, united kingdom",3
6028,broadline retail,health care facilities,"european union, 0",2
1740,oil and gas exploration and production,interactive media and services,"0, united kingdom",2
1748,oil and gas exploration and production,multi-family residential reits,"0, united kingdom",2


In [268]:
industries_2013_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
569,diversified banks,multi_line insurance,"group policy, share capital, são paulo, rcs me...",1888
584,diversified banks,pharmaceuticals,"food and rural affairs, share capital, são pau...",1751
340,industrial machinery and supplies and components,diversified banks,"group policy, share capital, são paulo, budape...",1696
365,industrial machinery and supplies and components,pharmaceuticals,"share capital, environment health and safety, ...",1580
570,diversified banks,aerospace and defense,"share capital, são paulo, united nations, grou...",1487
...,...,...,...,...
4344,consumer staples merchandise retail,interactive home entertainment,poland,1
2023,diversified real estate activities,consumer staples merchandise retail,,0
4390,consumer staples merchandise retail,broadline retail,,0
4404,consumer staples merchandise retail,forest products,,0


In [269]:
industries_2016_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
529,diversified banks,pharmaceuticals,"group policy, unrealised losses, hsbc private ...",2028
1194,specialty chemicals,pharmaceuticals,"aspire target opportunity, vivawest wohnen gmb...",1745
310,industrial machinery and supplies and components,diversified banks,"unrealised losses, share capital, gen energy, ...",1725
544,diversified banks,integrated oil and gas,"são paulo, budapest, united nations, tortola, ...",1678
514,diversified banks,life and health insurance,"unrealised losses, budapest, united nations, e...",1645
...,...,...,...,...
1374,diversified real estate activities,health care reits,"switzerland, financial instruments, project de...",5
1406,diversified real estate activities,financial exchanges and data,"financial instruments, 0, 175, united kingdom",4
1441,diversified real estate activities,tires and rubber,sao tome and principe,4
1386,diversified real estate activities,self-storage reits,"financial instruments, 0, united kingdom",3


In [270]:
industries_2022_network_df.sort_values(by='Weight', ascending=False)

Unnamed: 0,Source,Target,Common Attributes,Weight
1103,life sciences tools and services,financial exchanges and data,"fopse sarl fr eurofins agrosciences services, ...",6746
5796,construction materials,apparel retail,"biodiversity expert group, associates western,...",4720
772,electric utilities,renewable electricity,"autonomous communities, iam caecius, wood mack...",3489
642,diversified banks,integrated telecommunication services,"group policy, joint ventures associates, europ...",2931
604,diversified banks,multi_line insurance,"group policy, hsbc private bank, share capital...",2743
...,...,...,...,...
6199,renewable electricity,home improvement retail,"australia, nonexecutive directors, united kingdom",6
3229,health care distributors,home improvement retail,"po box, australia, united kingdom",6
6953,home improvement retail,forest products,"60.0, united kingdom",2
6604,commodity chemicals,home improvement retail,united kingdom,1


#### Filter the dataframes by Weight (>700) <a id="sub_section_2_1_1"></a>

The whole data is hard to visualize because it contains to many industries. Therefore, we filter the data to the edges with a weight over 700.

In [271]:
# Filter the dataframes by weight
filtered_industries_2011_network_df = industries_2011_network_df[industries_2011_network_df['Weight'] > 700]
filtered_industries_2013_network_df = industries_2013_network_df[industries_2013_network_df['Weight'] > 700]
filtered_industries_2016_network_df = industries_2016_network_df[industries_2016_network_df['Weight'] > 700]
filtered_industries_2022_network_df = industries_2022_network_df[industries_2022_network_df['Weight'] > 700]

### 2.2 Create network graphs and visualize them <a id="section_2_2"></a>

In [272]:
def visualize_graph(df, title='Graph Visualization', k=20, axis_size=(10, 8), edge_threshold=500, label_font_size=8):
    # Create a graph
    G = create_graph_from_df(df)

    # Create a plot without map projection
    fig, ax = plt.subplots(figsize=axis_size)

    # Normalizing node sizes (do this only once)
    max_degree = max(dict(G.degree()).values())
    node_sizes = [(G.degree(n) / max_degree * 100) + 10 for n in G.nodes()]  # Node size in range [10, 100]

    # Calculate combined metric for node colors: degree + sum of edge weights
    node_weights = {node: sum([G[node][neighbor]['weight'] for neighbor in G[node]]) for node in G.nodes()}
    max_weight = max(node_weights.values())
    combined_metric = {node: (G.degree(node) / max_degree) + (node_weights[node] / max_weight) for node in G.nodes()}
    
    # Normalize the combined metric and create color mapping
    max_combined_metric = max(combined_metric.values())
    normalized_metric = {node: metric / max_combined_metric for node, metric in combined_metric.items()}
    node_colors = [plt.cm.coolwarm(normalized_metric[node]) for node in G.nodes()]

    # Using a colormap to map weights to colors for edges
    edge_weights = np.array([G[u][v]['weight'] for u, v in G.edges()])
    edge_color_map = plt.colormaps['Reds']
    edge_colors = edge_color_map(edge_weights / edge_weights.max())

    # Position nodes using NetworkX's spring layout
    pos = nx.spring_layout(G, seed=20, k=k, iterations=50)

    # Visualize the graph with smaller labels
    nx.draw_networkx_nodes(G, pos, node_color=node_colors, node_size=node_sizes, ax=ax)
    nx.draw_networkx_edges(G, pos, edge_color=edge_colors, ax=ax, edge_cmap=edge_color_map)
    nx.draw_networkx_labels(G, pos, font_size=label_font_size, ax=ax)

    plt.title(title)
    return fig


In [273]:
def visualize_graph_industries_interactive(dataframes, title='Industries as Nodes'):
    # Precompute figures
    precomputed_figures = {}
    for year, df in dataframes.items():
        precomputed_figures[year] = visualize_graph(df, title=title)

    def update_plot_precomputed(year):
        
        # Clear the output of the widget to remove the old plot
        clear_output(wait=True)

        # Display the precomputed figure corresponding to the selected year
        fig = precomputed_figures[year]
        display(fig)

    # Create a dropdown to select the dataframe
    dropdown = widgets.Dropdown(options=list(dataframes.keys()))

    # Create an interactive output for your plot
    output = widgets.interactive_output(update_plot_precomputed, {'year': dropdown})

    # Display the dropdown and output
    display(dropdown, output)

In [274]:
industries = {'2011': filtered_industries_2011_network_df, '2013': filtered_industries_2013_network_df, '2016': filtered_industries_2016_network_df, '2022': filtered_industries_2022_network_df}
visualize_graph_industries_interactive(industries, title = 'Industries as Nodes')

Dropdown(options=('2011', '2013', '2016', '2022'), value='2011')

Output()

#### Interpretation of the resuls (top 10 industries):

1. **Diversified Banks:** As a pivotal industry, diversified banks are central due to their role in financing other industries. Banks provide the necessary capital for investments, expansions and operations of companies across various sectors. Their financial services are integral to the economy, affecting market liquidity and credit availability.

2. **Industrial Machinery, Supplies, and Components:** This industry is crucial as it supplies essential tools and components to various other industries. It's a backbone for manufacturing sectors, contributing to the production capabilities and efficiencies of numerous industries.

3. **Pharmaceuticals:** Given the constant need for medical innovation and healthcare services, pharmaceuticals are always critical. This industry's significance could be amplified by ongoing health challenges, the aging population, and the need for new medications and treatments.

4. **Aerospace and Defense:** The strategic importance of aerospace and defense is evident due to its role in national security and its technological advancements. It's also a high-capital industry with extensive supply chains, impacting many other sectors.

5. **Integrated Telecommunication Services:** Telecommunications are essential for modern business operations, facilitating communication and data transfer. This industry's infrastructure serves as the backbone of digitalization and is crucial for the functioning of the global economy.

6. **Speciality Chemicals:** These chemicals are vital for various applications across industries, from agriculture to manufacturing and consumer goods. Their specialized nature makes them indispensable for specific processes and products.

7. **Integrated Oil and Gas:** Despite the shift towards renewable energy, oil and gas remain key to the global energy supply. This industry's influence comes from its role in powering industries, transportation, and households.

8. **Apparel, Accessories and Luxury Goods**: This sector's importance might be attributed to consumer spending patterns and the global nature of the fashion industry, which has extensive supply chains and market reach.

9. **Trading Companies and Distributors:** These entities facilitate the movement of goods across the globe, essential for global trade. They are crucial in supply chain management and in connecting producers with markets.

10. **Packaged Food and Meats:** This industry is vital due to the fundamental need for food. Its position reflects the importance of food security, supply chain efficiency, and the growing demand for processed and packaged foods.

Overall, the prominence of these industries in this network could be due to their economic size, global interconnectedness, strategic importance or essential nature in fulfilling basic needs and technological advancements. The results might also be influenced by current global trends, such as digital transformation, sustainability issues, healthcare advancements and changing consumer behaviors.

### 2.3 Create clusters by applying the louvain method <a id="section_2_3"></a>

#### 2011 <a id="sub_section_2_3_1"></a>

In [275]:
# Apply the louvain method for 2011
apply_louvain_interactive(filtered_industries_2011_network_df)

Dropdown(description='Community:', options=(0, 1, 2, 3, 4), value=0)

Output()

#### 2013 <a id="sub_section_2_3_2"></a>

In [276]:
# Apply the louvain method for 2013
apply_louvain_interactive(filtered_industries_2013_network_df)

Dropdown(description='Community:', options=(0, 1, 2, 3, 4), value=0)

Output()

#### 2016 <a id="sub_section_2_3_3"></a>

In [277]:
# Apply the louvain method for 2016
apply_louvain_interactive(filtered_industries_2016_network_df)

Dropdown(description='Community:', options=(0, 1, 2, 3), value=0)

Output()

#### 2022 <a id="sub_section_2_3_4"></a>

In [278]:
# Apply the louvain method for 2022
apply_louvain_interactive(filtered_industries_2022_network_df)

Dropdown(description='Community:', options=(0, 1, 2, 3, 4), value=0)

Output()