<a href="https://colab.research.google.com/github/Kunstenpunt/perform_europe/blob/main/data_analysis_tom.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Perform Europe Facebook Event Data Analysis
In this analysis, we start from the step 2 survey that was performed among more than 2000 respondents. In this survey, the respondents indicated their online presence. We gathered from this indication if these respondents also establish themselves on social media, c.q. Facebook, and if they use the Facebook Events module to promote their work.

Using the Facebook Events module is a straightforward way of creating structured data in an international standard (schema.org) that offers linked data. By doing so, the Event is not only widely shareable inside Facebook through the social functions of that platform. Moreover, because of the usage of a de facto standard (schema.org), the structured metadata of such events is also picked up in more general Knowledge Graphs, such as the one by Google. By consequence, the Event will also occur in the "events nearby" or "events coming up" module that Google offers when searching for cities, venues or performing arts organisations.

The implicit nudge of Facebook to generate structured metadata thus has a multiplication effect that increases the visibility of performing arts significantly.

The following analysis not only checks which of the respondents of this survey make use of the digital advantages of this approach. This is an interesting insight in the adoption of digital tools in the performing arts sector. In addition, we will also use the available structured metadata to investigate the core question of this survey: what are the characteristics of cross-border mobility in performing arts?

To make this analysis, we make use of the data science tool Colaboratory, a Google implemention of Python notebooks, widely used in data science.

## Preparing the data
First some technical steps to get the data., starting with loading libraries and mounting the google drive.

In [None]:
from google.colab import drive
from pandas import read_excel, isna, DataFrame, merge, offsets, concat, isnull
from re import sub
drive.mount("/content/gdrive")

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Now we read in the data, both from the survey as from the scraped structured metadata from Facebook, thanks to their adherence to the schema.org standard.

In [None]:
facebook_data_file = "/content/gdrive/Shareddrives/Perform Europe/Task 1. Mapping/STEP 2 (Feb-May2021)/FACEBOOK EVENTS ANALYSIS/data.xlsx"
pe_survey_data_file = "/content/gdrive/Shareddrives/Perform Europe/Task 1. Mapping/STEP 2 (Feb-May2021)/FACEBOOK EVENTS ANALYSIS/performeurope.xlsx"


In [None]:
pe_survey_data = read_excel(pe_survey_data_file)
facebook_data = read_excel(facebook_data_file)

Let's prepare the data. From a first exercise, we distinguished the following data points that we need to collect from the available data:

- Perform Europe Survey
 - respondent_id (CHECK)
 - in which country is the respondent based
   - available values in Q2 (CHECK)
   - clustered into macroregions (CHECK)
 - primary core activity
   - production (artistic creation, production, artist mangement), presentation (question 4) (CHECK)
 - how long is the respondent active in the sector
   - less than 5 years, between 5 - 10 years , more than 10 years (question 8) (CHECK)
 - what is the main discipline of the respondent
   - theatre or music theatre, dance, circus and/or street arts, performance, multidisciplinary (question 3) (CHECK)
 - is the respondent a freelancer or organisation
   - freelancer, organisation (question 6) (CHECK)
 - urban/rural focus of the performing arts organisation
   - urban, rural (question 7) (CHECK)
 - amount of employees
   - less than 5, between 5 - 15, between 15 - 30, more than 30 (question 10) (CHECK)
 - legal status
   - for-profit, non-profit, public institution, social enterprise (question 11) (CHECK)
 - how is the respondent funded 
   - public, private, mixed, self, crowd (question 12) (CHECK)
 - where is the work of the respondent already shown
   - available values in Q 16 (CHECK)
   - clustered by macroregions (CHECK)
- Facebook
 - datum (CHECK)
 - respondent_id (CHECK) 
 - event_id (CHECK)
 - location name (CHECK)
 - location city 
 - location country (CHECK)
 - location macroregion (CHECK)
- External data
 - amount of inhabitants per country
 - amount of inhabitants per macroregion

### Rural/urban
Is the working focus of the respondent urban or rural?

In [None]:
areas = [column for column in pe_survey_data.columns if column.startswith("7.")]
print(areas)

["7.1. {{tooltip 'An urban area' 'Urban areas cover cities, towns and suburbs'}}", "7.2. {{tooltip 'A rural area' 'Rural areas are all areas outside the urban cluster'}}"]


In [None]:
area_records = []
for row in pe_survey_data.iterrows():
  respondent_id = row[1]["Respondent"]
  values = []
  for area in areas:
    if not isna(row[1][area]):
      area_clean = "urban" if area.startswith("7.1") else "rural"
      values.append(area_clean)
  record = {
      'respondent_id': respondent_id,
      'area': sorted(values),
      'area_str': ", ".join(sorted(values))
  }
  area_records.append(record)
area_df = DataFrame.from_records(area_records)
area_df.set_index("respondent_id", inplace=True)
print(area_df.head())

                  area area_str
respondent_id                  
57             [urban]    urban
58             [urban]    urban
59                  []         
60                  []         
61             [urban]    urban


### Core activity (production/presentation)
What is the core activity of the respondent?

In [None]:
core_activities = pe_survey_data['4. Which is your primary core activity within the performing arts field?'].unique()
print(core_activities)

['Artist management, promotion and representation' 'Artistic creation' nan
 'Production' 'Presentation and programming (might include co-production)']


Let's map this:

In [None]:
activity_mapping = {
    'Artist management, promotion and representation': 'production',
    'Artistic creation': 'production',
    'Production': 'production',
    'Presentation and programming (might include co-production)': 'presentation'
 }

In [None]:
prod_pres_records = []
for row in pe_survey_data.iterrows():
  respondent_id = row[1]["Respondent"]
  core_activity = row[1]['4. Which is your primary core activity within the performing arts field?']
  value = activity_mapping[core_activity] if not isna(core_activity) else None
  record = {
      'respondent_id': respondent_id,
      'core_activity': value
  }
  prod_pres_records.append(record)
core_activity_df = DataFrame.from_records(prod_pres_records)
core_activity_df.set_index("respondent_id", inplace=True)
print(core_activity_df.head())

              core_activity
respondent_id              
57               production
58               production
59                     None
60                     None
61               production


### Countries
Let's get the country information and the information about the macroregions.

In [None]:
based_in_countries = [column for column in pe_survey_data.columns if column.startswith("2.")]
print(based_in_countries)

['2.1. Albania', '2.2. Armenia', '2.3. Austria', '2.4. Belgium', '2.5. Bosnia and Herzegovina', '2.6. Bulgaria', '2.7. Croatia', '2.8. Republic of Cyprus', '2.9. Czech Republic', '2.10. Denmark', '2.11. Estonia', '2.12. Finland', '2.13. France', '2.14. Georgia', '2.15. Germany', '2.16. Greece', '2.17. Hungary', '2.18. Iceland', '2.19. Ireland', '2.20. Italy', '2.21. Kosovo', '2.22. Latvia', '2.23. Lithuania', '2.24. Luxembourg', '2.25. Malta', '2.26. Moldova', '2.27. Montenegro', '2.28. Netherlands', '2.29. North Macedonia', '2.30. Norway', '2.31. Poland', '2.32. Portugal', '2.33. Republic of Serbia', '2.34. Romania', '2.35. Slovakia', '2.36. Slovenia', '2.37. Spain', '2.38. Sweden', '2.39. Tunisia', '2.40. Ukraine', '2.41. United Kingdom']


For each country, we need a macroregion.

In [None]:
macroregions_mapping = {
     'Albania': 'Balkans', 
     'Armenia': 'Eastern Partnership + Tunisia', 
     'Austria': 'Western Europe', 
     'Belgium': 'Western Europe', 
     'Bosnia and Herzegovina': 'Balkans', 
     'Bulgaria': 'Eastern Europe', 
     'Croatia': 'Eastern Europe', 
     'Republic of Cyprus': 'Southern Europe', 
     'Czech Republic': 'Eastern Europe', 
     'Denmark': 'Northern Europe', 
     'Estonia': 'Northern Europe', 
     'Finland': 'Northern Europe', 
     'France': 'Western Europe', 
     'Georgia': 'Eastern Partnership + Tunisia', 
     'Germany': 'Western Europe', 
     'Greece': 'Southern Europe', 
     'Hungary': 'Eastern Europe', 
     'Iceland': 'Northern Europe', 
     'Ireland': 'Western Europe', 
     'Italy': 'Southern Europe', 
     'Kosovo': 'Balkans',
     'Latvia': 'Northern Europe', 
     'Lithuania': 'Northern Europe', 
     'Luxembourg': 'Western Europe', 
     'Malta': 'Southern Europe', 
     'Moldova': 'Eastern Partnership + Tunisia', 
     'Montenegro': 'Balkans', 
     'Netherlands': 'Western Europe', 
     'North Macedonia': 'Balkans', 
     'Norway': 'Northern Europe', 
     'Poland': 'Eastern Europe', 
     'Portugal': 'Southern Europe', 
     'Republic of Serbia': 'Balkans', 
     'Romania': 'Eastern Europe', 
     'Slovakia': 'Eastern Europe', 
     'Slovenia': 'Eastern Europe', 
     'Spain': 'Southern Europe', 
     'Sweden': 'Northern Europe', 
     'Tunisia': 'Eastern Partnership + Tunisia', 
     'Ukraine': 'Eastern Partnership + Tunisia', 
     'United Kingdom': 'Western Europe',
     'Countries outside Creative Europe': 'Countries outside Creative Europe'
     }

And we also take the market size in function of the population (per million).

In [None]:
market_size_mapping = {
     'Albania': 2.8, 
     'Armenia': 2.9, 
     'Austria': 8.9, 
     'Belgium': 11, 
     'Bosnia and Herzegovina': 3, 
     'Bulgaria': 7, 
     'Croatia': 4, 
     'Republic of Cyprus': 1, 
     'Czech Republic': 11, 
     'Denmark': 6, 
     'Estonia': 1.3, 
     'Finland': 5.5, 
     'France': 67, 
     'Georgia': 3.7, 
     'Germany': 83, 
     'Greece': 10.7, 
     'Hungary': 9.7, 
     'Iceland': 1, 
     'Ireland': 4.9, 
     'Italy': 60, 
     'Kosovo': 1.9,
     'Latvia': 1.9, 
     'Lithuania': 2.8, 
     'Luxembourg': 1, 
     'Malta': 1, 
     'Moldova': 2.6, 
     'Montenegro': 1, 
     'Netherlands': 17.2, 
     'North Macedonia': 2, 
     'Norway': 5.3, 
     'Poland': 38, 
     'Portugal': 10, 
     'Republic of Serbia': 6.9, 
     'Romania': 19.4, 
     'Slovakia': 5.4, 
     'Slovenia': 2, 
     'Spain': 47, 
     'Sweden': 10.2, 
     'Tunisia': 11.7, 
     'Ukraine': 44.4, 
     'United Kingdom': 66.6
     }

In [None]:
based_in_records = []
for row in pe_survey_data.iterrows():
  respondent_id = row[1]["Respondent"]
  based_in = []
  based_in_macroregion = set()
  market_size = set()
  for country in based_in_countries:
    if not isna(row[1][country]):
      country_clean = sub(r"\d\.\d+?\.\s", "", country)
      based_in.append(country_clean)
      based_in_macroregion.add(macroregions_mapping[country_clean])
      market_size.add("large internal market" if market_size_mapping[country_clean] > 30 else "small internal market")
  record = {
      'respondent_id': respondent_id,
      'based_in_country': sorted(based_in),
      'based_in_country_str': ", ".join(sorted(based_in)),
      'based_in_macroregion': sorted(based_in_macroregion),
      'based_in_macroregion_str': ", ".join(sorted(based_in_macroregion)),
      'market_size': sorted(market_size),
      'market_size_str': ", ".join(sorted(market_size))
  }
  based_in_records.append(record)
based_in_df = DataFrame.from_records(based_in_records)
based_in_df.set_index("respondent_id", inplace=True)
print(based_in_df.head())

              based_in_country  ...        market_size_str
respondent_id                   ...                       
57                   [Finland]  ...  small internal market
58                  [Slovenia]  ...  small internal market
59                 [Lithuania]  ...  small internal market
60                    [Greece]  ...  small internal market
61                    [France]  ...  large internal market

[5 rows x 6 columns]


In [None]:
shown_in_countries = [column for column in pe_survey_data.columns if column.startswith("16.")]
shown_in_records = []
for row in pe_survey_data.iterrows():
  respondent_id = row[1]["Respondent"]
  shown_in = []
  shown_in_macroregion = set()
  for country in shown_in_countries:
    if not isna(row[1][country]):
      country_clean = sub(r"\d\d\.\d+?\.\s", "", country)
      shown_in.append(country_clean)
      if not country_clean == "Countries outside Creative Europe":
        shown_in_macroregion.add(macroregions_mapping[country_clean])
      else:
        shown_in_macroregion.add("Countries outside Creative Europe")
  record = {
      'respondent_id': respondent_id,
      'shown_in_country': sorted(shown_in),
      'shown_in_country_str': ", ".join(sorted(shown_in)),
      'shown_in_macroregion': sorted(shown_in_macroregion),
      'shown_in_macroregion_str': ", ".join(sorted(shown_in_macroregion))
  }
  shown_in_records.append(record)
shown_in_df = DataFrame.from_records(shown_in_records)
shown_in_df.set_index("respondent_id", inplace=True)
print(shown_in_df.head())

                                                shown_in_country  ...                           shown_in_macroregion_str
respondent_id                                                     ...                                                   
57             [Austria, Countries outside Creative Europe, F...  ...  Countries outside Creative Europe, Eastern Eur...
58             [Austria, Bosnia and Herzegovina, Croatia, Hun...  ...  Balkans, Eastern Europe, Southern Europe, West...
59                                                            []  ...                                                   
60                                                            []  ...                                                   
61                                                            []  ...                                                   

[5 rows x 4 columns]


### Discipline
Via question 3, we can detect which are the disciplines of the respondent.

In [None]:
disciplines = [column for column in pe_survey_data.columns if column.startswith("3.")]
print(disciplines)

['3.1. Theatre or music theatre', '3.2. Dance', '3.3. Circus and/or street arts', "3.4. {{tooltip 'Performance' 'Performance must be only considered as a subcategory within the performing arts field (not visual arts or literature)'}}", '3.5. Multidisciplinary within the performing arts (music excluded)']


Let's bring this together in a variable.

In [None]:
discipline_records = []
for row in pe_survey_data.iterrows():
  respondent_id = row[1]["Respondent"]
  values = []
  for discipline in disciplines:
    if not isna(row[1][discipline]):
      discipline_clean = sub(r"\d\.\d+?\.\s", "", discipline)
      if discipline_clean == "{{tooltip 'Performance' 'Performance must be only considered as a subcategory within the performing arts field (not visual arts or literature)'}}":
        discipline_clean = "Performance"
      values.append(discipline_clean)
  record = {
      'respondent_id': respondent_id,
      'disciplines': sorted(values),
      'disciplines_str': ", ".join(sorted(values))
  }
  discipline_records.append(record)
discipline_df = DataFrame.from_records(discipline_records)
discipline_df.set_index("respondent_id", inplace=True)
print(discipline_df.head())

                                                     disciplines                                    disciplines_str
respondent_id                                                                                                      
57                                                       [Dance]                                              Dance
58                                                       [Dance]                                              Dance
59             [Multidisciplinary within the performing arts ...  Multidisciplinary within the performing arts (...
60             [Multidisciplinary within the performing arts ...  Multidisciplinary within the performing arts (...
61             [Multidisciplinary within the performing arts ...  Multidisciplinary within the performing arts (...


### Facebook macroregion

In [None]:
!pip install pycountry
import pycountry
fb_countries = facebook_data['location_address_country'].unique()
print(fb_countries)

['DK' nan 'IE' 'SE' 'FR' 'NL' 'IT' 'GR' 'CY' 'FI' 'AT' 'LV' 'DE' 'BE' 'US'
 'ES' 'HR' 'CH' 'RS' 'NO' 'PT' 'HU' 'PL' 'CA' 'GB' 'AR' 'MX' 'IL' 'TN'
 'CZ' 'SK' 'SI' 'LU' 'UA' 'RO' 'MU' 'AU' 'GE' 'IS' 'LT' 'EG' 'MK' 'BG'
 'FO' 'MT' 'LB' 'EE' 'XK' 'TW' 'BO' 'TR' 'CL' 'RU' 'JP' 'SG' 'TH' 'BR'
 'ID']


In [None]:
fb_macroregions_mapping = {
     'Albania': 'Balkans', 
     'Armenia': 'Eastern Partnership + Tunisia', 
     'Austria': 'Western Europe', 
     'Belgium': 'Western Europe', 
     'Bosnia and Herzegovina': 'Balkans', 
     'Bulgaria': 'Eastern Europe', 
     'Croatia': 'Eastern Europe', 
     'Cyprus': 'Southern Europe', 
     'Czech Republic': 'Eastern Europe', 
     'Czechia': 'Eastern Europe', 
     'Denmark': 'Northern Europe', 
     'Estonia': 'Northern Europe', 
     'Finland': 'Northern Europe', 
     'France': 'Western Europe', 
     'Georgia': 'Eastern Partnership + Tunisia', 
     'Germany': 'Western Europe', 
     'Greece': 'Southern Europe', 
     'Hungary': 'Eastern Europe', 
     'Iceland': 'Northern Europe', 
     'Ireland': 'Western Europe', 
     'Italy': 'Southern Europe', 
     'Kosovo': 'Balkans',
     'Latvia': 'Northern Europe', 
     'Lithuania': 'Northern Europe', 
     'Luxembourg': 'Western Europe', 
     'Malta': 'Southern Europe', 
     'Moldova': 'Eastern Partnership + Tunisia', 
     'Montenegro': 'Balkans', 
     'Netherlands': 'Western Europe', 
     'North Macedonia': 'Balkans', 
     'Norway': 'Northern Europe', 
     'Poland': 'Eastern Europe', 
     'Portugal': 'Southern Europe', 
     'Serbia': 'Balkans', 
     'Romania': 'Eastern Europe', 
     'Slovakia': 'Eastern Europe', 
     'Slovenia': 'Eastern Europe', 
     'Spain': 'Southern Europe', 
     'Sweden': 'Northern Europe', 
     'Tunisia': 'Eastern Partnership + Tunisia', 
     'Ukraine': 'Eastern Partnership + Tunisia', 
     'United Kingdom': 'Western Europe',
     'United States': 'Countries outside Creative Europe',
     'Switzerland': 'Countries outside Creative Europe',
     'Canada': 'Countries outside Creative Europe',
     'Argentina': 'Countries outside Creative Europe',
     'Mexico': 'Countries outside Creative Europe',
     'Israel': 'Countries outside Creative Europe',
     'Lebanon': 'Countries outside Creative Europe',
     'Taiwan, Province of China': 'Countries outside Creative Europe',
     'Bolivia, Plurinational State of': 'Countries outside Creative Europe',
     'Turkey': 'Countries outside Creative Europe',
     'Chile': 'Countries outside Creative Europe',
     'Australia': 'Countries outside Creative Europe',
     'Russian Federation': 'Countries outside Creative Europe',
     'Japan': 'Countries outside Creative Europe',
     'Singapore': 'Countries outside Creative Europe',
     'Thailand': 'Countries outside Creative Europe',
     'Brazil': 'Countries outside Creative Europe',
     'Indonesia': 'Countries outside Creative Europe',
     'Mauritius': 'Countries outside Creative Europe',
     'Egypt': 'Countries outside Creative Europe',
     'Faroe Islands': 'Countries outside Creative Europe',
     'Kosovo': 'Balkans',
     }

In [None]:
shown_in_records_fb = []
for row in facebook_data.iterrows():
  platform_id = row[1]["platform_id"]
  country = row[1]['location_address_country']
  if not isna(country):
    if country == "XK":
      country_clean = "Kosovo"
    else:
      country_clean = pycountry.countries.get(alpha_2=country)
      country_clean = country_clean.name
    record = {
        'platform_id': platform_id,
        'fb_shown_in_country': country_clean,
        'fb_shown_in_macroregion': fb_macroregions_mapping[country_clean]
    }
  shown_in_records_fb.append(record)
shown_in_fb_df = DataFrame.from_records(shown_in_records_fb)
shown_in_fb_df.set_index("platform_id", inplace=True)
print(shown_in_fb_df.head())

                 fb_shown_in_country fb_shown_in_macroregion
platform_id                                                 
312118346518788              Denmark         Northern Europe
1602125709937098             Denmark         Northern Europe
650277199170858              Denmark         Northern Europe
650277199170858              Denmark         Northern Europe
350994285633781              Denmark         Northern Europe


### Combining the data
Now that we have all variables prepared, we can make an integrated dataset. Let's first add the macroregions to facebook.

In [None]:
facebook_data.set_index("platform_id", inplace=True)
facebook_data_with_macroregions = facebook_data.join(shown_in_fb_df).reset_index()
facebook_data_with_macroregions.head()

Unnamed: 0.1,platform_id,Unnamed: 0,platform,name,date,until_date,location_name,location_address_street,location_address_zip,location_address_city,location_address_country,description,respondent_id,date_parsed,fb_shown_in_country,fb_shown_in_macroregion
0,100148020514218,18926,facebook,Autoctonos / Présentation publique Ayelen Parolin,2017-03-09,2017-03-09,,,,,,"Nous sommes tous des autochtones, et tous des ...",916,2017-03-09,,
1,100186340734716,19473,facebook,Alleen - Sara De Roo & Fikry El Azzouzi / tg STAN,2017-10-17,2017-10-18,,,,,,Selectie @[396205347229383:274:Het TheaterFest...,948,2017-10-17,,
2,100280486986476,5664,facebook,Mapped Productions - NOVA INSULA,2015-07-01,2015-07-04,"Mole Vanvitelliana, Ancona","Banchina Giovanni da Chio, 60121 Ancona",,"Ancona, Marche, Italië",IT,"Oggi alle ore 19, si inaugura Nova Insula. Si ...",1659,2015-07-01,Italy,Southern Europe
3,100548043729175,13043,facebook,Remek hang a futkosásban az Átrium Film-Színhá...,2016-09-07,2016-09-07,Átrium,Margit körút 55.,H-1024,"Boedapest, Hongarije",HU,Szeptember 7-én - szerdán - 19.00-kor az idei ...,555,2016-09-07,Hungary,Eastern Europe
4,100590720764575,17151,facebook,Scena madre* - Maison de la musique de Nanterre,2018-02-16,2018-02-16,Maison de la musique de Nanterre,8 rue des Anciennes-Mairies,92000,"Nanterre, Frankrijk",FR,"SCENA MADRE*\n——————\n\nIntrigue, suspense et ...",850,2018-02-16,France,Western Europe


And then we add the variables to the survey data.

In [None]:
full_data = area_df.join([core_activity_df, based_in_df, shown_in_df, discipline_df])
full_data.head()

Unnamed: 0_level_0,area,area_str,core_activity,based_in_country,based_in_country_str,based_in_macroregion,based_in_macroregion_str,market_size,market_size_str,shown_in_country,shown_in_country_str,shown_in_macroregion,shown_in_macroregion_str,disciplines,disciplines_str
respondent_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
57,[urban],urban,production,[Finland],Finland,[Northern Europe],Northern Europe,[small internal market],small internal market,"[Austria, Countries outside Creative Europe, F...","Austria, Countries outside Creative Europe, Fi...","[Countries outside Creative Europe, Eastern Eu...","Countries outside Creative Europe, Eastern Eur...",[Dance],Dance
58,[urban],urban,production,[Slovenia],Slovenia,[Eastern Europe],Eastern Europe,[small internal market],small internal market,"[Austria, Bosnia and Herzegovina, Croatia, Hun...","Austria, Bosnia and Herzegovina, Croatia, Hung...","[Balkans, Eastern Europe, Southern Europe, Wes...","Balkans, Eastern Europe, Southern Europe, West...",[Dance],Dance
59,[],,,[Lithuania],Lithuania,[Northern Europe],Northern Europe,[small internal market],small internal market,[],,[],,[Multidisciplinary within the performing arts ...,Multidisciplinary within the performing arts (...
60,[],,,[Greece],Greece,[Southern Europe],Southern Europe,[small internal market],small internal market,[],,[],,[Multidisciplinary within the performing arts ...,Multidisciplinary within the performing arts (...
61,[urban],urban,production,[France],France,[Western Europe],Western Europe,[large internal market],large internal market,[],,[],,[Multidisciplinary within the performing arts ...,Multidisciplinary within the performing arts (...


## Analysis questions
We have a number of questions for this data.


### Country distribution

In [None]:
full_data.explode("based_in_country").reset_index().groupby("based_in_country").agg({"respondent_id": "nunique"})["respondent_id"].sort_values(ascending=False)

based_in_country
Spain                     190
Germany                   189
France                    175
Italy                     167
Portugal                  137
United Kingdom            133
Greece                    102
Belgium                    93
Sweden                     76
Iceland                    70
Netherlands                66
Ireland                    62
Denmark                    59
Norway                     54
Poland                     49
Finland                    47
Bulgaria                   37
Czech Republic             33
Lithuania                  31
Slovenia                   31
Republic of Cyprus         30
Slovakia                   30
Luxembourg                 28
Croatia                    27
Republic of Serbia         26
Romania                    24
Hungary                    24
Georgia                    23
Latvia                     23
Austria                    19
Tunisia                    13
Kosovo                     11
Ukraine                

### How many of the respondents use Facebook Events to announce activities?

In [None]:
fb_event_users = len(facebook_data["respondent_id"].value_counts())
fb_event_users

270

In [None]:
fb_event_users / len(full_data.index)

0.13392857142857142

In [None]:
len(facebook_data_with_macroregions[-isna(facebook_data_with_macroregions["fb_shown_in_country"])]["respondent_id"].value_counts())

253

In [None]:
len(facebook_data_with_macroregions[-isna(facebook_data_with_macroregions["fb_shown_in_country"])].value_counts())

13056

### How many respondents of facebook events per discipline?

And how many respondents were there per discipline?

In [None]:
full_data.explode("disciplines").reset_index().groupby("disciplines").agg({"respondent_id": "nunique"})

Unnamed: 0_level_0,respondent_id
disciplines,Unnamed: 1_level_1
Circus and/or street arts,394
Dance,765
Multidisciplinary within the performing arts (music excluded),693
Performance,756
Theatre or music theatre,843


And in the facebook data?

In [None]:
fb_and_survey = full_data.reset_index().merge(facebook_data_with_macroregions, how="right", left_on="respondent_id", right_on="respondent_id")
fb_and_survey.explode("disciplines")[-isna(fb_and_survey["platform_id"])].groupby("disciplines").agg({"respondent_id": "nunique"})

  


Unnamed: 0_level_0,respondent_id
disciplines,Unnamed: 1_level_1
Circus and/or street arts,85
Dance,127
Multidisciplinary within the performing arts (music excluded),122
Performance,120
Theatre or music theatre,131


### How many respondents of facebook events per producer/presenter type?

In [None]:
full_data.reset_index().groupby("core_activity").agg({"respondent_id": "nunique"})

Unnamed: 0_level_0,respondent_id
core_activity,Unnamed: 1_level_1
presentation,287
production,1447


In [None]:
fb_and_survey.groupby("core_activity").agg({"respondent_id": "nunique"})

Unnamed: 0_level_0,respondent_id
core_activity,Unnamed: 1_level_1
presentation,83
production,187


### How many respondents of facebook events per country and per macroregion?

In [None]:
full_data.explode("based_in_country").reset_index().groupby("based_in_country").agg({"respondent_id": "nunique"}).sort_values(by="respondent_id", ascending=False)

Unnamed: 0_level_0,respondent_id
based_in_country,Unnamed: 1_level_1
Spain,190
Germany,189
France,175
Italy,167
Portugal,137
United Kingdom,133
Greece,102
Belgium,93
Sweden,76
Iceland,70


In [None]:
fb_and_survey.explode("based_in_country").groupby("based_in_country").agg({"respondent_id": "nunique"}).sort_values(by="respondent_id", ascending=False)

Unnamed: 0_level_0,respondent_id
based_in_country,Unnamed: 1_level_1
Spain,33
Italy,31
France,27
United Kingdom,22
Portugal,20
Germany,19
Denmark,14
Belgium,14
Greece,14
Netherlands,12


In [None]:
fb_and_survey.explode("based_in_macroregion").groupby("based_in_macroregion").agg({"respondent_id": "nunique"}).sort_values(by="respondent_id", ascending=False)

Unnamed: 0_level_0,respondent_id
based_in_macroregion,Unnamed: 1_level_1
Western Europe,102
Southern Europe,100
Northern Europe,46
Eastern Europe,28
Balkans,7
Eastern Partnership + Tunisia,7


## Q16 data
Based on the answers to q16 in the survey, we get an idea of how many respondents are present in which countries and macroregions. For this, we focus on respondents with core activitye "producer".

In [None]:
fd = full_data.reset_index().explode("based_in_country")
fd["based_in_macroregion"] = fd["based_in_country"].map(macroregions_mapping)
fd = fd.explode("shown_in_country")
fd = fd[-isnull(fd["shown_in_country"])]
fd["shown_in_macroregion"] = fd["shown_in_country"].map(macroregions_mapping)
fd["cross_border_other_macroregion"] = (fd["based_in_country_str"] != fd["shown_in_country"]) & (fd["based_in_macroregion_str"] != fd["shown_in_macroregion"])
fd["cross_border_same_macroregion"] = (fd["based_in_country_str"] != fd["shown_in_country"]) & (fd["based_in_macroregion_str"] == fd["shown_in_macroregion"])
fd["within_border"] = (fd["based_in_country_str"] == fd["shown_in_country"])
fd = fd[fd["core_activity"] == "production"]
q16 = fd[["respondent_id", "based_in_country", "based_in_macroregion", "shown_in_country", "shown_in_macroregion", "within_border", "cross_border_same_macroregion", "cross_border_other_macroregion", "core_activity"]].drop_duplicates()

### Sankey data

A first analysis is to simply make a "flow" of producers that are based in a certain macroregion and indicate that they are also present in another macroregion

In [None]:
q16.groupby(["based_in_macroregion", "shown_in_macroregion"]).agg({"respondent_id": "nunique"}).to_csv()

'based_in_macroregion,shown_in_macroregion,respondent_id\nBalkans,Balkans,37\nBalkans,Countries outside Creative Europe,14\nBalkans,Eastern Europe,25\nBalkans,Eastern Partnership + Tunisia,3\nBalkans,Northern Europe,8\nBalkans,Southern Europe,24\nBalkans,Western Europe,26\nEastern Europe,Balkans,51\nEastern Europe,Countries outside Creative Europe,64\nEastern Europe,Eastern Europe,133\nEastern Europe,Eastern Partnership + Tunisia,25\nEastern Europe,Northern Europe,59\nEastern Europe,Southern Europe,79\nEastern Europe,Western Europe,122\nEastern Partnership + Tunisia,Balkans,1\nEastern Partnership + Tunisia,Countries outside Creative Europe,11\nEastern Partnership + Tunisia,Eastern Europe,14\nEastern Partnership + Tunisia,Eastern Partnership + Tunisia,23\nEastern Partnership + Tunisia,Northern Europe,8\nEastern Partnership + Tunisia,Southern Europe,7\nEastern Partnership + Tunisia,Western Europe,19\nNorthern Europe,Balkans,22\nNorthern Europe,Countries outside Creative Europe,125\nNorth

### Perspective Audience

From the perspective of an audience, we can ask how many of the producing respondents that this audience can be exposed in the macroregion of the audience.

In [None]:
perspective_audience_cbsm = q16.groupby(["shown_in_macroregion", "cross_border_same_macroregion"]).respondent_id.nunique()
perspective_audience_cbom = q16.groupby(["shown_in_macroregion", "cross_border_other_macroregion"]).agg({"respondent_id": "nunique"})
perspective_audience_wb = q16.groupby(["shown_in_macroregion", "within_border"]).agg({"respondent_id": "nunique"})
amount_of_producers_shown = q16.groupby(["shown_in_macroregion"]).agg({"respondent_id": "nunique"})
amount_of_producers_based = q16.groupby(["based_in_macroregion"]).agg({"respondent_id": "nunique"})
# remove the unnecessary False data
perspective_audience_cbsm = perspective_audience_cbsm[perspective_audience_cbsm.index.get_level_values("cross_border_same_macroregion") == True].droplevel(1)
perspective_audience_cbom = perspective_audience_cbom[perspective_audience_cbom.index.get_level_values("cross_border_other_macroregion") == True].droplevel(1)
perspective_audience_wb = perspective_audience_wb[perspective_audience_wb.index.get_level_values("within_border") == True].droplevel(1)
# join
perspective_audience = concat([perspective_audience_wb, perspective_audience_cbsm, perspective_audience_cbom, amount_of_producers_shown, amount_of_producers_based], axis=1, join="inner")
perspective_audience.columns = ["within_border", "cross_border_same_macroregion", "cross_border_other_macroregion", "amount of producers shown in this region", "amount of producers based in this region"]
perspective_audience

Unnamed: 0,within_border,cross_border_same_macroregion,cross_border_other_macroregion,amount of producers shown in this region,amount of producers based in this region
Balkans,25,25,148,177,38
Eastern Europe,80,104,485,601,155
Eastern Partnership + Tunisia,15,15,110,130,31
Northern Europe,123,156,393,567,224
Southern Europe,173,196,460,698,335
Western Europe,227,303,594,920,430


### Perspective producer

In [None]:
perspective_respondent_cbsm = q16.groupby(["based_in_macroregion", "cross_border_same_macroregion"]).agg({"respondent_id": "nunique"})
perspective_respondent_cbom = q16.groupby(["based_in_macroregion", "cross_border_other_macroregion"]).agg({"respondent_id": "nunique"})
perspective_respondent_wb = q16.groupby(["based_in_macroregion", "within_border"]).agg({"respondent_id": "nunique"})
amount_of_producers = q16.groupby(["based_in_macroregion"]).agg({"respondent_id": "nunique"})
# remove the unnecessary False data
perspective_respondent_cbsm = perspective_respondent_cbsm[perspective_respondent_cbsm.index.get_level_values("cross_border_same_macroregion") == True].droplevel(1)
perspective_respondent_cbom = perspective_respondent_cbom[perspective_respondent_cbom.index.get_level_values("cross_border_other_macroregion") == True].droplevel(1)
perspective_respondent_wb = perspective_respondent_wb[perspective_respondent_wb.index.get_level_values("within_border") == True].droplevel(1)
# join
perspective_respondent = concat([perspective_respondent_wb, perspective_respondent_cbsm, perspective_respondent_cbom, amount_of_producers], axis=1, join="inner")
perspective_respondent.columns = ["within_border", "cross_border_same_macroregion", "cross_border_other_macroregion", "amount of producers based in macroregion"]
perspective_respondent

Unnamed: 0_level_0,within_border,cross_border_same_macroregion,cross_border_other_macroregion,amount of producers based in macroregion
based_in_macroregion,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Balkans,25,25,34,38
Eastern Europe,80,104,143,155
Eastern Partnership + Tunisia,15,15,26,31
Northern Europe,123,156,201,224
Southern Europe,173,196,316,335
Western Europe,227,303,395,430


## Facebook data

we can reproduce the analyses above based on facebook data, rather than on data from the amount of respondents.

In [None]:
fd = fb_and_survey.reset_index().explode("based_in_country")
fd["based_in_macroregion"] = fd["based_in_country"].map(macroregions_mapping)
fd = fd[fd["core_activity"] == "production"] # only take producers
fd = fd[-isna(fd["fb_shown_in_country"])] # ignore eventdata without country information
fd["cross_border_other_macroregion"] = (fd["based_in_country"] != fd["fb_shown_in_country"]) & (fd["based_in_macroregion"] != fd["fb_shown_in_macroregion"])
fd["cross_border_same_macroregion"] = (fd["based_in_country"] != fd["fb_shown_in_country"]) & (fd["based_in_macroregion"] == fd["fb_shown_in_macroregion"])
fd["within_border"] = (fd["based_in_country"] == fd["fb_shown_in_country"])
fd = fd[fd["core_activity"] == "production"]
fb = fd[["respondent_id", "platform_id", "based_in_country", "based_in_macroregion", "fb_shown_in_country", "fb_shown_in_macroregion", "within_border", "cross_border_same_macroregion", "cross_border_other_macroregion", "core_activity"]].drop_duplicates()
fb

Unnamed: 0,respondent_id,platform_id,based_in_country,based_in_macroregion,fb_shown_in_country,fb_shown_in_macroregion,within_border,cross_border_same_macroregion,cross_border_other_macroregion,core_activity
294,948,121767468503339,Belgium,Western Europe,United Kingdom,Western Europe,False,True,False,production
295,948,126190487930717,Belgium,Western Europe,Belgium,Western Europe,True,False,False,production
301,948,134699050804032,Belgium,Western Europe,Belgium,Western Europe,True,False,False,production
303,948,137664490252614,Belgium,Western Europe,Belgium,Western Europe,True,False,False,production
305,948,140128050050025,Belgium,Western Europe,Belgium,Western Europe,True,False,False,production
...,...,...,...,...,...,...,...,...,...,...
28745,2055,2250871828257839,Czech Republic,Eastern Europe,Germany,Western Europe,False,False,True,production
28750,1620,2308805572761077,Croatia,Eastern Europe,Croatia,Eastern Europe,True,False,False,production
28757,1620,2322746724696012,Croatia,Eastern Europe,Croatia,Eastern Europe,True,False,False,production
28758,1111,2834129026616155,Spain,Southern Europe,Spain,Southern Europe,True,False,False,production


### Sankey data
First with all data

In [None]:
fb.groupby(["based_in_macroregion", "fb_shown_in_macroregion"]).agg({"platform_id": "nunique"}).to_csv()

'based_in_macroregion,fb_shown_in_macroregion,platform_id\nBalkans,Balkans,166\nBalkans,Countries outside Creative Europe,3\nBalkans,Eastern Europe,4\nEastern Europe,Balkans,1\nEastern Europe,Eastern Europe,1230\nEastern Europe,Eastern Partnership + Tunisia,1\nEastern Europe,Northern Europe,2\nEastern Europe,Southern Europe,1\nEastern Europe,Western Europe,19\nEastern Partnership + Tunisia,Eastern Europe,1\nEastern Partnership + Tunisia,Eastern Partnership + Tunisia,372\nNorthern Europe,Balkans,1\nNorthern Europe,Countries outside Creative Europe,3\nNorthern Europe,Eastern Europe,4\nNorthern Europe,Northern Europe,912\nNorthern Europe,Southern Europe,3\nNorthern Europe,Western Europe,21\nSouthern Europe,Balkans,1\nSouthern Europe,Countries outside Creative Europe,28\nSouthern Europe,Eastern Europe,7\nSouthern Europe,Northern Europe,15\nSouthern Europe,Southern Europe,1215\nSouthern Europe,Western Europe,93\nWestern Europe,Balkans,1\nWestern Europe,Countries outside Creative Europe,68\n

What if we only take cross-border data?

In [None]:
fb[fb["within_border"] == False].groupby(["based_in_macroregion", "fb_shown_in_macroregion"]).agg({"platform_id": "nunique"}).to_csv()

'based_in_macroregion,fb_shown_in_macroregion,platform_id\nBalkans,Balkans,60\nBalkans,Countries outside Creative Europe,3\nBalkans,Eastern Europe,4\nEastern Europe,Balkans,1\nEastern Europe,Eastern Europe,158\nEastern Europe,Eastern Partnership + Tunisia,1\nEastern Europe,Northern Europe,2\nEastern Europe,Southern Europe,1\nEastern Europe,Western Europe,19\nEastern Partnership + Tunisia,Eastern Europe,1\nNorthern Europe,Balkans,1\nNorthern Europe,Countries outside Creative Europe,3\nNorthern Europe,Eastern Europe,4\nNorthern Europe,Northern Europe,58\nNorthern Europe,Southern Europe,3\nNorthern Europe,Western Europe,21\nSouthern Europe,Balkans,1\nSouthern Europe,Countries outside Creative Europe,28\nSouthern Europe,Eastern Europe,7\nSouthern Europe,Northern Europe,15\nSouthern Europe,Southern Europe,80\nSouthern Europe,Western Europe,93\nWestern Europe,Balkans,1\nWestern Europe,Countries outside Creative Europe,68\nWestern Europe,Eastern Europe,12\nWestern Europe,Eastern Partnership +

### Perspective audience

In [None]:
perspective_audience_cbsm = fb.groupby(["fb_shown_in_macroregion", "cross_border_same_macroregion"]).agg({"platform_id": "nunique"})
perspective_audience_cbom = fb.groupby(["fb_shown_in_macroregion", "cross_border_other_macroregion"]).agg({"platform_id": "nunique"})
perspective_audience_wb = fb.groupby(["fb_shown_in_macroregion", "within_border"]).agg({"platform_id": "nunique"})
amount_of_events_shown = fb.groupby(["fb_shown_in_macroregion"]).agg({"platform_id": "nunique"})
# remove the unnecessary False data
perspective_audience_cbsm = perspective_audience_cbsm[perspective_audience_cbsm.index.get_level_values("cross_border_same_macroregion") == True].droplevel(1)
perspective_audience_cbom = perspective_audience_cbom[perspective_audience_cbom.index.get_level_values("cross_border_other_macroregion") == True].droplevel(1)
perspective_audience_wb = perspective_audience_wb[perspective_audience_wb.index.get_level_values("within_border") == True].droplevel(1)
# join
perspective_audience = concat([perspective_audience_wb, perspective_audience_cbsm, perspective_audience_cbom, amount_of_events_shown], axis=1, join="inner")
perspective_audience.columns = ["within_border", "cross_border_same_macroregion", "cross_border_other_macroregion", "amount of events shown in this region"]
perspective_audience

Unnamed: 0_level_0,within_border,cross_border_same_macroregion,cross_border_other_macroregion,amount of events shown in this region
fb_shown_in_macroregion,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Balkans,106,60,2,168
Eastern Europe,1073,158,26,1250
Northern Europe,896,58,46,933
Southern Europe,1135,80,28,1231
Western Europe,1789,88,132,1941


### Perspective producer

In [None]:
perspective_respondent_cbsm = fb.groupby(["based_in_macroregion", "cross_border_same_macroregion"]).agg({"platform_id": "nunique"})
perspective_respondent_cbom = fb.groupby(["based_in_macroregion", "cross_border_other_macroregion"]).agg({"platform_id": "nunique"})
perspective_respondent_wb = fb.groupby(["based_in_macroregion", "within_border"]).agg({"platform_id": "nunique"})
amount_of_events = fb.groupby(["based_in_macroregion"]).agg({"platform_id": "nunique"})
# remove the unnecessary False data
perspective_respondent_cbsm = perspective_respondent_cbsm[perspective_respondent_cbsm.index.get_level_values("cross_border_same_macroregion") == True].droplevel(1)
perspective_respondent_cbom = perspective_respondent_cbom[perspective_respondent_cbom.index.get_level_values("cross_border_other_macroregion") == True].droplevel(1)
perspective_respondent_wb = perspective_respondent_wb[perspective_respondent_wb.index.get_level_values("within_border") == True].droplevel(1)
# join
perspective_respondent = concat([perspective_respondent_wb, perspective_respondent_cbsm, perspective_respondent_cbom, amount_of_events], axis=1, join="inner")
perspective_respondent.columns = ["within_border", "cross_border_same_macroregion", "cross_border_other_macroregion", "events_by_producers"]
perspective_respondent

Unnamed: 0_level_0,within_border,cross_border_same_macroregion,cross_border_other_macroregion,events_by_producers
based_in_macroregion,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Balkans,106,60,7,173
Eastern Europe,1073,158,24,1254
Northern Europe,896,58,32,944
Southern Europe,1135,80,144,1359
Western Europe,1789,88,150,2015
