## Welcome to this project, which demonstrates how data visualization can be used for exploratory analysis.
In this project, I use a custom-built dataset created by combining open data, web scraping, and APIs.
The aim is to explore the collections of contemporary art museums to identify patterns and insights.
I analyze the collection data from 11 museums to examine what this data can reveal about art institutions, without focusing specifically on the artworks themselves.

# 0. Load data, libraries and custom functions

In [1]:
import pandas as pd
from scripts.load_data import load_data
from scripts.plot_medium_over_years import *
from scripts.plot_collection_size import plot_collection_size
from scripts.plot_medium_pie import plot_medium_pie
from scripts.plot_acquisition_method import plot_acquisition_method
from scripts.plot_medium_bar import plot_medium_bar
from scripts.plot_acquisition_over_time import plot_acquisition_over_time
from scripts.plot_origin_countries import plot_origin_countries
from scripts.plot_creation_distribution import plot_creation_distribution
#folder_path = '/Users/antoninalightfoot/Documents/GitHub/data_analysis_portfolio/data' 
folder_path = '/Users/CUDAN/Downloads/data_analysis_portfolio/data' #uni macbook
museums_data, museum_names = load_data(folder_path, filter_data=False)

# 1. How many artworks every museum has in their modern and contemporary art collection (1860)?

In [2]:
plot_collection_size(museums_data, museum_names, log_scale=True)

From the graph we can see that even museums with a large contemporarary art collection cannot be call "contemporary"; this division does not correlate with the size of the institution. 
There is a distinct difference between MET, and National Gallery(US) - their contemporary collection is almost as big as their classic collection - 
and Kiasma and Whitney - their collection is only contemporary and modern. Reina Sofia and Pompidou are close to this end of the scale. MOMA having a large collection has only around 1 percent of classic art compared to modern/contemporary. 
For the sake of understanding modern/contemporary art in collections, we will use the filter in the further graphs. 

# 2. What is the dominant artwork medium in every museum?

In [20]:
plot_medium_pie(museums_data, museum_names)

Here we see that museums differ by their media. Graphics is dominant across many museums (National Gallery, Queensland, Ateneum, MOMA, MET, Tate, SMK, Whitney)
Reina Sofia stands out because it haas the most balanced collection.
Kiasma and Pompidou stand out because of the dominant painting and photography, respectively. The most striking is the fact that the persentage of new media art, installations, and video art (that was on the rise after 1960s) is not significant: the collections are mostly consisted of traditional art media. Maybe, the acquisition policy at the museums changed across years? is there an era of video art for museums? 

# 3. How did artworks medium acqusition change over time?

In [2]:
from scripts.plot_medium_over_years2 import plot_medium_over_years2, PlotConfig  
plot_medium_over_years2(museums_data, museum_names)


urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020



what are the ways to acquire artworks for every museum?

In [6]:
plot_acquisition_method(museums_data, museum_names)

how did museums acquire artworks over time? 

In [7]:
museums_data[0].columns

Index(['Artist', 'Title', 'Medium', 'Medium_classified',
       'Acquistion_classified', 'Year_acquisition', 'Gender_classified',
       'Artist_birth_year', 'Artist_death_year', 'Country_calculated',
       'Date_creation_year'],
      dtype='object')

In [8]:
plot_acquisition_over_time(museums_data, museum_names)

what are the origin artists countries in the collection? What countries do museums collect?

In [9]:
fig, world_merged = plot_origin_countries(museums_data, museum_names)

Countries with no mentions in any dataset:
{'Vanuatu', 'Russia', 'Niger', 'Oman', 'Malawi', 'Eritrea', 'Ukraine', 'Somalia', 'Czechia', 'N. Cyprus', 'Cyprus', 'Botswana', 'Belarus', 'Turkmenistan', 'Eswatini', 'S. Sudan', 'Bahamas', 'W. Sahara', 'Bosnia and Herz.', 'Solomon Is.', 'Central African Rep.', 'Tajikistan', 'Djibouti', 'Qatar', 'Moldova', 'Jamaica', 'Antarctica', 'New Caledonia', 'Eq. Guinea', 'North Korea', 'Somaliland', 'South Korea', 'Brunei', 'Libya', 'Congo', 'United States of America', 'Burundi', 'Togo', 'Timor-Leste', 'Guinea-Bissau', 'Dominican Rep.', 'Honduras', 'Fr. S. Antarctic Lands', "Côte d'Ivoire", 'Gambia', 'Dem. Rep. Congo', 'Falkland Is.'}


In [10]:
test = world_merged[['Country', 'Mentions']]

In [11]:
fig

what are the artwork creation year distribution for every collection? what time periods do the museums prioritize? 

In [12]:
plot_creation_distribution(museums_data, museum_names)

gender