# FOCUSED Project: OSPO adoption
As part of the [FOCUSED Collaboration project](https://github.com/JumpsuitWizard/FOCUSED-Collaboration), this notebook examines OSPO (Open Source Programs Offices) adoption across the [Standard and Poor's 500 index](https://en.wikipedia.org/wiki/S%26P_500).

## Authors

- **PI**: Duane O'Brien
- **Researcher**: julia ferraioli
- **Analyst**: Reshama Shaikh

## Research question

## Methodology

## Data sources

The following data sources are used in the analysis:

- [S&P 500](https://github.com/datasets/s-and-p-500-companies/blob/master/data/constituents.csv) retrieved on 2021-10-05
- [OSCI Index](https://opensourceindex.io/) retrieved on 2022-02-28
- [OSPO Landscape](https://landscape.todogroup.org/) retrieved on 2022-05-09

## Visualization setup

_Make sure you have run through the [Ingestion notebook](Ingestion.ipynb) first!_

In [35]:
import pandas as pd
import plotly.express as px
import numpy as np

# Load the raw data from the previous step
data = pd.read_csv('data_derived/merged_data.csv')

# Create tables for country and sector counts
country_counts = data.groupby(by=['country']).country.agg('count').to_frame('total').reset_index()
sector_counts = data.groupby(by=['sector']).sector.agg('count').to_frame('total').reset_index()

# Create an aggregate table
aggregates = pd.DataFrame({'categories': 
                           [
                               'in S&P 500','in OSPO landscape','in OSCI', 'in S&P and OSPO landscape',
                               'in S&P and OSCI', 'in OSPO landscape and  OSCI', 'in all three'
                           ],
                          'count':
                           [
                               len(data[data['in S&P 500']]),
                               len(data[data['in OSPO landscape']]),
                               len(data[data['in OSCI']]),
                               len(data.query('`in S&P 500` & `in OSPO landscape`')),
                               len(data.query('`in S&P 500` & `in OSCI`')),
                               len(data.query('`in OSPO landscape` & `in OSCI`')),
                               len(data.query('`in S&P 500` & `in OSPO landscape` & `in OSCI`'))
                           ]
                          })

Unnamed: 0,categories,count
0,in S&P 500,504
1,in OSPO landscape,102
2,in OSCI,299
3,in S&P and OSPO landscape,23
4,in S&P and OSCI,23
5,in OSPO landscape and OSCI,36
6,in all three,12


## Look at the crossover between S&P, OSCI, and OSPO landscape

In [36]:
# Chart number of companies in each category and combination of categories
px.bar(aggregates, x = 'categories', y = 'count',
       title = "Number of companies in each category",
       labels = {'categories': 'dataset presence', 'count': '# of companies'}
      ).show()

# Chart number of companies by country
px.bar(country_counts, x = 'country', y = 'total',
      title = "Breakdown of companies by country",
      labels = {'total': '# of companies'}
      ).show()

# Chart number of companies by sector
px.bar(sector_counts, x = 'sector', y = 'total',
      title = "Breakdown of companies by sector",
      labels = {'total': '# of companies'}
      ).show()

## Examine data across various vectors

In [33]:
# Country x OSPO landscape
country_x_ospo = (data.groupby(by = ['country'], as_index = False)
 .agg({'in OSPO landscape': 'sum'}))
country_x_ospo['not in OSPO landscape'] = country_counts['total'] - country_x_ospo['in OSPO landscape']

px.bar(country_x_ospo, x = 'country',
       y = ['in OSPO landscape',
          'not in OSPO landscape'
         ],
       title = "country broken down by presence in OSPO landscape"
      ).show()

# Sector x OSPO landscape
sector_x_ospo = (data.groupby(by = ['sector'], as_index = False)
 .agg({'in OSPO landscape': 'sum'}))
sector_x_ospo['not in OSPO landscape'] = sector_counts['total'] - sector_x_ospo['in OSPO landscape']

px.bar(sector_x_ospo, x = 'sector',
       y = ['in OSPO landscape',
            'not in OSPO landscape'
         ], title = "sector broken down by presence in OSPO landscape"
      ).show()

# OSCI x OSPO landscape
tf_matrix = data.groupby(['in OSCI', 'in OSPO landscape']).size().unstack(fill_value = 0)

px.imshow(np.flip(tf_matrix.to_numpy(), 0), 
          labels = dict(x = "in OSPO landscape", y = "in OSCI", color = "# of companies"),
          x = ['False', 'True'],
          y = ['True', 'False'],
          title = "OSCI cross-referenced with OSPO landscape"
         ).show()