# Licence and access rights notebook

## Imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# %load imports.py
# Basic imports
import os
import sys

import matplotlib

sys.path.append('./')

# SQL database
import pymysql

# Classical external libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline

import seaborn as sns
sns.set(style="darkgrid")

# Import py files for generic functions
from sql import *
from helpers import *


## Connection to DB

In [3]:
# Connexion database
engine = db_engine()

## Load Useful Data

In [4]:
newspapers_df = read_table('impresso.newspapers', engine)
issues_df = read_table('impresso.issues', engine)
# Create new decade column
issues_df['decade'] = issues_df.apply(lambda row: row.year-row.year%10, axis = 1) 

In [27]:
newspapers_metadata_df = read_table('newspapers_metadata', engine)
meta_properties_df = read_table('meta_properties', engine)
PROPERTIES = meta_properties_df.name.unique()

## Number of issues by access right policy

In [18]:
issues_df.access_rights.unique()

array(['Closed', 'OpenPublic', 'OpenPrivate'], dtype=object)

In [12]:
issues_df.head(3)

Unnamed: 0,id,year,month,day,edition,access_rights,created,last_modified,is_damaged,s3_version,newspaper_id,decade
0,actionfem-1927-10-15-a,1927,10,15,a,Closed,2019-06-15 12:22:38,NaT,0,,actionfem,1920
1,actionfem-1927-11-15-a,1927,11,15,a,Closed,2019-06-15 12:22:38,NaT,0,,actionfem,1920
2,actionfem-1927-12-15-a,1927,12,15,a,Closed,2019-06-15 12:22:38,NaT,0,,actionfem,1920


In [23]:
count_issue_ar_df, _, _ = group_and_count(issues_df, ['access_rights'], 'id', False)

In [32]:
total_nb_issues = count_issue_ar_df['count'].sum()
count_issue_ar_df['rate'] = count_issue_ar_df['count']/total_nb_issues

In [33]:
count_issue_ar_df

Unnamed: 0,access_rights,count,rate
0,Closed,168685,0.38176
1,OpenPrivate,225703,0.510801
2,OpenPublic,47473,0.107439


On Oct. 15th 2019 : there are 225'703 open-private issues, 168'685 closed issues, and 47'473 open-public issues in the impresso db. <br/>
In other words, half issues (~51%) are open-private, around 38% are closed and finally only almost 11% of issues are open-public.

### Filter on property

#### By country

In [34]:
newspapers_select_ch = np_by_property(newspapers_metadata_df, meta_properties_df, 'countryCode', 'CH')
issues_df_select_ch = filter_df_by_np_id(issues_df, newspapers_select_ch)

newspapers_select_lux = np_by_property(newspapers_metadata_df, meta_properties_df, 'countryCode', 'LU')
issues_df_select_lux = filter_df_by_np_id(issues_df, newspapers_select_lux)

In [50]:
count_issue_ar_ch_df, _, _ = group_and_count(issues_df_select_ch, ['access_rights'], 'id', False)
count_issue_ar_lux_df, _, _ = group_and_count(issues_df_select_lux, ['access_rights'], 'id', False)

In [52]:
total_nb_issues_ch = count_issue_ar_ch_df['count'].sum()
total_nb_issues_lux = count_issue_ar_lux_df['count'].sum()

In [53]:
total_nb_issues_ch_lux = total_nb_issues_ch+total_nb_issues_lux

In [54]:
count_issue_ar_ch_df['ch_rate'] = count_issue_ar_ch_df['count']/total_nb_issues_ch

In [55]:
count_issue_ar_lux_df['lux_rate'] = count_issue_ar_lux_df['count']/total_nb_issues_lux

In [56]:
count_issue_ar_ch_df

Unnamed: 0,access_rights,count,ch_rate
0,Closed,112676,0.316981
1,OpenPrivate,225611,0.634691
2,OpenPublic,17179,0.048328


In [57]:
count_issue_ar_lux_df

Unnamed: 0,access_rights,count,lux_rate
0,Closed,56009,0.653814
1,OpenPublic,29656,0.346186


Looking at the country level is very interesting because we see that the access right of issues in both countries are very different.
- Switzerland : The percentage of open public issues is very low. Most issued (~63%) are open-private and the large majority of the rest (~32%) has closed access rights.
- Luxembourg : On the contrary, there are no open-private access right on issues. Approximately two thirds (65%) of the issues has closed access rights, and the last third has open-public access.

## by newspaper

## Number of content items by access right policy, filtered by property