# The Meta Executor

This Jupyter notebook is able to run other notebooks.
When it runs them, it will NOT overwrite what was in the notebook, but it will update the outputs.
These all function to save databases, tables, or figures.
So, hypothetically, you could run all analyses without seeing the notebooks that ran them, and peruse the results in the `all_figures.ipynb` or `all_tables.ipynb` notebooks.

This notebook is also where I have centralized high-level documentation.
To edit documentation, see the `documentation.yaml` file.

# todos

+ fix the term consolidation. I want the top 30000 "terms" -- cooccurrences between words 
    + in terms of frequency
    + in terms of chi2 non-independence with central citations
    + in terms of chi2 with each other (use cortext)
+ export demographic tables
+ journal trends figure
+ export figure with different kinds of deaths, consolidate multiple figures into "bigdeaths"

In [1]:
import sys; sys.path.append(_dh[0].split("knowknow")[0])
from knowknow import *

In [2]:
import papermill as pm

In [3]:
if True:
    import logging
    logging.basicConfig()
    logging.getLogger().setLevel(logging.INFO)

In [4]:
showdocs("counter")

# Counting coocurrences

Cultural phenomena are rich in meaning and context. Moreover, the meaning and context are what we care about, so stripping that would be a disservice. "Consider Geertz:"
> Not only is the semantic structure of the figure a good deal more complex than it appears on the surface, but an analysis of that structure forces one into tracing a multiplicity of referential connections between it and social reality, so that the final picture is one of a configuration of dissimilar meanings out of whose interworking both the expressive power and the rhetorical force of the final symbol derive. (Geertz [1955] 1973, Chapter 8 Ideology as a Cultural System, p. 213)

The way people understanding their world shape their action, and understandings are heterogeneous in any community, woven into a complex web of interacting pieces and parts. Understandings are constantly evolving, shifting with every conversation or Breaking News. Any quantitative technique for studying meaning must be able to capture the relational structure of cultural objects, their temporal dynamics, or it cannot be meaning.

These considerations motivate how I have designed the data structure and code for this project. My attention to "cooccurrences" in what follows is an application of Levi Martin and Lee's (2018) formal approach to meaning. They develop the symbolic formalism I use below, as well as showing several general analytic strategies for inductive, ground-up meaning-making from count data. This approach is quite general, useful for many applications.

The process is rather simple, I count cooccurrences between various attributes. For each document, for each citation in that document, I increment a dozen counters, depending on attributes of the citation, paper, journal, or author. This counting process is done once, and can be used as a compressed form of the dataset for all further analyses. In the terminology of Levi Martin and Lee, I am constructing "hypergraphs", and I will use their notation in what follows. For example $[c*fy]$ indicates the dataset which maps from $(c, fy) \to count$.
$c$ is the name of the cited work. $fy$ is the publication year of the article which made the citation. $count$ is the number of citations which are at the intersection of these properties.

+ $[c]$ the number of citations each document receives
+ $[c*fj]$ the number of citations each document receives from each journal's articles
+ $[c*fy]$ the number of citations each document receives from each year's articles
+ $[fj]$ the number of citations from each journal
+ $[fj*fy]$ the number of citations in each journal in each year
+ $[t]$ cited term total counts
+ $[fy*t]$ cited term time series
+ term cooccurrence with citation and journal ($[c*t]$ and [fj*t]$)
+ "author" counts, the number of citations by each author ($[a]$ $[a*c]$ $[a*j*y]$)
+ [c*c]$, the cooccurrence network between citations
+ the death of citations can be studied using the $[c*fy]$ hypergraph
+ $[c*fj*t]$ could be used for analyzing differential associations of $c$ to $t$ across publication venues
+ $[ta*ta]$, $[fa*fa]$, $[t*t]$ and $[c*c]$ open the door to network-scientific methods



# References



+ Martin, John Levi, and Monica Lee. 2018. “A Formal Approach to Meaning.” Poetics 68(February):10–17.
+ Geertz, Clifford. 1973. The Interpretation of Cultures. New York: Basic Books, Inc.

In [None]:
# JSTOR counter

if False:
    jstor = Path(_dh[0]).joinpath('creating variables','jstor counter (cnt).ipynb')
    pm.execute_notebook(
        str(jstor),
        str(jstor),
        parameters = {},
        nest_asyncio=True
    )

In [None]:
# WOS counter

if False:
    wos = Path(_dh[0]).joinpath('creating variables','web of science counter (cnt).ipynb')
    pm.execute_notebook(
        str(wos),
        str(wos),
        parameters = {},
        nest_asyncio=True
    )

In [8]:
showdocs("top1")

# Zooming in on the top 1%

I would like to look at the most successful cited authors, cited works, and cited terms. Unfortunately, this isn't so simple. There has been a dramatic increase in the supply of citations over the last 100 years, so the group with the most total citations would be skewed towards the citation preferences of recent papers. In order to account for this bias,

I choose among items cited by articles published in each decade 1940-1950, 1941-1951, 1942-1952, all the way to 1980-1990. In each of these decades I determine which were the top-cited 1%. The set of all these top 1%s, from all these decade spans, comprise the 1% I will study in this paper.

In [6]:
# top1
ys = Path(_dh[0]).joinpath('creating variables','top percent cited in decade (top1).ipynb')

settings = [
    #{"database_name":"sociology-wos","ctype":'c'},
    #{"database_name":"sociology-wos","ctype":'ta'},
    {"database_name":"sociology-jstor-basicall","ctype":'t', }
]

for sett in settings:
    pm.execute_notebook(
        str(ys),
        str(ys),
        parameters = sett,
        nest_asyncio=True
    )

INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\creating variables\top percent cited in decade (top1).ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\creating variables\top percent cited in decade (top1).ipynb


HBox(children=(FloatProgress(value=0.0, description='Executing', max=12.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3





PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [6]":
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-6-b0ab305641fa> in <module>
     18             count_in_range[ getattr(cross, ctype) ] += count
     19 
---> 20     q99 = np.quantile(np.array( list(count_in_range.values()) ), 1-top_percentile)
     21     top1 = {k for k in count_in_range if count_in_range[k]>=q99}
     22     all_tops.update(top1)

<__array_function__ internals> in quantile(*args, **kwargs)

c:\users\amcga\envs\citation-deaths\lib\site-packages\numpy\lib\function_base.py in quantile(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3816         raise ValueError("Quantiles must be in the range [0, 1]")
   3817     return _quantile_unchecked(
-> 3818         a, q, axis, out, overwrite_input, interpolation, keepdims)
   3819 
   3820 

c:\users\amcga\envs\citation-deaths\lib\site-packages\numpy\lib\function_base.py in _quantile_unchecked(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3824     r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
   3825                     overwrite_input=overwrite_input,
-> 3826                     interpolation=interpolation)
   3827     if keepdims:
   3828         return r.reshape(q.shape + k)

c:\users\amcga\envs\citation-deaths\lib\site-packages\numpy\lib\function_base.py in _ureduce(a, func, **kwargs)
   3401         keepdim = (1,) * a.ndim
   3402 
-> 3403     r = func(a, **kwargs)
   3404     return r, keepdim
   3405 

c:\users\amcga\envs\citation-deaths\lib\site-packages\numpy\lib\function_base.py in _quantile_ureduce_func(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3939             n = np.isnan(ap[-1:, ...])
   3940 
-> 3941         x1 = take(ap, indices_below, axis=axis) * weights_below
   3942         x2 = take(ap, indices_above, axis=axis) * weights_above
   3943 

<__array_function__ internals> in take(*args, **kwargs)

c:\users\amcga\envs\citation-deaths\lib\site-packages\numpy\core\fromnumeric.py in take(a, indices, axis, out, mode)
    192            [5, 7]])
    193     """
--> 194     return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
    195 
    196 

c:\users\amcga\envs\citation-deaths\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     59 
     60     try:
---> 61         return bound(*args, **kwds)
     62     except TypeError:
     63         # A TypeError occurs if the object does have such a method in its

IndexError: cannot do a non-empty take from an empty axes.


In [5]:
# ysum

ys = Path(_dh[0]).joinpath('creating variables','ysum.ipynb')

settings = [
    #{"database_name":"sociology-wos","dtype":'c'},
    #{"database_name":"sociology-wos","dtype":'ta'},
    {"database_name":"sociology-jstor-basicall","dtype":'t'}
]

for sett in settings:
    pm.execute_notebook(
        str(ys),
        str(ys),
        parameters = sett,
        log_output=True,
        progress_bar=False,
        nest_asyncio=True
    )

INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\creating variables\ysum.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\creating variables\ysum.ipynb
INFO:blib2to3.pgen2.driver:Generating grammar tables from c:\users\amcga\envs\citation-deaths\lib\site-packages\blib2to3\Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to C:\Users\amcga\AppData\Local\black\black\Cache\19.10b0\Grammar3.7.5.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: 'C:\\Users\\amcga\\AppData\\Local\\black\\black\\Cache\\19.10b0\\tmpz631lxk8'
INFO:blib2to3.pgen2.driver:Generating grammar tables from c:\users\amcga\envs\citation-deaths\lib\site-packages\blib2to3\PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to C:\Users\amcga\AppData\Local\b

# small analyses

In [5]:
# journal summaries
jsum = Path(_dh[0]).joinpath('analyses','summary of journals.ipynb')

pm.execute_notebook(
    str(jsum),
    str(jsum),
    parameters = {},
    nest_asyncio=True
);

INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\summary of journals.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\summary of journals.ipynb


HBox(children=(FloatProgress(value=0.0, description='Executing', max=3.0, style=ProgressStyle(description_widt…

INFO:papermill:Executing notebook with kernel: python3





In [None]:
# momentary success
moment = Path(_dh[0]).joinpath('analyses','momentary success makes death less likely.ipynb')

settings = [
    #{"database_name":"sociology-wos","dtype":'c'},
    #{"database_name":"sociology-wos","dtype":'ta'},
    {"database_name":"sociology-jstor","dtype":'t'}
]

for sett in settings:
    pm.execute_notebook(
        str(moment),
        str(moment),
        parameters = sett,
        nest_asyncio=True
    )

In [7]:
# ubiquitous power-law behavior
ys = Path(_dh[0]).joinpath('analyses','ubiquitous power-law behavior.ipynb')

settings = [
    {"database_name":"sociology-wos","dtype":'c'},
    {"database_name":"sociology-wos","dtype":'ta'},
    {"database_name":"sociology-jstor","dtype":'t'}
]

for sett in settings:
    pm.execute_notebook(
        str(ys),
        str(ys),
        parameters = sett,
        nest_asyncio=True
    )

INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\ubiquitous power-law behavior.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\ubiquitous power-law behavior.ipynb


HBox(children=(FloatProgress(value=0.0, description='Executing', max=13.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\ubiquitous power-law behavior.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\ubiquitous power-law behavior.ipynb





HBox(children=(FloatProgress(value=0.0, description='Executing', max=13.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\ubiquitous power-law behavior.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\ubiquitous power-law behavior.ipynb





HBox(children=(FloatProgress(value=0.0, description='Executing', max=13.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3





In [11]:
# demographics of authors, context, etc

demo = Path(_dh[0]).joinpath('analyses','demographics.ipynb')

for i in [2]:#range(3):
    pm.execute_notebook(
        str(demo),
        str(demo),
        parameters = dict(setting_no=i),
        nest_asyncio=True
    )

INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\demographics.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\demographics.ipynb


HBox(children=(FloatProgress(value=0.0, description='Executing', max=28.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3





In [9]:
showdocs("100bigc")

# Citations which were in the 1%, but have died

This figure shows a random 100 of these cited works. , in the sense of death2, death3, or death5. These deaths are labeled for reference.

In [7]:
# visualizations of remarkable lives and deaths

viz = Path(_dh[0]).joinpath('analyses','remarkable lives and deaths.ipynb')

settings = [
    {"database_name":"sociology-wos","dtype":'c',"birth_key":'pub',"definitions_of_death":['death2','death3','death5']},
    {"database_name":"sociology-wos","dtype":'ta',"birth_key":'first',"definitions_of_death":['death2','death3','death5']},
    {"database_name":"sociology-jstor","dtype":'t',"birth_key":'first',"definitions_of_death":['death2','death3','death5']}
]

for sett in settings:
    pm.execute_notebook(
        str(viz),
        str(viz),
        parameters = sett,
        nest_asyncio=True
    )

INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\remarkable lives and deaths.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\remarkable lives and deaths.ipynb
INFO:blib2to3.pgen2.driver:Generating grammar tables from c:\users\amcga\envs\citation-deaths\lib\site-packages\blib2to3\Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to C:\Users\amcga\AppData\Local\black\black\Cache\19.10b0\Grammar3.7.5.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: 'C:\\Users\\amcga\\AppData\\Local\\black\\black\\Cache\\19.10b0\\tmpfhgsmgew'
INFO:blib2to3.pgen2.driver:Generating grammar tables from c:\users\amcga\envs\citation-deaths\lib\site-packages\blib2to3\PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to C:\U

HBox(children=(FloatProgress(value=0.0, description='Executing', max=14.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\remarkable lives and deaths.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\remarkable lives and deaths.ipynb





HBox(children=(FloatProgress(value=0.0, description='Executing', max=14.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Input Notebook:  G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\remarkable lives and deaths.ipynb
INFO:papermill:Output Notebook: G:\My Drive\projects\qualitative analysis of literature\post 5-12-2020\git repository _ citation-deaths\knowknow\analyses\remarkable lives and deaths.ipynb





HBox(children=(FloatProgress(value=0.0, description='Executing', max=14.0, style=ProgressStyle(description_wid…

INFO:papermill:Executing notebook with kernel: python3



