# Publication queries 

This notebook contains a collection of common publication queries for [Dimensions on Google BigQuery](https://docs.dimensions.ai/bigquery/).

For more background, see also the [publications data model](https://docs.dimensions.ai/bigquery/datasource-publications.html). 

## Prerequisites

This tutorial assumed that you have a Dimensions on Google BigQuery account and have completed the [Verifying your connection](https://digital-science.github.io/dimensions-gbq-lab/cookbooks/1-Verifying-your-connection.html) notebook. 

In [7]:
!pip install google-cloud-bigquery -U --quiet
%load_ext google.cloud.bigquery

import sys
print("==\nAuthenticating..")
if 'google.colab' in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
    print('..done (method: Colab)')
else:
    from google.cloud import bigquery
    print('..done (method: local credentials)')

The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery
==
Authenticating..
..done (method: local credentials)


Update the value of `MY_PROJECT_ID` as needed and run the cell below to get started. 

In [8]:
MY_PROJECT_ID = "ds-data-solutions-gbq"

print("==\nTesting connection..")
client = bigquery.Client(project=MY_PROJECT_ID)
client.query("""
    SELECT COUNT(*) as Total_Publications 
    from `dimensions-ai.data_analytics.publications`
    """).to_dataframe()

==
Testing connection..


Unnamed: 0,Total_Publications
0,115963650


## 1. Top publications by Altmetric score and research organization 



In [9]:
%%bigquery --project $MY_PROJECT_ID

-- Top 5 pubs by Altmetric Score for GRID ID grid.4991.5 in the year 2020

SELECT
  id,
  title.preferred as title,
  ARRAY_LENGTH(authors) as authors,
  altmetrics.score as altmetrics_score
FROM
  `dimensions-ai.data_analytics.publications`
WHERE
  year = 2020 AND 'grid.4991.5' in UNNEST(research_orgs)
ORDER BY
  altmetrics.score desc
LIMIT 5

Query complete after 0.07s: 100%|██████████| 2/2 [00:00<00:00, 908.64query/s]                         
Downloading: 100%|██████████| 5/5 [00:02<00:00,  1.99rows/s]


Unnamed: 0,id,title,authors,altmetrics_score
0,pub.1130340155,Two metres or one: what is the evidence for ph...,6,15626
1,pub.1129493369,Safety and immunogenicity of the ChAdOx1 nCoV-...,366,15382
2,pub.1127239818,Remdesivir in adults with severe COVID-19: a r...,46,12139
3,pub.1133359801,Safety and efficacy of the ChAdOx1 nCoV-19 vac...,766,11111
4,pub.1131721397,Scientific consensus on the COVID-19 pandemic:...,31,10429


## 2. Working with Publications dates 

Each publication has various dates available. 

* date, year, date_normal, date_online, date_print refer to the publication object. See the [documentation](https://docs.dimensions.ai/bigquery/datasource-publications.html) to find out more about their meaning. 
* date_imported_gbq refers to when this record was last added to GBQ - this date can be handy if you want to synchronize an external data source to GBQ. 
* date_inserted: this refers to when this records was originally added to Dimensions (if the records gets adjusted later, it doesn't change). 

In [10]:
%%bigquery --project $MY_PROJECT_ID

select doi, date, date_normal, year, date_online, date_print, date_imported_gbq, date_inserted 
from `dimensions-ai.data_analytics.publications`
where year = 2010 and journal.id = "jour.1115214"
order by citations_count desc 
limit 10

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 905.80query/s]                         
Downloading: 100%|██████████| 10/10 [00:02<00:00,  4.10rows/s]


Unnamed: 0,doi,date,date_normal,year,date_online,date_print,date_imported_gbq,date_inserted
0,10.1038/nbt.1621,2010-05-02,2010-05-02,2010,2010-05-02,2010-05,2021-02-10 01:09:29+00:00,2017-08-31 12:50:56+00:00
1,10.1038/nbt.1630,2010-05-02,2010-05-02,2010,2010-05-02,2010-05,2021-02-10 01:09:29+00:00,2017-08-31 12:50:56+00:00
2,10.1038/nbt.1614,2010-03,2010-03-01,2010,,2010-03,2021-02-10 01:09:29+00:00,2017-08-31 12:50:56+00:00
3,10.1038/nbt.1685,2010-10-13,2010-10-13,2010,2010-10-13,2010-10,2021-02-10 00:53:56+00:00,2017-08-31 12:50:56+00:00
4,10.1038/nbt1210-1248,2010-12-07,2010-12-07,2010,2010-12-07,2010-12,2021-02-10 00:53:56+00:00,2017-08-31 12:50:56+00:00
5,10.1038/nbt.1755,2010-12-22,2010-12-22,2010,2010-12-22,2011-02,2021-02-10 01:09:29+00:00,2017-08-31 12:50:56+00:00
6,10.1038/nbt1010-1045,2010-10-13,2010-10-13,2010,2010-10-13,2010-10,2021-02-10 00:53:56+00:00,2017-08-31 12:50:56+00:00
7,10.1038/nbt.1633,2010-05-02,2010-05-02,2010,2010-05-02,2010-05,2021-02-10 00:53:56+00:00,2017-08-31 12:50:56+00:00
8,10.1038/nbt.1667,2010-07-19,2010-07-19,2010,2010-07-19,2010-08,2021-02-10 01:09:29+00:00,2017-08-31 12:50:56+00:00
9,10.1038/nbt.1641,2010-05-23,2010-05-23,2010,2010-05-23,2010-06,2021-02-10 00:53:56+00:00,2017-08-31 12:50:56+00:00


### Number of publications added to Dimensions by month

In [13]:
%%bigquery --project $MY_PROJECT_ID

SELECT 
  DATETIME_TRUNC(DATETIME(date_inserted), MONTH) as date,
  COUNT(id) as countDim
FROM
  `dimensions-ai.data_analytics.publications`
GROUP BY date  
ORDER BY date DESC
LIMIT 5




Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 2546.63query/s]                        
Downloading: 100%|██████████| 5/5 [00:02<00:00,  1.99rows/s]


Unnamed: 0,date,countDim
0,2021-02-01,174570
1,2021-01-01,685667
2,2020-12-01,820007
3,2020-11-01,573519
4,2020-10-01,718132


## Top N publications by citations percentile

In [14]:
%%bigquery --project $MY_PROJECT_ID

WITH pubs AS (
  SELECT
    p.id as id, 
    p.title.preferred as title,
    p.citations_count as citations,
  FROM
    `dimensions-ai.data_analytics.publications` p
  WHERE year = 2020 AND "09" IN UNNEST(category_for.first_level.codes)
),
ranked_pubs AS (
  SELECT
    p.*,
    PERCENT_RANK() OVER (ORDER BY p.citations DESC) citation_percentile
  FROM
    pubs p
)
SELECT * FROM ranked_pubs
WHERE citation_percentile <= 0.01
ORDER BY citation_percentile asc

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 2074.33query/s]                        
Downloading: 100%|██████████| 7034/7034 [00:02<00:00, 2580.02rows/s]


Unnamed: 0,id,title,citations,citation_percentile
0,pub.1129408972,Estimation of total flavonoid content in propo...,881,0.000000
1,pub.1122861707,"Mercury 4.0: from visualization to analysis, d...",393,0.000001
2,pub.1125814051,Analysis and forecast of COVID-19 spreading in...,286,0.000003
3,pub.1126110231,Covid-19: automatic detection from X-ray image...,255,0.000004
4,pub.1125821215,The Role of Telehealth in Reducing the Mental ...,234,0.000006
...,...,...,...,...
7029,pub.1126789618,First Successful Treatment of Coronavirus Dise...,14,0.008954
7030,pub.1126819649,Advanced Matrixes for Binder‐Free Nanostructur...,14,0.008954
7031,pub.1128192899,A hybrid multi-scale model of COVID-19 transmi...,14,0.008954
7032,pub.1128190805,Z-scheme In2O3/WO3 heterogeneous photocatalyst...,14,0.008954


## Citations by journal, for a specific publisher 

In [20]:
%%bigquery --project $MY_PROJECT_ID

WITH publisher_pubs AS (
  SELECT id FROM `dimensions-ai.data_analytics.publications`
  WHERE publisher.id = "pblshr.1000340" AND type = "article"
)

SELECT 
  COUNT(p.id) as tot,
  p.journal.title as journal
FROM `dimensions-ai.data_analytics.publications` p, UNNEST(p.reference_ids) r
WHERE 
  p.year = 2020 AND p.type = "article"      -- restrict to articles with a published year of 2020
  AND p.publisher.id <> "pblshr.1000340"    -- where the publisher is not the same as the pusblisher above
  AND r IN (SELECT * FROM publisher_pubs)   -- the publication must reference a publishers publication
GROUP BY journal
ORDER BY tot DESC
LIMIT 10

Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 3541.29query/s]                        
Downloading: 100%|██████████| 10/10 [00:02<00:00,  4.11rows/s]


Unnamed: 0,tot,journal
0,26147,Scientific Reports
1,18794,International Journal of Molecular Sciences
2,8647,Frontiers in Microbiology
3,8620,
4,7695,Frontiers in Immunology
5,6960,International Journal of Environmental Researc...
6,6421,Nature Communications
7,6145,Cells
8,5687,Cancers
9,5006,Microorganisms


## Researchers related to a selected GRID and category [REVIEW]

* Using publications to identify researchers of interest
* Using researchers db to get more infos about them 

In [None]:
%%bigquery --project $MY_PROJECT_ID

WITH researchers AS 
(
  SELECT DISTINCT res_id
  from `dimensions-ai.data_analytics.publications`, UNNEST(researcher_ids) res_id 
  where "grid.4991.5" in UNNEST(research_orgs)
  AND "2204" in UNNEST(category_for.second_level.codes)
  AND year = 2020
)


SELECT id, current_research_org, last_publication_year, total_grants
from `dimensions-ai.data_analytics.researchers`r1
join researchers r2 
on r1.id = r2.res_id
order by total_publications desc
limit 10

## Generate list of authors for a publication by flattening/concatenating

IE Flattening an array of objects into a string

In [30]:
%%bigquery --project $MY_PROJECT_ID

select
  p.id,
  ARRAY_TO_STRING((
    SELECT ARRAY(SELECT CONCAT(first_name, " ", last_name) FROM UNNEST(p.authors))
   ), '; ') as authors_list
from `dimensions-ai.data_analytics.publications` p
where p.id = 'pub.1132070778'

Query complete after 0.01s: 100%|██████████| 1/1 [00:00<00:00, 86.44query/s]                           
Downloading: 100%|██████████| 1/1 [00:02<00:00,  2.49s/rows]


Unnamed: 0,id,authors_list
0,pub.1132070778,O Grånäs; A Mocellin; E S Cardoso; F Burmeiste...


## Generate list of categories for a publication by flattening/concatenating


In [33]:
%%bigquery --project $MY_PROJECT_ID

select
  p.id, 
  ARRAY_TO_STRING((
    SELECT ARRAY(SELECT name FROM UNNEST(p.category_for.first_level.full))
   ), '; ') as categories_list
from `dimensions-ai.data_analytics.publications` p
where p.id = 'pub.1132070778'

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 721.54query/s]                          
Downloading: 100%|██████████| 1/1 [00:04<00:00,  4.17s/rows]


Unnamed: 0,id,categories_list
0,pub.1132070778,Physical Sciences; Chemical Sciences


## One-degree citation network for a single publication

In [35]:
%%bigquery --project $MY_PROJECT_ID

WITH level1 AS (
  select "pub.1099396382" as citation_from, citations.id as citation_to, 1 as level, citations.year as citation_year
  from `dimensions-ai.data_analytics.publications` p, unnest(citations) as citations
  where p.id="pub.1099396382"
),

level2 AS (
  select l.citation_to as citation_from, citations.id as citation_to, 2 as level, citations.year as citation_year
  from `dimensions-ai.data_analytics.publications` p, unnest(citations) as citations, level1 l
  where p.id = l.citation_to
)

SELECT * from level1 
UNION ALL
SELECT * from level2 

Query complete after 0.00s: 100%|██████████| 4/4 [00:00<00:00, 1477.65query/s]                        
Downloading: 100%|██████████| 187/187 [00:02<00:00, 80.92rows/s]


Unnamed: 0,citation_from,citation_to,level,citation_year
0,pub.1114028205,pub.1131160226,2,2020
1,pub.1106819031,pub.1116024231,2,2019
2,pub.1106819031,pub.1110011840,2,2018
3,pub.1106819031,pub.1106383928,2,2018
4,pub.1106819031,pub.1127419935,2,2020
...,...,...,...,...
182,pub.1043374025,pub.1028868656,2,2006
183,pub.1043374025,pub.1084164363,2,2017
184,pub.1053387944,pub.1104020017,2,2018
185,pub.1053387944,pub.1013860913,2,2007


## Publications per category, total and percentage against total

In [36]:
%%bigquery --project $MY_PROJECT_ID

SELECT
  cat.name,
  COUNT(DISTINCT p.id) AS pubs_global,
  ROUND ((COUNT(DISTINCT p.id) * 100 /(
      SELECT
        COUNT(*)
      FROM
        `dimensions-ai.data_analytics.publications`)), 2 ) AS pubs_global_pc
FROM
  `dimensions-ai.data_analytics.publications` p,
  UNNEST(category_for.first_level.full) cat
GROUP BY
  cat.name

Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2172.99query/s]                        
Downloading: 100%|██████████| 22/22 [00:02<00:00,  8.67rows/s]


Unnamed: 0,name,pubs_global,pubs_global_pc
0,Biological Sciences,8922205,7.69
1,Medical and Health Sciences,29853801,25.74
2,Economics,1722795,1.49
3,Engineering,12168683,10.49
4,Education,1804004,1.56
5,Information and Computing Sciences,5118832,4.41
6,History and Archaeology,2333998,2.01
7,Technology,1932511,1.67
8,"Commerce, Management, Tourism and Services",1792537,1.55
9,Studies in Creative Arts and Writing,639952,0.55


## Heads up - about UNNEST 

UNNEST are implicit 'cross-join' queries, hence only records that have some value in the nested column are represented

For example, the query below return less publications that then ones available, because only the ones with `research_org_country_names` are included (= cross join)

In [45]:
%%bigquery --project $MY_PROJECT_ID

SELECT count(distinct p.id) as tot_articles
FROM `dimensions-ai.data_analytics.publications` p 
    , UNNEST(research_org_country_names) as research_org_country_names
WHERE year = 2000

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 2579.52query/s]                        
Downloading: 100%|██████████| 1/1 [00:04<00:00,  4.60s/rows]


Unnamed: 0,tot_articles
0,1060342


As a test, we can run the query without the UNNEST clause

In [44]:
%%bigquery --project $MY_PROJECT_ID

SELECT count(distinct p.id) as tot_articles
FROM `dimensions-ai.data_analytics.publications` p 
WHERE year = 2000


Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 2101.00query/s]                        
Downloading: 100%|██████████| 1/1 [00:02<00:00,  2.97s/rows]


Unnamed: 0,tot_articles
0,1759389


If you want to get all records, then LEFT JOIN is the way to go in this case

In [47]:
%%bigquery --project $MY_PROJECT_ID


SELECT count(distinct p.id) as tot_articles
FROM `dimensions-ai.data_analytics.publications` p 
LEFT JOIN UNNEST(research_org_country_names) as research_org_country_names
WHERE year = 2000

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1973.48query/s]                        
Downloading: 100%|██████████| 1/1 [00:03<00:00,  3.90s/rows]


Unnamed: 0,tot_articles
0,1759389


## Find articles mathcing a specific affiliation string

In [48]:
%%bigquery --project $MY_PROJECT_ID


SELECT
  id, aff.grid_id, aff.raw_affiliation
FROM
  `dimensions-ai.data_analytics.publications`,
  UNNEST(authors) auth,
  UNNEST(auth.affiliations_address) AS aff
WHERE
  year = 2020
  AND aff.grid_id = "grid.69566.3a"
  AND LOWER(aff.raw_affiliation) LIKE "%school of medicine%"

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1575.33query/s]                        
Downloading: 100%|██████████| 5920/5920 [00:02<00:00, 2237.13rows/s]


Unnamed: 0,id,grid_id,raw_affiliation
0,pub.1121839087,grid.69566.3a,"Evidence-based Cardiovascular Medicine, Tohoku..."
1,pub.1121839087,grid.69566.3a,"Evidence-based Cardiovascular Medicine, Tohoku..."
2,pub.1121837499,grid.69566.3a,"Department of Molecular Endocrinology, Tohoku ..."
3,pub.1121837499,grid.69566.3a,"Department of Molecular Endocrinology, Tohoku ..."
4,pub.1121837499,grid.69566.3a,"Department of Molecular Endocrinology, Tohoku ..."
...,...,...,...
5915,pub.1131102254,grid.69566.3a,"Department of Neurosurgery, Tohoku University ..."
5916,pub.1131102254,grid.69566.3a,"Department of Neurosurgery, Tohoku University ..."
5917,pub.1131102254,grid.69566.3a,Department of Neurosurgical Engineering and Tr...
5918,pub.1131102254,grid.69566.3a,"Department of Neurosurgery, Tohoku University ..."


variant, to get unique publication records with affiliation count 

In [49]:
%%bigquery --project $MY_PROJECT_ID



SELECT
  COUNT(aff) AS matching_affiliations,
  id,
  title.preferred AS title
FROM
  `dimensions-ai.data_analytics.publications`,
  UNNEST(authors) auth,
  UNNEST(auth.affiliations_address) AS aff
WHERE
  year = 2020
  AND aff.grid_id = "grid.69566.3a"
  AND LOWER(aff.raw_affiliation) LIKE "%school of medicine%"
GROUP BY
  id,
  title

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1722.51query/s]                        
Downloading: 100%|██████████| 1492/1492 [00:02<00:00, 543.31rows/s]


Unnamed: 0,matching_affiliations,id,title
0,1,pub.1120113207,Renal damage in primary aldosteronism: a syste...
1,47,pub.1124019204,Study profile of The Tohoku Medical Megabank C...
2,1,pub.1123669309,CD4+ T Cells as Key Players in the Immunopatho...
3,5,pub.1123675210,Effects of Long-Term Exercise on Liver Cyst in...
4,1,pub.1123793958,"Degenerative rotator cuff tear, repair or not ..."
...,...,...,...
1487,1,pub.1132236309,Protracted rosiglitazone treatment exacerbates...
1488,3,pub.1132927290,Benefits of eculizumab in AQP4+ neuromyelitis ...
1489,2,pub.1133258481,RARE-26. RETROSPECTIVE ANALYSIS OF PEDIATRIC C...
1490,12,pub.1133397887,Echolalia in patients with primary progressive...


## Publications with corresponding authors by publisher 

In [51]:
%%bigquery --project $MY_PROJECT_ID

select count(distinct id) as tot , publisher.name
from `dimensions-ai.data_analytics.publications`, unnest(authors) aff
where aff.corresponding is true and publisher.name is not null
group by publisher.name
order by tot desc

Query complete after 0.00s: 100%|██████████| 4/4 [00:00<00:00, 1715.81query/s]                        
Downloading: 100%|██████████| 421/421 [00:03<00:00, 134.06rows/s]


Unnamed: 0,tot,name
0,8733776,Elsevier
1,5885408,Springer Nature
2,813007,Institute of Electrical and Electronics Engine...
3,683093,SAGE Publications
4,380636,MDPI
...,...,...
416,1,Society for Sedimentary Geology
417,1,Scientific Archives LLC
418,1,"Museum of Comparative Zoology, Harvard University"
419,1,Institute of Lifestyle Medicine


## Funding by journal

In [56]:
%%bigquery --project $MY_PROJECT_ID

with funding as (
SELECT funding.grid_id as funders, count(id) as pubs,  count(funding.grant_id) as grants
FROM `dimensions-ai.data_analytics.publications`, unnest(funding_details) as funding
where journal.id = "jour.1113716" -- nature medicine 
GROUP BY funders)
select funding.*, grid.name 
from funding 
join `dimensions-ai.data_analytics.grid` grid on funding.funders = grid.id  
ORDER BY pubs DESC, grants DESC


Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2062.70query/s]                        
Downloading: 100%|██████████| 831/831 [00:03<00:00, 262.38rows/s]


Unnamed: 0,funders,pubs,grants,name
0,grid.48336.3a,2699,2484,National Cancer Institute
1,grid.419681.3,2008,1878,National Institute of Allergy and Infectious D...
2,grid.419635.c,1620,1564,National Institute of Diabetes and Digestive a...
3,grid.279885.9,1612,1525,National Heart Lung and Blood Institute
4,grid.416870.c,712,668,National Institute of Neurological Disorders a...
...,...,...,...,...
826,grid.453131.1,1,0,Yorkshire Cancer Research
827,grid.467619.b,1,0,W.W Grainger (United States)
828,grid.467239.d,1,0,Zimmer Biomet (United States)
829,grid.280878.d,1,0,New York State Office of Mental Health


## Articles with SDGs

In [64]:
%%bigquery --project $MY_PROJECT_ID

select p.id, p.doi, p.date_inserted, sdg.name 
from `dimensions-ai.data_analytics.publications` p, unnest(category_sdg.full) sdg
where sdg is not null 
limit 5

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 795.51query/s]                         
Downloading: 100%|██████████| 5/5 [00:02<00:00,  1.96rows/s]


Unnamed: 0,id,doi,date_inserted,name
0,pub.1013334466,10.1017/s0261340900021950,2017-08-31 12:50:56+00:00,"Peace, Justice and Strong Institutions"
1,pub.1025550519,10.1017/s0261340900021780,2017-08-31 12:50:56+00:00,"Peace, Justice and Strong Institutions"
2,pub.1106406142,,2018-09-01 10:48:57+00:00,Good Health and Well Being
3,pub.1104914054,,2018-06-20 15:35:59+00:00,Good Health and Well Being
4,pub.1038799356,10.1056/nejm184601140332401,2017-08-31 12:50:56+00:00,Good Health and Well Being


Count how many pubs per SDG

In [65]:
%%bigquery --project $MY_PROJECT_ID

select COUNT(DISTINCT p.id) as tot, sdg.name 
from `dimensions-ai.data_analytics.publications` p, unnest(category_sdg.full) sdg
GROUP BY sdg.name
limit 5

Query complete after 0.00s: 100%|██████████| 4/4 [00:00<00:00, 1851.18query/s]                        
Downloading: 100%|██████████| 5/5 [00:02<00:00,  1.98rows/s]


Unnamed: 0,tot,name
0,748342,"Peace, Justice and Strong Institutions"
1,49065,No Poverty
2,184789,Reduced Inequalities
3,260191,Decent Work and Economic Growth
4,230736,Sustainable Cities and Communities


## Journals with LIKE string matching

In [71]:
%%bigquery --project $MY_PROJECT_ID

select COUNT(*) as pubs, journal.id, journal.title, journal.issn, journal.eissn, publisher.name
from `dimensions-ai.data_analytics.publications` 
where  LOWER( journal.title ) LIKE  CONCAT('%medicine%')
group by 2, 3, 4, 5, 6
order by pubs desc
limit 20

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1209.20query/s]                        
Downloading: 100%|██████████| 20/20 [00:02<00:00,  7.83rows/s]


Unnamed: 0,pubs,id,title,issn,eissn,name
0,168620,jour.1014075,New England Journal of Medicine,0028-4793,1533-4406,Massachusetts Medical Society
1,83860,jour.1011551,Medicine & Science in Sports & Exercise,0195-9131,1530-0315,Wolters Kluwer
2,58617,jour.1017222,Annals of Internal Medicine,0003-4819,1539-3704,American College of Physicians
3,52792,jour.1312267,Journal of the Royal Society of Medicine,0141-0768,1758-1095,SAGE Publications
4,52248,jour.1017256,JAMA Internal Medicine,2168-6106,2168-6114,American Medical Association (AMA)
5,47104,jour.1027092,Experimental Biology and Medicine,1535-3702,1535-3699,SAGE Publications
6,46274,jour.1016342,Critical Care Medicine,0090-3493,1530-0293,Wolters Kluwer
7,37632,jour.1057918,Journal of Molecular Medicine,0946-2716,1432-1440,Springer Nature
8,34891,jour.1017275,Arizona Medicine,0093-0415,1476-2978,
9,31068,jour.1014535,The American Journal of Medicine,0002-9343,1555-7162,Elsevier


## New vs recurring authors, for a specific journal

In [73]:
%%bigquery --project $MY_PROJECT_ID


WITH authoryear AS (
  select pubs.year, author.researcher_id, COUNT(pubs.id) AS numpubs
  FROM `dimensions-ai.data_analytics.publications` as pubs
  CROSS JOIN UNNEST(pubs.authors) AS author
  WHERE author.researcher_id IS NOT NULL AND journal.id= "jour.1115214"
  GROUP BY author.researcher_id, pubs.year
), authorfirst AS (
  select researcher_id, MIN(year) AS minyear
  FROM authoryear
  GROUP BY researcher_id
), authorsummary AS (
  SELECT ay.*, IF(ay.year=af.minyear, TRUE, FALSE) AS firstyear
  FROM authoryear ay
  JOIN authorfirst af ON af.researcher_id=ay.researcher_id
  ORDER BY ay.researcher_id, year
), numauthors AS (
  SELECT year, firstyear, COUNT(DISTINCT researcher_id) AS numresearchers
  FROM authorsummary
  WHERE year>2010
  GROUP BY year, firstyear
)
SELECT year, SUM(CASE when firstyear then numresearchers else 0 end) as num_first,
             SUM(CASE when NOT firstyear then numresearchers else 0 end) as num_recurring
from numauthors
group by year
order by year;


Query complete after 0.00s: 100%|██████████| 10/10 [00:00<00:00, 5324.75query/s]                       
Downloading: 100%|██████████| 10/10 [00:02<00:00,  4.04rows/s]


Unnamed: 0,year,num_first,num_recurring
0,2011,1040,352
1,2012,858,373
2,2013,926,345
3,2014,1088,338
4,2015,1044,392
5,2016,1313,350
6,2017,1072,404
7,2018,1104,419
8,2019,1184,442
9,2020,1579,568


## incoming citations for a journal

In [77]:
%%bigquery --project $MY_PROJECT_ID

select COUNT(distinct id) as totcount, year, type
from `dimensions-ai.data_analytics.publications` 
where id in 
  ( select citing_pubs.id
    from `dimensions-ai.data_analytics.publications`, UNNEST(citations) as citing_pubs
    where  journal.id = "jour.1115214" 
  )
group by year, type
order by year, type  

Query complete after 0.00s: 100%|██████████| 7/7 [00:00<00:00, 3743.00query/s]                        
Downloading: 100%|██████████| 201/201 [00:02<00:00, 79.73rows/s]


Unnamed: 0,totcount,year,type
0,9,,article
1,1,1924.0,article
2,1,1942.0,article
3,1,1963.0,article
4,1,1964.0,article
...,...,...,...
196,1,2021.0,book
197,612,2021.0,chapter
198,26,2021.0,monograph
199,894,2021.0,preprint


## Outgoing citations to a journal

In [80]:
%%bigquery --project $MY_PROJECT_ID

select COUNT(distinct id) as totcount, year, type
from `dimensions-ai.data_analytics.publications` 
where id in 
  ( select distinct reference_pubs
    from `dimensions-ai.data_analytics.publications`, UNNEST(reference_ids) as reference_pubs
    where  journal.id = "jour.1115214" 
  )
group by year, type
order by year, type  

Query complete after 0.00s: 100%|██████████| 8/8 [00:00<00:00, 2721.81query/s]                        
Downloading: 100%|██████████| 356/356 [00:02<00:00, 139.64rows/s]


Unnamed: 0,totcount,year,type
0,1,,article
1,1,1825.0,article
2,1,1828.0,article
3,1,1853.0,article
4,1,1855.0,monograph
...,...,...,...
351,3,2019.0,proceeding
352,409,2020.0,article
353,5,2020.0,chapter
354,34,2020.0,preprint


## Concepts: select publications matching selected concepts

In [86]:
%%bigquery --project $MY_PROJECT_ID

WITH tropical_diseases AS (
    select * from `dimensions-ai.data_analytics.publications`
)
SELECT publisher.name as publisher, year, count(*) as num_pub
FROM tropical_diseases, UNNEST(tropical_diseases.concepts) c
WHERE 
  (lower(c.concept) in UNNEST(["buruli ulcer", "mycobacterium", "mycolactone", "bairnsdale ulcer"]) 
  OR REGEXP_CONTAINS(title.preferred, r"(?i)/buruli ulcer|mycobacterium|mycolactone|bairnsdale ulcer/"))
  AND year >= 2010
  AND publisher IS NOT NULL
GROUP BY publisher,year
ORDER BY num_pub DESC, year, publisher
limit 10

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 2063.11query/s]                        
Downloading: 100%|██████████| 10/10 [00:02<00:00,  4.21rows/s]


Unnamed: 0,publisher,year,num_pub
0,Elsevier,2020,31812
1,Elsevier,2018,29580
2,Elsevier,2019,28941
3,Elsevier,2017,28415
4,Elsevier,2015,27301
5,Elsevier,2011,25758
6,Elsevier,2016,25149
7,Elsevier,2013,23209
8,Elsevier,2014,23100
9,Springer Nature,2019,22072
