### What to do 
1. Get authors: by organization id and the rest?
2. For each author, get its publications
3. Collect abstract of top N publications (according to popularity)

In [1]:
from scholarly import scholarly
from scholarly import ProxyGenerator

In [5]:
author_name = "Maciej Piasecki"
search_query = scholarly.search_author(author_name)
first_author_result = next(search_query)
first_author_result

{'container_type': 'Author',
 'filled': [],
 'source': <AuthorSource.SEARCH_AUTHOR_SNIPPETS: 'SEARCH_AUTHOR_SNIPPETS'>,
 'scholar_id': 'nU_W9XwAAAAJ',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=nU_W9XwAAAAJ',
 'name': 'Maciej Piasecki',
 'affiliation': 'Wroclaw University of Science and Technology',
 'email_domain': '@pwr.wroc.pl',
 'interests': ['Computational Linguistics',
  'Natural Language Processing',
  'Human-Computer Interaction',
  'Artificial Intelligence',
  'Language Technology'],
 'citedby': 2952}

`fill` method retrieves more information 

In [6]:
author = scholarly.fill(first_author_result)
author

{'container_type': 'Author',
 'filled': ['basics',
  'indices',
  'counts',
  'coauthors',
  'publications',
  'public_access'],
 'source': <AuthorSource.SEARCH_AUTHOR_SNIPPETS: 'SEARCH_AUTHOR_SNIPPETS'>,
 'scholar_id': 'nU_W9XwAAAAJ',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=nU_W9XwAAAAJ',
 'name': 'Maciej Piasecki',
 'affiliation': 'Wroclaw University of Science and Technology',
 'email_domain': '@pwr.wroc.pl',
 'interests': ['Computational Linguistics',
  'Natural Language Processing',
  'Human-Computer Interaction',
  'Artificial Intelligence',
  'Language Technology'],
 'citedby': 2952,
 'citedby5y': 1329,
 'hindex': 24,
 'hindex5y': 14,
 'i10index': 81,
 'i10index5y': 28,
 'cites_per_year': {2006: 18,
  2007: 50,
  2008: 89,
  2009: 95,
  2010: 112,
  2011: 118,
  2012: 131,
  2013: 186,
  2014: 154,
  2015: 123,
  2016: 153,
  2017: 154,
  2018: 172,
  2019: 195,
  2020: 130,
  2021: 200,
  2022: 115,
  2023: 343,
  2024: 338},
 'coauthors'

author's publications

In [7]:
len(author['publications'])

293

Here we only have some brief info about the given publication

In [8]:
first_publication = author['publications'][0]
first_publication

{'container_type': 'Publication',
 'source': <PublicationSource.AUTHOR_PUBLICATION_ENTRY: 'AUTHOR_PUBLICATION_ENTRY'>,
 'bib': {'title': 'ChatGPT: Jack of all trades, master of none',
  'pub_year': '2023',
  'citation': 'Information Fusion 99, 101861, 2023'},
 'filled': False,
 'author_pub_id': 'nU_W9XwAAAAJ:Aul-kAQHnToC',
 'num_citations': 428,
 'citedby_url': 'https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2600515932282922845',
 'cites_id': ['2600515932282922845']}

In [9]:
first_publication_filled = scholarly.fill(first_publication)
first_publication_filled

{'container_type': 'Publication',
 'source': <PublicationSource.AUTHOR_PUBLICATION_ENTRY: 'AUTHOR_PUBLICATION_ENTRY'>,
 'bib': {'title': 'ChatGPT: Jack of all trades, master of none',
  'pub_year': 2023,
  'citation': 'Information Fusion 99, 101861, 2023',
  'author': 'Jan Kocoń and Igor Cichecki and Oliwier Kaszyca and Mateusz Kochanek and Dominika Szydło and Joanna Baran and Julita Bielaniewicz and Marcin Gruza and Arkadiusz Janz and Kamil Kanclerz and Anna Kocoń and Bartłomiej Koptyra and Wiktoria Mieleszczenko-Kowszewicz and Piotr Miłkowski and Marcin Oleksy and Maciej Piasecki and Łukasz Radliński and Konrad Wojtasik and Stanisław Woźniak and Przemysław Kazienko',
  'journal': 'Information Fusion',
  'volume': '99',
  'pages': '101861',
  'publisher': 'Elsevier',
  'abstract': 'OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals it

Here we have multiple authors!

In [10]:
first_publication_filled['bib']['author']

'Jan Kocoń and Igor Cichecki and Oliwier Kaszyca and Mateusz Kochanek and Dominika Szydło and Joanna Baran and Julita Bielaniewicz and Marcin Gruza and Arkadiusz Janz and Kamil Kanclerz and Anna Kocoń and Bartłomiej Koptyra and Wiktoria Mieleszczenko-Kowszewicz and Piotr Miłkowski and Marcin Oleksy and Maciej Piasecki and Łukasz Radliński and Konrad Wojtasik and Stanisław Woźniak and Przemysław Kazienko'

The abstract is not full. 

In [11]:
first_publication_filled['bib']['abstract']

'OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT’s capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We …'

In [12]:
def get_author_publications(author_name):
    # 'affiliation' should be 'Wroclaw University of Science and Technology' or similiar
    search_query = scholarly.search_author(author_name)
    if not search_query:
        return None 
    first_author_result = next(search_query)
    author = scholarly.fill(first_author_result)
    return author['publications']

In [13]:
def structure_publications(publications):
    structured_publications = []
    for publication in publications:
        filled_pub = scholarly.fill(publication)
        if 'abstract' in filled_pub['bib']:
            structured_publications.append({'abstract' : filled_pub['bib']['abstract'], 'title': filled_pub['bib']['title'], 'author': filled_pub['bib']['author']})
        
    return structured_publications

In [14]:
mp_pubs = get_author_publications('Maciej Piasecki')
mp_structured_publications = structure_publications(mp_pubs)

In [15]:
mp_structured_publications[0]

{'abstract': 'OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT’s capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We …',
 'title': 'ChatGPT: Jack of all trades, master of none',
 'author': '

### WITHOUT setting proxy

In [1]:
from scholarly import scholarly

organization_id = 6200813508511872715
author_names = list(scholarly.search_author_by_organization(organization_id))

In [2]:
len(author_names)

509

### WITH setting proxy
try to run this cell only

In [1]:
from scholarly import scholarly
from scholarly import ProxyGenerator

# Set up a ProxyGenerator object to use free proxies
# This needs to be done only once per session
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

organization_id = 6200813508511872715
author_names = list(scholarly.search_author_by_organization(organization_id))

MaxTriesExceededException: Cannot Fetch from Google Scholar.

In [None]:
len(author_names)

We get a list of dictionaries which are of type 'Author' (the other type is 'Publication')

In [None]:
author_names[0]

In [None]:
author_filled = scholarly.fill(author_names[0])

Publications are sorted according to publications in ascending order

In [None]:
author_filled

In [None]:
scholarly.fill(author_filled['publications'][0])