<a href="https://colab.research.google.com/github/BecomeAllan/S2Search/blob/main/SemanticScholarSearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Consumindo a API do SemanticScholar

A seguir, tem uma classe chamada `Search()`, que ao instanciar-la em uma variável é possível fazer pesquisas sobre papers utilizando a api do SemanticScholar, dentre os parâmetros temos:

- Buscar: Pesquisas sobre tópicos onde adicionar tópicos utiliza-se + (mais) e remover tópicos usamos - (menos)

- Fields: O que será retornado como dados. Para utilizar, escolha dentre as opções sem utilizar espaço e separadas de virgulas:
  - externalIds
  - url
  - title
  - abstract
  - venue 
  - year 
  - referenceCount
  - citationCount
  - influentialCitationCount
  - isOpenAccess
  - fieldsOfStudy
  - authors 

- Offset: Número que começa a puxar a partir da ordem dele a lista de papers. (0 seria o primeiro)

- Limite: Número de papers a ser retornados (Máx. 10.000)

**Obs:** A api do SemanticScholar disponibiliza 100 query's a cada 5 min, no qual apenas retorna no máx. 100 resutados (limite). Assim a cada 5 min, é possível puxar 10.000 papers.

In [112]:
#@title Classe para pesquisa no SemanticScholar
import IPython
from google.colab import output
import pandas as pd

class Search():
  def __init__(self, **kwargs):
    self.data = ""
    self.data_0 = ""

    self.search = kwargs.get('search', None)
    self.fields = kwargs.get('fields', None)
    self.limit = kwargs.get('limit', None)
    self.offset = kwargs.get('offset', None)

    if self.search == None and self.fields == None and self.limit == None and self.offset == None:
      self._start(False)
    else:
      self._start(True)
  
  def _start(self, *args):

    output.register_callback('notebook.searching', self._searching)
    output.register_callback('notebook.AddListItem', self._add_list_item)
    output.register_callback('notebook.mergeData', self._merge_data)
    output.register_callback('notebook.error', self._error)


    boxs = ''' 
        <label for="query">Buscar: </label>
        <input type="text" id="query" value="Machine Learning+Deep Learning" style="width: 400px;"/>
        <br/>
        <br/>
        
        <label for="fields">Fields: </label>
        <input type="text" id="fields" value="title,abstract,isOpenAccess,fieldsOfStudy" style="width: 400px;"/>
        <br/>
        <br/>
 
        <label for="limit">Limite: </label>
        <input type="text" id="limit" value="10" style="width: 50px;"/><br/>
        <br/>

        <label for="limit">Offset: </label>
        <input type="text" id="offset" value="0" style="width: 50px;"/><br/>
        <br/>

        <button id='button'>Pesquisar</button>
        <br/>
        <br/>
           '''

    button = ''' document.querySelector('#button').onclick = async () => ''' # {}

    search_query = '''
            var search = document.getElementById("query").value
            var fields = document.getElementById("fields").value
            var limit = parseInt(document.getElementById("limit").value)
            var offset = parseInt(document.getElementById("offset").value)
                  '''
    search_params = '''
            var search = "{search}"
            var fields = "{fields}"
            var limit = parseInt({limit})
            var offset = parseInt({offset})
                  '''
    engine = '''
            google.colab.kernel.invokeFunction('notebook.searching', [], {});

            if (limit >100) {
              var number = limit
              var data = ""
              var promises = []
              var offsetSearch = 0
              var rest = 0

              for (let index = 0; index < Math.floor(limit/100); index++) {
                offsetSearch = 100*(index) + offset + 1*(index!==0)


                promises.push(
                  fetch(`https://api.semanticscholar.org/graph/v1/paper/search?query=${search}&offset=${offsetSearch}&limit=100&fields=${fields}`)
    .then(res=> {return(res.json())})
    .then(res=> {return(res)})
                )
              }
              
              if (limit%100 !== 0) { 
                rest= limit%100
                offsetSearch = offsetSearch+100
                
                console.log(rest)
                console.log(offsetSearch)

                promises.push(
                fetch(`https://api.semanticscholar.org/graph/v1/paper/search?query=${search}&offset=${offsetSearch}&limit=${rest}&fields=${fields}`)
    .then(res=> {return(res.json())})
    .then(res=> {return(res)})
                )}

              await Promise.all(promises).then(data=>{
                google.colab.kernel.invokeFunction('notebook.mergeData', [data], {})
              })
              .catch(err=> { return (google.colab.kernel.invokeFunction('notebook.error', [err], {})) })

            } else {

            await fetch(`https://api.semanticscholar.org/graph/v1/paper/search?query=${search}&offset=${offset}&limit=${limit}&fields=${fields}`)
    .then(res=> {return(res.json())})
    .then(res=> {
      console.log(res)
      console.log("AQUIII")
      return(google.colab.kernel.invokeFunction('notebook.AddListItem', [res], {}))})
    .catch(err=> { return (
      google.colab.kernel.invokeFunction('notebook.error', [err], {})) })
            }
                  '''

    asyncfun = "async function asyncfun()"

    if args[0]:

      main_app =  "<script>" + search_params.format(search=self.search, fields=self.fields, limit=self.limit, offset=self.offset) + asyncfun + "{" + engine + "}" + "asyncfun()" + "</script>"

      display(IPython.display.HTML(main_app))
      
    else:
      main_app = boxs + "<script>" + button + "{" + search_query + engine + "}" + "</script>"
      
      display(IPython.display.HTML(main_app))

    

  def _error(self,value):
    try:
      print("ERRO na API SemanticScholar:\n")
      print(value)
    except:
      pass 

  def _searching(self):
    with output.use_tags('some_outputs'):
      print("\n\nPesquisando...")
      sys.stdout.flush();

  def _merge_data(self, data):
    output.clear(output_tags='some_outputs')
    print(f"Achou {data[0]['total']} papers.\n")
    self.data_0 = data

    self.data = pd.DataFrame(data[0]['data'])

    try:
      for x in data[1:len(data)]:
        try:
          self.merge(pd.DataFrame(x['data']))
        except:
          self._error(x)
    except:
      pass 

    print(f"\nApi devolveu >> {self.data.shape[0]} papers\n" )
    print(self.data.head())


  def merge(self, data):
    self.data = pd.concat([self.data, data], ignore_index=True ) 

  def _add_list_item(self, value):
    output.clear(output_tags='some_outputs')

    print(f"Achou {value['total']} papers.\n")

    self.data = pd.DataFrame(value['data'])

    print(f"Api devolveu >> {self.data.shape[0]} papers\n" )
    
    print(self.data.head())



# Consumir a classe `Search()`

A duas formas de pesquisar utilizando `Search()`:

1. A primeira é utilizando parâmetros na propria classe:

In [113]:
Resultados = Search(search = "Machine Learning+Deep Learning" , fields = "title,abstract,isOpenAccess,fieldsOfStudy", limit = "250", offset = "0")

Achou 651023 papers.


Api devolveu >> 250 papers

                                    paperId  ...       fieldsOfStudy
0  846ff7afb7670d62f88b4a8cc99d306ffb81b075  ...          [Medicine]
1  5dc53e50148b01fe8b9536eb79fa6b1dce924174  ...          [Medicine]
2  7cc2e148d27a7508dd23c4e35eb63cc9b3e6a58f  ...  [Computer Science]
3  59444b096f7c8a561d540102e8b5bfb189edabc6  ...                None
4  eee313380ccb45807ea0afa3c1df86f6b48b8867  ...  [Computer Science]

[5 rows x 5 columns]


In [114]:
# Os dados ficam na variável data, no qual é uma tabela do tipo pandas
print(Resultados.data)

                                      paperId  ...                    fieldsOfStudy
0    846ff7afb7670d62f88b4a8cc99d306ffb81b075  ...                       [Medicine]
1    5dc53e50148b01fe8b9536eb79fa6b1dce924174  ...                       [Medicine]
2    7cc2e148d27a7508dd23c4e35eb63cc9b3e6a58f  ...               [Computer Science]
3    59444b096f7c8a561d540102e8b5bfb189edabc6  ...                             None
4    eee313380ccb45807ea0afa3c1df86f6b48b8867  ...               [Computer Science]
..                                        ...  ...                              ...
245  6718ed5f9267960034155faa709e24988eb89fcc  ...               [Computer Science]
246  16c0ef924da1f6b510c9c783ac764156f5a3d631  ...               [Computer Science]
247  067aa61f39d2489cb1efb29877144bd2e2a4b540  ...              [Biology, Medicine]
248  5817758f0c0fe5242239fa8fe32aba713c893e11  ...     [Computer Science, Medicine]
249  2581b3e44592b3b3741474c8d6f483a90c29f139  ...  [Computer Science, Mathe

2. A segunda é atravez da api de busca, searchBox, no qual é possivel colocar os campos:

In [115]:
Resultados_2 = Search()

Achou 651023 papers.

Api devolveu >> 10 papers

                                    paperId  ...       fieldsOfStudy
0  846ff7afb7670d62f88b4a8cc99d306ffb81b075  ...          [Medicine]
1  5dc53e50148b01fe8b9536eb79fa6b1dce924174  ...          [Medicine]
2  7cc2e148d27a7508dd23c4e35eb63cc9b3e6a58f  ...  [Computer Science]
3  59444b096f7c8a561d540102e8b5bfb189edabc6  ...                None
4  eee313380ccb45807ea0afa3c1df86f6b48b8867  ...  [Computer Science]

[5 rows x 5 columns]


In [118]:
print(Resultados_2.data)

                                    paperId  ...       fieldsOfStudy
0  846ff7afb7670d62f88b4a8cc99d306ffb81b075  ...          [Medicine]
1  5dc53e50148b01fe8b9536eb79fa6b1dce924174  ...          [Medicine]
2  7cc2e148d27a7508dd23c4e35eb63cc9b3e6a58f  ...  [Computer Science]
3  59444b096f7c8a561d540102e8b5bfb189edabc6  ...                None
4  eee313380ccb45807ea0afa3c1df86f6b48b8867  ...  [Computer Science]
5  46479bbea7749cb2db35b139206039531327053c  ...  [Computer Science]
6  b69fe5a837277ddbea5215d6bacd3a902e9d11ce  ...          [Medicine]
7  b0bf64ccbd651e8c7bc141d8aabaecff562e93a1  ...  [Computer Science]
8  042ab08ec6782cf217f13175162bfd48f7350114  ...  [Computer Science]
9  03e7832982986159400a8eeab148487ffcfabe56  ...  [Computer Science]

[10 rows x 5 columns]
