# Basic querying of articles

1. Query articles
2. Store them in Pandas DataFreame. Some dataprocessing is actually made at this step. It includes:
   1. Convert *authors*, *titles*, *abstract*, *journal* and *publisher* strings to lowercase.
   2. Convert *publication_date* to correct datetime format.
   3. Convert, which are actualli strings to pandas string format.
   4. All other datatypes are set to correct python datatypes.
3. Save to to *.csv* file.
4. Optionally save to *.xlsx* for manual marking.

In [3]:
%load_ext autoreload
%autoreload 2

from artfinder import Crossref
import logging
import os

logging.basicConfig(level=logging.INFO)

crosref = Crossref(app='artfinder', email='aapopov1@mephi.ru')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Make a search query and construct pandas dataframe with results
* Valid fields for query are in Crossref.FIELDS_QUERY
* Valid fields for filter are in Crossref.FILTER_VALIDATOR
* Specifying the same filter several times results in OR semantics, while specifying different filters results in AND semantics
* [RESP-API documentation](https://github.com/CrossRef/rest-api-doc?tab=readme-ov-file#queries)



In [4]:
author_name = 'Barcikowski'
df = crosref.query(author=author_name).filter(from_pub_date='1993', type=['proceedings-article', 'journal-article']).get_df()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 548 entries, 0 to 547
Data columns (total 17 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   publisher               548 non-null    string        
 1   license                 348 non-null    object        
 2   is_referenced_by_count  548 non-null    object        
 3   link                    486 non-null    object        
 4   authors                 548 non-null    object        
 5   abstract                184 non-null    string        
 6   title                   548 non-null    string        
 7   doi                     548 non-null    string        
 8   type                    548 non-null    string        
 9   journal                 548 non-null    string        
 10  issn                    491 non-null    string        
 11  volume                  480 non-null    string        
 12  issue                   369 non-null    string    

## Save data

In [5]:
path_to_save = os.path.join('database', author_name + '.csv')
df.to_csv(path_to_save, index=False)

### Save for marking data

In [None]:
path_for_processing = ''
df[['title', 'abstract', 'doi']].to_excel(path_for_processing, index=False)