pandas-marc is a lightweight Python library for working with MARC 21 bibliographic metadata using pandas dataframes. It uses pymarc for serializing to and deserializing from MARC.
Let's say we have a CSV file with some tabular bibliographic metadata:
MARC 007 | title |
---|---|
ta | Woman in the nineteenth century |
ta | The fire next time |
Using pandas, we can load in the CSV and prepare it for processing into MARC:
import pandas as pd
dataframe = pd.read_csv('marc_data.csv')
dataframe.columns = ['m007', 'm245']
dataframe['m245'] = dataframe['m245'].map(lambda title: f'$a{title}.' )
dataframe['m245_indicators'] = '10'
Our dataframe now looks like this:
m007 | m245 | m245_indicators |
---|---|---|
ta | $aWoman in the nineteenth century. | 10 |
ta | $aThe fire next time. | 10 |
To convert the dataframe into a series of MARC records, we load it into an instance of pandas-marc's MARCDataFrame
class. This allows us to generate pymarc Record
objects from the dataframe rows:
from pandas_marc import MARCDataFrame
mdf = MARCDataFrame(dataframe)
for record in mdf.records:
print(record.title())
Output:
Woman in the nineteenth century.
The fire next time.
Record
objects may be readily serialized using pymarc's MARCWriter
class:
from pymarc import MARCWriter
with open('marc_data.mrc', 'wb') as marc_file:
writer = MARCWriter(marc_file)
for record in mdf.records:
writer.write(record)
pandas-marc uses the following delimiters by default:
- Subfields are delimited with a dollar sign,
$
. For example, here is a MARC 100 personal name field comprising an author name and relator term under subfieldsa
ande
:$aZitkála-Šá, $eauthor.
- Multiple field occurrences may be delimited with a backslash,
\
. For example, here is a series of delimited MARC 650 subject fields:$aRhetoric.\$aSpeeches.
Alternate delimiters can be specified using arguments to MARCDataFrame
:
mdf = MARCDataFrame(
dataframe=dataframe,
occurrence_delimiter='|',
subfield_delimiter='‡'
)
Download the latest stable release from the master branch and install with pip:
pip install pandas-marc.tar.gz
Run pandas-marc's test suite using pytest:
pytest tests/test_pandas_marc.py
# Or just...
pytest