Pirn is a data extraction/load tool that can be used to extract data from one or more data sources and load it into one or more data sources. It can be used to migrate data from one data source to another, or to synchronize data between data sources.
Key features:
- Configurable source queries to start from a base set of documents
- Supports multiple source and target databases, and database types via plugins
- Copies related documents by following reference fields
npm install --save @pirn/pirn-core
import { Pirn } from '@pirn/pirn-core';
const pirn = new Pirn();
Clients are used to connect to the data stores. Depending on the type of data store, different clients are available and are installed separately. See Plugins for more information.
There are two types of clients: source and target. Source clients are used to extract data. Target clients are used to load data. A target client must have a source
property that references the clientId
of a source client. The source
property must be set before calling connect()
.
Each client must have a unique clientId
property. The clientId
is used to reference the client in queries. The clientId
can be any string, but it is recommended to use a descriptive name. The clientId
must be set before calling connect()
. The clientId
must be unique across all clients. If two clients have the same clientId
, an error will be thrown. If the clientId
is not set, it will be set to a random string.
import { Client as MongodbClient } from '@pirn/pirn-plugin-mongodb';
import { Client as CouchdbClient } from '@pirn/pirn-plugin-couchdb';
const sourceClientA = new MongodbClient({
type: 'source',
clientId: 'remote-mongodb',
db: {
url: REMOTE_MONGODB_URL,
name: 'mongodb-production',
}
});
const targetClientA = new CouchdbClient({
type: 'target',
clientId: 'local-couchdb',
source: sourceClientA.clientId, // <- This is the clientId of the source client defined above
db: {
url: LOCAL_COUCHDB_URL,
name: 'couchdb-development',
}
});
Depending on the type of data source, different options are available. See Client Options for more information.
Queries are used to fetch data from the clients. The results of the queries will be used to fetch related documents. See Queries for more information.
const initialQuery = {
key: 'get-prod-products',
clientId: sourceClientA.clientId,
from: ['products'],
where: {
keys: ['_id'],
operator: 'in',
value: ['abc123-xyz4-1234-1234-1234567890ab', 'abc123-xyz4-1234-1234-1234567890ac'],
}
};
const pirn = new Pirn();
pirn.setJSONDumpPath(`${process.env.CWD}/results.json`);
pirn.addQueries([ initialQuery ]);
await pirn.addClients([ sourceClientA, targetClientA]);
await pirn.connectAll();
await pirn.fetch();
await pirn.dump();
await pirn.disconnectAll();
- Node.js 14 or higher
Pirn is the main class that orchestrates the data extraction and load process. It is responsible for connecting to the data sources, fetching the data, and dumping the data to the target data sources or JSON file. It is the only class that needs to be instantiated.
Adds a client to the Pirn instance. The client must be added before calling connect()
. The client must have a unique clientId
property. See Client Options for more information. Client classes are available as plugins. See Plugins for more information.
Adds multiple Client
instances to the Pirn instance. See addClient()
for more information.
Closes the connection to the client and removes the client from the list of clients. Returns the list of clients
.
Removes multiple clients from the Pirn instance.
Returns the client with the specified clientId
.
Returns the list of clients.
Sets the path to the JSON dump file. If the path is not set, the JSON dump file will not be created. The path must be set before calling dump()
. The path must be a valid path to a JSON file. If the file does not exist, it will be created. If the file exists, it will be overwritten. See dump() for more information.
Returns the path to the JSON dump file.
Adds a query to be executed. Depending on the query, it may be executed on all clients, or on a specific client. The query must be added before calling fetch()
. Returns the queries
array. See Queries for more information.
Adds multiple queries to be executed. See addQuery()
for more information.
Removes a query from the list of queries. Returns the queries
array.
Removes multiple queries from the list of queries. See removeQuery()
for more information.
Returns the query with the specified queryKey
.
Returns the list of queries.
Adds a field to ignore when fetching data from a client. The field will be ignored for all queries executed on the client. If no clientId is provided, the field will be ignored for all clients. The field must be added before calling fetch()
. Returns the ignoreFields
array. See Client Options for more information.
Example:
pirn.addIgnoreField('sourceA', 'password');
pirn.addIgnoreField('sourceB', 'password');
In the example above, the password
field will be ignored for all queries executed on the sourceA
and sourceB
clients.
Adds multiple fields to ignore when fetching data from a client or clients. See addIgnoreField()
for more information.
Returns the list of fields to ignore for the specified client. If no clientId is provided, the list of fields to ignore for all clients will be returned.
Removes a field from the list of fields to ignore for the specified client. If no clientId is provided, the field will be removed from the list of fields to ignore for all clients. Returns the ignoreFields
array.
Removes multiple fields from the list of fields to ignore for the specified client or clients. See removeIgnoreField()
for more information.
addIgnoreTable(clientId: string, table: string)
or addIgnoreCollection(clientId: string, collection: string)
Adds a table/collection to ignore when fetching data from a client. The table/collection will be ignored for all queries executed on the client. The table/collection must be added before calling fetch()
. Returns the ignoreTables
array. See Client Options for more information.
Example:
pirn.addIgnoreTable('sourceA', 'passwords');
pirn.addIgnoreCollection('sourceB', 'passwords');
In the example above, the passwords
table/collection will be ignored for all queries executed on the sourceA
and sourceB
clients.
addIgnoreTables(clientId: string, tables: string[])
or addIgnoreCollections(clientId: string, collections: string[])
Adds multiple tables/collections to ignore when fetching data from a client. See addIgnoreTable()
or addIgnoreCollection()
for more information.
Returns the list of tables/collections to ignore for the specified client. If no clientId is provided, the list of tables/collections to ignore for all clients will be returned.
Connects to all clients. This must be called before calling fetch()
. Returns a promise that resolves when all clients are connected.
Connects to the specified client. This must be called before calling fetch()
. Returns a promise that resolves when the client is connected.
Fetches the data from the clients. This must be called before calling dump()
. Returns a promise that resolves when all data is fetched. This method could take a while to complete, depending on the amount of data being fetched.
Dumps the data to the target clients. This must be called before calling disconnectAll()
. Returns a promise that resolves when all data is dumped. If the JSON dump path is set, the data will be dumped to the JSON file and target clients. If the JSON dump path is not set, the data will only be dumped to the target clients.
Disconnects from all clients. This must be called before calling connect()
. Returns a promise that resolves when all clients are disconnected.
Disconnects from the specified client. This must be called before calling connect()
. Returns a promise that resolves when the client is disconnected.
The following options can be set on a client:
Option | Type | Default | Description |
---|---|---|---|
ignoreFields |
string[] |
[] |
An array of fields to ignore when fetching data. See addIgnoreField() for more information. |
ignoreTables |
string[] |
[] |
An array of tables/collections to ignore when fetching data. See addIgnoreTable() or addIgnoreCollection() for more information. |
Queries are used to fetch data from the clients. Queries can be added to the Pirn instance before calling fetch()
. See Pirn API for more information.
A query is an object that contains the following properties:
Property | Type | Description |
---|---|---|
clientId |
string |
The client ID to execute the query on. If not set, the query will be executed on all clients. |
from |
string[] |
An array of tables/collections to fetch data from. If not set, the query will be executed on all tables/collections. |
where |
string | object |
A string or object that represents the WHERE clause of the query. If not set, the query will not fetch any data from the tables/collections. |
The where
property of a query can be a string or an object. If it is a string, it will be used as the WHERE clause of the query. If it is an object, it will be used to build the WHERE clause of the query. The object can have the following properties:
Property | Type | Description |
---|---|---|
keys |
string[] |
An array of keys to use in the WHERE clause. |
operator |
string |
The operator to use in the WHERE clause. |
value |
string | string[] | object | object[] |
The value to use in the WHERE clause. |
If the where
property is an array of strings or objects, the query will be executed multiple times, once for each string or object in the array.
The following operators can be used in the where
property of a query:
Operator | Description |
---|---|
eq |
Equal to |
like |
Like |
in |
In array |
Support for more operators will be added in the future. Please be aware that a query that is too broad could take a long time to execute.
The following values can be used in the where
property of a query:
Value | Description |
---|---|
string |
A string value. |
object |
An object value. |
array |
An array of strings or objects. |
The reason why only these value types are supported is to control the size of the initial query. If the query is too broad, it could take a long time to execute.
Fetch all documents from the users
table/collection where the last_name
field contains Doe
and the first_name
field is exactly John
(case-sensitive).
const query = {
from: ['users'],
where: [
"last_name LIKE '%Doe%'",
{ first_name: 'John' },
],
};
const query = {
from: ['users'],
where: { _id: ObjectId('abc123-xyz4-1234-1234-1234567890ab') },
};
Plugins can be created to support different data sources. Plugins are classes that extend the Client
class. To create a plugin, see Contributing for more information.
Plugin | Description |
---|---|
@pirn/pirn-cli | CLI |
@pirn/pirn-plugin-mongodb | MongoDB client |
If you would like to contribute a plugin, please see Contributing for more information. If you would like to request a plugin, please open an issue.
Contributions are welcome. Please read the Contributing Guidelines for more information.
MIT