A configurable and re-usable python script to import data from an import.io extractor into an SQL database
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
config.json
config_handler.py
importSQL.py
sql_handler.py

README.md

importSQL

A configurable and re-usable python script to import data from an import.io extractor into an SQL database

Dependencies

You will need to install the following:

  1. python
  2. python-mysqldb (used as an sql client only)
  3. A running SQL server (SQLlite, MySQL etc...)

Running it

To run it you have a few options, but firstly you need to have your table and database ready

Run the following commands to set up the demo table:

mysql>

CREATE DATABASE ILOVEOWLS 
    DEFAULT CHARACTER SET utf8 
    DEFAULT COLLATE utf8_general_ci;
    
USE ILOVEOWLS;

CREATE TABLE OMFGowls
(
field_image varchar(255),
field_name varchar(255),
field_price varchar(255),
field_size varchar(255),
field_colour varchar(255)
) DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

Timestamps

To add the a timestamp field to an already existing table, use the following command:


ALTER TABLE OMFGowls ADD COLUMN timestamp BIGINT;

Using a config Script

  • Using this you can set up a configuration that you can set up once and re-use to get the latest data into yours database table

  • Make sure you know your import.io user credentials

  • Create a config.json file that follows this pattern:

{
	// Config for import.io
	"sourceUUID": "94cdc938-c24e-42db-b94f-3fb852c450a9",
	"inputUrls": ["http://owlkingdom.com","http://owlkingdom.com"],
	"ioAPIKey": "[your API key]",

	// Config for mysql
	"table": "OMFGowls",
	"database": "ILOVEOWLS",
	"host": "localhost",
	"port": 3306,
	"username": "root",
	"password": "root",
 
        "addTimestamp": true,
	"mapping": {
		"image": "field_image",
		"name": "field_name",
		"price": "field_price",
		"size": "field_size",
		"colour": "field_colour"
	}
}
  • Put it in the same directory as your importSQL script.
  • RUN IT! importSQL [optional:-U [sql username] -P [sql password] -p [io API key]]

This json file above will grab the owls from Owl Kingdom and put them into your SQL table

Mapping

This mapping field defines the mapping between the column names in import.io and the column names in your MySQL database

"mapping": {
   	// import.io	// MySQL
	"image": 	"field_image",
	"name": 	"field_name",
	"price": 	"field_price",
	"size": 	"field_size",
	"colour": 	"field_colour"
}

Getting Data from Crawlers

To get all the data from your crawl from import.io into your SQL database, you can use the -c option to turn on crawler mode. This will get the data from the last crawl snapshot and not directly query the crawler using an input url.

(if you have settings in a config file, they will be loaded, but overwritten by anything you supply on the commandline)

importSQL -c -s "your crawler guid" [optional:-U [sql username] -P [sql password] -p [io API key]]

Using commandline options

  • When using just commandline options, be aware the the script will assume that the column names from import.io match the columns names in mysql

Here are the list of commandline options you can use:

  • -U mysql username (default: root)
  • -P mysql password (default: root)
  • -H mysql host name (default: localhost)
  • -E mysql port number (default: 3306)
  • -t mysql table name
  • -d mysql database name
  • -T add timestamp
  • -p import.io APIKey
  • -s source UUID
  • -i input urls for data source (can use multiple here)
  • -c flag to tell if the source you want data from is an uploaded crawl