Imports the IMDB dataset to a neo4j database.
To be able to use this project, the following steps have to be done in the given order.
The project was tested with the following software requirements:
Please ensure that the required software is available on your system.
According to the license model of the imdb dataset, the used source files are not included in this repository. To be
able to use the provided importer the mentioned files (name.basics.tsv
, title.principals.tsv
, title.basics.tsv
)
have to be included in the resources folder of this project. Then you can specify the number of relations to be imported
by changing the maxRelations
-parameter of importer
bean in the neo4j-config. Like this you can import
only a subset of the whole dataset.
The imdb data is available under the following link https://datasets.imdbws.com/, and the documentation to the interfaces can be found here https://www.imdb.com/interfaces/.
In order to import the data a neo4j database has to be running on your system. The configuration of the connection
must be adapted in the resources/neo4j-config.xml
file.
For importing the data into the database a csv bulk import statement is used. To perform this statement the csv files
are exported in the database's import directory. For this, the neo4j installation directory has to be defined in the
neo4j-config.xml
as a constructor argument (neo4jDatabasePath
) of the TSV2CSV
class.
To start the import process, run the main function in the Neo4jImdbMain
class.
If you have any questions please contact us: contact@aist.science.
If you are using this repository inside a research publication, we would ask you to cite us: