Read the CSV file and parse the data.
Create sets for each node type (Artist, Genre, Country).
Parse relationships and create appropriate connections between nodes.
Here's a breakdown of how you can accomplish this:

1. Read and Parse the CSV File
We'll use Python's built-in csv module to read the CSV file. Each row in the file will be read and its fields will be extracted.

2. Create Sets for Node Types
Artists: Extract spotify_id, name, followers, and popularity.
Genres: Extract genres from the genres field.
Countries: Extract country ISO2 names from chart_hits.
3. Parse Relationships
Artist-Genres: Create relationships between each artist and their genres.
Artist-Country-Hits: Create relationships between each artist and countries, including the number of hits as a property of the relationship.
Python Script Outline
python
Copy code
import csv
import json

# Function to parse genres
def parse_genres(genres_str):
    return json.loads(genres_str.replace("'", '"'))

# Function to parse chart hits
def parse_chart_hits(chart_hits_str):
    if chart_hits_str:
        return [tuple(hit.strip("[]'").split(" (")) for hit in chart_hits_str.split(", ")]
    return []

# Read CSV and process data
with open('your_file.csv', 'r', encoding='utf-8') as file:
    reader = csv.DictReader(file)

    artists = set()
    genres = set()
    countries = set()
    artist_genre_relations = set()
    artist_country_relations = set()

    for row in reader:
        # Add artist
        artists.add((row['spotify_id'], row['name'], row['followers'], row['popularity']))

        # Process and add genres
        for genre in parse_genres(row['genres']):
            genres.add(genre)
            artist_genre_relations.add((row['spotify_id'], genre))

        # Process and add countries and hits
        for country, hits in parse_chart_hits(row['chart_hits']):
            countries.add(country)
            artist_country_relations.add((row['spotify_id'], country, hits))

# At this point, you have all the sets filled with data
# You can now proceed to create nodes and relationships in Neo4j
Creating Nodes and Relationships in Neo4j
You can use the Neo4j Python driver to create nodes and relationships in your Neo4j database. You'll have to iterate over the sets created and use appropriate Cypher queries to insert the data into Neo4j.

Note
Ensure your CSV file path is correct in the open function.
This script assumes your CSV format is consistent with the example you provided.
You'll need to install the Neo4j Python driver (neo4j) to interact with your Neo4j database.
The script above is a basic implementation. You might need to adjust it based on the specifics of your dataset and Neo4j schema.
This script should give you a good starting point for creating your import script. Let me know if you need further assistance or more specific details!

In [32]:
import csv
import json

# Function to parse genres
def parse_genres(genres_str):
    # Strip whitespace and check for empty genres
    genres_str = genres_str.strip()
    if genres_str in ["[]", ""]:
        return []

    try:
        # Try replacing single quotes with double quotes and escaping existing double quotes
        formatted_str = genres_str.replace('"', '\\"').replace("'", '"')
        return json.loads(formatted_str)
    except json.JSONDecodeError:
        # Handle the exception if JSON parsing still fails
        return []

# Function to parse chart hits
def parse_chart_hits(chart_hits_str):
    if chart_hits_str:
        hits = [hit.strip("[]'").strip(")").split(" (") for hit in chart_hits_str.split(", ")]
        # interpret the number of hits as an integer
        hits = [(country, int(hits)) for country, hits in hits]
        return hits
    return []

# Read CSV and process data
with open('datasets/spotify/nodes.csv', 'r', encoding='utf-8') as file:
    reader = csv.DictReader(file)

    artists = set()
    genres = set()
    countries = set()
    artist_genre_relations = set()
    artist_country_relations = set()

    for row in reader:
        # Add artist
        artists.add((row['spotify_id'], row['name'], float(row['followers']), int(row['popularity'])))

        # Process and add genres
        for genre in parse_genres(row['genres']):
            genres.add(genre)
            artist_genre_relations.add((row['spotify_id'], genre))

        # Process and add countries and hits
        for country, hits in parse_chart_hits(row['chart_hits']):
            countries.add(country)
            artist_country_relations.add((row['spotify_id'], country, hits))

# At this point, you have all the sets filled with data
# You can now proceed to create nodes and relationships in Neo4j
print(artists)


{('5I82NM6jN4Y267iHwVeNR9', 'Nata Record', 188.0, 12), ('5UOOgRWguRmVZo1voJuQpf', 'Orgânico', 196759.0, 54), ('6MlVGjgieHwMJCPBjU41dN', 'Maciej Musiałowski', 18644.0, 32), ('4kwEd1P9j15ZqUVP5zK7Pv', 'Joe Stone', 32782.0, 55), ('3zx1v6xCo7VE8vxhhyqr5Y', 'Brady Watt', 6118.0, 37), ('2UhQHcRXyVTc6IO0bFIRh3', 'DJ Abdel', 30064.0, 43), ('6sYR8JTRUUUzSD9IydLhfG', 'Guizmo', 528654.0, 52), ('4KJ6jujcNPzOyhdNoiNftp', 'lovelytheband', 360970.0, 61), ('49801AhfB2xOoOqtRZXv4H', 'Klikkmonopolet', 1505.0, 44), ('0WleeEe3UurwlNbDGhb5Yz', 'ALLMO$T', 821638.0, 53), ('5TKat8l0kj32ZBTActy8U6', 'Zlyj Reper Zenyk', 5111.0, 34), ('7EuXVmTcFfpvmFbi1CTctP', 'Kuningasidea', 19686.0, 33), ('0UtXMxHMXhwQUI6G6TFDt1', 'Gin Lee', 60282.0, 45), ('1usmBbXRC0bMPPUaaebBVw', 'Jayboogz x NAVI', 1434.0, 30), ('5t8dw8yCWNAezW6wP3ZOGh', 'Alex Kunnari', 4341.0, 33), ('0YMeriqrS3zgsX24nfY0F0', 'The Tragically Hip', 849612.0, 59), ('1vSN1fsvrzpbttOYGsliDr', 'Tori Kelly', 2052997.0, 67), ('77QIEno3j2L5WkrHkh2OnP', 'Sir Mich', 3

To add the data from your edges.csv file, which represents featuring relationships between artists, into a new set, we can follow these steps:

Read the edges.csv file: Similar to how we read the other CSV file, we'll use Python's csv module.

Create a Set for Featurings: This set will hold tuples representing the featuring relationships between two artists, identified by their id_0 and id_1 values from the CSV file.

In [25]:
import csv

# Initialize a set to store featuring relationships
featurings = set()

# Read the edges.csv file
with open('datasets/spotify/edges_ridotto.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    next(reader)  # Skip the header row

    for row in reader:
        if len(row) == 2:  # Ensure the row has exactly two elements
            id_0, id_1 = row
            featurings.add((id_0, id_1))

# At this point, the 'featurings' set contains all the featuring relationships

At this point, we import neo4j library and write the script for adding nodes and relations

In [33]:
from neo4j import GraphDatabase

# Define the Neo4j connection details
uri = "bolt://localhost:7687"
username = "neo4j"
password = "password"

# Connect to the Neo4j database
driver = GraphDatabase.driver(uri, auth=(username, password))

# Function to create nodes and relationships
def create_nodes_and_relationships():
    with driver.session() as session:
        # Create nodes
        session.run("""
            UNWIND $artists AS artist
            MERGE (a:Artist {spotify_id: artist[0]})
            SET a.name = artist[1], a.followers = artist[2], a.popularity = artist[3]
        """, artists=list(artists))

        session.run("""
            UNWIND $genres AS genre
            MERGE (g:Genre {name: genre})
        """, genres=list(genres))

        session.run("""
            UNWIND $countries AS country
            MERGE (c:Country {name: country})
        """, countries=list(countries))

        # Create relationships
        session.run("""
            UNWIND $artist_genre_relations AS rel
            MATCH (a:Artist {spotify_id: rel[0]})
            MATCH (g:Genre {name: rel[1]})
            MERGE (a)-[:HAS_GENRE]->(g)
        """, artist_genre_relations=list(artist_genre_relations))

        session.run("""
            UNWIND $artist_country_relations AS rel
            MATCH (a:Artist {spotify_id: rel[0]})
            MATCH (c:Country {name: rel[1]})
            MERGE (a)-[:HAS_HIT {hits: rel[2]}]->(c)
        """, artist_country_relations=list(artist_country_relations))

        session.run("""
            UNWIND $featurings AS rel
            MATCH (a:Artist {spotify_id: rel[0]})
            MATCH (b:Artist {spotify_id: rel[1]})
            MERGE (a)-[:FEATURING]->(b)
        """, featurings=list(featurings))

# Call the function to create nodes and relationships
create_nodes_and_relationships()
