URL-Graph is a Neo4j project designed to store and manage relationships between URLs. This graph database allows you to model and query the connections between different web addresses, providing valuable insights into the structure of your web data.
In the world of web data, understanding relationships between URLs is crucial. The URL-Graph project leverages the power of Neo4j to create a graph representation of these relationships, enabling easy navigation and analysis.
- Graph Database: Utilize Neo4j's powerful graph database to model and store URL relationships.
- Cypher Queries: Leverage the expressive Cypher query language to extract valuable insights from the graph.
Before you begin, ensure you have the following prerequisites installed:
- Neo4j Database Download Neo4j
- Python Download Python
- Clone the repository:
git clone https://github.com/KingAkeem/url-graph.git
- Install dependencies:
cd url-graph
pip install -r requirements.txt
Update the configuration file with your Neo4j connection details (config.yml):
neo4j:
uri: bolt://localhost:7687
username: your-username
password: your-password
-
Start the Neo4j database, this will be based on the OS that you're using. Check Neo4j instructions for further explanation.
-
Execute the application
python main.py -u https://www.example.com -d 3 # -u/--url to specify URI and -d/--depth to specify depth of graph
Will dockerize project at some point.
Browser URL: http://localhost:7474/browser/
// Example Cypher Query to find relationships for a specific URL
MATCH (n:Node {url: 'https://example.com'})
-[relationship:parent]-()
RETURN n, relationship;