Skip to content
This repository has been archived by the owner. It is now read-only.


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Project Discontinued

For additional context see:

Visualize CC Catalog Data


The landscape of openly licensed content is wide and varied. Millions of web pages host and share CC-licensed works—in fact, we estimate that there are over 1.6 billion across the web! With this growth of CC-licensed works, Creative Commons (CC) is increasingly interested in learning how hosts and users of CC-licensed materials are connected, as well as the types of content published under a CC license and how this content is shared. Each month, CC uses Common Crawl data to find all domains that contain CC-licensed content. This dataset contains information about the URL of the websites and the licenses used.

In order to draw conclusions and insights from this dataset, we created the Linked Commons: a visualization that shows how the Commons is digitally connected.

A live demo of the project can be found in here

Getting Started

Directory Structure

│   docker-compose.yml # Development docker compose
└───data-release # Contains some raw unprocessed tsv files and processed output JSON files
└───frontend # Contains react.js app to render the visualization in the browser.|   .env # Contains Backend Server Base Endpoint
│  │   package.json
│  │   package.lock.json
│  │
│  └───src # Contains all React Components
└───backend # Includes Django server source code and scripts to build & update the database. 
   │   requirements.txt
   │   .env # Contains list of environment variables the project needs
   └───scripts # Contains scripts to parse JSON data and upload it to MongoDB server
   └───src # Contains server side Django Apps which defines the API that feeds data to the visualization 

Setting Up Local Development Environment Without Docker


The frontend application is using react, for which NodeJS v12+ and npm are necessary. NodeJS can be installed from here.

The backend application is using Django, for which Python v3.7+ necessary. Python can be installed from here.


  1. Navigate to frontend/ directory.
cd frontend/
  1. Install all dependencies (Make sure that there exists a package.json in the current path)
npm install
  1. To start the development server, use the following command in the terminal.
npm start
  1. To create an optimized build for production, run the following command in the terminal.
npm run build

Backend and Database

  1. Navigate to backend/ directory.
cd backend/
  1. Before proceeding further, ensure that all the variables in .env file are updated and MONGO_HOSTNAME is set to localhost:27017.
  2. Install all dependencies
pip install -r requirements.txt
  1. Navigate to src/ directory where Django-server code exists
cd src/
  1. To start the development server, use the following command
python runserver
  1. Now the backend should be live at localhost:8000.
  2. The server needs a running instance of MongoDB. Start the Mongo DB server and ensure that the authentication credentials are exactly same as defined in the .env file. If you wish to update the data inside the Database, head over to this section.
  3. Happy Contributing to Linked Commons! 🚀🚀🚀

Setting Up Local Development Environment using Docker

  1. Make sure that the root directory contains docker-compose.yml. And ensure that the backend/.env file is updated and MONGO_HOSTNAME is set to mongodb:27017.
  2. Run the following command to build and start the container.
docker-compose up
  1. Now the frontend, backend and database should be live.
  2. If this is the first time you have built the container, head over to this section to learn how to add data to the MongoDB.
  3. Any changes in the backend/ and frontend/ will trigger a rebuild process and you will be able to see the changes on server!
  4. Happy Contributing to Linked Commons! 🚀🚀🚀

Building production version

Important: For simiplicity we will be using docker to build the production version. Please note that any changes in project files after build won't get reflected in the running container and you need to rebuild the image again.

  1. Before building images, ensure that all the variables in .env file are updated and MONGO_HOSTNAME is set to mongodb:27017.
  2. Now, navigate to backend and then build the django-backend image.
cd backend/
docker build . -f -t linked_commons/backend
  1. Create a new user-defined bridge network
docker network create --driver=bridge linkedcommons-net
  1. Now run the recently built linked_commons/backend image.
docker run --name backend \
   -p 8000:8000 --env-file ./env \
   --network=linkedcommons-net \
   --rm -d linked_commons/backend
  1. Now to start the database in an isolated container.
docker run -it --name mongodb \
   --network=linkedcommons-net \
   -p 27017:27017 -v mongodbdata:/data/db \
   --env-file ./.env --rm -d mongo:4.0.8
  1. You can now access the backend at port 8000 and database at port 27017 of localhost. If you wish to add data then head over to this section.

  2. Now, let's build the frontend. Navigate to frontend directory and build the react-frontend image.

cd frontend
docker build . -f  -t  linkedcommons/frontend
  1. Now to start the frontend application run the following command.
docker run --name frontend \
   -p 3000:80 --rm -d linkedcommons/frontend
  1. Now, the frontend can be accessed at localhost:3000.

Add data to MongoDB

  1. Navigate to the directory containing
cd backend/scripts
  1. Ensure that the directory contains fdg_input_file.json or update the INPUT_FILE_PATH variable which will be uploaded to the database. A sample fdg_input_file.json can be found inside data-release/ directory.
  2. Ensure that all the variables in .env file are updated with the running mongodb server.
  3. Now run the build_db_script in the terminal.
# It will connect to the database at `localhost:27017` and update the data. 
python localhost
  1. It should take a while depending on the JSON file size.
  2. Congrats! You have successfully updated the data. 🎉🎉🎉


GSoC2019 - Google Summer of Code project by María Belén Guaranda


Data visualizations of CC-licensed works across the internet.




Code of conduct