Skip to content

GSoC 2024

Nikita Manovich edited this page Mar 21, 2024 · 25 revisions

CVAT Google Summer of Code 2024

GSoC 2024 Homepage

CVAT accepted projects


Date Description Comment
February 6, 2024 Mentoring organization application deadline done
April 2, 2024 GSoC contributors submit their proposals through the program website in progress
April 23, 2024 Review all submitted GSoC Contributor proposals new
April 24, 2024 Deadline to submit ranked slot requests (Org Admins enter requests) new
April 29, 2024 Google Program Admins review and assign org slots new
April 30, 2024 Organizations receive notification of their accepted GSoC 2024 Contributors new
May 1, 2024 Accepted GSoC 2024 GSoC Contributor projects are announced new
May 1 - 26, 2024 Community Bonding Period new
May 25, 2024 Deadline to notify Google Admins of an inactive GSoC Contributor new
May 27, 2024 Coding begins new

Resources

CVAT project ideas list

Mailing list to discuss: cvat-gsoc-2024 mailing list

Index to Ideas Below

  1. Load and visualize 16-bit medical images
  2. Keyboard shortcuts customization
  3. Quality control: consensus
  4. Internationalization and localization
  5. Enhanced multi-object tracking
  6. Annotate everything automatically
  7. API keys and token-based auth for SDK and CLI
  8. Quality control: general improvements
  9. Add or extend support for import-export formats (e.g. YOLOvN)

Idea Template

All work is in Python and TypeScript unless otherwise noted.


Ideas

  1. IDEA: Load and visualize 16-bit medical images

    • Description: All digital projection X-ray in DICOM is more than 8 bits and hence encoded in two bytes, even if not all 16 bits are used. Right now CVAT converts 16-bit images into 8-bit. For medical images it leads to losing important information and it isn't possible to annotate such data efficiently. A doctor should adjust the contract of some regions manually to annotate such visual data.
    • Expected Outcomes:
      • Upload digital projection X-ray in DICOM and convert it to 16-bit PNG.
      • Visualize 16-bit PNG image in the browser using WebGL.
      • Implement brightness, inverting, contract, saturation using WebGL.
      • Import/Export datasets in CVAT format.
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, TypeScript, WebGL
    • Possible Mentors: Boris Sekachev
    • Difficulty: Hard
    • Duration: 350 hours
  2. IDEA: Keyboard shortcuts customization

    • Description: In many case to have good data annotation speed users need to use mouse, keyboard, and other input devices effectively. One way is to customize keyboard shortcuts and adapt them for a specific use case. For example, if you have several labels in your task, it can be important to assign a shortcut for each label and use them to switch quickly between them and annotate faster. Other users want to lock/unlock an object quickly.
    • Expected Outcomes:
      • It should be possible to configure shortcuts in settings and save them per user.
      • Add functional tests and documentation
    • Resources:
    • Skills Required: TypeScript, React
    • Possible Mentors: Maria Khrustaleva, Kirill Lakhov
    • Difficulty: Medium
    • Duration: 175 hours
  3. IDEA: Quality control: consensus

    • Description: If you use crowd to annotate an image, the easiest way to get high quality annotations for a task is to annotate the same image multiple times. After that you can compare labels from multiple annotators to produce high-quality results. Let's say you try to estimate age of people. The task is very subjective. An averaged answer from multiple annotators can help you predict more precise age for a person.
    • Expected Outcomes:
      • It should be possible to create multiple jobs for the same segment of images (https://github.com/opencv/cvat/issues/125)
      • Support a number of built-in algorithms to merge annotations for a segment: voting, averaging, raw (put all annotations as is)
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, Django
    • Possible Mentors: Maxim Zhiltsov, Maria Khrustaleva
    • Difficulty: Medium
    • Duration: 350 hours
  4. IDEA: Internationalization and localization

    • Description: Typical users of CVAT are data annotators from different countries without good knowledge of English. It is very difficult for them to work with a tool which cannot show them messages, hints on their native language. The goal of internationalization and localization is to allow a single web application to offer its content in languages and formats tailored to the audience.
    • Expected Outcomes:
      • CVAT supports one more language. It should be easy to add a new language for a non-technical person.
      • It should be possible to choose a language in UI (e.g., en/fr).
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, TypeScript
    • Possible Mentors: Andrey Zhavoronkov, Kirill Lakhov
    • Difficulty: Hard
    • Duration: 350 hours
  5. IDEA: Enhanced multi-object tracking

    • Description: Computer Vision Annotation Tool supports tracks (aka objects that detect something on a range of frames, e.g. a person, walking on a videofile). It would be nice to develop a feature to track a segmentation mask automatically with using modern deep learning approaches. Now the tool only supports single-object trackers. It consumes huge time when users run tracker for many objects. Moreover it supports only bounding boxes and can't be used for more complex objects (e.g. polygons or binary masks).
    • Expected Outcomes:
      • User uploads a video to CVAT, initiates automatic tracking process through the user interface (by drawing a bounding box, or polygon around the object, or pressing a dedicated button). Server side algorithm performs tracking on multiple frames and returns result to client. So, labeling speed is accelerated significantly.
    • Resources:
    • Skills Required: Python, Computer Vision, Neural Networks, TypeScript
    • Possible Mentors: Boris Sekachev, Nikita Manovich
    • Difficulty: Medium
    • Duration: 175 hours
  6. IDEA: Annotate everything automatically

    • Description: The feature suggests an idea to get instance segmentation for an image automatically for a wide range of classes. That may be achieved by using state-of-the art deep learning approaches (e.g. Grounding DINO and Segment Anything collaboration). These models may be integrated into CVAT to provide powerful feature for automatic annotation. It will allow data researchers to accelerate their annotation speed.
    • Expected Outcomes:
      • User uploads set of images to CVAT. For a dedicated image user may give text prompt to the model or just click a button in the user interface to get automatica predictions. A deep learning model is running on server on GPU.
    • Resources:
    • Skills Required: Python, Computer Vision, Neural Networks, TypeScript
    • Possible Mentors: Boris Sekachev, Nikita Manovich
    • Difficulty: Medium
    • Duration: 175 hours
  7. IDEA: API keys and token-based auth for SDK and CLI

    • Description: Currently, the only official way to authorize in SDK/CLI is by providing your username and password in the requests. This approach works, however it has security issues. The idea is to provide an option for a user to generate and manage API access keys. Such a key could be used as a replacement for the login/password pair.
    • Expected Outcomes:
      • Users can generate API access tokens in the account settings in UI
      • Users can revoke existing API access tokens in the account settings
      • Users can call API endpoints providing API access tokens
      • A token can be stored in the user profile files on their computer
      • A token can be used for auth in SDK/CLI
    • Resources:
    • Skills Required: Python, Django, Typescript, React
    • Possible Mentors: Maxim Zhiltsov, Roman Donchenko, Andrey Zhavoronkov
    • Difficulty: Medium
    • Duration: 175 hours
  8. IDEA: Quality control: general improvements

    • Description: CVAT supports basic quality measurements for the tasks. But there are many ways it can be improved, both user-requested and our own ideas. This includes, but is not limited to:
      • an option to launch quality computation explicitly from the UI (currently it's only computed periodically in background)
      • better display for the computed metrics - e.g. showing confusion matrix in UI (and suggesting possible problems and fixes - top bad images per error type, per class etc.), per-annotation type metrics
      • better display for the available settings - more clear descriptions for the specific settings, visualizations for the parameters (e.g. OKS sigma, IoU threshold)
      • quality computation for projects (currently, only tasks and jobs are supported)
      • more metrics for computation (different tasks may have different targets - MCC, F1, pixel-level metrics for segmentation etc.)
      • more convenient display and navigation for quality conflicts, filtering
      • UI for manual selection of Ground Truth frames (currently, only random selection is available in UI)
    • Expected Outcomes:
      • Functionality is implemented within the selected scope (as discussed with the participant)
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Typescript, React, Python, Django, Machine Learning, Statistics
    • Possible Mentors: Maxim Zhiltsov, Maria Khrustaleva, Boris Sekachev, Kirill Lakhov
    • Difficulty: Medium
    • Duration: 350 hours
  9. IDEA: Add or extend support for import-export formats (e.g. YOLOvN)

    • Description: CVAT already supports a number of popular dataset formats, but some of the more recent ones can be missing. An example of such is YOLOv8, which is popular at the moment and was requested by the community several times.
    • Expected Outcomes:
      • The format is available for import and export in CVAT
      • Format documentation is available online
    • Resources:
    • Skills Required: Python, Deep Learning, Computer Vision
    • Possible Mentors: Maxim Zhiltsov
    • Difficulty: Easy
    • Duration: 175 hours

Idea Template

1. #### _IDEA:_ <Descriptive Title>
   * ***Description:*** 3-7 sentences describing the task
   * ***Expected Outcomes:***
      * < Short bullet list describing what is to be accomplished >
      * <i.e. create a new module called "bla bla">
      * < Has method to accomplish X >
      * <...>
   * ***Resources:***
         * [For example a paper citation](https://arxiv.org/pdf/1802.08091.pdf)
         * [For example an existing feature request](https://github.com/opencv/cvat/pull/5608)
         * [Possibly an existing related module](https://github.com/opencv/cvat/tree/develop/cvat/apps/opencv) that includes OpenCV JavaScript library.
   * ***Skills Required:*** < for example mastery plus experience coding in Python, college course work in vision that covers AI topics, python. Best if you have also worked with deep neural networks. >
   * ***Possible Mentors:*** < your name goes here >
   * ***Difficulty:*** <Easy, Medium, Hard>
   * ***Duration:*** <90, 175 or 350 hours>

Potential mentors list

Nikita Manovich
Boris Sekachev
Maxim Zhiltsov
Roman Donchenko
Andrey Zhavoronkov
Maria Khrustaleva
Kirill Lakhov

Admins

Nikita Manovich
Boris Sekachev