Skip to content

GSOC 2024 ideas list

Oliver Kopp edited this page Apr 8, 2024 · 40 revisions

JabRef in Google Summer of Code 2024

We strongly believe in open source and provide interaction with a diverse community. JabRef aims to provide a welcoming experience to open source newcomers. We have three years of Google Sommer of Code (GSoC) participation with great results. All of them are huge steps towards a well-usable research tool.

Participants will grow their technical, coding, and their open source experience. They will also receive a stipend by Google. Finally, participants will expand their professional network.

Below, there are some project ideas to serve as start what could be done within a GSoC project. First some links for some more background information.

Links

(All summarized information is tentative. The definitive information is on the linked pages)

Projects

This page lists a number of ideas for potential projects to be carried out by the persons participating in Google Summer of Code 2024. This is by no means a closed list, so the possible contributors can feel free to propose alternative activities related to the project (the list of feature requests and the GitHub issue tracker might serve as an additional source of inspiration). Students are strongly encouraged to discuss their ideas with the developers and the community to improve their proposal until submission (e.g., using the Gitter Channel or the forum). It's also a good idea to start working on one of the smaller issues to make yourself familiar with the contribution process.

Improve handling of ancient documents by OCR and AI

JabRef, a comprehensive literature management software, currently supports both handling metadata and text-based PDF documents. However, a significant limitation arises with scanned PDFs, particularly historical articles, which are not text-searchable due to their image-based format. This project aims to bridge this gap by integrating advanced OCR (Optical Character Recognition) technology, enabling full-text search in scanned PDFs.

Useful links:

Some aspects:

  1. Add an option to call an OCR engine from JabRef, e.g., cloud based or local installs
  2. Define a common interface to support multiple OCR engines
  3. Provide a good default set of settings for the OCR engines
  4. Support expert configuration of the settings
  5. Add the extracted text as a layer to the pdf so that lucene can parse it
  6. Add an option to further process the text with Grobid for training and metadata extraction

Expected outcome:

A) Develop a common interface within JabRef to accommodate multiple OCR engines, ensuring flexibility and expandability. B) Enable expert users to fine-tune OCR settings, catering to specific needs or document formats.
C) Incorporate the OCR-extracted text as a searchable layer in PDFs, allowing Apache Lucene to index and look for the content.

Skills required:

  • Proficiency in Java programming.
  • A keen interest and curiosity in document processing and AI technologies.

Possible mentors:

@Siedlerchr, @koppor

Project size:

175h (medium)

AI-Powered Summarization and "Interaction" with Academic Papers

This project aims to revolutionize the way researchers interact with academic literature in JabRef, utilizing the power of Artificial Intelligence (AI) to enhance user experience and efficiency. The goal is to implement an AI feature allowing users to request a) summaries of PDF documents directly within JabRef and b) ask questions based on the "knowledge" inside the local PDFs. Ideally, the solution should work locally without any external Cloud service.

More ideas: Support ChatGPT-powered search. See https://oa.mg/chatgpt.

Useful links:

remote:

local:

other:

Popular libraries/frameworks/applications that have been considered, but that don't offer relevant functionality as Rest API or Java bindings:

Expected outcome:

Phase 1 (90h): Develop a module to connect JabRef with configurable online AI services that can generate summaries of academic papers and answer questions. Ensure this feature is user-friendly, allowing for seamless interaction (summary, asking questions) and customization according to user preferences. It has to be possible to ask questions covering selected (or even all) PDF files of a local library (.bib file with attached .pdf files).

Phase 2 (+90h): Develop a module to connect JabRef a local AI service that can generate summaries of academic papers and answer questions. Ensure this feature is user-friendly, allowing for seamless interaction (summary, asking questions) and customization according to user preferences. There must not be any remote connection. It has to be possible to ask questions covering all PDF files of a local library (.bib file with attached .pdf files).

Possible Mentors:

@koppor, @Siedlerchr, @ThiloteE

Project size:

  • Phase 1 only: 175h (medium)
  • Phases 1 and 2: 350h (large)

Welcome Walkthrough

This project aims to create an engaging and informative first start screen for JabRef, enhancing the initial user experience and showcasing the best features of the software. This screen will differ from the standard interface displayed when no database is open, providing a tailored introduction for new users.

Hints

  1. Configuration of Paper Directory: - Implement a feature allowing users to easily set up and manage their paper directory, as detailed in Issue #41.
  2. Integration of Online Services: - Include options for update checks, connecting with online services like Grobid (referencing Issue #566), fetchers, and full-text search capabilities.
    • Incorporate telemetry features with a clear and concise privacy statement.
  3. Creation of Example Library: - Develop a feature to create an example library, helping new users quickly understand JabRef's functionality.
  4. Community Engagement Tools: - Add links to the JabRef forum for support and Mastodon for community interaction.
  5. Donation Prompt:- Encourage support for JabRef through a tastefully integrated donation option.
  6. User Group-Specific Defaults: - Offer pre-configured default preferences catering to different user groups, such as "relaxed users" wanting all features, and "pro-users" who prefer managing BibTeX files without additional features (as per Issue #9491).

(These are just ideas, during the project, this needs to be refined)

Expected Outcome:

A welcome dialog with nice and welcoming UX

Examples:

  1. The welcome dialog should ask for: Configuration of Paper Direction, Integration of Online Services (Grobid, Telemetry), Creation of Example Library, Community Engagement Tool, Link to Donation page
  2. The welcome dialog should offer some sensitive User Group-Specific Defaults: Offer pre-configured default preferences catering to different user groups, such as "relaxed users" wanting all features, and "pro-users" who prefer managing BibTeX files without additional features (as per Issue #9491).

Skills required:

  • Java, JavaFX

Possible Mentors:

@koppor, @tobiasdiez

Project size:

  • 175h (medium)

Improved SLR Support

Description:

With the ever-growing number of publications in computer science and other fields of research, conducting secondary studies becomes necessary to summarize the current state of the art. For software engineering research, Kitchenham popularized the systematic literature review (SLR) method to address this issue. The main idea is to systematically identify and analyze the majority of relevant publications on a specific topic. This is usually an activity that takes extensive manual effort. Some tool support does exist, but the full potential of tools has not been exploited yet. JabRef also offers basic functionality for systematic literature reviews that is used by a number of researchers to systematically "harvest" related work based on the fetching capabilities of JabRef. While using the feature, various additional feature requests came up. For instance, created search queries are currently transformed internally by JabRef to the query format of the publisher. It should also be possible to directly input a query at the publisher site, e.g., for IEEE or ACM. More information: Dominik Voigt, Oliver Kopp, Karoline Wild: Systematic Literature Tools: Are we there yet? ZEUS 2021: 83-88

One key aspect would be the improvement of the fetcher Infrastructure in JabRef to better adapt to new and changing Publisher/Journal websites and to offer a more direct integration. As an inspiration, see BibDesk

Expected outcome:

An advanced SLR functionality, where a researcher is supported to execute a systematic-literature-review.

We did an initial project organization at https://github.com/users/koppor/projects/2.

Skills required:

  • Java, JavaFX

Possible mentors:

@koppor, @Siedlerchr, @calixtus

Project size: 350h (large) - Can also stripped down to medium.

Improved CSL Support (and more LibreOffice-JabRef improvements)

Description:

JabRef can connect to LibreOffice to offer premier reference management for LibreOffice. Currently, custom styles are supported. In this project, this support should be extended to offer support for the "Citation Style Language" files. A user should be able to choose the CSL style for the reference list and the citation style. Then, the LibreOffice document should adapt accordingly. For more information on CSL refer to https://citationstyles.org/. [Details: #8893]

In the LaTeX-world, .bst is still popular. JabRef has BST support, but currently not visible in the UI. In LibreOffice, it should be possible to select a .bst file, which is then used for rendering. [Details: #624]

The internal format of references is currently a JabRef-custom format. It should be changed to a format used by Zotero. See the discussion at https://github.com/JabRef/jabref/issues/2146#issuecomment-891432507 for details. This includes: i) implementation of that format, ii) implementation of a converter from the "old" JabRef-Format to the new one. The converter could be implemented within OpenOffice (similar to JabRef_LibreOffice_Converter).

Finally, one can work on improving the JabRef-LibreOffice-Plugin. See https://github.com/JabRef/jabref/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3Aopenoffice%2Flibreoffice for ideas. For instance, it should be possible to have footnote-based citations (see https://docs.jabref.org/cite/openofficeintegration#known-issues).

Expected outcome:

  • It is possible to select and change a CSL style for a LibreOffice document.
  • It is possible to select a .bst files
  • Internal format of citations changed to Zotero-Format

Possible Mentors:

@koppor, @Siedlerchr, @calixtus

Project size:

  • 90h (small) (if only CSL style selection and work on Zotero format)
  • 175h (medium) (CSL + .bst + Zotero + other issues fixed)

{Your own project}

You can propose another projects. JabRef offers a variaty of places where it can be improved. Think as user or talk to other users. Following places are a good start: