Skip to content

A tool to predict the sequencing library type from the base composition of a supplied FastQ file.

License

Notifications You must be signed in to change notification settings

ChristelKrueger/Librarian

 
 

Repository files navigation

Librarian

A tool to predict the sequencing library type from the base composition of a supplied FastQ file.

Reads from high throughput sequencing experiments show base compositions that are characteristic for their library type. For example, data from RNA-seq and WGBS-seq libraries show markedly different distributions of G, A, C and T across the reads. Librarian makes use of different composition signatures for library quality control: Test library compositions are extracted and compared against previously published data sets from mouse and human.

Please note that composition signatures from other species may vary significantly due to different overall GC content.

To that end, Librarian produces several plots to help identify library types. For example, it produces the following given the bisulfite and RNA example files:

  • Compositions Map: UMAP representation of compositions of published sequencing data. Different library types are indicated by colours. Compositions of test libraries are projected onto the same manifold and indicated by light green circles.

Compositions_map-2022-08-15-13-31

  • Probability Maps: This collection of maps shows the probability of a particular region of the map to correspond to a certain library type. The darker the colour, the more dominated the region is by the indicated library type. The location of test libraries is indicated by a light blue circle.

Probability_maps-2022-08-15-13-31

  • Prediction Plot: For each projected test library, the location on the Compositions/Probability Map is determined. This plot shows how published library types are represented at the same location.

Prediction_plot-2022-08-15-13-31

How to interpret: Some regions on the map are very specific to a certain library type, others are more mixed. Therefore, for some test libraries the results will be much clearer than for others. The different plots are intended to provide a good overview of how similar the test library is to published data. The cause of any deviations should be inspected; the interpretation will be different depending on how characteristic the composition signature of the library type and how far off the projection of the test sample is.

You can try Librarian at the Babraham Institute website, run a tool to query samples from the command-line, or set up the server yourself.

Folder Structure

  • frontend contains code for the website, which consists of the webpage and WebAssembly code responsible for extracting base compositions from given files. Extracted base compositions are sent to the server for plotting.
  • server contains code for the server, which serves the frontend and also responds to plotting requests.
  • cli is a utility program to send queries to the server from the command line.

Attribution:

Associated repositories:

About

A tool to predict the sequencing library type from the base composition of a supplied FastQ file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 38.5%
  • R 21.3%
  • JavaScript 13.7%
  • HTML 9.0%
  • Dockerfile 6.0%
  • Shell 4.3%
  • Other 7.2%