Skip to content

ccb-hms/hmc-clinic-2022-23

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leveraging Relational Databases for Spatial Transcriptomics

This is the repository for the 2022-2023 Harvey Mudd Clinic Team in collaboration with Harvard Center for Computational Biomedicine.

Table of Contents

  1. Description
  2. Getting Started
    1. Dependencies
  3. Assignment 1
  4. Assignment 2
  5. Assignment 3
  6. Assignment 4
  7. Assignment 5
  8. Assignment 6
  9. Authors
  10. Acknowledgements

Description

At Harvard CCB, researchers are pioneering the study of various biological and spatial genomic datasets using computational methods. These high-resolution biological datasets collected using imaging techniques can be quite large. Most workflows involve mainly Python and R, which cannot be effectively used to analyze such memory-intensive datasets. We aim to leverage relational database queries in SQL to improve scalability, add flexibility to analyze larger datasets, and eventually find additional underlying spatial relationships in the original data.

Getting Started

Dependencies

To run our scripts and follow along with our process, you'll need to have the following installed.

  • Python
  • Some Python packages:
    • pandas
    • tqdm
  • Azure Data Studio
  • Git
  • Docker

Assignment 1

Assignment 1 is an introduction to SQL Server consisting of a Coursera course on Relational Databases and a few corresponding exercises.
For a breakdown of each step in assignment 1, see the assignment 1 README.

Assignment 2

Assignment 2 focuses on a few exercises with queries in SQL Server in order to gain practice in using the tools we learned about in assignment 1. The assignment uses some flight data and asks us to use queries to find information such as which plane logged the most flight miles.
For a breakdown of each step in assignment 2, see the assignment 2 README.

Assignment 3

Assignment 3 consists of two subtasks: the first to read and present on recent reviews in spatially-resolved omics profiling, and the second to practice working with spatial omics data in SQL Server. This repository will focus only on the second subtask.
For a breakdown of each step in this subtask of assignment 3, see the assignment 3 README.

Assignment 4

Assignment 4 serves as a transition into working with spatial data. We are tasked with analyzing two tables: one containing weather data along iwht latitude and longitude of the weather station, and one containing geographical information. Our goal was to answer questions such as the windiest stations in Massachusetts, or the rainiest statin in Washington, by performing spatial intersect queries on the tables.
For a breakdown of each step of assignment 4, see the assignment 4 README.

Assignment 5

Assignment 5 finally brings our attention to spatial transcriptomics data in SQL Server. We are given multiple subtasks, such as creating a new gene-cell-molecule count table, reshaping that table into a gene expression matrix, and creating convex hulls around every molecule in a given cell.
For a breakdown of each step of assignment 5, see the assignment 5 README. You may also follow along in our assignment 5 notebook.

Assignment 6

Assignment 6 is a continuation of the ideas of Assignment 5, but with a significantly larger dataset of tissue images from 26 mice hypothalamuses. This dataset is currently not publically available but was provided for our use. With this larger dataset, we repeated the objectives of Assignment 5 on an institutional computer cluster: we created a molecule count table, and generated convex hulls around molecules belonging to cells in the first z-slice.
For a breakdown of each step of assignment 6, see the assignment 6 README. You may also follow along in our assignment 6 notebook.

Authors

Chris Couto

Alicia Lu

Elizabeth Lucas-Foley

Mads Mansfield

Acknowledgments

Tim Buchheim

Ludwig Geistlinger

Robert Gentleman

Rafael Goncalves

Tyrone Lee

Jeffrey Moffitt

Nathan Palmer

Sunil Poudel

Sam Pullman

Chris Stone

About

HMC clinic project 2022/23

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •