From 4ca5ae972bc2b99a2b02b08a4e218ced286b7fa9 Mon Sep 17 00:00:00 2001 From: Charlotte80 <74590698+Charlotte80@users.noreply.github.com> Date: Mon, 24 Feb 2025 05:17:56 +0800 Subject: [PATCH 1/6] Update index.md first commit - Workflow Zixuan's work summary --- intakes/11-Summer-2024-2025/index.md | 63 ++++++++++++++++++---------- 1 file changed, 40 insertions(+), 23 deletions(-) diff --git a/intakes/11-Summer-2024-2025/index.md b/intakes/11-Summer-2024-2025/index.md index 1e2c57f..651f19e 100644 --- a/intakes/11-Summer-2024-2025/index.md +++ b/intakes/11-Summer-2024-2025/index.md @@ -1,7 +1,7 @@ -#fix: links for Student Organiser PDF Coding Intake 11 - Summer 2024/2025 +# Intake 11 - Summer 2024/2025 This is the list of projects for this intake. Here you will see: -- Summary of the problem and the work done in the project +- Summary of the problem and the work done in the project - Links to the final presentation slides and/or video for the project - Links to the github repos that were part of this project - Links to other documentation, such as technical diary and other project documentation @@ -10,7 +10,7 @@ This is the list of projects for this intake. Here you will see: # REDMANE Capacity Planning -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -27,8 +27,8 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna # REDMANE Clinical Dashboards The challenge that we were trying to solve was how to utilize publicly available clinical metadata while ensuring patient privacy and addressing any potential security concerns while still making the data useful for research. As an example, sensitive data such as: medicare number, date of birth, location of residence, etc, are often included in clinical data. Therefore the solution is to artificially or 'synthetically' generate clinical data that replicates real world datasets. - -The way we tried to solve this was by developing code that renamed public clinical data files from cBioportal and randomly sampled a publicly available .fastq fille from genomeInABottle to generate the corresponding fastq files for each patient in the clinical data file. While these files aren’t real genome sequences, they are in the correct fastq format and can be used for other teams' data workflows. + +The way we tried to solve this was by developing code that renamed public clinical data files from cBioportal and randomly sampled a publicly available .fastq fille from genomeInABottle to generate the corresponding fastq files for each patient in the clinical data file. While these files aren’t real genome sequences, they are in the correct fastq format and can be used for other teams' data workflows. Not only did we learn about methods for generating synthetic clinical data, but we also gained valuable experience in writing clean, maintainable code that integrates seamlessly into larger team workflows. Since our work was being used by multiple intern teams, it was crucial to distribute data efficiently and document our code thoroughly. We quickly realized that clear cross-team communication and well-structured documentation were essential to the success of both our team and others. While we each improved our technical skills, we found that soft skills—such as collaboration and effective communication—were just as critical. Additionally, developing a strong understanding of the high-level context of our work significantly reduced redundancy and saved time, allowing us to make more informed decisions thorughout the internship. @@ -41,10 +41,10 @@ Not only did we learn about methods for generating synthetic clinical data, but - [Our Early High Level Understanding](https://wehieduau.sharepoint.com/:w:/r/sites/StudentInternGroupatWEHI/Shared%20Documents/Clinical%20Dashboards/2025%20Summer/Jordan%27s%20Understanding.docx?d=wf77f1ae9648b43559146572142039c8c&csf=1&web=1&e=FiVAIM) - [Weekly Updates]([https://wehieduau.sharepoint.com/:f:/s/StudentInternGroupatWEHI/Evmt-NPbZ09Lq7WXyYhKohsBdIVREOYRZ2ujDZ1Td6K3HA?e=OtLmDL](https://wehieduau.sharepoint.com/:w:/r/sites/StudentInternGroupatWEHI/_layouts/15/Doc.aspx?sourcedoc=%7B86A2D5F1-8DC5-4272-9ABB-77625B1B79A4%7D&file=Weekly%20Email%20Template.docx&action=default&mobileredirect=true) - + # REDMANE Data Ingestion -The challenge that we were trying to solve was ensuring that metadata uploaded to the REDMANE data registry and data portals (specifically cBioPortal) were formatted in standardised ways. Different points of data ingestion required different metadata formats. For example, each data portal has its own specific format for metadata, and without a streamlined way to generate these metadata files, users would struggle to verify and upload their data correctly. This lack of consistency could lead to errors in data ingestion and disorganisation in REDMANE’s database. +The challenge that we were trying to solve was ensuring that metadata uploaded to the REDMANE data registry and data portals (specifically cBioPortal) were formatted in standardised ways. Different points of data ingestion required different metadata formats. For example, each data portal has its own specific format for metadata, and without a streamlined way to generate these metadata files, users would struggle to verify and upload their data correctly. This lack of consistency could lead to errors in data ingestion and disorganisation in REDMANE’s database. We solved a part of this by developing a script for registering files onto the REDMANE data registry. This script scans a specified local directory, extracts relevant metadata, and compiles it into a JSON file summary to be uploaded to the registry. This JSON report ensures that the metadata is ingestible to the registry’s standard, which was designed in collaboration with the REDMANE Web Development team. We also looked at converting our JSON report into RO-Crate. @@ -68,7 +68,23 @@ While we gained technical insights into data organisation, including the importa # REDMANE Demo and Quality -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. + +The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. + +What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna ultricies diam volutpat faucibus. Sed feugiat placerat est nec scelerisque. Aenean a nisl sit amet ligula gravida fermentum eget in purus. Praesent a dui quis diam bibendum convallis vel in lacus. + +## Key links +- Final presentation slides (if supervisor agrees) +- Final presentation video (if supervisor agrees) +- GitHub repos +- Technical Diary +- Weekly Progress +- Project Management Tools + +# REDMANE Funding and Partnerships + +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -94,14 +110,14 @@ What we learned was that there are many approaches to deploying OMERO and config - Final presentation slides (if supervisor agrees) - Final presentation video (if supervisor agrees) - GitHub repos - - [Omero DataPortal](https://github.com/DBK333/Omero-DataPortal) -- Technical Diary - - [NectarVM Setup](https://wehieduau-my.sharepoint.com/:w:/g/personal/kasikumpaiboon_d_wehi_edu_au/EbuTOVm8MwNDrV3lIGVixukBgTxFFvSCCq3v-POA0LWpyA?e=MdpZDm) - - [Technical Diary Document](https://wehieduau-my.sharepoint.com/:w:/r/personal/kasikumpaiboon_d_wehi_edu_au/Documents/Microsoft%20Teams%20Chat%20Files/Data%20Portal%20Technical%20Diary(1).docx?d=wcd787f8fb8444df78c01881e89b00ea4&csf=1&web=1&e=pJsezK) + - [Omero DataPortal](https://github.com/DBK333/Omero-DataPortal) +- Technical Diary + - [NectarVM Setup](https://wehieduau-my.sharepoint.com/:w:/g/personal/kasikumpaiboon_d_wehi_edu_au/EbuTOVm8MwNDrV3lIGVixukBgTxFFvSCCq3v-POA0LWpyA?e=MdpZDm) + - [Technical Diary Document](https://wehieduau-my.sharepoint.com/:w:/r/personal/kasikumpaiboon_d_wehi_edu_au/Documents/Microsoft%20Teams%20Chat%20Files/Data%20Portal%20Technical%20Diary(1).docx?d=wcd787f8fb8444df78c01881e89b00ea4&csf=1&web=1&e=pJsezK) # REDMANE Web Dev -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -124,7 +140,7 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna # REDMANE Workflows -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -135,12 +151,13 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna - Final presentation video (if supervisor agrees) - GitHub repos - Technical Diary + - [Work Summary](https://wehieduau-my.sharepoint.com/:p:/r/personal/zhao_ch_wehi_edu_au/Documents/Work%20Summary.pptx?d=w3f5a83f7dcb444c0b5e84b955c1282ab&csf=1&web=1&e=oh3yJZ) - Weekly Progress - Project Management Tools # Student Organiser Data Visualisation -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -156,18 +173,18 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna # Student Organiser PDF Coding -The challenge that we were trying to solve was the time-consuming nature of reviewing internship applications. This involved downloading and sifting through numerous PDF resumes, extracting key information such as skills, experience, and education, and then comparing these across applicants. The process is not only slow but also prone to human error and can make it difficult to identify suitable candidates. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. -The way we tried to solve this was by creating a web-based application using HTML, CSS, and JavaScript. The application provides a user interface where PDF files can be uploaded, viewed, and highlighted. The intended goal for the site is to streamline the resume review process, by easily being able to comment on selected text, and add categories relating to it. Additionally, being able to connect this application directly to the student organiser in order to easily open resumes and look at previous comments is ideal. All tasks were intended to be completed using open-source tools and libraries. +The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. -What we learned was open-source software can either be of great help or a major obstacle. On one side, many open-source projects have communities built around them, so one might be able to find answers easily with just a Google search. On the other hand, it can be hard to find solutions to your specific problems because of outdated information or bad documentations. However, as they are open-source you can read through the source code to get your answers. +What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna ultricies diam volutpat faucibus. Sed feugiat placerat est nec scelerisque. Aenean a nisl sit amet ligula gravida fermentum eget in purus. Praesent a dui quis diam bibendum convallis vel in lacus. ## Key links -- [Final presentation slides](https://www.canva.com/design/DAGeA-99sw8/wu8bETDU_Ioi26iVPz-sQQ/view?utm_content=DAGeA-99sw8&utm_campaign=designshare&utm_medium=link2&utm_source=uniquelinks&utlId=h124da5d634) -- Final presentation video (tbc) -- [GitHub repo](https://github.com/WEHI-RCPStudentInternship/pdf-coder) +- Final presentation slides (if supervisor agrees) +- Final presentation video (if supervisor agrees) +- GitHub repos - Technical Diary -- [Weekly Progress](https://docs.google.com/document/d/11kn7avo8dtpY5Ho_D7bRSergjQzNMw7Q0_5i4YCggHg/edit?usp=sharing) +- Weekly Progress - Project Management Tools # Student Organiser RAG LLM @@ -189,7 +206,7 @@ What we learned was there are many moving parts to a RAG LLM, all of which can b # Quantum Computing -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. From 2e99fd8102ced76bc0eb421c6b0646e7dac4971c Mon Sep 17 00:00:00 2001 From: VitaChien <46022206+VitaChien@users.noreply.github.com> Date: Wed, 26 Feb 2025 13:38:13 +1100 Subject: [PATCH 2/6] move abstract here --- intakes/11-Summer-2024-2025/index.md | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/intakes/11-Summer-2024-2025/index.md b/intakes/11-Summer-2024-2025/index.md index 651f19e..0b69fb1 100644 --- a/intakes/11-Summer-2024-2025/index.md +++ b/intakes/11-Summer-2024-2025/index.md @@ -140,18 +140,30 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna # REDMANE Workflows -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to convert the raw data files to processed and summarised files. -The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. +The way we tried to solve this was: + +***1. Through Nextflow and Seqera on Milton HPC*** +1. Submit tickets to ask for access to Milton HPC and Seqera +2. Upload raw data to `/vast/scratch/users/yourname/` + any subfolder you desire +3. Generate access token on Seqera and fill it in the Nextflow Tower Agent page +4. Select `Sarek_344` from Lanchpad and launch it +5. Fill in Run setup configuration and lanch the pipeline + +***2. Through Galaxy*** +Galaxy is a user-friendly interface where a lot of bioinformatics tools are available and ready to use. +Use this link for more info: [] + +What we learned was -What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna ultricies diam volutpat faucibus. Sed feugiat placerat est nec scelerisque. Aenean a nisl sit amet ligula gravida fermentum eget in purus. Praesent a dui quis diam bibendum convallis vel in lacus. ## Key links - Final presentation slides (if supervisor agrees) - Final presentation video (if supervisor agrees) -- GitHub repos +- [GitHub repos] (https://github.com/VitaChien/WEHI_Workflow) - Technical Diary - - [Work Summary](https://wehieduau-my.sharepoint.com/:p:/r/personal/zhao_ch_wehi_edu_au/Documents/Work%20Summary.pptx?d=w3f5a83f7dcb444c0b5e84b955c1282ab&csf=1&web=1&e=oh3yJZ) + - [Work Summary](https://wehieduau-my.sharepoint.com/:p:/r/personal/zhao_ch_wehi_edu_au/Documents/Work%20Summary.pptx?d=w3f5a83f7dcb444c0b5e84b955c1282ab&csf=1&web=1&e=oh3yJZ) - Weekly Progress - Project Management Tools From 6a399e83f2f6fe3c4fc4ea27ed7f1cc03f925506 Mon Sep 17 00:00:00 2001 From: Zixuan Charlotte Zhao <74590698+Charlotte80@users.noreply.github.com> Date: Wed, 26 Feb 2025 11:12:00 +0800 Subject: [PATCH 3/6] Update index.md --- intakes/11-Summer-2024-2025/index.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/intakes/11-Summer-2024-2025/index.md b/intakes/11-Summer-2024-2025/index.md index 0b69fb1..f7d7c6a 100644 --- a/intakes/11-Summer-2024-2025/index.md +++ b/intakes/11-Summer-2024-2025/index.md @@ -155,14 +155,17 @@ The way we tried to solve this was: Galaxy is a user-friendly interface where a lot of bioinformatics tools are available and ready to use. Use this link for more info: [] -What we learned was +What we learned include reproducibility and scalability for the two platforms, compared barriers of entry, using both established tools to create workflows and trialing known pipelines. To be able to run Nextflow pipelines using Seqera we explored how to set up the environment on HPC, writing config files, and setting parameters. We've also did the conversion from command line manually using packages like bowtie2, samtools and bcftools. We've also learned how to effectively communicate across teams, sharing files as well as sourcing information about topics we knew less about in daily stand ups and co-working sessions. +We've validated the final output (.vcf files) by visualising it using IGV. ## Key links -- Final presentation slides (if supervisor agrees) +- Final presentation slides (if supervisor agrees?) + - [Final Presentation Slides](https://www.canva.com/design/DAGewhgcjlc/-93DtMN5HbugyU98Hr0V4A/edit) - Final presentation video (if supervisor agrees) -- [GitHub repos] (https://github.com/VitaChien/WEHI_Workflow) - Technical Diary + - [Tish's Galaxy Workflow](https://usegalaxy.org.au/u/tishtar/w/basic-conversions) + - [GitHub repos](https://github.com/VitaChien/WEHI_Workflow) - [Work Summary](https://wehieduau-my.sharepoint.com/:p:/r/personal/zhao_ch_wehi_edu_au/Documents/Work%20Summary.pptx?d=w3f5a83f7dcb444c0b5e84b955c1282ab&csf=1&web=1&e=oh3yJZ) - Weekly Progress - Project Management Tools From 6d3ac5a4b44bdb4292a4cf7b4b7aabbb23d39896 Mon Sep 17 00:00:00 2001 From: VitaChien <46022206+VitaChien@users.noreply.github.com> Date: Sat, 1 Mar 2025 10:48:37 +1100 Subject: [PATCH 4/6] Update Nextflow part. --- intakes/11-Summer-2024-2025/index.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/intakes/11-Summer-2024-2025/index.md b/intakes/11-Summer-2024-2025/index.md index f7d7c6a..28a5aa0 100644 --- a/intakes/11-Summer-2024-2025/index.md +++ b/intakes/11-Summer-2024-2025/index.md @@ -145,15 +145,10 @@ The challenge that we were trying to convert the raw data files to processed and The way we tried to solve this was: ***1. Through Nextflow and Seqera on Milton HPC*** -1. Submit tickets to ask for access to Milton HPC and Seqera -2. Upload raw data to `/vast/scratch/users/yourname/` + any subfolder you desire -3. Generate access token on Seqera and fill it in the Nextflow Tower Agent page -4. Select `Sarek_344` from Lanchpad and launch it -5. Fill in Run setup configuration and lanch the pipeline - +We learned about the common workflow used in the bioinformatics field, Nextflow, and gained experience with an open-source pipeline designed for variant mapping called nf-core/sarek. This pipeline allowed us to efficiently process WGS data by identifying genetic variants from sequencing datasets. Additionally, we deployed the pipeline on Seqera and executed it on Milton HPC. + ***2. Through Galaxy*** Galaxy is a user-friendly interface where a lot of bioinformatics tools are available and ready to use. -Use this link for more info: [] What we learned include reproducibility and scalability for the two platforms, compared barriers of entry, using both established tools to create workflows and trialing known pipelines. To be able to run Nextflow pipelines using Seqera we explored how to set up the environment on HPC, writing config files, and setting parameters. We've also did the conversion from command line manually using packages like bowtie2, samtools and bcftools. We've also learned how to effectively communicate across teams, sharing files as well as sourcing information about topics we knew less about in daily stand ups and co-working sessions. From 7fa0ef970477a3f04a5368461d02f9e99ea96f94 Mon Sep 17 00:00:00 2001 From: VitaChien <46022206+VitaChien@users.noreply.github.com> Date: Mon, 3 Mar 2025 09:00:11 +1100 Subject: [PATCH 5/6] Fix conflicts --- intakes/11-Summer-2024-2025/index.md | 38 ++++++++++++++-------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/intakes/11-Summer-2024-2025/index.md b/intakes/11-Summer-2024-2025/index.md index 5121eef..c10468f 100644 --- a/intakes/11-Summer-2024-2025/index.md +++ b/intakes/11-Summer-2024-2025/index.md @@ -1,7 +1,7 @@ # Student Organiser Intake 11 - Summer 2024/2025 This is the list of projects for this intake. Here you will see: -- Summary of the problem and the work done in the project +- Summary of the problem and the work done in the project - Links to the final presentation slides and/or video for the project - Links to the github repos that were part of this project - Links to other documentation, such as technical diary and other project documentation @@ -10,7 +10,7 @@ This is the list of projects for this intake. Here you will see: # REDMANE Capacity Planning -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -27,8 +27,8 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna # REDMANE Clinical Dashboards The challenge that we were trying to solve was how to utilize publicly available clinical metadata while ensuring patient privacy and addressing any potential security concerns while still making the data useful for research. As an example, sensitive data such as: medicare number, date of birth, location of residence, etc, are often included in clinical data. Therefore the solution is to artificially or 'synthetically' generate clinical data that replicates real world datasets. - -The way we tried to solve this was by developing code that renamed public clinical data files from cBioportal and randomly sampled a publicly available .fastq fille from genomeInABottle to generate the corresponding fastq files for each patient in the clinical data file. While these files aren’t real genome sequences, they are in the correct fastq format and can be used for other teams' data workflows. + +The way we tried to solve this was by developing code that renamed public clinical data files from cBioportal and randomly sampled a publicly available .fastq fille from genomeInABottle to generate the corresponding fastq files for each patient in the clinical data file. While these files aren’t real genome sequences, they are in the correct fastq format and can be used for other teams' data workflows. Not only did we learn about methods for generating synthetic clinical data, but we also gained valuable experience in writing clean, maintainable code that integrates seamlessly into larger team workflows. Since our work was being used by multiple intern teams, it was crucial to distribute data efficiently and document our code thoroughly. We quickly realized that clear cross-team communication and well-structured documentation were essential to the success of both our team and others. While we each improved our technical skills, we found that soft skills—such as collaboration and effective communication—were just as critical. Additionally, developing a strong understanding of the high-level context of our work significantly reduced redundancy and saved time, allowing us to make more informed decisions thorughout the internship. @@ -41,10 +41,10 @@ Not only did we learn about methods for generating synthetic clinical data, but - [Our Early High Level Understanding](https://wehieduau.sharepoint.com/:w:/r/sites/StudentInternGroupatWEHI/Shared%20Documents/Clinical%20Dashboards/2025%20Summer/Jordan%27s%20Understanding.docx?d=wf77f1ae9648b43559146572142039c8c&csf=1&web=1&e=FiVAIM) - [Weekly Updates]([https://wehieduau.sharepoint.com/:f:/s/StudentInternGroupatWEHI/Evmt-NPbZ09Lq7WXyYhKohsBdIVREOYRZ2ujDZ1Td6K3HA?e=OtLmDL](https://wehieduau.sharepoint.com/:w:/r/sites/StudentInternGroupatWEHI/_layouts/15/Doc.aspx?sourcedoc=%7B86A2D5F1-8DC5-4272-9ABB-77625B1B79A4%7D&file=Weekly%20Email%20Template.docx&action=default&mobileredirect=true) - + # REDMANE Data Ingestion -The challenge that we were trying to solve was ensuring that metadata uploaded to the REDMANE data registry and data portals (specifically cBioPortal) were formatted in standardised ways. Different points of data ingestion required different metadata formats. For example, each data portal has its own specific format for metadata, and without a streamlined way to generate these metadata files, users would struggle to verify and upload their data correctly. This lack of consistency could lead to errors in data ingestion and disorganisation in REDMANE’s database. +The challenge that we were trying to solve was ensuring that metadata uploaded to the REDMANE data registry and data portals (specifically cBioPortal) were formatted in standardised ways. Different points of data ingestion required different metadata formats. For example, each data portal has its own specific format for metadata, and without a streamlined way to generate these metadata files, users would struggle to verify and upload their data correctly. This lack of consistency could lead to errors in data ingestion and disorganisation in REDMANE’s database. We solved a part of this by developing a script for registering files onto the REDMANE data registry. This script scans a specified local directory, extracts relevant metadata, and compiles it into a JSON file summary to be uploaded to the registry. This JSON report ensures that the metadata is ingestible to the registry’s standard, which was designed in collaboration with the REDMANE Web Development team. We also looked at converting our JSON report into RO-Crate. @@ -101,10 +101,10 @@ What we learned was that there are many approaches to deploying OMERO and config - Whiteboard presentation - [Whiteboard presentation video](https://wehieduau.sharepoint.com/:u:/s/StudentInternGroupatWEHI/EVYobuKcom1Aqx_Cj7qawSsBd9UGhS7S_oQvg5zOkQaKxg?e=Jw8ndR) - GitHub repos - - [Omero DataPortal](https://github.com/DBK333/Omero-DataPortal) -- Technical Diary - - [NectarVM Setup](https://wehieduau-my.sharepoint.com/:w:/g/personal/kasikumpaiboon_d_wehi_edu_au/EbuTOVm8MwNDrV3lIGVixukBgTxFFvSCCq3v-POA0LWpyA?e=MdpZDm) - - [Technical Diary Document](https://wehieduau-my.sharepoint.com/:w:/r/personal/kasikumpaiboon_d_wehi_edu_au/Documents/Microsoft%20Teams%20Chat%20Files/Data%20Portal%20Technical%20Diary(1).docx?d=wcd787f8fb8444df78c01881e89b00ea4&csf=1&web=1&e=pJsezK) + - [Omero DataPortal](https://github.com/DBK333/Omero-DataPortal) +- Technical Diary + - [NectarVM Setup](https://wehieduau-my.sharepoint.com/:w:/g/personal/kasikumpaiboon_d_wehi_edu_au/EbuTOVm8MwNDrV3lIGVixukBgTxFFvSCCq3v-POA0LWpyA?e=MdpZDm) + - [Technical Diary Document](https://wehieduau.sharepoint.com/:w:/s/StudentInternGroupatWEHI/ETlszokWPf5CjFXIE2imaDMBq_HXhHTzAS7nailAALxErQ?e=spjzm2) # REDMANE Web Dev @@ -158,7 +158,7 @@ We've validated the final output (.vcf files) by visualising it using IGV. # Student Organiser Data Visualisation -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. @@ -174,18 +174,18 @@ What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna # Student Organiser PDF Coding -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was the time-consuming nature of reviewing internship applications. This involved downloading and sifting through numerous PDF resumes, extracting key information such as skills, experience, and education, and then comparing these across applicants. The process is not only slow but also prone to human error and can make it difficult to identify suitable candidates. -The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. +The way we tried to solve this was by creating a web-based application using HTML, CSS, and JavaScript. The application provides a user interface where PDF files can be uploaded, viewed, and highlighted. The intended goal for the site is to streamline the resume review process, by easily being able to comment on selected text, and add categories relating to it. Additionally, being able to connect this application directly to the student organiser in order to easily open resumes and look at previous comments is ideal. All tasks were intended to be completed using open-source tools and libraries. -What we learned was ... maximus metus id erat pharetra facilisis. Nullam ac urna ultricies diam volutpat faucibus. Sed feugiat placerat est nec scelerisque. Aenean a nisl sit amet ligula gravida fermentum eget in purus. Praesent a dui quis diam bibendum convallis vel in lacus. +What we learned was open-source software can either be of great help or a major obstacle. On one side, many open-source projects have communities built around them, so one might be able to find answers easily with just a Google search. On the other hand, it can be hard to find solutions to your specific problems because of outdated information or bad documentations. However, as they are open-source you can read through the source code to get your answers. ## Key links -- Final presentation slides (if supervisor agrees) -- Final presentation video (if supervisor agrees) -- GitHub repos +- [Final presentation slides](https://www.canva.com/design/DAGeA-99sw8/wu8bETDU_Ioi26iVPz-sQQ/view?utm_content=DAGeA-99sw8&utm_campaign=designshare&utm_medium=link2&utm_source=uniquelinks&utlId=h124da5d634) +- Final presentation video (tbc) +- [GitHub repo](https://github.com/WEHI-RCPStudentInternship/pdf-coder) - Technical Diary -- Weekly Progress +- [Weekly Progress](https://docs.google.com/document/d/11kn7avo8dtpY5Ho_D7bRSergjQzNMw7Q0_5i4YCggHg/edit?usp=sharing) - Project Management Tools # Student Organiser RAG LLM @@ -207,7 +207,7 @@ What we learned was there are many moving parts to a RAG LLM, all of which can b # Quantum Computing -The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. +The challenge that we were trying to solve was ... ipsum dolor sit amet, consectetur adipiscing elit. Phasellus sit amet turpis lacus. Morbi a risus sed nunc venenatis vehicula sed sit amet tortor. Integer leo metus, scelerisque quis gravida quis, laoreet sed nisl. Duis lacus diam, dapibus id orci nec, pellentesque sollicitudin arcu. Vestibulum auctor nec velit sit amet ornare. The way we tried to solve this was ... lorem mauris, ut suscipit dui porta at. Aenean elementum risus vel interdum condimentum. Nunc massa turpis, bibendum in leo vitae, dapibus cursus urna. Integer placerat lacinia finibus. Etiam vitae dolor ut tortor consectetur ultrices eu eget nisi. Cras neque massa, vestibulum id purus nec, aliquam molestie dolor. Vivamus sollicitudin, orci ut bibendum viverra, quam felis viverra purus, ut consectetur diam turpis quis eros. From 5e359a2e9a4e4765e835dbd4d0e49ab7ba2f5ff6 Mon Sep 17 00:00:00 2001 From: VitaChien <46022206+VitaChien@users.noreply.github.com> Date: Mon, 3 Mar 2025 09:02:13 +1100 Subject: [PATCH 6/6] Remove redundant space --- intakes/11-Summer-2024-2025/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intakes/11-Summer-2024-2025/index.md b/intakes/11-Summer-2024-2025/index.md index c10468f..57b4c09 100644 --- a/intakes/11-Summer-2024-2025/index.md +++ b/intakes/11-Summer-2024-2025/index.md @@ -104,7 +104,7 @@ What we learned was that there are many approaches to deploying OMERO and config - [Omero DataPortal](https://github.com/DBK333/Omero-DataPortal) - Technical Diary - [NectarVM Setup](https://wehieduau-my.sharepoint.com/:w:/g/personal/kasikumpaiboon_d_wehi_edu_au/EbuTOVm8MwNDrV3lIGVixukBgTxFFvSCCq3v-POA0LWpyA?e=MdpZDm) - - [Technical Diary Document](https://wehieduau.sharepoint.com/:w:/s/StudentInternGroupatWEHI/ETlszokWPf5CjFXIE2imaDMBq_HXhHTzAS7nailAALxErQ?e=spjzm2) + - [Technical Diary Document](https://wehieduau.sharepoint.com/:w:/s/StudentInternGroupatWEHI/ETlszokWPf5CjFXIE2imaDMBq_HXhHTzAS7nailAALxErQ?e=spjzm2) # REDMANE Web Dev