# Introduction to Amazon Q Developer

**Difficulty Level: Beginner**

### Overview

Amazon Q Developer is a generative artificial intelligence (AI) powered conversational assistant designed to enhance the 
software development process, particularly within the AWS ecosystem. Amazon Q can assist with various coding tasks such as providing inline code completions, generating new code, and scanning for security vulnerabilities. Amazon Q Developer can be accessed through an IDE such as VSCode and JupyterLab or through the command line. 

The model that Amazon Q utilizes has been supplemented with the high-quality AWS content, allowing users to ask questions
about AWS architecture, AWS resources, best practices, documentation, and support. 


### Learning Objectives

* Understand the capabilities of Amazon Q Developer and learn how to access the tool through an IDE 
* Engage with Amazon Q Developer and ask code-related queries
* Implement inline code completions 
* Generate new code snippets based on specific requirements 

### Prerequisites

To complete this tutorial, you will need access to SageMaker Studio. 
Alternatively, you may install Amazon Q Developer in an alternative IDE such as VSCode or JupyterLab. 

### Installing Amazon Q Developer on SageMaker Studio
1. Navigate to SageMaker Studio and create a domain

Note: In order to make Amazon Q Developer available within the JupyterLab, you will need to modify the IAM permissions associated with the SageMaker ExecutionRole. You will find the domain name in your SageMaker Studio launch page. 


2. Navigate to IAM and search for the execution role pertaining to your SageMaker domain. The execution role will be named AmazonSageMaker-ExecutionRole-(SageMaker domain id)


4. Click on the role and scroll down to the role policies

![alt text](../../Q-IAM-role.png)

![alt text](../../Q-role-policy.png)

5. Add the following policy to the IAM role: 

```JSON
		{
			"Effect": "Allow",
			"Action": [
				"q:SendMessage"
			],
			"Resource": [
				"*"
			]
		}
		{
			"Sid": "Amazon QDeveloperPermissions",
			"Effect": "Allow",
			"Action": [
				"codewhisperer:GenerateRecommendations"
			],
			"Resource": "*"
		}

```
6. Review and save your changes

![alt text](../../Q-iam-policy-review.png)

7. Open your SageMaker studio UI and create a JupyterLab Space

![alt text](../../Q-jupy-lab.png)

8. Click on the Amazon Q logo on the left panel 

![alt text](../../Q-amazon-q-jup.png)

Alternatively, you may open a CodeEditor application from the SageMaker Studio UI and install Amazon Q Developer as an extension. Please note that when this application is shut down, the extension will be removed. 

#### Environment Set Up

Download the scripts required to run this tutorial using wget: 

```bash
wget https://raw.githubusercontent.com/STRIDES/NIHCloudLabAWS/refs/heads/drafts/notebooks/GenAI/example_scripts/bioinformatics_testing.py --no-check-certificate 
wget https://raw.githubusercontent.com/STRIDES/NIHCloudLabAWS/refs/heads/drafts/notebooks/GenAI/example_scripts/quick-actions-testing.ipynb --no-check-certificate 

```

### Use Cases

#### Use Case 1: Utilize in-line code completion.

In-line code completion is a feature that helps you write code faster and with fewer errors. As you type, it suggests possible ways to complete your code based on what you've started to write. 

Try it out! Let's try adding a sixth step to the `bioinformatics-testing.py` script, where we will run samtools sort, to this script. As you type, press the tab key to see what the coding assistant suggests for you!

![alt text](../../Q-code-completion-1.png.png)

Samtools Sorting Example: 
```python
# Step 6: Run Samtools Sort
for index, row in sample_sheet.iterrows():
    bam_file = f"./star_results/{row['sample_id']}.bam"
    sorted_bam_file = f"./star_results/{row['sample_id']}_sorted.bam"
    samtools_sort_command = f"samtools sort {bam_file} -o {sorted_bam_file}"
    subprocess.run(samtools_sort_command, shell=True)
```

#### Use Case 2: Python Notebook functionalities

The quick actions menu provides a list of ways that you may prompt the coding assistant. In this use case, we will test the `/fix`, `optimize`, and `explain` quick actions 

##### **`/fix` Prompt**

1. Open the quick-actions-test.ipynb file
2. Run the notebook
3. Select the cell that contains and error - Cell 3
4. Navigate to the Amazon Q Developer search bar and type in `/fix`
5. Click on the down arrow next to the send button and select "Send message with selection" 

![alt text](../../Q-send-cell-with-prompt.png)

##### **Response** 

![alt text](../../Q-fix.png)

##### **Response Breakdown**
* The response contains the corrected code, a description of what the code does and suggestions for improving it. 
* You can easily implement the suggested changes by clicking on the three dots at the top of the response and selecting "Replace selection"

##### **`/optimize` Prompt**

1. Select the cell in which data is added to the dataframe - Cell 4
2. Navigate to the Amazon Q Developer search bar and type in `/optimize`
3. Click on the down arrow next to the send button and select "Send message with selection" 

##### **Response**

![alt text](../../Q-optimize.png)

##### **Response Breakdown** 
A few methods of optimization are suggested. 

##### **`/explain` Prompt**
1. Select the cell in which matplotlib is used to create a plot from the dataframe - Cell 5
2. Navigate to the Amazon Q Developer search bar and type in `/explain`
3. Click on the down arrow next to the send button and select "Send message with selection" 

##### **Response** 

![alt text](../../Q-explain.png)

##### **Response Breakdown** 
* The functions present in the code snippet are explained. Additionally, suggestions to enhance the code are provided

#### Use Case 3: Rewrite or add to an existing script. 

Prompting can be used to modify an existing script. We will incorporate parallel processing into a script. Parallel processing allows a program to execute multiple tasks simultaneously, which can significantly speed up the execution time, especially for tasks that are computationally intensive. 

Note: In this prompt, we include the script. When utilizing Amazon Q Developer in other IDEs such as VSCode or Code Editor, you may reference files through their filepaths. 

##### **Prompt** 

I would like to use parallel processing in my `bioinformatics-testing.py` script. Can you modify this script to do so?

Script: 
```python
import pandas as pd
import subprocess

# Step 1: Read the sample sheet
sample_sheet = pd.read_csv('samplesheet.csv')

# Step 2: Run FastQC
for index, row in sample_sheet.iterrows():
    fastqc_command = f"fastqc {row['file_path']} -o ./fastqc_results/"
    subprocess.run(fastqc_command, shell=True)

# Step 3: Run MultiQC
multiqc_command = "multiqc ./fastqc_results/ -o ./multiqc_report/"
subprocess.run(multiqc_command, shell=True)

# Step 4: Run STAR aligner
for index, row in sample_sheet.iterrows():
    star_command = f"STAR --genomeDir /path/to/genome --readFilesIn {row['file_path']} --outFileNamePrefix ./star_results/{row['sample_id']}"
    subprocess.run(star_command, shell=True)

# Step 5: Index BAM files with Samtools
for index, row in sample_sheet.iterrows():
    bam_file = f"./star_results/{row['sample_id']}.bam"
    samtools_command = f"samtools index {bam_file}"
    subprocess.run(samtools_command, shell=True)
```


##### **Response**

![alt text](../../Q-parallel-processing.png)

##### **Response Breakdown**

The response includes the following elements: 

1. A Modified Python Script:The complete modified script with parallel processing for FastQC and STAR alignment steps.
2. Key Improvements: A list of the main improvements made in the modified script.
3. Customization Instructions: Instructions on how to customize the parallel processing by specifying the number of processes.
4. Notes: Some considerations and caveats related to the script and parallel processing.
5. Performance Optimization Tips: Tips for optimizing performance when using the modified script. 

Alternatively, you can prompt the model to suggest ways to modify your existing scripts to run more efficiently. 

##### **Prompt**

What are some ways that I can optimize this script? 

Script: 
```python
import pandas as pd
import subprocess

# Step 1: Read the sample sheet
sample_sheet = pd.read_csv('samplesheet.csv')

# Step 2: Run FastQC
for index, row in sample_sheet.iterrows():
    fastqc_command = f"fastqc {row['file_path']} -o ./fastqc_results/"
    subprocess.run(fastqc_command, shell=True)

# Step 3: Run MultiQC
multiqc_command = "multiqc ./fastqc_results/ -o ./multiqc_report/"
subprocess.run(multiqc_command, shell=True)

# Step 4: Run STAR aligner
for index, row in sample_sheet.iterrows():
    star_command = f"STAR --genomeDir /path/to/genome --readFilesIn {row['file_path']} --outFileNamePrefix ./star_results/{row['sample_id']}"
    subprocess.run(star_command, shell=True)

# Step 5: Index BAM files with Samtools
for index, row in sample_sheet.iterrows():
    bam_file = f"./star_results/{row['sample_id']}.bam"
    samtools_command = f"samtools index {bam_file}"
	subprocess.run(samtools_command, shell=True)

```

##### **Response**

![alt text](../../Q-optimize-script.png)

##### **Response Breakdown** 

The response includes 

1. An Optimized Python Script: The script uses multiprocessing to run samples in parallel, handles errors comprehensively, sets up detailed logging, organizes code into modular functions, manages resources efficiently, and tracks progress with status updates and summary reports.
2. Explanation of Optimization Techniques: The response explains each of these optimization techniques 
3. Suggestions for Further Optimizations: Further optimizations such as checkpointing and adding a configuration file are suggested
4. References used for optimization are provided 




#### Use Case 4: Code conversion

Code conversion is often necessary when you want to adapt existing scripts to different environments, workflows, or tools. A common use-case is converting scripts into a workflow language. 

##### **Prompt** 

Convert this `starAlign.slurm` script into a Snakemake workflow.
Script: 

```bash
#!/bin/bash
#SBATCH --job-name=star_alignment
#SBATCH --output=star_output.txt
#SBATCH --error=star_error.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=02:00:00

module load star

INPUT_DIR=/path/to/input
OUTPUT_DIR=/path/to/output
GENOME_DIR=/path/to/genome

STAR --genomeDir $GENOME_DIR --readFilesIn $INPUT_DIR/sample_R1.fastq $INPUT_DIR/sample_R2.fastq --outFileNamePrefix $OUTPUT_DIR/sample_

```

##### **Response**

![alt text](../../Q-snakemake-wf.png)

##### **Response Breakdown** 

The response includes the following elements: 

1.  Scripts (Snakefile, config.yaml, cluster.yaml, submit.sh):
    * The `Snakefile` defines the workflow rules, including the all rule for final outputs and the star_align rule for alignment using STAR
    * The `config.yaml` specifies workflow configurations
    * The `cluster.yaml file` specifies default SLURM settings and resource requirements for specific rules
    * The `submit.sh` script is used to run the Snakemake workflow on a SLURM cluster
2. The key features of the workflow are highlighted and explained
3. The response provides instructions on how to run the workflow. Multiple options are given for this purpose.
3. Instructions are given to create necessary directories, modify configuration files, and run the workflow using the SLURM profile or direct submission.
4. The response suggests adding more rules for quality control or downstream analysis, defining dependencies in an environment file, and including quality control outputs in the workflow.

#### Use Case 5: Cloud migration assistance. 

Cloud migration of bioinformatics pipelines involves moving data and computational workflows to the cloud. This allows researchers to use scalable and powerful cloud resources, making it easier to process large datasets and perform complex analyses efficiently and cost-effectively. Prompting can be used to facilitate the migration of pipelines to the cloud. As Amazon Q Developer specializes in queries and tasks related to AWS, let's prompt the model to facilitate migration to AWS infrastructure. 

##### **Prompt** 

Modify this Snakemake workflow to run using cloud resources in AWS. What are the best practices for securing Snakemake workflows when using AWS cloud resources? 

Note: In this prompt, we will be continuing to ask questions in the same chat thread that created the snakemake script earlier. Referencing your chat history and fine-tuning the response to best suit your needs is an example of chain-of-thought prompting. 


##### **Response**

![alt text](../../Q-snakemake-cloud.png)

##### **Response Breakdown** 

1. Modified Script: The new response contained modifications that enable the pipeline to run using AWS infrastructure. For example, the snakefile contained S3 URLs and AWS-specific parameters. Additionally, an IAM policy was defined for running in AWS and a submit_aws.py script was defined. This script can be used to submit the Snakemake workflow to AWS Batch using the AWS SDK for Python. 
2. Key Features of the Implementation: The response highlights key features such as AWS integration for input/output storage and job scheduling, security best practices (IAM policies, secrets management, network security, data encryption, logging/monitoring), and resource management through AWS Batch settings.
3. Steps to Use the Workflow: Instructions are given to configure AWS settings in the configuration files, define necessary IAM policies, and run the workflow using the AWS Batch submission script.
4. Additional Security Measures: The response suggests creating VPC endpoints for AWS services, setting up S3 lifecycle policies for data retention and deletion, using AWS KMS for server-side encryption, and enabling S3 access logging for audit trails.

#### Use Case 6: Write code from scratch. 

Prompting an AI tool like Amazon Q Developer to write a script can be incredibly useful for several reasons. It significantly enhances efficiency by reducing the time and effort needed to create code from scratch. Additionally, it serves as a valuable learning aid, helping users understand coding practices and library usage through generated examples. The generated scripts provide a flexible starting point that can be easily customized to meet specific requirements, allowing users to quickly adapt and expand their projects.

##### **Prompt** 

Can you assist me in writing an R script that generates a plot of gene expression levels from a dataset? Please use the ggplot2 library for visualization. The script should read a CSV file containing gene expression data and produce a bar plot showing the expression levels of each gene. 

##### **Response**

![alt text](../../Q-R-script.png)

##### **Response Breakdown**
1. Script: The response includes an R script that accomplishes the prompt instructions. Within the script, comments are provided to help users understand the logic behind the script. 
2. Feature explanation: The features included in the script are highlighted 
3. Instructions on how to run the script and customize it as needed are provided. 

#### Use Case 7: Error debugging

Amazon Q Developer can also be used to identify and fix errors in your code. This is highly beneficial as it can save time and identify errors that may have been difficult resolve. 

### Prompting Best Practices 

Here are some tips to help you in your prompting journey!

1. Be Specific: Clearly state your request or question. Provide specific details to avoid ambiguity.

2. Provide Context: Provide background information that would be relevant to your prompt.

3. Break Down Complex Requests: Divide complex tasks into smaller, manageable parts.

5. Iterate and Refine: Refine your prompts based on the responses you receive. Provide additional context to fine-tune the responses according to your end goal. 

6. Validate the Response: Always use a human-in-the-loop approach to validate the responses you receive. 


### Conclusions

Congrats! You have successfully experimented with the features of Amazon Q. We hope you continue leveraging the powers of GenAI and Amazon Q Developer to drive impactful results in your projects.

### Clean Up

Once you have completed the tutorial, you may stop the JupyterLab application and delete the SageMaker Studio domain. 

#### Stopping domain applications

1. Navigate to Amazon SageMaker 
2. Click on Admin configurations > Domains in the left menu bar
3. Select the SageMaker Studio domain by clicking on the circle found on the right side of the domain name
4. Scroll down to the available applications 
5. Select any applications you have created and click Stop

#### Deleting the SageMaker Studio domain

1. Navigate to Amazon SageMaker 
2. Click on Admin configurations > Domains in the left menu bar
3. Click on the domain name 
4. Scroll down to the Delete domain box and select Delete domain