# Getting Started

This is a Jupyter Notebook file. You can run code directly in a Jupyter Notebook by typing commands (typically in python) and also use the file as a place to document and share information

### Getting Started

Since you are viewing this file, this means that you (likely have) already done the following. If you haven't done these yet, please do so now:
#### Start an interactive session

- Sign in to https://ood.huit.harvard.edu/ 
- Navigate to `Interactive Apps → Jupyter Lab`
- Launch a Jupyter Lab session with the following parameters:
    - Number of hours: 2
    - Number of CPUs: 1
- When the session is ready, click “Connect to Jupyter”
  
#### Make a copy of the Jupyter Notebook that contains these instructions. 

You'll be copying this file from the shared course folder, so that you can edit it as you see fit:

Today's instructions are saved in the following place:
`~/153784/practical_instructions/GettingStarted.ipynb`

To make a copy of these instructions run the following from within your terminal:

Now you should be able to open your own version of this Jupyter Notebook that you can edit by double clicking on the file called `GettingStarted.ipynb` on the file list to the left

### Locate your mystery genome's bam file

Now it's time to find the bam file associated with your mystery genome. It should be saved in the shared course folder, in a sub-directory called `data/mystery_genomes`

Take a look at the contents of this directory by running the following command:

### View your mystery genome's bamfile header

Now it's time to use the tool samtools to view your mystery genome's bam file header. 

Modify the placeholder text in the following code to view the header associated with your mystery genome 

<i> Remember, placeholder text is typically placed between brackets (be sure to replace the brackets as well) </i>

### View the first 25 lines of your mystery genome's bamfile 

In the above command, we used the parameter `-H` to specify that we only want to view the bam header. If we didn't use this flag the command you ran would have returned the full contents of the bam file with no header. 

The `-h` flag tells samtools to return the full bam faile with a header 

<b>Note</b> - Don't try running the above command without the `-H` flag, since your bam file is so big that this will likely crash your interactive session

If you want to see just a part of your bam file, then we can combine `samtools view` with the `head` command by running the following:

Notice how this time you don't see the header (since you didn't ask for it with either `-H` or `-h`). 

The `-25` flag tells the head command to show you the first 25 lines in the file. If you hadn't used this flag, the default is to return the first 5 lines

## See what else a Jupyter Notebook can do

For most of this course, I'll just be using Jupyter Notebooks to share instructions with you that you will run via a Terminal window, but Jupyter Notebooks are much more powerful. 

There are three types of Jupyter notebook cells. Try executing each of the following cells by typing typing `Command+Enter` while you have highlighted them with your cursor to see how they work:

This is a markdown cell, which can be used to record notes

In [None]:
print("This is a code cell, which you can run code from, like this print statement")

### A few examples of running code in a Jupyter Notebook

#### Python code

This type of Jupyter Notebook is primarily set up to run code in python. 

Here's an example of some simple python code that returns a list of the numbers 1-10.  

Try executing the code:

In [None]:
for x in range(1, 11):
  print(x)

#### Bash code

You can actually also run bash code from within a Jupyter Notebook if you start the command with an exclaimation point (`!`). 

Try printing the contents of the directory that this file is in by executing the following code:

In [None]:
! ls

Or if we want to replicate the python code from the previous section in bash, we can run the following bash code:

In [None]:
! for num in {1..10}; do echo "${num}"; done 

#### Loading and manipulating a data table with pandas 

Jupyter Notebooks a great places to view and manipulate data tables. The following code will load in data from an example csv file from the shared class directory using the tool pandas and will display it on the screen

In [None]:
import pandas as pd

data = pd.read_csv('~/153784/data/reference_data/getting_started_table.csv')

data

You can sort the table based on the values in a particular column, using a command like:

In [None]:
data.sort_values(by="Weight (g)")

Or you can subset the data, using a command like:

In [None]:
data[data["Season"]=="Fall"]

Learn more about everything you can use pandas for here: 
https://pandas.pydata.org/docs/user_guide/index.html#user-guide 

#### Making a simple plot with matplotlib

In this class, you will be asked to plot the results of some of your analyses using a tool of your choice. 

Using the data from the previous section, let's make a simple graph with the tool matplotlib (https://matplotlib.org/stable/users/index.html) 

In [None]:
import matplotlib.pyplot as plt

data.groupby('Color').size().plot(kind="pie")

Or you can make a scatterplot that compares the Weight and Price of each fruit

In [None]:
plt.scatter(data["Weight (g)"], data["Price per kg"])

And make sure to add appropriate labels to your figure, and customize the color and style of your markers

In [None]:
plt.scatter(data["Weight (g)"], data["Price per kg"], color="red", marker="*")
plt.xlabel("Weight (g)")
plt.ylabel("Price per kg")
plt.title("Fruit Prices")