<a href="https://colab.research.google.com/github/chris-lovejoy/CodingForMedicine/blob/main/Setting_up_Jupyter_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Testing our environmental setup

The purpose of this notebook is to introduce Jupyter Notebooks and for testing the environmental setup (whether running on Google Colab or on your local computer).


### What is a Jupyter Notebook?
The Jupyter Notebook is a browser-based tool for executing code, with an emphasis on easy visualisation and interpretability. It is most commonly used for Python, although can be used for other languages such as Julia, R, Haskell and Ruby. The file extension for Jupyter Notebook files are usually '.ipynb'.


### How to run a Jupyter Notebook

You can either run Jupyter Notebooks on [Google Colaboratory](http://colab.research.google.com/) ("Google Colab") or on your local computer. Google Colaboratory is quicker and easier to set-up. It saves you downloading the Jupyter software, as well as associated libraries (e.g. numpy, pandas) that you may need.


But there are advantages to running Jupyter on your local computer, including:

1. **You can manage data files more easily.** In Google Colab, you may have to re-upload data files each time you re-load the notebook. This isn't ture when the files are local.

2. **Version control is easier**, for example by uploading different versions of your notebook to [GitHub](https://www.github.com).

3. **You can import your own modules more easily.** Once you are more experienced with Jupyter Notebooks, you may want to import custom Python scripts into your notebooks.



### How to get setup
If you want to run the Jupyter Notebook on Google Colab, click on the 'Open in Colab' button at the top of this notebook. Then, click "File -> Save a copy in Drive". You must be logged into a Google account to do so. This will make a local copy of the notebook within your Google Drive which you can then modify.

If you want to run the Jupyter Notebook locally, use [this guide](https://realpython.com/jupyter-notebook-introduction/) to get up and running.



#### Installing local packages
If you are running Jupyter Notebook locally, you may need to install 'packages' or 'libraries'. These are pieces of code that enable specific functionality. 

Popular Python packages include:
- **numpy**: For more advanced manipulation of numbers
- **pandas**: For better handling of data
- **matplotlib**: For visualisation of data, with plots and graphs

The easiest way to install these for use in a Jupyter Notebook is with the 'conda' package manager. This involves running
> 'conda install [package-name]' 

from the command line/terminal. The commands for different packages can be found on the Anaconda website, such as [this command for installing numpy](https://anaconda.org/anaconda/numpy).


### Testing your setup
As a general principle, a completed Jupyter Notebook should be able to run through to completion. You can test by selecting "Kernel -> Restart and Run All".

This helps prevent mistakes that can cause erroneous outputs. For example, one common mistake is using a variable that is declared later on in the notebook or accidentally declaring two variables with the same name. This may not be obvious while writing the initial code, because the variable will be stored in memory. But when you restart and run from front-to-back, the notebook will throw an error.


## 1. Executing cells

You can 'execute' each cell of code individually by clicking 'Run' above or by pressing shift + enter.

Let's run the cells below.

In [1]:
# Declare a "string" of text
text = "let's learn about data science!"

In [2]:
# Declare an integer (ie. a number)
number = 0

Strings, integers and other data types are covered in [this exercise](./Python_principles.ipynb) on python principles.

## 2. Importing packages

Let's start by importing the 'numpy' package, to enable us to manipulate values more easily. A 'package' or 'library' is code written by someone else, that can be used to provide different functionality. 

We can import the package by running the cell below. 

If we're running on Google Colab, it will load that library from the computer running the Colab notebook. If we're running it on our local computer (ie. the PC or laptop in front of you), then it will load from on your computer. This means, you'll need to have installed the package onto your computer, ready to be imported. See the section "Installing local packages" above for help on this, or fill in [this form](https://docs.google.com/forms/d/e/1FAIpQLSdoOjVom8YKf11LxJ_bWN40afFMsWcoJ-xOrKhMbfBzgxTS9A/viewform) if you're struggling and need help.

*(Note: If you don't have a certain package available on Google Colab, you can install it with "!pip install [package name]")*

In [3]:
import numpy as np

We can now access the functions of numpy, by starting with "np.[function name]". For example, we can make an 'array' using "np.array". (Arrays are covered [this exercise](./Python_principles.ipynb) on python principles.)

In [4]:
array_1 = np.array(["to", "I'm", "import", "numpy!", "able"])

In [5]:
sequence = [1, 4, 0, 2, 3]

for index in sequence:
    print(array_1[index], end =" ")

I'm able to import numpy! 

## 3. Loading data 

To load a dataset, we'll want to import the 'pandas' library. We can do that in the same way:

In [6]:
import pandas as pd

The most commonly-used data format is the 'csv' file - which stands for "comma-separated value". In this file format, the data consists of rows of values separated by commas (hence the name). This can be opened by common software for visualising tables, such as Microsoft Excel, and can be easily imported using the pandas library.


[Here](https://github.com/chris-lovejoy/CodingForMedicine/blob/main/exercises/data.csv) is a template csv file which we can load into our notebook.

We want to make sure that the file is in the same directory as our current notebook. To do that, we can use the "ls" to *list* all the files. To run this from within the Jupyter Notebook, we need to add an '!', as follows:

In [7]:
!ls

Breast_cancer_features.ipynb            Python_principles.ipynb
Coding_Medical_Calculator.ipynb         Sentiment_analysis_doctor_reviews.ipynb
Diagnosing_Chest_X-Rays.ipynb           Setting_up_Jupyter_Notebook.ipynb
Predicting_No_Shows.ipynb               data.csv


If you're on Google Colab, the file will not show as you will need to import it. To do so:
1. Make sure you are connected to a runtime (click Connect in the top right, if you aren't)
2. Select the 'Files' folder on the left hand tab
3. Downloaded the 'data.csv' file and drag it into the Files tab (it should show the file uploading in the bottom left, and then you'll see 'data.csv' within the Files tab).


Other helpful commands for visualising files are:
- **pwd**: See the "present working directory", ie. the folder you are currently operating in
- **cd**: "Change Directory" to a different folder

Again, to run these within the notebook, we'll need to add an '!' in front.

### Importing the data with pandas

Let's now test whether we can import our data. We can use the 'read_csv' function from within the 'pandas' library.

In [8]:
dataset = pd.read_csv('./data.csv')
dataset

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,...,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890,
1,842517,M,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,...,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902,
2,84300903,M,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,...,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,...,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300,
4,84358402,M,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,...,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,926424,M,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,...,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115,
565,926682,M,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,...,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637,
566,926954,M,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,...,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820,
567,927241,M,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,...,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400,


### 4. Bringing it all together

Let's now test whether everything worked by running the cell below. If so, it should print out a message.

This cell combines values from all throughout this notebook. So if anything didnt run, it will throw an error.

See if you can figure out how the code in cell below works. But don't worry if you don't understand it - it uses a lot of concepts we haven't covered in this notebook. These concepts are covered in [this exercise](./Python_principles.ipynb) on python principles.


In [10]:
array_2 = np.array(["Python!", "ready to", "all set up"])
print(array_1[1], array_2[2], "which", end=" ")
print(dataset.columns[2][-4:], end="")
print("s", array_1[1], array_2[1], text[6:-1], end=" ")
print("using", array_2[0])