# Code Club 001: introduction to Jupyter notebooks


In [2]:
import os
import time

code_club_rule = True
count = 1

while code_club_rule == True:
    
    if str(count)[-1] == '1':
        ordinal_ind = 'st'
    elif str(count)[-1] == '2':
        ordinal_ind = 'nd'
    elif str(count) == '3':
        ordinal_ind = 'rd'
    else:
        ordinal_ind = 'th'

    print(f'The {count}{ordinal_ind} rule of Code Club is...')
    print("... always talk about Code Club!!!")

    count += 1
    time.sleep(4)

    if count == 10:
        code_club_rule = False

The 1st rule of Code Club is...
... always talk about Code Club!!!


KeyboardInterrupt: 

## Jupyter Notebooks

Jupyter a not-for-profit project to develop open-source software and interactive computing across multiple programming languages

  - The name Jupyter is derived from the three core programming languages supported: Julia, Python and R

Jupyter's most famous product is it's *Notebook* interface, which allows users to combine text and code cells:

  - Text cells are written in `markdown` which is a simple markup language that should be familiar to anyone that's written for a wiki **n.b.** it's also supported in Teams
    - here's a guide for it's use: [https://www.markdownguide.org/](https://www.markdownguide.org/)

  - Code cells can be written in `Julia`, `python` and `R`, but the most common language used is `python`

In 2017, Jupyter Notebooks won the prestigious ACM Software System Award [https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2](https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2)

There are a range of different platforms that allow you to create and distribute notebooks via the internet, including [Kaggle.com](https://www.kaggle.com/) and [Google Colab](https://colab.google/)
Notebooks and these platforms tend to be very popular amongst Data Scientists and are used for prototyping, sharing experiments and also because they offer free access to GPUs (very useful for machine learning/AI)
Jupyter Notebooks can also be used to write presentations, e.g. [RISE](https://rise.readthedocs.io/en/latest/) and even write books e.g. [Deep Learning for Coders with fastai and PyTorch](https://github.com/fastai/fastbook)

## Kaggle

As mentioned above, Kaggle is a platform for creating, editing and sharing Jupyter Notebooks. It also provides a platform for hosting competitions (usually AI/machine learning) and to make datasets available e.g. Seattle Library's Collection Inventory [https://www.kaggle.com/datasets/city-of-seattle/seattle-library-collection-inventory](https://www.kaggle.com/datasets/city-of-seattle/seattle-library-collection-inventory)

For today's Code Club session, I have created a Notebook in Kaggle - link shared via Teams

Once you have the link open, and you are signed into Kaggle, you should get the option to *Copy & Edit* in the top right-hand side of the page:
    
   - clicking that will save a version of the notebook to your profile and allow you to execute the code, add, remove and edit sections and access it wherever you have an internet connection

The notebook is designed to be a whistle-stop tour through `python`, `pandas` and `plotly`. These are all huge topics, but this notebook takes inspiration from Fastai's [Jeremy Howard](https://jeremy.fast.ai/) and takes a top-down approach. A simple analogy to explain this approach is to imagine learning to play a sport or game as a child: the first step isn't to learn all of the rules and their intricacies, it's to play! And through doing so, the game and it's rules begins to make more sense. In that spirit, read through the Notebook and if things don't immediately make sense, make a note and come back to it. For each section I've linked to other tutorials which go into more depth.

### 2 + 2 equals...

If you understood why the output to the above was what it was, then you're a Python expert (for this session at least) and if you didn't, no worries: you should have a Python expert somewhere nearby! 

## Python

Python is named in reference to Monty Python. This is important because it's indicative of the mindset of lots of Python developers. Often tutorials and documentation contain reference to the eponymous comedy troupe:

  - the official input and output documentation is littered with references to *Monty Python and the Holy Grail* [https://docs.python.org/3.8/tutorial/inputoutput.html#the-string-format-method](https://docs.python.org/3.8/tutorial/inputoutput.html#the-string-format-method)
  - the first chapter of [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/2e/chapter1/) contains 44 instances of [spam](https://youtu.be/anwy2MPT5RE?si=g0Y-jedgdwK-oyyp)

The community of Python developers pride themselves on doing things the 'pythonic' way. There is an entire code writing style guide [PEP 8](https://peps.python.org/pep-0008/) that goes into fine detail. The key underlying principles of being *pythonic* can be summarised as follows:

  - code should be written in a readable manner
  - explicit is better than implicit
  - simple is better than complex

This approach makes Python easier to learn, improves maintainability and generally makes understanding others (and sometimes your own) code easier

By way of an example here is a comparison between JavaScript and Python: both code snippets will define a list of pet names, loop through them and display them to the user. 

**JavaScript:** has a more verbose syntax and is therefore less intuitive to read

  ```js
  const pets = ['Dave the dog', 'Sammy the snake','Leo the lion','Kozzy the kangaroo'];

  for (let i=0; i < pets.length; i++) {
    console.log(pets[i]);
  }
  ```



**Python:** has a less busy syntax and reads more like plain English:

In [4]:
pets = ['Dave the dog', 'Sammy the snake','Leo the lion','Kozzy the kangaroo']

for pet in pets:
    print(pet)

Dave the dog
Sammy the snake
Leo the lion
Kozzy the kangaroo


### Python Basics & Conventions

Python follows a series of conventions, many of which are observed by most other programming languages. Here is a selection of the most fundamental ones:

1. Scripts are read and executed from top-to-bottom

Much like English, computers *read* Python left-to-right and top-to-bottom e.g. the code block below will cause an error to be returned 

In [6]:
carrots = .50
peas = .25
potatoes = .25

sub_total = carrots + peas + asparagus + potatoes # this line will cause an error

asparagus = 2.00

# sub_total = carrots + peas + asparagus + potatoes 


NameError: name 'asparagus' is not defined

2. Indentation is important

Python allows the programmer to define *blocks* of code that can be optionally executed if particular conditions are met. These blocks are identified by the use of indentation, precisely 4 spaces. Blocks can also be nestled within other blocks.

In Python the simplest way of controlling the flow is the use of `if`, `elif` (*i.e.* else if) and `else` statements

e.g. a simple temperature control system. Try changing the temperature variable to different numerical values to see how the outcome is affected

In [10]:
temperature = 11

if temperature < 20:
    # note the indentation
    print('Temperature is below 20, close the windows')
    if temperature < 10:
        print('Turn up the heating')

elif temperature < 30:
    print('Open the windows!')

else:
    print('Turn on the AC!')


Temperature is below 20, close the windows


**N.B.** most code editors (including Kaggle) will recognise that you're writing Python and automatically indent the required 4 spaces when appropriate.

3. Python is very literal and precise

I once heard writing code described as *"teaching a rock how to think"*. To that end a python script will only do what you've instructed it to do. Sometimes you may think you've told it to do something and the results suggest otherwise. Which can be (very) frustrating, but unfortunately it's usually our fault!

Because the Python script is interpreted very literally, it is important to be precise. This includes ways in which our brains have often learned to ignore or be flexible with e.g. capitalisation

In [4]:
my_fav_team = 'Manchester United'

your_fave_team = 'Manchester united'

print('Do we have the same favourite team? ', my_fav_team == your_fave_team)

Dp we have the same favourite team?  False


### things to explain

- variables
- comparisons
- ints
- lists
- comments

In [None]:
x = '2'
y = '2'

print(x + y)

### Python learning resources

- *Hello, Python* notebook. A good beginner's introduction to Python [https://www.kaggle.com/code/colinmorris/hello-python/tutorial#Your-Turn](https://www.kaggle.com/code/colinmorris/hello-python/tutorial#Your-Turn)


## Pandas

Another section, another silly name. **PANDAS** is a portmanteau of an economics term: **PAN**el **DA**ta (with an **S** added for good measure). It is a vast and very well-used Python package and is most useful for working with lots of data.

At the core of Pandas is the concept of the `DataFrame` which is essentially a representation of data in table form (similar to Excel or a database table). The table consists of rows of data, organised into columns and has an index. 

Pandas also provides lots of functionality to understand, shape and transform your data. It also has out-of-the-box support for reading and writing data to a variety of file formats including excel and csv, as well as built in database connectors. 

- 10 mins to Pandas [https://pandas.pydata.org/docs/user_guide/10min.html](https://pandas.pydata.org/docs/user_guide/10min.html)

In [None]:
import pandas as pd

## Plotly

Plotly is an open source graphing library. It began as a JavaScript library but has been adapted so that it interfaces with many different coding languages, including Python! The Python integration also interfaces with Pandas very nicely, so you can prepare your data in a Pandas DataFrame and then create some very nifty graphs with relative ease.

Plotly is not part of the Python standard library meaning that before we can use it, it needs to be installed. The most common method for installing Python packages is through `pip install <package name>` run as a *bash* command. In Jupyter *bash* commands are indicated through the use of `!` at the beginning of the code block. e.g. see below

In [None]:
! pip install plotly

*n.b.* `pip` stands for *"pip installs packages"* which is a recursive name, because the pip at the beginning stands for *"pip installs packages"* another fine example of Python programming humour.

## Scatter plot example

The following are all adapted from the *Basic Charts > Scatter Plots* Plotly docs page [https://plotly.com/python/line-and-scatter/](https://plotly.com/python/line-and-scatter/)

*n.b.* the *Plotly* docs will often make reference to their framework *Dash* - this is a more complete data analytics tool, similar to Power BI and a bit beyond the scope of this notebook.

The first plot below shows how simple it is to construct a scatter chart. The `x` and `y` variables contain lists that will be plotted.

It's worth noting that all the examples follow common naming conventions e.g. the figure to be plotted will be created in a variable called `fig`, this is another example of the Python community embracing readability

In [5]:
import plotly.express as px
fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])
fig.show()

When you install Plotly, you are also provided with sample datasets [https://github.com/plotly/datasets](https://github.com/plotly/datasets)

The following example uses one of these datasets `iris` that is loaded as a Pandas DataFrame

In [8]:
import plotly.express as px
df = px.data.iris()

Before plotting anything, it can be useful/wise to use some Pandas methods to understand the data a bit better e.g.

  - `df.columns` to see the column names
  - `df.shape` to see the height and width of the data
  - `df.describe()` to give a statistical summary of the data
  - `df.info()` to describe the data type and *non-null* count for each column

In [14]:
print(df.columns)
print(df.describe())
print(df.info())
print(df.shape)

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species',
       'species_id'],
      dtype='object')
       sepal_length  sepal_width  petal_length  petal_width  species_id
count    150.000000   150.000000    150.000000   150.000000  150.000000
mean       5.843333     3.054000      3.758667     1.198667    2.000000
std        0.828066     0.433594      1.764420     0.763161    0.819232
min        4.300000     2.000000      1.000000     0.100000    1.000000
25%        5.100000     2.800000      1.600000     0.300000    1.000000
50%        5.800000     3.000000      4.350000     1.300000    2.000000
75%        6.400000     3.300000      5.100000     1.800000    3.000000
max        7.900000     4.400000      6.900000     2.500000    3.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    flo

Now we know a bit more about the *iris* data we can assign the column names to the `x` and `y` variables in the construction of the figure

In [6]:
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()

In [13]:
fig = px.scatter(df, x="petal_width", y="petal_length")
fig.show()

Finally, more *keyword arguments* can be passed to the `fig` variable to further enrich the plot:

  - `size` is set to *petal_length* and will determine the size of the dot
  - `color` is set to *species* and will assign different colours to each species
  - `hover_data` is set to *petal_width* and will include that in the information displayed when each plot point is hovered over

This is a relatively small selection of the ways in which the scatter plot can be customised, the full documentation lists all of the options [https://plotly.com/python/reference/scatter/](https://plotly.com/python/reference/scatter/)

In [7]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width'])
fig.show()

### Useful links:

- plotly official docs [https://plotly.com/python/](https://plotly.com/python/) 
- official pip docs [https://pypi.org/project/pip/](https://pypi.org/project/pip/)
- intro to pip [https://www.w3schools.com/python/python_pip.asp](https://www.w3schools.com/python/python_pip.asp) 