# What is a Jupyter Notebook and how to use it?

_*This introduction notebook was created taking Jupyter Notebook's and Colab's documentation as a starting point. For more information, please visit: [Jupyter](https://jupyter.org/) and [Colab](https://colab.research.google.com/notebooks/intro.ipynb#scrollTo=-Rh3-Vt9Nev9)._

### Introduction

The Jupyter Notebook is an interactive computing environment that enables users to author notebook documents that include:

- Live code
- Interactive widgets
- Plots
- Narrative text
- Equations
- Images
- Video

### Components 

The Jupyter Notebook combines three components:

- **The notebook web application**: An interactive web application for writing and running code interactively and authoring notebook documents.

- **Kernels**: Separate processes started by the notebook web application that runs users' code in a given language and returns output back to the notebook web application. The kernel also handles things like computations for interactive widgets, tab completion and introspection.

- **Notebook documents**: Self-contained documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative text, equations, images, and rich media representations of objects. Each notebook document has its own kernel.

### Web Application (Google Colab)

The advantage of using a web application is that no software installation is needed to interact with and execute the notebooks. More specifically, Google Colababoratory or "Colab" allows you to write and execute code in the browser, see results of computation in different media representations and author narrative text using "Markdown" language. The advantage of Colab over the default Jupyter Notebook Web Application is the ability to benefit from all the Jupyter Notebook's features with zero configuration required.

### Kernels

Through Jupyter's kernel and messaging architecture, the Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook. The deafault kernel runs Python code. Check [this notebook](https://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-1-Introduction-to-Python-Programming.ipynb) for an introduction in Python programming.

### Notebook documents

Notebook documents contain the inputs and outputs of an interactive session as well as narrative text that accompanies the code but is not meant for execution. Notebook documents are just files on your local drive with a ".ipynb" extention.

## Colab Basics
### Saving your own copy

To be able to edit and save your progress you have to create your own copy of the notebook on your google drive. To do that, click on the "File" drop-down menu and select "Save a copy in Drive".

![alt-tekst](https://drive.google.com/uc?export=view&id=1r2fu8Ja3l1ME34nAKmmsmqeR_SddllOf)

Once you have done that you can edit and save your progress by clicking on "Save" or by using the keyboard shortcut "Ctrl+S".




## Getting started

The document you are reading is not a static web page, but an interactive environment called a **Colab notebook** that lets you write and execute code.

For example, here is a **code cell** with a short Python script that computes a value, stores it in a variable, and prints the result:

In [1]:
# This is a comment

# Declare a variable
seconds_in_a_day = 24 * 60 * 60

# Print a number
print(seconds_in_a_day)

86400


To execute the code in the above cell, select it with a click and then either press the play button to the left of the code, or use the keyboard shortcut "Command/Ctrl+Enter". To edit the code, just click the cell and start editing.

Variables that you define in one cell can later be used in other cells:

In [None]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

Now it is time to try it for yourself. Modify the code in the cell below to calculate the number of seconds in a year and print that number.

In [None]:
# define below a variable named "seconds_in_a_year" using the "seconds_in_a_day" variable
# and print it

seconds_in_a_year

Colab notebooks allow you to combine **executable code** and **rich text** in a single document, along with **images**, **HTML**, **LaTeX** and more. When you create your own Colab notebooks, they are stored in your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing them to comment on your notebooks or even edit them. To learn more, see [Overview of Colab](/notebooks/basic_features_overview.ipynb). To create a new Colab notebook you can use the File menu above, or use the following link: [create a new Colab notebook](http://colab.research.google.com#create=true).

Colab notebooks are Jupyter notebooks that are hosted by Colab. To learn more about the Jupyter project, see [jupyter.org](https://www.jupyter.org).

## Interactive functions

There are multiple libraries available for the Noteooks that allow you to create **interactive** functions. One of them is _ipywidgets_ and at the most basic level it can generate a User Interface widget that allows the user to control the inputs of a function. 

Run the cell below and see how you can easily compute the squares of the integers in the range 1 - 10 by moving the slider.

In [None]:
from ipywidgets import interact

# rename the function and modify it so it returns the cube of a number "x"
def square(x):
  print(x*x)

interact(square, x = (0, 10, 1))

Now modify the function above so that it calculates the cube.

## Data science

With Colab you can harness the full power of popular Python libraries to analyze and visualize data. The code cell below uses **numpy** to generate some random data, and uses **matplotlib** to visualize it. To edit the code, just click the cell and start editing.

In [None]:
# Import the necessary libraries
import numpy as np
from matplotlib import pyplot as plt

# Generates 100 random numbers around 200
ys = 200 + np.random.randn(100)
# Generates the an array containing integers from 0 till lenght of the ys array
x = [x for x in range(len(ys))]

# Plot using plt.plot function from the matplotlib library
plt.plot(x, ys, '-')
plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6)

# Define plot title
plt.title("Sample Visualization")
plt.show()

One of the most used ways to store and work with data structures is by using **Pandas DataFrames**. Putting the _ys_ and _x_ into a dataframe allows easier data manipulation.

In [None]:
import pandas as pd

# putting the ys and x into a dataframe
dataframe = pd.DataFrame(np.transpose([ys,x]), columns = ['ys', 'x'])

# setting x column as the index
dataframe = dataframe.set_index('x')

dataframe

In [None]:
dataframe.plot()

You can import your own data into Colab notebooks from your Google Drive account, including from spreadsheets, as well as from Github and many other sources. To learn more about importing data, and how Colab can be used for data science, see the links below:

[Importing data in Colab](https://colab.research.google.com/notebooks/io.ipynb)

[Data analysis with Pandas](https://pandas.pydata.org/docs/getting_started/overview.html)

[Data visualization with Matplotlib](https://matplotlib.org/)

## Exercise

Now let's do some data analysis ourselves using the basic functionalities of Python and Jupyter Notebooks. In 2013 [Crunchbase](https://data.crunchbase.com/docs/open-data-map) provided a snapshot of all the start-ups in USA that got funding in that particular year. Run the cell below to upload the ".CSV" file that contains the data.

In [None]:
from google.colab import files
uploaded = files.upload()
filename = list(uploaded.keys())[0]

Pandas library provides several functions which allow data to be read from files with different extensions. In this case we are going to use "pd.read_csv".

In [None]:
data = pd.read_csv(filename)
data

Let's find out in which category the companies that managed to to get the most funding were active.

To do that we need to perform several steps:
- set the category code as the table index
- group the data by category and return the fundings as the sum of the fundings got by each company which is part of the category
- sort the values to have a more clear representation

In [None]:
data_grouped = data.set_index('category_code').groupby(level = 'category_code').sum().sort_values(by = ['funding_total_usd'])
data_grouped.plot(kind = 'bar', figsize = (20, 8), grid = True)

Quite interesting, isn't it? Some of these companies however, are already closed. Let's see out of closed companies, which category received the most funding.

In [None]:
# keeping only the closed companies
data_closed = data[data['status'] == 'closed']
data_closed

Calculate and plot it yourself. Write your code in the cell below.

In [None]:
# think about the steps you need to take to tranform the data
data_closed_grouped = # data_closed.set_index('category_code').groupby(level = 'category_code').mean().sort_values(by = ['funding_total_usd'])
data_closed_grouped.plot(kind = 'bar', figsize = (20, 8), grid = True)

By running the cell below we are going to introduce another column in the dataset that contains the first letter of each company's name. 

In [None]:
data['first_letter'] = data['name'].apply(lambda x: x[0])
data

## Question 1

Filter the list so it contains companies whose names start with the same letter as your last name. 

Out of your list, which three categories received the most funding?
What about the most average funding per company? (Hint: use "mean()")

In [None]:
# total fundings per category
data_total = 


In [None]:
# average fundings per category
data_average = 

## Downloading on local drive

To save the notebook on your local PC and to be able to open with other Jupyter Notebook applications you have to click on the "File" drop-down menu and select "Download .ipynb". 

![alt-tekst](https://drive.google.com/uc?export=view&id=1rXpG6hsG4I_j7NTiLAHyWEreakIxIIv3)
