# Welcome to Jupyter!

This repo contains an introduction to [Jupyter](https://jupyter.org) and [IPython](https://ipython.org).

Outline of some basics:

* [Notebook Basics](../examples/Notebook/Notebook%20Basics.ipynb)
* [IPython - beyond plain python](../examples/IPython%20Kernel/Beyond%20Plain%20Python.ipynb)
* [Markdown Cells](../examples/Notebook/Working%20With%20Markdown%20Cells.ipynb)
* [Rich Display System](../examples/IPython%20Kernel/Rich%20Output.ipynb)
* [Custom Display logic](../examples/IPython%20Kernel/Custom%20Display%20Logic.ipynb)
* [Running a Secure Public Notebook Server](../examples/Notebook/Running%20the%20Notebook%20Server.ipynb#Securing-the-notebook-server)
* [How Jupyter works](../examples/Notebook/Multiple%20Languages%2C%20Frontends.ipynb) to run code in different languages.

# Here are some of the basics!

* Jupyter is nice because it's a mash of both **code** and **markdown** (similar to Discord!)
* Write code in cells and run individual cells to see output!
* Run a cell by hitting **Ctrl + Enter** 
* Cells must be run **in order** for code to work!
* Press **b** to create a new cell below the current selected cell

In [None]:
print("Click on this cell, then hit Ctrl + Enter to make it run!")

*Note: The **[1]** on the left of a code cell notates in what order code was run. Try running the following 3 sets of cells in order to make it work!*

# Now to learn about Pandas!

![title](./img/pandas.png)

**pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

Library documentation: https://pandas.pydata.org/

# Start below

**First**, we must **import** numpy from Python!

1. In the top right, click the **blue + button**. 
2. Select Terminal.
3. Type in "pip install pandas" and hit enter
4. Type in "pip install matplotlib" and hit enter

Once these four steps have been done, we can now run the line of code below to import it into our Python!

In [None]:
import pandas as pd

In [None]:
# Pandas works well with lists and dictionaries!
data = {
    'apples': [3,2,0,1],
    'oranges': [0, 3, 7, 2]
}

Then we can pass it through Pandas into something called a DataFrame! \
DataFrames help us visualize our data!


In [None]:
purchases = pd.DataFrame(data)
purchases

As seen above, the DataFrame takes our **key** and our **values** and presents them in comparison to their indexes. \
We can also change the index of our DataFrame using the below!

In [None]:
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])
purchases

Now the above shows who has how many of what object! Easier to visualize! \ \
We can also **loc**ate a specific customer! This is basically referencing a specific index!

In [None]:
purchases.loc['June']

## Reading from a CSV! (Excel file)

Check out the "purchases.csv" file on the left. This is a basic excel file. \
CSV stands for **comma separated values**. A very basic way for computers to separate information.

In [None]:
df = pd.read_csv('purchases.csv')
df

In [None]:
# We can add that our column starts at the 0th position to remove the index column.
df = pd.read_csv('purchases.csv', index_col=0)
df

## Here's another data set! Movies!!!
We can read this data, BUT there's a lot!

In [None]:
movies_df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")

In [None]:
# To not read all of it, let's just look at the top!
movies_df.head()

In [None]:
# We can also look at the bottom!
movies_df.tail(2)

In [None]:
# This command tells us what data we're looking at and what data type it is!
movies_df.info()

## From this movie data, let's pull specific data!

In [None]:
# This will show us what our columns are!
movies_df.columns

In [None]:
# Now we see we have the column 'Revenue (Millions)', let's see how much money we got!
revenue = movies_df['Revenue (Millions)']

In [None]:
revenue.head() # Top ranks revenue!

In [None]:
revenue_mean = revenue.mean()  # Let's calculate the mean of all movies!
revenue_mean

## With our data, we can search through it!

Let's look for movies made by a specific director! \
We can ask the condition **if the director is equal to "Ridley Scott"**

In [None]:
movies_df[movies_df['Director'] == "Ridley Scott"] 

In [None]:
# Beyond this, we can also check for ratings ABOVE 8.6, then show first 3 results
movies_df[movies_df['Rating'] >= 8.6].head(3)

## This is all great, but it's a lot of text. 
## Time to visualize!

In [None]:
import matplotlib.pyplot as plt # import matlab plotting!
plt.rcParams.update({'font.size': 20, 'figure.figsize': (10, 8)}) # set font and plot size to be larger

In [None]:
# This creates a XY plot
movies_df.plot(kind='scatter', x='Rating', y='Revenue (Millions)', title='Revenue (millions) vs Rating');

In [None]:
# We can make a Histogram!
movies_df['Rating'].plot(kind='hist', title='Rating');

In [None]:
# How bout a box plot!
movies_df['Rating'].plot(kind="box");

![title](./img/boxplot.gif)

# Lab time! 

Again, there's a lot of information above! You don't have to remember it all!

For the lab, accomplish the following:
1. Create a histogram of the years of movies!
2. Create a XY coordinate plot of Year vs Metascore
3. Print out the mean Metascore.