# Notebooks: like spreadsheets for data science experiments

- One place for your narrative, code, and visualizations.
- **Markdown** (simplified HTML) for narrative
- **Python**, R, and a bunch of other languages for computation and analysis

Several Notebook options:
- Jupyter Notebook (very popular)
- Kaggle (cloud-based Jupyter)
- Google Colab (collaborative notebook)
- Observable (uses JavaScript/d3 instead of Python/matplotlib)
- SageMaker (cloud-based Jupyter by Amazon)

# Python: programming language for general computing

In [None]:
# Say hey.
print('hello punchcut 👋🏽')

In [None]:
# Split the bill. 
party_size = 8
total_charges = 200
individual_balance = total_charges / party_size
print('You owe ${:,.2f}'.format(individual_balance))

In [None]:
# Proclaim the greatest rappers of all time.
for i in range(5):
    print('{}. Dylan'.format(i + 1))

# NumPy: a linear algebra library for Python

In [None]:
import numpy as np

vector_a = np.array([5, 5])
vector_b = np.array([5, -5])

# These two vectors form a 90° angle. Dot product should be 0.
print('dot product =', np.dot(vector_a, vector_b))

# pandas: data structure and stats library for Python

Pandas turns Excel, CSV, JSON and other formats into a DataFrame.
A DataFrame is a spreadsheet we can analyze and change with code.

In [None]:
import pandas as pd

# Convert a COVID case CSV file to a DataFrame.
covid19_cases_df = pd.read_csv('../input/../input/coronavirus-covid19-data-in-the-united-states/us-states.csv')
covid19_cases_df

Let's see which states have the highest cumulative case count.

In [None]:
top_states_df = covid19_cases_df.drop(['date', 'fips'], axis=1)
top_states_df = covid19_cases_df.groupby('state').max().sort_values(by=['cases'], ascending=False).reset_index()
top_states_df.head(5)

# Matplotlib: data visualization library for Python

Let's draw the case curve for a specific state. We'll start by using **pandas** to create a state-specific DataFrame.

In [None]:
state = 'California'
state_cases_df = covid19_cases_df[covid19_cases_df['state'] == state]
state_cases_df.tail()

Now let's use **matplotlib** to graph the DataFrame.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(20, 5))
plt.plot(state_cases_df['date'], state_cases_df['cases'])
plt.plot(state_cases_df['date'], state_cases_df['deaths'])
plt.xticks(state_cases_df['date'][::5], rotation=45, ha='right')
plt.ylabel('Cases & Deaths in Thousands')
plt.legend(['Cases', 'Deaths'])
plt.title('Cumulative Cases & Deaths in ' + state)
plt.show()

# scikit-learn: machine learning library for Python

We use scikit-learn to fit models to our data. The models help us predict, classify, and make other decisions based on the data. 

It's too complex to get into here. That's what [this other notebook](https://www.kaggle.com/epassi/median-housing-value-c-1990-california) is about.